Login

prince76 · 02-29-2012, 10:37 AM

Tomorrow is my first presentation on my project Compiler Construction.

I have been writing these stuff which I have to present tomorrow.

feeling nervous , hopefully all goes well ;)

now I have to make PPT slides from this text.

Code:
[ Compiler Construction ]

First of all , what is a Comipler ??

Compiler is a computer program which converts your source code into an assembly code. 

Source code is a code written in programming languages like C++ , C , Java , which are usually termed as High level Language.

Compiler takes your source code and checks it for errors and if not found any , converts it into an assembly code .

assembly code is a code written in a assembly language which is a low level language. 

Construction 

Basically it is a two pass Compiler, 

Front End and back End. 

Front End is composed of two things 

Scanner

Parser 

Back End is composed of many parts 

few of them are 

Code Optimization 

Instruction Selection 

Registry Allocation 

Scanner is called as a lexical analyzer also known as tokenizer.

lexical analysis is the process of converting a sequence of characters into a sequence of tokens.

Token is composed of two things 

<token type , value>

example

Consider this expression in C++ programming language.

    sum=3+2;

sum = identifier

=     Assignment operator

3     =Number

+     =Addition operator

2     =Number

;     =End of statement

Token = <number , 3>

how it works ?

Scanner has some basic definations of the token type for the language we are working on.

firstly we deifne a regular language for the Scanner and then derive regular expressions  from it.

then we convert it into a Non Deterministic finite automata then to Deterministic finite automata  and then minimizes it. 

Regular language can be 

[0-9] means 0 to 9

[a-z]*  * means in a loop                                            b  

a|b  a or b 

thse are regular expressions defined for our language . 

we make a finite automata through this expression.

Lex is a tool which is used to generate this Scanner.

Coding in C++ for Scanner

%

{#include "tokendef.h"

}

%

D   [0-9]

L  [a-zA-Z_]

id {L} ({L} {D})*

%%

void {return(TOK_VOID); }

int {return (TOK_INT}; }

(D)+

while   {return (TOK_WHILE);}

if      {return (TOK_IF);}

else    {return (TOK_ELSE);}

%%

#define void 1

#define int 2

#define wihle 3

#define if 4

#define else 5

.cpp file 

#include 

int main(){

FlexLexer lex;

int tc = lex. [method name  which basically contains all the defination statement ]

while(tc !=0){

cout <<tc <<" , " <<lex.YYText() <<endl;

tc =lex.[same method name]

}

}

}

Parser 

Parser is used to create an Intermediate Representation of this tokens. 

one form of this IR is Abstract Syntax Tree [AST]

which is derived from Parse Tree.

Parser deals with the Syntax and Semantics.

it basically checks wheather your expression belongs to your programming language or not

lets take an example to further explain this 

expression

fi(a=20;a<78;a++){

as per the syntax it should be "if" , instead of "fi"

also , there is no semicolon at the end.

this is what you call syntax.

Scanner will pass this as an identifier.

what is semantics?

Linguistic semantics is the study of meaning that is used to understand human expression through language.

Other forms of semantics include the semantics of programming languages.

lets take an example to explain this 

suppose you initialize a variable "a" 

and do not declare it,

our Comipler should throw a semantic error.

The whole concept of Parser is based on CFG

which is Context free Grammer. 

It is a formal grammar in which every production rule is of the form

    V --> w

where V is a single nonterminal symbol, and w is a string of nonterminals.

There are certain rules defined in CFG which helps  us to make Parser.

There are variuos types of parsing technique

top down parsing 

bottom up parsing

preductive parsing

recursive descend parsing

Backup Normal Form(BNF) notation was used to derive certain rules for making syntax for programming languages.

These rules are based on 

[Context free Grammer]

the way of doing this is 

take your expression and start deriving it 

either form the left hand side or right hand side 

example 

goal --> expr 

these expr is further derived .

lets take an example 

expression x-2*y

lets do a leftmost derivation 

goal ---> expr

expr--?> expr , op, term 

expr --->expr , op , term 

result --> (x-2)*y

when you do this same derivation by using right most derivation 

result would be x-(2*y)

now in the case of programming language, 

our expressions should be based on BODMAS rule.

for this we define levels.

Top up down parsing 

in this method , for the construction of our parse tree, we follow our path from root to last leaf

let us take an example 

same expression used before

x-2*y

we follow the construction from root .

this is our rule for defining grammer 

goal -> expr

expr -> expr + term   [here we are taking + just to try out our rule]

later we will see that we have not applied the correct rule , then we will change our operator.

how we will do this ? by backtrack

backtrack is a method which tells you to go back and change your rule.

now we will change our operator to - wherever it was +.

our last term in this rule is  always factor

replace your actual value by factor.

this technique only works for right recursive grammer 

it fails when left recursive grammer occurs.

eg for left recursive grammer

term --> term * factor

now we have to convert this left recursive grammer to right one 

expr --> term  expr'

expr' ----> + term expr'

expr'------> - term expr'

expr'----->e [epsolong sign]

this is done by using epsolong sign.

Preductive Parsing

this technique is known as Preductive Parsing because we predict for the next token.

it is based on the LL(k) property

L--> left to right scan

L-->  left side parsing

k is the look ahead token.

very complex technique to use.

LL(k) property is true

when A->a amd A->b

then first(a) U first(b) = null;

this property was basically used in Recursive descend Parsing.

in this we have to make functions of each non terminal , i.e for expr(), term(), Eprime().

We use OOP language C++ , so that we can use objects and concept of Inheritance to make it simpler and easy.

first thing we have to do is make a SuperClass NonTerminal and then all the non terminals will be the sub class of this superclass.

class NonTerminal{

public:NonTerminal(Scanner* sc){

s= sc; tree =null;

}

virtual = ~NonTerminal();

virtual bool isPresent() =0;

TreeNode* AST(){return tree;}

protected:

scanner* s;

TreeNode* tree;

}

our Grammer rule 

goal --->expr

expr---> term expr*  

expr* --> + term expr'

    | - term expr*

we have to make subclasses of each Non Terminal

class expr:public NonTerminal{

bool expr::isPresent(){

Term* op1 = new Term(s);

if(op1->ispresent())

return false;

tree = op1->AST();

Eprime* op2 =new Eprime(s,tree);

if(op2->ispresent())

tree = op2->AST();

return true;

Subclass for Eprime 

bool Eprime::ispresent(){

int op = s->nextToken();

if(op==PLUS || op==MINUS){

s->advance();

Term* op2 = new Term(s);

if(!op2->isPresent()

{syntaxError(s);

TreeNode* t2 = op2->AST();

tree = new TreeNode(op, exprsofar, t2);

Eprime op3 = new Eprime(s, tree);

if(op3->isPresent())

tree = op3->AST();

return true;

}

else return false;

More to Come [] . .

There are many corrections to be done , please let me know if someone can , thanks Smile

Rhynorater · 04-07-2012, 09:30 AM

Ummm, that looks pretty good bro! It's a little over my head, but I think you got dis!

Login
Username:
Password:	Lost Password?
	Remember me