Download Computer Science 3rd year some specific parts pdf and more Study notes Computer Science in PDF only on Docsity!
Yet Another Compiler-Compiler
Automatic generation of CF
parsers
YACC – Yet Another Compiler-Compiler
- YACC (Bison) is a parser generator for LALR(1) grammars
▫ Given a description of the grammar generates a C source for the parser
- The input is a file that contains the grammar description with a formalism similar to the BNF (Backus-Naur Form) notation for language specification
▫ non terminal symbols – lowercase identifiers
expr, stmt
▫ terminal symbols– uppercase identifiers or single characters
INTEGER, FLOAT, IF, WHILE, ‘;’, ‘.’
▫ Grammar rules (production rules)
expr: expr ‘+’ expr | expr ‘’ expr ; E → E+E|EE Language processing technologies Marco Maggini
YACC – integration with the lexical scanner
- A lexical scanner is exploited to detect the terminal symbols in the parsed file
▫ The parser generated by YACC makes a call to the lexical scanner when it
needs to read the next terminal symbol for the input
- The parse is implemented by the function yyparse() that needs the support of the lexical scanner (e.g. yylex()), the error handling functions and the caller procedure Language processing technologies Marco Maggini input file Lexical scanner Parser next token INTEGER IDENTIFIER …. actions
Grammar file
- The grammar is defined in a text file (usually with extension .y) Language processing technologies Marco Maggini %{ C declarations (include/define/global variables) %} YACC definitions (terminal/non-terminal symbols and their properties) %% grammar ruels and associated actions %% C code (it is copied into the generated file after the yyparse() function)
Rule definition- 1
- A grammatical rule has the following structure
result: components….. ;
▫ is the non terminal symbol to which the right side of the
production rule is reduced
▫ The right side of the production rule is a sequence of components that
consist of terminal symbols, non terminal symbols and actions (C code
between {…})
Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3; } ACTION terminal symbol
Rule definition- 2
- Alternative reductions for the same non terminal symbol can be listed
- If the right side of the production rule is empty, the rule is satisfied by the
empty string
- The rule is recursive if the terminal symbol in the left side ()
appears also in the right side (it is better to avoid right recursion...)
Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3} | expr ‘’ expr {$$=$1$3} ; expr: /* empty */ | expr1 ; exprseq: expr | exprseq ‘,’ expr ;
Semantics definition- 2
- If different data types are to be used for different symbols
▫ The completed list of data tyeps must be specified in the YACC
declarations
▫ One of the declared data types is associated to a terminal/non terminal
symbol with a the declarations %token and %type
Language processing technologies Marco Maggini %union { double val; char *sptr; } Names associated to types in YACC C types %token NUM %type string_ass
Semantics definition- 3
- The action is a C code block that is executed when a production rule is
applied (reduction)
▫ Actions can also appear between the symbols in the string of the right
side of the production rule and, in this case, they are executed when the
rule is partially matched (this makes rules less clear to understand)
▫ The action C code can refer the semantic values associated to the rule
tokens
The value of the n-th token is associated to the identifier $n $$ represents the left side value If no action is specified then $$=$1 by default The data type of $n is that declared for the corresponding token (it can be eventually casted with $n) Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3} ; $$ $1 $
The interface with C
- The parse function yyparse() reads terminal symbols, executes the actions and returns when
▫ The end of file is reached (return value 0)
▫ A fatal syntax error is found (return value 1)
▫ The macro YYACCEPT (return value 0) or YYABORT (return value 1) are
called in an action
- The terminal symbols are detected by the lexical scanner – f.i. yylex() - that returns the corresponding code (ASCII code or YACC #define value)
▫ The eventual semantic value associated to the terminal symbol must be
stored into the global variable yylval
▫ If a single data type YYSTYPE is used in YACC, then yylval is of type
YYSTYPE, otherwise it is a C union data structure
Language processing technologies Marco Maggini
The interface with C - yylval
- If yylval is of a single data type, in lex its value will be assigned as
- If more data type are used
▫ If the assigned value is a pointer, the address should be in the global
memory space or in the heap (dynamic allocation with malloc)
Language processing technologies Marco Maggini …. yylval = value; return ID; } …. yylval.sptr = string; return STRING; } %union { double val; char *sptr; }
Error handling - 2
- After the call to yyerror() the parser tries to recover from the error condition if a error recover function is implemented, otherwise it exists yyparse returning 1
- The variable yynerrs stores the number of encountered errors (it is a global variable for non-reentrant parsers)
- In general it is preferable to avoid halting the parser at the first error
▫ The error handling policy can be defined by exploiting the special
terminal symbol error in the production rule
▫ The special symbol error is generated by the parser every time that a
syntax error is found
Language processing technologies Marco Maggini
Error handling- 3
- If there is an error in exp ▫ Some incomplete derivations in the parser stack and terminal symbols in the input are likely to be found before an input ‘\n’ is matched ▫ The parser forces the application of the rule removing part of the syntactical context from the stack and the input it removes states and objects from the stack until the rule containing error is matched (it finds the previous stmt) it pushes the symbol error into the stack it reads input symbols until it finds a matching lookahead terminal symbol (‘\n’ in this case) Language processing technologies Marco Maggini stmt: /* empty */ | stmt ‘\n’ | stmt exp ‘\n’ | stmt error ‘\n’
Error handling– the art of… 2
- If the wrong error policy si used, a syntax error can be the cause of another one...
- To avoid an uncontrolled generation of error messages the parser does not report new error messages for a syntax error that is found just after the last one (at least two new symbols are to be read to generate a new error)
- The reporting of error messages can be reactivated by the call of the function yyerrok in the action Language processing technologies Marco Maggini
An example- calculator
- Parser to implement the operations of a multifunction calculator that has the following features
▫ Arithmetic operators (‘+’, ‘-’, ‘*’, ‘/’,’^’)
▫ Predefined functions (sin, cos, exp, log,…) to be invoked as f(x)
▫ Variables with variable names and assignments (v=1)
- Source file for YACC/Bison
- Source file for LEX
- Utility file in C (symbol table management)
- The parser C source is generated by the command bison –d calc.y
▫ The genearted files are calc.yy,tab.c and calc.yy.tab.h
Language processing technologies Marco Maggini