Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Science 3rd year some specific parts pdf, Study notes of Computer Science

Computer Science 3rd year some specific parts pdf

Typology: Study notes

2016/2017

Uploaded on 11/29/2017

saavan-singh
saavan-singh 🇮🇳

1 document

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Yet Another Compiler-Compiler
Automatic generation of CF
parsers
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Computer Science 3rd year some specific parts pdf and more Study notes Computer Science in PDF only on Docsity!

Yet Another Compiler-Compiler

Automatic generation of CF

parsers

YACC – Yet Another Compiler-Compiler

  • YACC (Bison) is a parser generator for LALR(1) grammars

▫ Given a description of the grammar generates a C source for the parser

  • The input is a file that contains the grammar description with a formalism similar to the BNF (Backus-Naur Form) notation for language specification

▫ non terminal symbols – lowercase identifiers

 expr, stmt

▫ terminal symbols– uppercase identifiers or single characters

 INTEGER, FLOAT, IF, WHILE, ‘;’, ‘.’

▫ Grammar rules (production rules)

 expr: expr ‘+’ expr | expr ‘’ expr ; E → E+E|EE Language processing technologies Marco Maggini

YACC – integration with the lexical scanner

  • A lexical scanner is exploited to detect the terminal symbols in the parsed file

▫ The parser generated by YACC makes a call to the lexical scanner when it

needs to read the next terminal symbol for the input

  • The parse is implemented by the function yyparse() that needs the support of the lexical scanner (e.g. yylex()), the error handling functions and the caller procedure Language processing technologies Marco Maggini input file Lexical scanner Parser next token INTEGER IDENTIFIER …. actions

Grammar file

  • The grammar is defined in a text file (usually with extension .y) Language processing technologies Marco Maggini %{ C declarations (include/define/global variables) %} YACC definitions (terminal/non-terminal symbols and their properties) %% grammar ruels and associated actions %% C code (it is copied into the generated file after the yyparse() function)

Rule definition- 1

  • A grammatical rule has the following structure

result: components….. ;

is the non terminal symbol to which the right side of the

production rule is reduced

▫ The right side of the production rule is a sequence of components that

consist of terminal symbols, non terminal symbols and actions (C code

between {…})

Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3; } ACTION terminal symbol

Rule definition- 2

  • Alternative reductions for the same non terminal symbol can be listed
  • If the right side of the production rule is empty, the rule is satisfied by the

empty string

  • The rule is recursive if the terminal symbol in the left side ()

appears also in the right side (it is better to avoid right recursion...)

Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3} | expr ‘’ expr {$$=$1$3} ; expr: /* empty */ | expr1 ; exprseq: expr | exprseq ‘,’ expr ;

Semantics definition- 2

  • If different data types are to be used for different symbols

▫ The completed list of data tyeps must be specified in the YACC

declarations

▫ One of the declared data types is associated to a terminal/non terminal

symbol with a the declarations %token and %type

Language processing technologies Marco Maggini %union { double val; char *sptr; } Names associated to types in YACC C types %token NUM %type string_ass

Semantics definition- 3

  • The action is a C code block that is executed when a production rule is

applied (reduction)

▫ Actions can also appear between the symbols in the string of the right

side of the production rule and, in this case, they are executed when the

rule is partially matched (this makes rules less clear to understand)

▫ The action C code can refer the semantic values associated to the rule

tokens

 The value of the n-th token is associated to the identifier $n  $$ represents the left side value  If no action is specified then $$=$1 by default  The data type of $n is that declared for the corresponding token (it can be eventually casted with $n) Language processing technologies Marco Maggini expr: expr ‘+’ expr {$$=$1+$3} ; $$ $1 $

The interface with C

  • The parse function yyparse() reads terminal symbols, executes the actions and returns when

▫ The end of file is reached (return value 0)

▫ A fatal syntax error is found (return value 1)

▫ The macro YYACCEPT (return value 0) or YYABORT (return value 1) are

called in an action

  • The terminal symbols are detected by the lexical scanner – f.i. yylex() - that returns the corresponding code (ASCII code or YACC #define value)

▫ The eventual semantic value associated to the terminal symbol must be

stored into the global variable yylval

▫ If a single data type YYSTYPE is used in YACC, then yylval is of type

YYSTYPE, otherwise it is a C union data structure

Language processing technologies Marco Maggini

The interface with C - yylval

  • If yylval is of a single data type, in lex its value will be assigned as
  • If more data type are used

▫ If the assigned value is a pointer, the address should be in the global

memory space or in the heap (dynamic allocation with malloc)

Language processing technologies Marco Maggini …. yylval = value; return ID; } …. yylval.sptr = string; return STRING; } %union { double val; char *sptr; }

Error handling - 2

  • After the call to yyerror() the parser tries to recover from the error condition if a error recover function is implemented, otherwise it exists yyparse returning 1
  • The variable yynerrs stores the number of encountered errors (it is a global variable for non-reentrant parsers)
  • In general it is preferable to avoid halting the parser at the first error

▫ The error handling policy can be defined by exploiting the special

terminal symbol error in the production rule

▫ The special symbol error is generated by the parser every time that a

syntax error is found

Language processing technologies Marco Maggini

Error handling- 3

  • If there is an error in exp ▫ Some incomplete derivations in the parser stack and terminal symbols in the input are likely to be found before an input ‘\n’ is matched ▫ The parser forces the application of the rule removing part of the syntactical context from the stack and the input  it removes states and objects from the stack until the rule containing error is matched (it finds the previous stmt)  it pushes the symbol error into the stack  it reads input symbols until it finds a matching lookahead terminal symbol (‘\n’ in this case) Language processing technologies Marco Maggini stmt: /* empty */ | stmt ‘\n’ | stmt exp ‘\n’ | stmt error ‘\n’

Error handling– the art of… 2

  • If the wrong error policy si used, a syntax error can be the cause of another one...
  • To avoid an uncontrolled generation of error messages the parser does not report new error messages for a syntax error that is found just after the last one (at least two new symbols are to be read to generate a new error)
  • The reporting of error messages can be reactivated by the call of the function yyerrok in the action Language processing technologies Marco Maggini

An example- calculator

  • Parser to implement the operations of a multifunction calculator that has the following features

▫ Arithmetic operators (‘+’, ‘-’, ‘*’, ‘/’,’^’)

▫ Predefined functions (sin, cos, exp, log,…) to be invoked as f(x)

▫ Variables with variable names and assignments (v=1)

  • Source file for YACC/Bison
  • Source file for LEX
  • Utility file in C (symbol table management)
  • The parser C source is generated by the command bison –d calc.y

▫ The genearted files are calc.yy,tab.c and calc.yy.tab.h

Language processing technologies Marco Maggini