1Introduction• What is YACC ?– Tool which will produce a parser for a given grammar.– YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar and to produce the source code of the syntactic analyzer of a language produced by this grammar.History• Yacc original written by Stephen C. Johnson, 1975.•Variants:– lex, yacc (AT&T)– bison: a yacc replacement (GNU)– flex: fast lexical analyzer (GNU)– BSD yacc– PCLEX, PCYACC (AbraxasSoftware) yaccHow YACC Works(1) ParseYACC source (foo.y)y.tab.hy.tab.ccc / gcc(2) Compiley.tab.ca.outa.out(3) RunToken streamAbstractSyntaxTreey.outputAn YACC File Example%{#include <stdio.h>%}%token NAME NUMBER%%statement: NAME '=' expression | expression { printf("= %d\n", $1); } ;expression: expression '+' NUMBER { $$ = $1 + $3; } | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ;%%int yyerror(char *s){ fprintf(stderr, "%s\n", s); return 0;}int main(void){ yyparse(); return 0;}YACC File Format%{C declarations%}yacc declarations%%Grammar rules%%Additional C code– Comments in /* ... */ may appear in any of the sections. Definitions Section%{#include <stdio.h>#include <stdlib.h>%}%token ID NUM%start exprIt is a terminalstart from expr2Start Symbol• The first non-terminal specified in the grammar specification section.• To overwrite it with %start declaraction.%start non-terminalRules Section• Is a grammar• Exampleexpr : expr '+' term | term;term : term '*' factor | factor;factor : '(' expr ')' | ID | NUM;Rules Section• Normally written like this• Example:expr : expr '+' term | term;term : term '*' factor | factor;factor : '(' expr ')' | ID | NUM;The Position of Rulesexpr : expr '+' term { $$ = $1 + $3; }| term { $$ = $1; };term : term '*' factor { $$ = $1 * $3; }| factor { $$ = $1; };factor : '(' expr ')' { $$ = $2; }| ID| NUM;Works with LEX12 + 26call yylex()[0-9]+next token is NUMNUM ‘+’ NUMLEX and YACC need a way to identify tokensCommunication between LEX and YACCyacc -d gram.yWill produce:y.tab.h• Use enumeration / define• YACC creates y.tab.h• LEX includes y.tab.h3Communication between LEX and YACC%{#include <stdio.h>#include "y.tab.h"%}id [_a-zA-Z][_a-zA-Z0-9]*%%int { return INT; }char { return CHAR; }float { return FLOAT; }{id} { return ID;}%{#include <stdio.h>#include <stdlib.h>%}%token CHAR, FLOAT, ID, INT%%yacc -d xxx.yproducesy.tab.h# define CHAR 258# define FLOAT 259# define ID 260# define INT 261parser.yscanner.lYACC• Rules may be recursive• Rules may be ambiguous*• Uses bottom up Shift/Reduce parsing– Get a token– Push onto stack– Can it reduced ?• yes: Reduce using a rule• no: Get another token• Yacc cannot look ahead more than one tokenPassing value of token• Every terminal-token (symbol) may represent a value or data type– May be a numeric quantity in case of a number (42)– May be a pointer to a string ("Hello, World!")• When using lex, we put the value into yylval– In complex situations yylval is a union• Typical lex code:[0-9]+ {yylval = atoi(yytext); return NUM}Passing value of token• Yacc allows symbols to have multiple types of value symbols%union {double dval;int vblno;char* strval;}Passing value of token%union {double dval;int vblno;char* strval;}yacc -dy.tab.h…extern YYSTYPE yylval;[0-9]+ { yylval.vblno = atoi(yytext);return NUM;}[A-z]+ { yylval.strval = strdup(yytext);return STRING;}Lex fileinclude “y.tab.h”Yacc Example• Taken from Lex & Yacc• Example: Simple calculatora = 4 + 6aa=10b = 7c = a + bcc = 17$4Grammarexpression ::= expression '+' term |expression '-' term | termterm ::= term '*' factor | term '/' factor | factorfactor ::= '(' expression ')' |'-' factor |NUMBER | NAME #define NSYMS 20 /* maximum number of symbols */struct symtab {char *name;double value;} symtab[NSYMS];struct symtab *symlook();parser.hname value0name value1name value2name value3name value4name value5name value6name value7name value8name value9name value10Symbol Table%{#include "parser.h"#include <string.h>%}%union {double dval;struct symtab *symp;}%token <symp> NAME%token <dval> NUMBER%type <dval> expression%type <dval> term%type <dval> factor%%parser.yParserTerminal NAME and <symp>have the same data type.Nonterminal expression and <dval> have the same data type.statement_list: statement '\n'| statement_list statement '\n';statement: NAME '=' expression { $1->value = $3; }| expression { printf("= %g\n", $1); };expression: expression '+' term { $$ = $1 + $3; }| expression '-' term { $$ = $1 - $3; }| term;parser.yParser (cont’d)term: term '*' factor { $$ = $1 * $3; }| term '/' factor { if ($3 == 0.0)yyerror("divide by zero");else$$ = $1 / $3;}| factor;factor: '(' expression ')' { $$ = $2; }| '-' factor { $$ = -$2; }| NUMBER { $$ = $1; }| NAME { $$ = $1->value; };%%parser.yParser (cont’d)%{#include "y.tab.h"#include "parser.h"#include <math.h>%}%%([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) {yylval.dval = atof(yytext); return NUMBER;}[ \t] ; /* ignore white space */Scannerscanner.l5[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */yylval.symp = symlook(yytext);return NAME; }"$" { return 0; /* end of input */ }\n|”=“|”+”|”-”|”*”|”/” return yytext[0];%%Scanner (cont’d)scanner.lPrecedence / Association1. 1-2-3 = (1-2)-3? or 1-(2-3)?2. 1-2*3 = 1-(2*3) or (1-2)*3?Yacc: Shift/Reduce conflicts. Default is to shift.(1) 1 - 2 - 3(2) 1 - 2 * 3Precedence / Association%right ‘=‘%left '<' '>' NE LE GE%left '+' '-‘%left '*' '/' highest precedencePrecedence / Associationexpr : expr ‘+’ expr { $$ = $1 + $3; }| expr ‘-’ expr { $$ = $1 - $3; }| expr ‘*’ expr { $$ = $1 * $3; }| expr ‘/’ expr{ if($3==0)yyerror(“divide 0”);else$$ = $1 / $3;}| ‘-’ expr %prec UMINUS {$$ = -$2; }%left '+' '-'%left '*' '/'%noassoc UMINUSIF-ELSE Ambiguity• Consider following rule:Following state ?IF expr IF expr stmt ELSE stmtstmt : IF expr stmt| IF expr stmt ELSE stmt……IF-ELSE Ambiguity• It is a shift/reduce conflict.• Yacc will always choose to shift.• A solution:stmt : matched | unmatched ;matched: other_stmt | IF expr THEN matched ELSE matched ;unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ;6Shift/Reduce Conflicts• shift/reduce conflict– occurs
View Full Document