Fall 2006 Program Development 5Bison and parsingFrom the area of compilers, we get a host of tools toconvert text files into programs. After lexical analysis,the second part of that process when you are dealing withtraditional languages such as C is syntax analysis, whichalso known as parsing.A good tool for creating parsers is bison. It take s aspecification file and creates an syntax analyzer, previouslycalled y.tab.c by yacc and now is generally justFILENAME.tab.c.COP 4342Fall 2006 Program Development 5Parsing terms☞ Production rules define a parser. Informally, these canbe expressed in BNF/EBNF form.☞ Production rules are made up a le ft hand side witha non-terminal, and righthand side made up terminalsand non-terminals.☞ A terminal “represents a class of syntacticallyequivalent tokens” [Bison manual].COP 4342Fall 2006 Program Development 5Attributes for te rminals and non-terminalsTerminals and non-terminals can have attributes.Constants could have the value of the constant, forinstance.Identifiers might have a pointer to a location whereinformation is kept about the identifier.COP 4342Fall 2006 Program Development 5Some general approaches to syntax analysisUse a compiler-compiler tool, such as bison.Write a one- off rec ursive des ce nt parser.Write a one- off parser suited to your program.COP 4342Fall 2006 Program Development 5Bison - our lexical analyzer generatorCan be called as yyparse().It is easy to interface with flex/lex.COP 4342Fall 2006 Program Development 5y file → bison → y.tab.c (*.tab.c)y.tab.c and → gcc → syntax analyzerother filesinput stream → syntax analyzer → actions takenwhen rules appliedCOP 4342Fall 2006 Program Development 5Calling B isonHere’s an example of calling Bison (which will be veryuseful when compiling assign6):Assign6-solution.out: Assign6-solution.y Assign6-solution.lbison -d --debug --verbose Assign6-solution.yflex Assign6-solution.lcc -c lex.yy.ccc -c Assign6-solution.tab.ccc -o Assign6-solution.out Assign6-solution.tab.o lex.yy.oThe -d option specifies to output an explicitCOP 4342Fall 2006 Program Development 5y.tab.h/*.tab.h file for flex. Specifying --debug and--verbose (combined with enabling yydebug) make itmuch easier to debug your parser!COP 4342Fall 2006 Program Development 5Bison specificationsBison source:{ definitions }%%{ rules }%%{ user subroutines }COP 4342Fall 2006 Program Development 5Definitions☞ Declarations of ordinary C variables and constants.☞ bison declarations.COP 4342Fall 2006 Program Development 5RulesThe general form for production rules is:<non-terminal> : <sequence of terminals and non-terminals> {action} | ... ;The actions are C/C++ code. Actions can appear inthe middle of the sequence of terminals and non-termianls.COP 4342Fall 2006 Program Development 5Bison declarations%token TOKEN create a TOKEN type%union { } create a Union for llvals.%right TOKEN create a TOKEN type that has right associativity%left TOKEN create a TOKEN type that has left associativityCOP 4342Fall 2006 Program Development 5Bison actionsActions are C source fragments.Example rules:variableDeclaration : ID COLON ID SEMICOLON {printf("emitting var %s of type %s\n",$3,$1);} ;The $3 and $1 refer to the values of the items 3 and 1in the righthand side of the production rule.COP 4342Fall 2006 Program Development 5An example of Bison: first, its matchingflex file%{#include <stdlib.h>#include <string.h>#include "Assign6-solution.tab.h"extern int linecount;%}%%program return PROGRAM;end return END;variables return VARIABLES;var return VAR;functions return FUNCTIONS;define return DEFINE;COP 4342Fall 2006 Program Development 5statements return STATEMENTS;if return IF;then return THEN;else return ELSE;while return WHILE;, return COMMA;"(" return LPARENTHESIS;")" return RPARENTHESIS;"{" return LBRACE;"}" return RBRACE;: return COLON;; return SEMICOLON;[a-zA-Z0-9]+ yylval = (int)strdup(yytext); return ID;[\n] linecount++;[ \t]+COP 4342Fall 2006 Program Development 5An example Bison program%{#include <stdlib.h>#include <stdio.h>int linecount = 0;void yyerror(char *s){fprintf(stderr,"file is not okay -- problem at line %d\n",linecount);exit(1);}int yywrap(){return 1;}%}%token IDCOP 4342Fall 2006 Program Development 5%token PROGRAM%token END%token VARIABLES%token VAR%token STATEMENTS%token IF%token THEN%token ELSE%token WHILE%token LBRACE%token RBRACE%token COLON%token SEMICOLON%token FUNCTIONS%token COMMA%token DEFINE%token LPARENTHESIS%token RPARENTHESIS%%program : PROGRAM ID variablesSection functionsSection statementsSection END ;variablesSection : VARIABLES LBRACE variableDeclarations RBRACE ;COP 4342Fall 2006 Program Development 5variableDeclarations : | variableDeclarations variableDeclaration ;variableDeclaration : ID COLON ID SEMICOLON {printf("emitting var %s of type %s\n",$3,$1);} ;functionsSection : FUNCTIONS LBRACE functionDeclarations RBRACE ;functionDeclarations : | functionDeclarations functionDeclaration ;functionDeclaration : DEFINE ID COLON ID LPARENTHESIS argsList RPARENTHESIS LBRACE statements RBRACE ;statementsSection : STATEMENTS LBRACE statements RBRACE ;statements : | statements statement ;statement : VAR variableDeclaration | whileLoop | ifStruct | subroutineCall SEMICOLON ;whileLoop : WHILE LPARENTHESIS subroutineCall RPARENTHESIS LBRACE statements RBRACE ;ifStruct : IF LPARENTHESIS subroutineCall RPARENTHESIS LBRACE statements RBRACE ;|IF LPARENTHESIS subroutineCall RPARENTHESIS LBRACE statements RBRACE ELSE LBRACE statements RBRACE ;subroutineCall : ID LPARENTHESIS callArgsList RPARENTHESIS ;argsList : | argPair | argsList COMMA argPair ;argPair : ID ID ;callArgsList : | ID | callArgsList COMMA ID ;%%int main(int argc, char **argv){// yydebug = 1;yyparse();COP 4342Fall 2006 Program Development 5printf("input is okay\n");}COP
View Full Document