228CS 536 Spring 2008©ExampleLet’s look at the CUP specificationfor CSX-lite. Recall its CFG isprogram → { stmts }stmts → stmt stmts| λstmt → id = expr ;| if ( expr ) stmtexpr → expr + id| expr - id|id229CS 536 Spring 2008©The corresponding CUPspecification is:/***This Is A Java CUP Specification ForCSX-lite, a Small Subset of The CSXLanguage, Used In Cs536 ***//* Preliminaries to set up and use thescanner. */import java_cup.runtime.*;parser code {: public void syntax_error(Symbol cur_token){ report_error(“CSX syntax error at line “+String.valueOf(((CSXToken)cur_token.value).linenum),null);}:};init with {: :};scan with {:return Scanner.next_token();:};230CS 536 Spring 2008©/* Terminals (tokens returned by thescanner). */terminal CSXIdentifierToken IDENTIFIER;terminal CSXToken SEMI, LPAREN, RPAREN,ASG, LBRACE, RBRACE;terminal CSXToken PLUS, MINUS, rw_IF;/* Non terminals */non terminal csxLiteNode prog;non terminal stmtsNode stmts;non terminal stmtNode stmt;non terminal exprNode exp;non terminal nameNode ident;start with prog;prog::= LBRACE:l stmts:s RBRACE {: RESULT=new csxLiteNode(s,l.linenum,l.colnum); :};stmts::= stmt:s1 stmts:s2 {: RESULT=new stmtsNode(s1,s2,s1.linenum,s1.colnum); :}231CS 536 Spring 2008©| {: RESULT= stmtsNode.NULL; :};stmt::= ident:id ASG exp:e SEMI {: RESULT=new asgNode(id,e,id.linenum,id.colnum); :}| rw_IF:i LPAREN exp:e RPAREN stmt:s {: RESULT=new ifThenNode(e,s, stmtNode.NULL,i.linenum,i.colnum); :};exp::=exp:leftval PLUS:op ident:rightval {: RESULT=new binaryOpNode(leftval,sym.PLUS, rightval,op.linenum,op.colnum); :}| exp:leftval MINUS:op ident:rightval {: RESULT=new binaryOpNode(leftval,sym.MINUS,rightval,op.linenum,op.colnum); :}| ident:i {: RESULT = i; :};232CS 536 Spring 2008©ident::= IDENTIFIER:i {: RESULT = new nameNode(new identNode(i.identifierText, i.linenum,i.colnum),exprNode.NULL,i.linenum,i.colnum); :};233CS 536 Spring 2008©Let’s parse{ a = b ; }First, a is parsed usingident::= IDENTIFIER:i {: RESULT = new nameNode(new identNode(i.identifierText, i.linenum,i.colnum),exprNode.NULL,i.linenum,i.colnum); :}We buildnameNodeidentNodenullExprNodea234CS 536 Spring 2008©Next, b is parsed usingident::= IDENTIFIER:i {: RESULT = new nameNode(new identNode(i.identifierText, i.linenum,i.colnum),exprNode.NULL,i.linenum,i.colnum); :}We buildnameNodeidentNodenullExprNodeb235CS 536 Spring 2008©Then b’s subtree is recognized asan exp:| ident:i {: RESULT = i; :}Now the assignment statement isrecognized:stmt::= ident:id ASG exp:e SEMI {: RESULT=new asgNode(id,e,id.linenum,id.colnum); :}We buildnameNodeidentNodenullExprNodeanameNodeidentNodenullExprNodebasgNode236CS 536 Spring 2008©The stmts →λ production ismatched (indicating that there areno more statements in theprogram).CUP matchesstmts::= {: RESULT= stmtsNode.NULL; :}and we buildNext,stmts → stmt stmtsis matched usingstmts::= stmt:s1 stmts:s2 {: RESULT=new stmtsNode(s1,s2,s1.linenum,s1.colnum); :}nullStmtsNode237CS 536 Spring 2008©This buildsAs the last step of the parse, theparser matchesprogram → { stmts }using the CUP ruleprog::= LBRACE:l stmts:s RBRACE {: RESULT=new csxLiteNode(s,l.linenum,l.colnum); :};nameNodeidentNodenullExprNodeanameNodeidentNodenullExprNodebasgNodestmtsNodenullStmtsNode238CS 536 Spring 2008©The final AST reurned by theparser isnameNodeidentNodenullExprNodeanameNodeidentNodenullExprNodebasgNodestmtsNodenullStmtsNodecsxLiteNode239CS 536 Spring 2008©Errors in Context-FreeGrammarsContext-free grammars cancontain errors, just as programsdo. Some errors are easy to detectand fix; others are more subtle.In context-free grammars we startwith the start symbol, and applyproductions until a terminal stringis produced.Some context-free grammars maycontain useless non-terminals.Non-terminals that areunreachable (from the startsymbol) or that derive no terminalstring are considered useless.Useless non-terminals (andproductions that involve them)can be safely removed from a240CS 536 Spring 2008©grammar without changing thelanguage defined by the grammar.A grammar containing uselessnon-terminals is said to be non-reduced.After useless non-terminals areremoved, the grammar is reduced.ConsiderS → AB| xB → bA → aAC → dWhich non-terminals areunreachable? Which derive noterminal string?241CS 536 Spring 2008©Finding Useless Non-terminalsTo find non-terminals that canderive one or more terminalstrings, we’ll use a markingalgorithm.We iteratively mark terminals thatcan derive a string of terminals,until no more non-terminals canbe marked. Unmarked non-terminals are useless.(1) Mark all terminal symbols(2) RepeatIf all symbols on therighthand side of aproduction are markedThen mark the lefthand sideUntil no more non-terminalscan be marked242CS 536 Spring 2008©We can use a similar markingalgorithm to determine whichnon-terminals can be reachedfrom the start symbol:(1) Mark the Start Symbol(2) RepeatIf the lefthand side of aproduction is markedThen mark all non-terminalsin the righthand sideUntil no more non-terminalscan be marked243CS 536 Spring 2008©λ DerivationsWhen parsing, we’ll sometimesneed to know which non-terminalscan derive λ. (λ is “invisible” andhence tricky to parse).We can use the following markingalgorithm to decide which non-terminals derive λ(1) For each production A →λmark A(2) RepeatIf the entire righthandside of a productionis markedThen mark the lefthand sideUntil no more non-terminalscan be marked244CS 536 Spring 2008©As an example considerS → A B CA → aB → CDD → d| λC → c| λ245CS 536 Spring 2008©Recall that compilers prefer anunambiguous grammar because aunique parse tree structure can beguaranteed for all inputs.Hence a unique translation,guided by the parse treestructure, will be obtained.We would like an algorithm thatchecks if a grammar isambiguous.Unfortunately, it is undecidablewhether a given CFG isambiguous, so such an algorithmis impossible to create.Fortunately for certain grammarclasses, including those for whichwe can generate parsers, we canprove included grammars areunambiguous.246CS 536 Spring 2008©Potentially, the most serious flawthat a grammar might have is thatit generates the “wronglanguage."This is a subtle point as agrammar serves as the definitionof a language.For established languages (like Cor Java) there is usually a suite ofprograms created to test andvalidate new compilers. Anincorrect grammar will almostcertainly lead to
View Full Document