DOC PREVIEW
UW-Madison CS 536 - Lecture notes

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

295CS 536 Spring 2008©How do LL(1) Parsers BuildSyntax Trees?So far our LL(1) parser has actedlike a recognizer. It verifies thatinput token are syntacticallycorrect, but it produces nooutput.Building complete (concrete)parse trees automatically is fairlyeasy.As tokens and non-terminals arematched, they are pushed onto asecond stack, the semantic stack.At the end of each production, anaction routine pops off n itemsfrom the semantic stack (where nis the length of the production’srighthand side). It then builds asyntax tree whose root is the296CS 536 Spring 2008©lefthand side, and whose childrenare the n items just popped off.For example, for productionStmt → id = Expr ;the parser would include anaction symbol after the “;” whoseactions are:P4 = pop(); // Semicolon tokenP3 = pop(): // Syntax tree for ExprP2 = pop(); // Assignment tokenP1 = pop(); // Identifier tokenPush(new StmtNode(P1,P2,P3,P4));297CS 536 Spring 2008©Creating Abstract SyntaxTreesRecall that we prefer that parsersgenerate abstract syntax trees,since they are simpler and moreconcise.Since a parser generator can’tknow what tree structure we wantto keep, we must allow the userto define “custom” action code,just as Java CUP does.We allow users to include “codesnippets” in Java or C. We alsoallow labels on symbols so thatwe can refer to the tokens andtress we wish to access. Ourproduction and action code willnow look like this:Stmt → id:i = Expr:e ;{: RESULT = new StmtNode(i,e); :}298CS 536 Spring 2008©How do We Make GrammarsLL(1)?Not all grammars are LL(1);sometimes we need to modify agrammar’s productions to createthe disjoint Predict sets LL1)requires.There are two common problemsin grammars that make uniqueprediction difficult or impossible:1. Common prefixes.Two or more productions withthe same lefthand side beginwith the same symbol(s).For example,Stmt → id = Expr ;Stmt → id ( Args ) ;299CS 536 Spring 2008©2. Left-RecursionA production of the formA → A ...is said to be left-recursive.When a left-recursive productionis used, a non-terminal isimmediately replaced by itself(with additional symbolsfollowing).Any grammar with a left-recursiveproduction can never be LL(1).Why?Assume a non-terminal A reachesthe top of the parse stack, withCT as the current token. The LL(1)parse table entry, T[A][CT],predicts A → A ...We expand A again, and T[A][CT],so we predict A → A ... again. Weare in an infinite prediction loop!300CS 536 Spring 2008©Eliminating Common PrefixesAssume we have two of moreproductions with the samelefthand side and a commonprefix on their righthand sides:A → α β | α γ | ... | α δWe create a new non-terminal, X.We then rewrite the aboveproductions into:A → αXX→ β | γ | ... | δFor example,Stmt → id = Expr ;Stmt→ id ( Args ) ;becomesStmt → id StmtSuffixStmtSuffix→ = Expr ;StmtSuffix→ ( Args ) ;301CS 536 Spring 2008©Eliminating Left RecursionAssume we have a non-terminalthat is left recursive:A → Aα A → β | γ | ... | δTo eliminate the left recursion, wecreate two new non-terminals, Nand T.We then rewrite the aboveproductions into:A → N T N → β | γ | ... | δT → α T | λ302CS 536 Spring 2008©For example,Expr → Expr + idExpr → idbecomesExpr → N TN → idT → + id T | λThis simplifies to:Expr → id TT → + id T | λ303CS 536 Spring 2008©Reading AssignmentRead Sections 6.1 to 6.5.1 ofCrafting a Compiler featuringJava.304CS 536 Spring 2008©How does JavaCup Work?The main limitation of LL(1)parsing is that it must predict thecorrect production to use when itfirst starts to match theproduction’s righthand side.An improvement to this approachis the LALR(1) parsing methodthat is used in JavaCUP (and Yaccand Bison too).The LALR(1) parser is bottom-upin approach. It tracks the portionof a righthand side alreadymatched as tokens are scanned. Itmay not know immediately whichis the correct production tochoose, so it tracks sets ofpossible matching productions.305CS 536 Spring 2008©ConfigurationsWe’ll use the notationX → A B•C Dto represent the fact that we aretrying to match the productionX → A B•C D with A and Bmatched so far.A production with a “•”somewhere in its righthand side iscalled a configuration.Our goal is to reach aconfiguration with the “dot” at theextreme right:X → A B C D•This indicates that an entireproduction has just beenmatched.306CS 536 Spring 2008©Since we may not know whichproduction will eventually be fullymatched, we may need to track aconfiguration set. A configurationset is sometimes called a state.When we predict a production, weplace the “dot” at the beginning ofa production:X →•A B C DThis indicates that the productionmay possibly be matched, but nosymbols have actually yet beenmatched.We may predict a λ-production:X →λ•When a λ-production is predicted,it is immediately matched, since λcan be matched at any time.307CS 536 Spring 2008©Starting the ParseAt the start of the parse, we knowsome production with the startsymbol must be used initially. Wedon’t yet know which one, so wepredict them all:S →•A B C DS →•e F gS →•h I...308CS 536 Spring 2008©ClosureWhen we encounter aconfiguration with the dot to theleft of a non-terminal, we knowwe need to try to match that non-terminal.Thus inX →•A B C Dwe need to match someproduction with A as its left handside.Which production?We don’t know, so we predict allpossibilities:A →•P Q RA →•s T...309CS 536 Spring 2008©The newly added configurationsmay predict other non-terminals,forcing additional productions tobe included. We continue thisprocess until no additionalconfigurations can be added.This process is called closure (ofthe configuration set).Here is the closure algorithm:ConfigSet Closure(ConfigSet C){repeatif (X → a•B d is in C &&B is a non-terminal)Add all configurations ofthe form B →•g to C)until (no more configurationscan be added);return C;}310CS 536 Spring 2008©Example of ClosureAssume we have the followinggrammar:S → AbA → CDC → DC → cD → dTo compute Closure(S →•Ab)we first include all productionsthat rewrite A:A →•CDNow C productions are included:C →•DC →•c311CS 536 Spring 2008©Finally, the D production is added:D →•dThe complete configuration set is:S →•AbA →•CDC →•DC →•cD →•dThis set tells us that if we want tomatch an A, we will need tomatch a C, and this is done bymatching a c


View Full Document

UW-Madison CS 536 - Lecture notes

Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?