UD CISC 672 - Parsing IV - D2311603

Home> Schools> University of Delaware> Computer/Information Sciences (CISC) (CISC) > CISC 672> Parsing IV

DOC PREVIEW

UD CISC 672 - Parsing IV

School name University of Delaware

Course Cisc 672- Compiler Construction

Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Parsing IV LR(1) ParsersLR(1) Parsers • LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition • LR(1) parsers recognize languages that have an LR(1) grammar Informal definition: A grammar is LR(1) if, given a rightmost derivation S ⇒ γ0 ⇒ γ1 ⇒ γ2 ⇒ … ⇒ γn–1 ⇒ γn ⇒ sentence We can 1. isolate the handle of each right-sentential form γi, and 2. determine the production by which to reduce, by scanning γi from left-to-right, going at most 1 symbol beyond the right end of the handle of γiLR(1) Parsers A table-driven LR(1) parser looks like Tables can be built by hand However, this is a perfect task to automate Scanner Table-driven Parser ACTION & GOTO Tables Parser Generator source code grammar IRLR(1) Skeleton Parser stack.push(INVALID); stack.push(s0); token = scanner.next_token(); do while (TRUE) { s = stack.top(); if ( ACTION[s,token] == “shift si” ) then { stack.push(token); stack.push(si); token ← scanner.next_token(); } else if ( ACTION[s,token] == “reduce A→β” ) then { stack.popnum(2*|β|); // pop 2*|β| symbols s = stack.top(); stack.push(A); stack.push(GOTO[s,A]); } else if ( ACTION[s,token] == “accept” & token == EOF )! then return; else report a syntax error and recover; } The skeleton parser • uses ACTION & GOTO tables • does |words| shifts • does |derivation| reductions • does 1 accept • detects errors by failure of 3 other casesTo make a parser for L(G), need a set of tables The grammar The tables LR(1) Parsers (parse tables)!stack.push(INVALID); stack.push(s0); token = scanner.next_token(); do while (TRUE) { s = stack.top(); if ( ACTION[s,token] == “shift si” ) then { stack.push(token); stack.push(si); token ← scanner.next_token(); } else if ( ACTION[s,token] == “reduce A→β” ) then { stack.popnum(2*|β|); // pop 2*|β| symbols s = stack.top(); stack.push(A); stack.push(GOTO[s,A]); } else if ( ACTION[s,token] == “accept” & token == EOF )! then return; else report a syntax error and recover; } Example Parse 1: The string “baa” The tables The grammarExample Parse 1 The string “baa”Example Parse 1 The string “baa”Example Parse 1 The string “baa”Example Parse 1 The string “baa”stack.push(INVALID); stack.push(s0); token = scanner.next_token(); do while (TRUE) { s = stack.top(); if ( ACTION[s,token] == “shift si” ) then { stack.push(token); stack.push(si); token ← scanner.next_token(); } else if ( ACTION[s,token] == “reduce A→β” ) then { stack.popnum(2*|β|); // pop 2*|β| symbols s = stack.top(); stack.push(A); stack.push(GOTO[s,A]); } else if ( ACTION[s,token] == “accept” & token == EOF )! then return; else report a syntax error and recover; } Example Parse 2: The string “baa baa” The tables The grammarExample Parse 2 The string “baa baa”Example Parse 2 The string “baa baa”Example Parse 2 The string “baa baa”Example Parse 2 The string “baa baa”LR(1) Parsers How does this LR(1) stuff work? • Unambiguous grammar ⇒ unique rightmost derivation • Keep upper fringe on a stack → All active handles include top of stack (TOS)! → Shift inputs until TOS is right end of a handle • Language of handles is regular (finite)! → Build a handle-recognizing DFA → ACTION & GOTO tables encode the DFA • Final state in DFA ⇒ a reduce action → New state is GOTO[state at TOS (after pop), lhs] → For SN, this takes the DFA to s1Building LR(1) Parsers How do we generate the ACTION and GOTO tables? • Use the grammar to build a model of the DFA • Use the model to build ACTION & GOTO tables • If construction succeeds, the grammar is LR(1)! The Big Picture • Model the state of the parser • Use two functions goto( s, X ) and closure( s )! → goto() is analogous to Delta() in the subset construction → closure() adds information to round out a state • Build up the states and transition functions of the DFA • Use this information to fill in the ACTION and GOTO tablesLR(1) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1) parser An LR(1) item is a pair [P, a], where P is a production A→β with a • at some position in the rhs and a is a lookahead word (or EOF)! The • in an item indicates the position of the top of the stack [A→•βγ,a] means that the input seen so far is consistent with the use of A →βγ immediately after the symbol on top of the stack [A →β•γ,a] means that the input seen so far is consistent with A →βγ at this point in the parse, and that the parser has already recognized β&[A →βγ•,a] means that the parser has seen βγ, and that a lookahead symbol of a is consistent with reducing to ALR(1) Items The production A→β, where β = B1B1B1 with lookahead a, can give rise to 4 items [A→•B1B2B3,a], [A→B1•B2B3,a], [A→B1B2•B3,a], & [A→B1B2B3•,a] The set of LR(1) items for a grammar is finite What’s the point of all these lookahead symbols? • Carry them along to choose the correct reduction, if there is a choice • Lookaheads are bookkeeping, unless item has • at right end → Has no direct use in [A→β•γ,a] → In [A→β•,a], a lookahead of a implies a reduction by A →β&→ For { [A→β•,a],[B→γ•δ,b] }, a ⇒ reduce to A; FIRST(δ) ⇒ shift ⇒ Limited right context is enough to pick the actionsHigh-level overview 1 Build the canonical collection of sets of LR(1) Items a Begin in an appropriate state, CC0 ♦ [S’ →•S,EOF], along with any equivalent items ♦ Derive equivalent items as closure( CC0 )! b Repeatedly compute, for each CCk, and each X, goto(CCk,X)! ♦ If the set is not already in the collection, add it ♦ Record all the transitions created by goto( )! This eventually reaches a fixed point 2 Fill in the tables from the collection of sets of LR(1) items The canonical collection

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-14-15-30-31 out of 31 pages.

UD CISC 672 - Parsing IV

Sign up for free to view:

Please select your school