ParsingReview for the MidtermStephen A. EdwardsColumbia UniversityFall 2008The Midterm70 minutes4–5 problemsClosed bookOne sheet of notes of your own devisingComprehensive: Anything discussed in class is fair gameLittle, if any, programming.Details of O’Caml/C/C++/Java syntax not requiredBroad knowledge of languages discussedTopicsStructure of a CompilerScanning and ParsingRegular ExpressionsContext-Free GrammarsBottom-up ParsingASTsName, Scope, and BindingsPart IStructure of a CompilerCompiling a Simple Programint gcd(int a, int b){while (a != b) {if (a > b) a -= b;else b -= a;}return a;}What the Compiler Seesint gcd(int a, int b){while (a != b) {if (a > b) a -= b;else b -= a;}return a;}i n t sp g c d ( i n t sp a , sp in t sp b ) nl { nl sp sp w h i l e sp( a sp ! = sp b ) sp { nl sp sp sp sp if sp ( a sp > sp b ) sp a sp - = sp b; nl sp sp sp sp e l s e sp b sp - = spa ; nl sp sp } nl sp sp r e t u r n spa ; nl } nlText file is a sequence of charactersLexical Analysis Gives Tokensint gcd(int a, int b){while (a != b) {if (a > b) a -= b;else b -= a;}return a;}intgcd ( int a , int b ) { while ( a!= b ) { if ( a > b ) a -= b ; elseb -= a ; } return a ; }A stream of tokens. Whitespace, comments removed.Parsing Gives a n ASTfuncintgcd argsargint aargint bseqwhile!=a bif>a b-=a b-=b areturnaint gcd(int a, int b){while (a != b) {if (a > b) a -= b;else b -= a;}return a;}Abstract syntax tree built from parsing rules.Semantic Anal ysis Resolves SymbolsSymbolTable:int aint bfuncintgcd argsargint aargint bseqwhile!=a bif>a b-=a b-=b areturnaTypes checked; references to symbols resolvedTranslation into 3-Address CodeL0: sne $1, a, bseq $0, $1, 0btrue $0, L1 % while (a != b)sl $3, b, aseq $2, $3, 0btrue $2, L4% if (a < b)sub a, a, b % a -= bjmp L5L4: sub b, b, a % b -= aL5: jmp L0L1: ret aint gcd(int a, int b){while (a != b) {if (a > b) a -= b;else b -= a;}return a;}Idealized a ssembly language w/ infinite registersGene ration of 80386 Assemblygcd: pushl %ebp % Save FPmovl %esp,%ebpmovl 8(%ebp),%eax% Load a from stackmovl 12(%ebp),%edx % Load b from stack.L8: cmpl %edx,%eaxje .L3% while (a != b)jle .L5 % if (a < b)subl %edx,%eax % a -= bjmp .L8.L5: subl %eax,%edx % b -= ajmp .L8.L3: leave % Restore SP, BPretPart IIScanningDescribing TokensAlphabet: A finite set of symbolsExamples: { 0, 1 }, { A, B, C, .. . , Z }, ASCII, UnicodeString: A finite sequence of symbols f rom an alphabetExamples: ǫ (the empty string), Stephen, αβγLanguage: A set of strings over an alphabetExamples: ; (the empty language), { 1, 11, 111, 1111 }, all Englishwords, strings that start with a letter followed by any sequence ofletters and digitsOperations on LanguagesLet L = { ǫ, wo }, M = { man, men }Concatenation: Strings from one followed by the otherLM = { man, men, woman, women }Union: All strings f rom each languageL ∪ M = {ǫ, wo, man, men }Kleene Closure: Zero or more concatenationsM∗= {ǫ} ∪ M ∪ M M ∪ M M M · ·· ={ǫ, man, men, manman, ma nmen, menman, menmen,manmanman, manmanmen, manmenma n, . . . }Regular Expressions over an Alphabet ΣA standard way to express languages for tokens.1. ǫ is a regular expression that denotes {ǫ}2. If a ∈ Σ, a is an RE that denotes {a}3. If r and s denote languages L(r ) and L(s),Ï(r )|(s) denotes L(r ) ∪ L(s)Ï(r )( s) denotes {tu : t ∈ L(r ),u ∈ L(s)}Ï(r )∗denotes ∪∞i=0Li(L0= {ǫ} and Li= LLi−1)Nondeterministic Finite Automata“All strings containing aneven number of 0’s and 1’s”A BCD00001111start1. Set of states S:nA , B , C , Do2. Set of input symbols Σ: {0,1}3. Transition function σ : S × Σǫ→ 2Sstateǫ 0 1A – {B} {C}B– {A} {D}C– {D} {A}D– {C} {B}4. Start state s0: A5. Set of accepting states F :nAoThe Language induced by an NFAAn NFA accepts an input string x iff there is a path from the startstate to an accepting state that “spells out” x.A BCD00001111startShow that the string “010010” isaccepted.A B DCD B A010 010Translating REs into NFAsastartar1r2ifr1r2startr1|r2ifr1r2startǫǫǫǫ(r )∗ifrstartǫǫǫǫTranslating REs into NFAsExample: translate (a|b)∗abb into an NFA0123456 7 8 910ǫǫaǫbǫǫǫ abbǫǫShow that the string “aabb” is accepted.0 1 2 3 6 7 8 9 10ǫǫaǫ ǫ ab bSimulating NFAsProblem: you must follow the “right” arcs to show that a string isaccepted. How do you know which arc is right?Solution: follow them all an d sort it out later.“Two-stack” NFA simulation algorithm:1. Initial states: the ǫ-closure of the start state2. For each character c,ÏNew states: follow all transitions labeled cÏForm the ǫ-closure of the current states3. Accept if any final state is acceptingSimulating an NFA: ·aabb, Start0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: ·aabb, ǫ-cl osure0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: a·abb0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: a·abb, ǫ-closure0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aa·bb0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aa·bb, ǫ-closure0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aab·b0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aab·b, ǫ-closure0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aabb·0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫSimulating an NFA: aabb·, Done0 12 34 56 7 8 910ǫǫaǫbǫǫǫ abbǫǫDeterm inist ic Finite AutomataRestricted form of NFAs:ÏNo state has a transition on ǫÏFor each state s and symbol a, there is at most one edgelabeled a leaving s.Differs subtly from the definition used in COMS W3261 (Sipser,Introduction to the Theory of Computation)Very easy to check acceptance: simulate by maintaining currentstate. Accept if you end up on an accepting state. Reject if you endon a non-accepting state or if there is no transition from the currentstate for the next symbol.Determ inist ic Finite Automata{type token = ELSE | ELSEIF}rule token =parse "else" { ELSE }| "elseif" { ELSEIF }els eifDeterm inist ic Finite Automata{ type token = IF | ID of string | NUM of string }rule token =parse "if" { IF }| [’a’-’z’] [’a’-’z’ ’0’-’9’]*as lit { ID(lit) }| [’0’-’9’]+ as num { NUM(num) }ID IFIDNUMifa-z0-9a-eg-z0-9a-z0-9a-hj-z0-90-9Building a DFA from an NFASubset construction algorithmSimulate the NFA for all possible inputs and track the states thatappear.Each unique state during simulation becomes a state in the DFA.Subset construction for (a|b)∗abb (1)abSubset
View Full Document