DOC PREVIEW
UMD CMSC 330 - Context-Free Grammars: Pushdown Automaton

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC 330: Organization of Programming LanguagesContext-Free Grammars:Pushdown AutomatonCMSC 330 2Reminders• Project 2 Due Oct. 12CMSC 330 3Regular expressions and CFGs• Programming languages are not regular– Matching (an arbitrary number of) brackets so that they are balanced• Usually almost context-free, with some hacksMachineDescriptionpushdown automata (PDAs)context-free grammarscontext-free languagesDFAs, NFAsregular expressionsregular languagesCMSC 330 4Equivalence of DFA and regular grammarsCMSC 330 5Pushdown Automaton (PDA)•A pushdown automaton (PDA) is an abstract machine similar to the DFA– Has a finite set of states– Also has a pushdown stack• Moves of the PDA are as follows:– An input symbol is read and the top symbol on the stack is read– Based on both inputs, the machine• Enters a new state, and• Writes zero or more symbols onto the pushdown stack• Or pops zero or more symbols from the stack– String accepted if the stack is empty AND the string has endedCMSC 330 6Power of PDAs• PDAs are more powerful than DFAs–anbn, which cannot be recognized by a DFA, can easily be recognized by the PDA• Stack all a symbols and, for each b, pop an a off the stack. • If the end of input is reached at the same time that the stack becomes empty, the string is accepted2CMSC 330 7Context-free Grammars in Practice• Regular expressions are used to turn raw text into a string of tokens– E.g., “if”, “then”, “identifier”, etc.– Whitespace and comments are simply skipped– These tokens are the input for the next phase of compilation– Standard tools used include lex and flex• Many others for Java• CFGs are used to turn tokens into parse trees– This process is called parsing– Standard tools used include yacc and bison• Those trees are then analyzed by the compiler, which eventually produces object codeCMSC 330 8Parsing• There are many efficient techniques for turning strings into parse trees– They all have strange names, like LL(k), SLR(k), LR(k)...– Take CMSC 430 for more details• We will look at one very simple technique: recursive descent parsing– This is a “top-down” parsing algorithm because we’re going to begin at the start symbol and try to produce the stringCMSC 330 9ExampleE  id = n | { L }L  E ; L | – Here n is an integer and id is an identifier• One input might be– { x = 3; { y = 4; }; }– This would get turned into a list of tokens{x=3;{y=4;}; }– And we want to turn it into a parse treeCMSC 330 10Example (cont’d)E  id = n | { L }L  E ; L | { x = 3; { y = 4; }; }E{L}E;Lx= 3 E ; L{L }E;Ly=4 CMSC 330 11Parsing Algorithm• Goal: determine if we can produce a string to be parsed from the grammar's start symbol• At each step, we'll keep track of two facts– What tree node are we trying to match?– What is the next token (lookahead) of the input string?• There are three cases:– If we’re trying to match a terminal and the next token (lookahead) is that token, then succeed, advance the lookahead, and continue– If we’re trying to match a nonterminal then pick which production to apply based on the lookahead– Otherwise, fail with a parsing errorCMSC 330 12Example (cont’d)E  id = n | { L }L  E ; L | { x = 3 ; { y = 4 ; } ; }E{L}E;Lx= 3 E ; L{L }E;Ly=4 lookahead3CMSC 330 13Definition of First()• First( ), for any terminal or nonterminal , is the set of initial terminals of all strings that may expand to– We’ll use this to decide what production to applyCMSC 330 14Definition of First(), cont’d• For a terminal a, First(a) = { a }• For a nonterminal N:–If N , then add  to First(N) –If N  1 2... n, then (note the iare all the symbols on the right side of one single production):•Add First(12... n) to First(N), where First(12... n) is defined as–First(1) if  ∉ First(1)– Otherwise (First(1)–) First(2... n)•If  ∈ First(i) for all i, 1 ≤ i ≤ k, then add  to First(N)CMSC 330 15ExamplesE  id = n | { L }L  E ; L | First(id) = { id }First("=") = { "=" }First(n) = { n }First("{")= { "{" }First("}")= { "}" }First(";")= { ";" }First(E) = { id, "{" }First(L) = { id, "{",  }E  id = n | { L } | L  E ; L | First(id) = { id }First("=") = { "=" }First(n) = { n }First("{")= { "{" }First("}")= { "}" }First(";")= { ";" }First(E) = { id, "{",  }First(L) = { id, "{", ";",  }CMSC 330 16Implementing a Recursive Descent Parser• For each terminal symbol a, create a function parse_a, which:– If the lookahead is a it consumes the lookahead by advancing the lookahead to the next token, and returns– Otherwise fails with a parse error if the lookahead is not a• For each nonterminal N, create a function parse_N– This function is called when we’re trying to parse a part of the input which corresponds to (or can be derived from) N– parse_S for the start symbol S begins the processCMSC 330 17Implementing a Recursive Descent Parser, con't.• The body of parse_N for a nonterminal N does the following:– Let N 1| ... | kbe the productions of N•Here iis the entire right side of a production- a sequence of terminals and nonterminals– Pick the production N isuch that the lookahead is in First(i)• It must be that First(i)  First(j) =  for i  j• If there is no such production, but N then return• Otherwise, then fail with a parse error– Suppose i= 1 2... n. Then call parse_ 1(); ... ; parse_ n() to match the expected right-hand side, and returnCMSC 330 18ExampleE  id = n | { L }L  E ; L | let parse_term t =if !lookahead = tthen lookahead := <next token>else raise <Parse error>let rec parse_E () =if lookahead = 'id' then beginparse_term 'id';parse_term '=';parse_term 'n'endelse if lookahead = '{' then beginparse_term '{';parse_L ();parse_term '}';endelse raise <Parse error>;(not quitevalid OCaml)4CMSC 330 19Example (cont’d)E  id = n | { L }L  E ; L | and parse_L () =if lookahead = 'id'|| lookahead = '{' then beginparse_E ();parse_term ';';parse_L ()end(* else return (not an error) *)mutually recursive with previous let recCMSC 330 20Things to Notice• If you draw the execution trace of the parser as a tree, then you get the parse tree• This parsing strategy may fail on certain grammars because the First sets overlap– This doesn’t mean the grammar is not


View Full Document

UMD CMSC 330 - Context-Free Grammars: Pushdown Automaton

Documents in this Course
Exam #1

Exam #1

6 pages

Quiz #1

Quiz #1

2 pages

Midterm 2

Midterm 2

12 pages

Exam #2

Exam #2

7 pages

Ocaml

Ocaml

7 pages

Parsing

Parsing

38 pages

Threads

Threads

12 pages

Ruby

Ruby

7 pages

Quiz #3

Quiz #3

2 pages

Threads

Threads

7 pages

Quiz #4

Quiz #4

2 pages

Exam #2

Exam #2

6 pages

Exam #1

Exam #1

6 pages

Threads

Threads

34 pages

Quiz #4

Quiz #4

2 pages

Threads

Threads

26 pages

Exam #2

Exam #2

9 pages

Exam #2

Exam #2

6 pages

Load more
Download Context-Free Grammars: Pushdown Automaton
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Context-Free Grammars: Pushdown Automaton and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Context-Free Grammars: Pushdown Automaton 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?