DOC PREVIEW
Berkeley COMPSCI 164 - Building a Parser II

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Prof. Bodik CS 164 Lecture 6 1Building a Parser IICS1643:30-5:00 TT10 EvansProf. Bodik CS 164 Lecture 62Administrativia• PA2 assigned today–due in 12 days• WA1 assigned today–due in a week– it’s a practice for the exam•First midterm–Oct 5– will contain some project-inspired questionsProf. Bodik CS 164 Lecture 63Overview• Grammars• derivations• Recursive descent parser• Eliminating left recursionProf. Bodik CS 164 Lecture 64Grammars• Programming language constructs have recursive structure. – which is why our hand-written parser had this structure, too•An expression is either:•number, or•variable, or• expression + expression, or• expression - expression, or• ( expression ), or•…Prof. Bodik CS 164 Lecture 65Context-free grammars (CFG)• a natural notation for this recursive structure• grammar for our balanced parens expressions:BalancedExpression d a | ( BalancedExpression )• describes (generates) strings of symbols:– a, (a), ((a)), (((a))), …• like regular expressions but can refer to – other expressions (here, BalancedExpression)– and do this recursively (giving is “non-finite state”)Prof. Bodik CS 164 Lecture 66Example: arithmetic expressions• Simple arithmetic expressions:E d n | id | ( E ) | E + E | E * E• Some elements of this language:–id –n–( n )–n + id–id * ( id + id )2Prof. Bodik CS 164 Lecture 67Symbols: Terminals and Nonterminals• grammars use two kinds of symbols•terminals: – no rules for replacing them– once generated, terminals are permanent– these are tokens of our language •nonterminals:– to be replaced (expanded)– in regular expression lingo, these serve as names of expressions– start non-terminal: the first symbol to be expandedProf. Bodik CS 164 Lecture 68Notational Conventions• In these lecture notes, let’s adopt a notation:– Non-terminals are written upper-case– Terminals are written lower-case or as symbols, e.g., token LPAR is written as ( – The start symbol is the left-hand side of the first productionProf. Bodik CS 164 Lecture 69Derivations• This is how a grammar generates strings:– think of grammar rules (called productions) as rewrite rules• Derivation: the process of generating a string1. begin with the start non-terminal2. rewrite the non-terminal with some of its productions3. select a non-terminal in your current stringi. if no non-terminal left, done. ii. otherwise go to step 2.Prof. Bodik CS 164 Lecture 610Example: derivationGrammar: E d n | id | ( E ) | E + E | E * E•a derivation:E rewrite E with ( E )( E ) rewrite E with n( n ) this is the final string of terminals• another derivation (written more concisely):E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id )Prof. Bodik CS 164 Lecture 611So how do derivations help us in parsing?• A program (a string of tokens) has no syntax error if it can be derived from the grammar.– but so far you only know how to derive some (any) string, not how to check if a given string is derivable• So how to do parsing?– a naïve solution: derive all possible strings and check if your program is among them– not as bad as it sounds: there are parsers that do this, kind of. Coming soon.Prof. Bodik CS 164 Lecture 612Decaf ExampleA fragment of Decaf:STMT d while ( EXPR ) STMT| id ( EXPR ) ;EXPR d EXPR + EXPR| EXPR – EXPR| EXPR < EXPR| ( EXPR )| id3Prof. Bodik CS 164 Lecture 613Decaf Example (Cont.)Some elements of the (fragment of) language:Question: One of the strings is not from the language.Which one?id ( id ) ;id ( ( ( ( id ) ) ) ) ;while ( id < id ) id ( id ) ;while ( while ( id ) ) id ( id ) ;while ( id ) while ( id ) while ( id ) id ( id ) ;STMT d while ( EXPR ) STMT| id ( EXPR ) ;EXPR d EXPR + EXPR | EXPR – EXPR| EXPR < EXPR | ( EXPR ) | id Prof. Bodik CS 164 Lecture 614CFGs (definition)•A CFG consists of– A set of terminal symbols T– A set of non-terminalsymbols N– A start symbolS (a non-terminal)– A set of productions:produtions are of two forms (X  N)X d ε , or X d Y1Y2... Ynwhere Yi N 4 TProf. Bodik CS 164 Lecture 615context-free grammars• what is “context-free”?– means the grammar is not context-sensitive• context-sensitive gramars– can describe more languages than CFGs– because their productions restrict when a non-terminal can be rewritten. An example production:d N d d A B c– meaning: N can be rewritten into ABc only when preceded by d– can be used to encode semantic checks, but parsing is hardProf. Bodik CS 164 Lecture 616Now let’s parse a string• recursive descent parser derives all strings – until it matches derived string with the input string– or until it is sure there is a syntax errorProf. Bodik CS 164 Lecture 617Recursive Descent Parsing• Consider the grammarE → T + E | TT → int | int * T | ( E )• Token stream is: int5* int2• Start with top-level non-terminal E• Try the rules for E in orderProf. Bodik CS 164 Lecture 618Recursive Descent Parsing. Example (Cont.)•Try E0→ T1+ E2• Then try a rule for T1 → ( E3 )– But ( does not match input token int5•TryT1 → int . Token matches. – But + after T1does not match input token *•Try T1 → int * T2– This will match but + after T1will be unmatched• Have exhausted the choices for T1– Backtrack to choice for E04Prof. Bodik CS 164 Lecture 619Recursive Descent Parsing. Example (Cont.)•Try E0→ T1• Follow same steps as before for T1– And succeed with T1 → int * T2and T2 → int– With the following parse treeE0T1int5*T2int2Prof. Bodik CS 164 Lecture 620A Recursive Descent Parser (2)• Define boolean functions that check the token string for a match of– A given token terminalbool term(TOKEN tok) { return in[next++] == tok; }– A given production of S (the nth)bool Sn() { … }– Any production of S: bool S() { … }• These functions advance nextProf. Bodik CS 164 Lecture 621A Recursive Descent Parser (3)• For production E → T + Ebool E1() { return T() && term(PLUS) && E(); }• For production E → Tbool E2() { return T(); }• For all productions of E (with backtracking)bool E() {int save = next;return (next = save, E1()) || (next = save, E2()); }Prof. Bodik CS 164 Lecture 622A Recursive Descent Parser (4)• Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE);


View Full Document

Berkeley COMPSCI 164 - Building a Parser II

Documents in this Course
Lecture 8

Lecture 8

40 pages

Load more
Download Building a Parser II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Building a Parser II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Building a Parser II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?