DOC PREVIEW
Berkeley COMPSCI 164 - Lecture Notes

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Lecture 8 Parsers grammar derivations, recursive descent parser vs. CYK parser, Prolog vs. Datalog Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2012 UC BerkeleyAdministrativia You will earn PA extra credit for bugs in solutions, starter kits, handouts. Today is back-to-basic Thursday. We have some advanced material to cover. 2Today: Parsing Why parsing? Making sense out of these sentences: This lecture is dedicated to my parents, Mother Teresa and the pope. the (missing) serial comma determines whether M.T.&p. associate to “my parents” or to “dedicated to”. Seven-foot doctors filed a law suit. the dash associates “seven” to “foot” rather than to “doctors”. if E1 then if E2 then E3 else E4 typical semantics associates “else E4” with the closest if (ie, “if E2”) In general, programs and data exist in text form which need to be understood by parsing 3The cs164 concise parsing story Courses often spend two weeks on parsing. CS164 deals with parsing in 2 lectures, and teaches non-parsing lessons along the way. 1. Write a random expression generator. 2. Invert this recursive generator into a parser by replacing print with scan and random with oracle. 3. Now rewrite write this parser in Prolog, which is your oracle. This gives you the ubiquitous recursive descent parser. 4. An observation: this Prolog parser has no negation. It’s in Datalog! 5. Datalog programs are evaluated bottom-up (dynamic programming). Rewriting the Prolog parser into Datalog thus yields CYK parser. 6. Datalog evaluation can be optimized with a Magic Set Transformation, which yields Earley Parser. (Covered in Lecture 9.) 4Grammar: a recursive definition of a language Language: a set of (desired) strings Example: the language of regular expressions (RE). RE can be defined as a grammar: base case: any character c is regular expression; inductive case: if e1, e2 are regular expressions then the following are also regular expressions: e1 | e2 e1 e2 e1* (e1) Example: a few strings in this language: a few strings not in this language: 5Terminals, Non-terminals, productions The grammar notation: R ::= c | R R | R|R | R* | ( R) terminals (red): input characters also called the alphabet (of the of the language) non-terminals: will be rewritten to terminals convention: capitalized start non-terminal: starts the derivation of a string convention: s.n.t. is always the first nonterminal mentioned productions: rules that governs string derivation ex has five: R ::= c, R ::= R R, R ::= R|R, R ::= R*, R ::=(R) 6It’s grammar, not grammer. “Not all writing is due to bad grammer.” (sic) Saying “grammer” is a lexical error, not a syntactic (ie, grammatic) one. In the compiler, this error is caught by the lexer. lexer fails to recognize “grammer” as being in the lexicon. In cs164, you learn which part of compiler finds errors. lexer, parser, syntactic analysis, or runtime checks? 7Grammars vs. languages Write a grammar for the language all strings bai, i>0. grammar 1: S ::= Sa | ba grammar 2: S ::= baA A ::= aA |  A language can be described with multiple grammars L(G) = language (strings) described by grammar G Left recursive grammar: Right-recursive grammar: neither: 8Why do we care about left-/right-recursion? Some parser can’t handle left-recursive grammars. It may get them into infinite recursion. Luckily, we can rewrite a l/r grammar into a r/r one. Example 1: S ::= Sa | a is rewritten into S ::= aS | a Example 2: E ::= a | E + E | E * E | (E) becomes E ::= T | T + E T = F | F * T F = a | ( E ) 9 T (a term) and F (a factor) introduce desirable precedence and associativity. More in L9.Deriving a string from a grammar How is a string derived in a grammar: 1. write down the start non-terminal S 2. rewrite S with the rhs of a production S → rhs 3. pick a non-terminal N 4. rewrite N with the rhs of a production N → rhs 5. if no non-terminal remains, we have generated a string. 6. otherwise, go to 3. Example: grammar G: E ::= T | T + E T = F | F * T F = a | ( E ) derivation of a string from L(G): S → T + E → F + E → a + E → a + T → a + F → a + a 10Generate a string from L(G) Is there a recipe for printing all strings from L(G)? Depends if you are willing to wait. L(G)may be infinite.  Write function gen(G) that prints a string s  L(G). If L(G) is finite, rerunning gen(G) should eventually print any string in L(G). 11gen(G) Grammar G and its language L(G): G: E ::= a | E + E | E * E L(G) = { a, a+a, a*a, a*a+a, … } For simplicity, we hardcode G into gen() def gen() { E(); print EOF } def E() { switch (choic e()): case 1: print "a" case 2: E(); print "+"; E() case 3: E(); print "*"; E() } 12Visualizing string generation with a parse tree The tree that describe string derivation is parse tree. Are we generating the string top-down or bottom-up? Top-down. Can we do it other way around? Sure. See CYK. 13Parsing Parsing is the inverse of string generation: given a string, we want to find the parse tree If parsing is just the inverse of generation, let’s obtain the parser mechanically from the generator! def gen() { E(); print EOF } def E() { switch (choice()): case 1: print “a" case 2: E(); print "+"; E() case 3: E(); print "*"; E() } 14Generator vs. parser def gen() { E(); print EOF } def E() { switch (choice()) { case 1: print “a" case 2: E(); print "+"; E() case 3: E(); print "*"; E() }} def parse() { E(); scan(EOF) } def E() { switch (oracle()) { case 1: scan("a") case 2: E(); scan("+"); E() case 3: E(); scan("*"); E() }} def scan(s) { if input starts with s, consume s; else abort } 15Parsing == reconstruction of the parse tree Why do we need the parse tree? We evaluate it to obtain the AST, or perhaps to directly compute the value of the program. Next slide shows use of parse tree for evaluation. Exercise: construct AST from a parse tree. 1617 Example 1: evaluate an expression (calculator) Input: 2 * (4 + 5) Annotated Parse Tree: E (18) T (18) F (9) T (2) F (2) E (9) T (5) F (5) E (4) T (4) F (4) * ) + ( int (2) int (4) int (5)18 Parse tree vs. abstract syntax tree Parse tree = concrete syntax tree – contains all syntactic symbols from the


View Full Document

Berkeley COMPSCI 164 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

40 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?