DOC PREVIEW
CSU CS 553 - Scanning and Parsing

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS553 Lecture Scanning and Parsing 2Scanning and ParsingAnnouncements– Project 1 is 5% of total grade– Project 2 is 10% of total grade– Project 3 is 15% of total grade– Project 4 is 10% of total gradeToday– Outline of planned topics for course– Overall structure of a compiler– Lexical analysis (scanning)– Syntactic analysis (parsing) CS553 Lecture Scanning and Parsing 3Structure of a Typical Interpreter“sentences”Synthesisoptimizationcode generationtarget languageIRIR code generationIRAnalysischaracter streamlexical analysis“words”tokenssemantic analysissyntactic analysisASTannotated ASTinterpreterCompilerCS553 Lecture Scanning and Parsing 4Lexical Analysis (Scanning) Break character stream into tokens (“words”)– Tokens, lexemes, and patterns– Lexical analyzers are usually automatically generated from patterns(regular expressions) (e.g., lex) Examples“.*”“hi”, “mom”string[0-9]+ | [0-9]*.[0-9]+3.14159,570number[a-zA-Z_]+[a-zA-Z0-9_]*foo,indexidentifier< | <= | = | != | ...<,<=,=,!=,...relationifififconstconstconstpatternlexeme(s)tokenconst pi := 3.14159 ⇒ const, identifier(pi), assign,number(3.14159)CS553 Lecture Scanning and Parsing 5Interaction Between Scanning and ParsingLexicalanalyzerParsercharacter streamlexer.next()lexer.peek()tokenparse treeor AST2CS553 Lecture Scanning and Parsing 6Specifying Tokens with SableCC Theory meets practice:– Regular expressions, formallanguages, grammars, parsing… SableCC example input file: Package minijava; Helpers all = [0..0xFFFF]; cr = 13; digit = ['0'..'9']; letter = ['a'..'z'] | ['A'..'Z']; underscore = ’_’; not_star = [all - '*']; not_star_slash = [not_star - '/']; c_comment = '/*' not_star* ('*'(not_star_slash not_star*)?)* '*/'; Tokens t_plus = '+'; t_if = 'if'; t_id = letter (letter | digit | underscore)*; t_blank = (' ' | eol | tab)+; t_comment = c_comment | line_comment; Ignored Tokens t_blank, t_comment;CS553 Lecture Scanning and Parsing 7Recognizing Tokens with DFAs ‘if‘ letter (letter | digit)* Ambiguity due to matching substrings– Longest match– Rule priorityletter or digitletter12fi145t_ift_idCS553 Lecture Scanning and Parsing 8 Impose structure on token stream– Limited to syntactic structure (⇒ high-level)– Structure usually represented with an abstract syntax tree (AST)– Parsers are usually automatically generated from context-free grammars(e.g., yacc, bison, cup, javacc, sablecc) Example for i = 1 to 10 do a[i] = x * 5; for id(i) equal number(1) to number(10) do id(a) lbracket id(i) rbracket equal id(x) times number(5) semiSyntactic Analysis (Parsing)fori 1 10 asgaitmsx5arrCS553 Lecture Scanning and Parsing 9Interaction Between Scanning and ParsingLexicalanalyzerParsercharacter streamlexer.next()lexer.peek()tokenparse treeor AST3CS553 Lecture Scanning and Parsing 10Bottom-Up Parsing: Shift-Reduce Rightmost derivation: expand rightmost non-terminals first SableCC, yacc, and bison generate shift-reduce parsers:– LALR(1): look-ahead, left-to-right, rightmost derivation in reverse, 1 symbol lookahead– LALR is a parsing table construction method, smaller tables than canonical LRReference: Barbara Ryder’s 198:515 lecture notes(1) S -> E(2) E -> E + T(3) E -> T(4) T -> idGrammerS -> E -> E + T -> E + id -> E + T + id -> E + id + id -> T + id + id -> id + id + ida + b + cCS553 Lecture Scanning and Parsing 11Shift-Reduce Parsing ExampleReference: Barbara Ryder’s 198:515 lecture notes(1) S -> E(2) E -> E + T(3) E -> T(4) T -> idStack Input Actionaccept$ Sreduce (1)$ Ereduce (2)$ E + Treduce (4)$ E + cshiftc$ E +shift+ c$ Ereduce (2)+ c$ E + Treduce (4)+ c$ E + bshiftb + c$ E +shift+ b + c$ Ereduce (3)+ b + c$ Treduce (4)+ b + c$ ashifta + b + c$CS553 Lecture Scanning and Parsing 12Shift-Reduce Parsing Example (precedence problem)(1) S -> E(2) E -> E + T(3) E -> E * T(4) E -> T(5) T -> idStack Input Actionshifta + b * c$CS553 Lecture Scanning and Parsing 13Syntax-directed Translation: AST Construction example AST for a+b+cReference: Barbara Ryder’s 198:515 lecture notes Grammer with production rulesS: E { $$ = $1; };E: E ‘+’ T { $$ = new node(“+”, $1, $3); } | T { $$ = $1; } ;T: T_ID { $$ = new leaf(“id”, $1); }; Implicit parse tree for a+b+cSEET+aabbccT_IDT_IDT_IDTT+E++4CS553 Lecture Scanning and Parsing 14Using SableCC to specify grammar and generate AST Productions cst_program {-> program} = cst_main_class cst_class_decl* {-> New program(cst_main_class.main_class,[cst_class_decl.class_decl])} ; cst_exp_list {-> exp* } = {many_rule} cst_exp cst_exp_rest* {-> [cst_exp.exp, cst_exp_rest.exp] } | {empty_rule} {-> [] } ; cst_exp_rest {-> exp* } = t_comma cst_exp {-> [cst_exp.exp] }; Abstract Syntax Tree program = main_class [class_decls]:class_decl*; exp = {call} exp t_id [args]:exp* | ...CS553 Lecture Scanning and Parsing 15Parsing Terms CFG (Context-free Grammer)– production rule– terminal– nonterminal– FOLLOW(X): “the set of terminals that can immediately follow X” BNF (Backus-Naur Form) and EBNF (Extended BNF): equivalent to CFGsCS553 Lecture Scanning and Parsing 16Parsing Terms cont … Top-down parsing– LL(1): left-to-right reading of tokens, leftmost derivation, 1 symbol look-ahead– Predictive parser: an efficient non-backtracking top-down parser that can handleLL(1)– More generally recursive descent parsing may involve backtracking Bottom-up Parsing– LR(1): left-to-right reading of tokens, rightmost derivation in reverse, 1 symbollookahead– Shift-reduce parsers: for example, bison, yacc, and SableCC generated parsers– Methods for producing an LR parsing table– SLR, simple LR– Canonical LR, most powerful– LALR(1)CS553 Lecture Scanning and Parsing 17Concepts Compilation stages in a compiler– Scanning, parsing, semantic analysis, intermediate code generation,optimization, code generation Lexical analysis or scanning– Tools: SableCC, lex, flex, etc. Syntactic analysis or parsing– Tools: SableCC, yacc, bison, etc.5CS553 Lecture Scanning and Parsing 18Next TimeLecture– More undergraduate compilers reviewCS553 Lecture Scanning and Parsing 19Language Implementation TimelineFlow-sens. defined [Banning]Itanium ships & Jikes RVM [IBM]CS553 @ CSU ‘80 ‘90 2000 2010Sparse cond. const. [Wegman&Zadeck]Superblock scheduling [Hwu] Java [Gosling&Sun]


View Full Document

CSU CS 553 - Scanning and Parsing

Download Scanning and Parsing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Scanning and Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Scanning and Parsing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?