DOC PREVIEW
CU-Boulder CSCI 3155 - Syntax

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1SyntaxAmer DiwanWhat is syntax?• Syntax– Determines the form of programs• Semantics– Determines the meaning of programs– Not all syntactically correct programs are “meaningful”– e.g., int i; i.m();is syntactically correct but semantically meaningless2Compilation and syntaxCharacter streamScanner (lexical anal)Token streamParser (syntax analysis)Parse treeSemantic analysis andIR generationASTMachine-independent optimizationsModified intermediate formTarget code generationAssembly or machine languageMachine-specificoptimizations Modified assemblySyntaxCSCI4555Lexical analysis• What are tokens?– Variables (x, i, theCount, ...)– Keywords (while, begin, ...)– Numbers (10, 1.5E10, ...)– Other symbols ({, ...)• How do languages specify legal tokens?(what is a legal variable name?)3Specifying tokens with regular expressions• What is a regular expression?– A character (e.g., opening_brace = {)– An empty string, denoted ?– Two regular expressions next to each other (e.g., auto_incr = plus plus)– Two regular expressions separated by a vertical bar (e.g., binary_op = plus | minus)– A regular expression followed by a kleene-star (e.g., integer = digit digit*)Examples of regular expressions• variable_name = (a|b|...|z|A|B|...|Z)(a|b|...|z|A|B|...|Z|0|...|9)*What kinds of variable names does this allow?• digit = 0|1|...|9unsigned_integer = digit digit*unsigned_number = unsigned_integer((. unsigned_integer) | ?)4Recognizing tokens: Ad-hoc approach(small grammars)• getWord(InputStream is) {...}• getNumber(InputStream is) {...}• matchKeyword(String s) {if (s.equal(“BEGIN”)) ...else if (s.equal(“END”)) ......}• if is_num(next_char) return new Number(getNumber())else { word = getWord();if matchKeyword(word) then ...else return new Identifier(word)}Recognizing tokens(larger grammars)• Build (or have a tool build) a scanner based on a DFAstartcharacterdigit•Want the longest token•myVarand not m or my or myV or ...5Language choices may complicate lexical analysis• FORTRAN (pre ‘90)– DO 5 I = 1.25assigns a value 1.25 to variable DO5I (spaces are ignored)– May lead to programmer errors!• Pascal: is 4.– Start of a floating point number (4.51)?– Half way through a subrange (4..20)?– Need to look ahead 2 charactersParsing• What are the possible statements of a language?– Addition (a+b)– While loop (while (b) { ... })– For loop (for (i = 0; i < 10; ++i) { ...})– Class declaration (class S extends T {...})– ...• How do languages specify legal statements?6Context-free grammars• regular expressions + recursion• expr -> identifier | number | - expr| expr operator exproperator -> + | - | * | /• Symbols that appear on lhs are non-terminals• Symbols that appear only on rhs are terminals– A scanner returns a stream of terminals to the parserParse tree for -x+y expr- exprexpr operator expridentifier(x) identifier(y)+Does this match what you would mean with -x+y?expr-> identifier | number | - expr| exproperator exproperator -> + | - | * | /7Another parse for -x+yexprexpr operator expridentifier(y)+- expridentifier(x)Does this match what you would mean with -x+y?expr-> identifier | number | - expr| exproperator exproperator -> + | - | * | /Ambiguity in grammars• When more than one production can be used– Resolution mechanisms:• Precedence (* has higher precedence than +)• Associativity (10 - 4 - 3 means (10 - 4) - 3)• Let’s rewrite the grammar to eliminate ambiguities810-4-3number (10)number(4)number(3)-a nonterminala nonterminal-Desired parse:“-” groups to the “left” (left associative)instead of expr -> expr operator expr, haveexpr -> expr operator term | termterm -> identifier | number10-4*3number (10)number(4)number(3)*a nonterminala nonterminal-Desired parse:Note how the higher precedence operators come lower in the derivation treeexpr -> expr add_op term | termterm -> term mult_op factor | factorfactor -> identifier | number | - factor | ( expression)9Parsing using the grammarTop down parsing• Predict non-terminal and match– id_list -> id id_list_tailid_list_tail -> , id id_list_tail | ;match_id_list() {match_id(); match_id_list_tail(); }match_id_list_tail() {if (cur_token == comma) {match_comma(); match_id(); match_id_list_tail();} else if (cur_token == semicolon) {match_semicolon();} else error();– More intuitive (when grammar is suitable)ParsingWhat next?• What does the parser in the previous slide do?– x, y, z; Yup!– x,, y, z: Nope!– x y, z: Nope!• What else needs to happen in a parser?– Build a parse or abstract syntax tree for later use10Building an AST• E.g., AST for:<ul><li> item 1 </li><li> item 2</li></ul> symbol tableitemsparentListNodetextparenttextparentPlainTextNodePlainTextNode“item 1” “item 2”StringStringitem valuenextitem valuenextparent parentListItemListItemsymbol tablesymbol tableTop down parsingWhen is the grammar suitable?• When you can predict the next production with a fixed (small) number of lookaheads• How many lookaheads do we need?– id_list -> id id_list_tailid_list_tail -> , id id_list_tail | ;– id_list -> id_list, id; | id;– expr -> expr add_op term | termterm -> term mult_op factor | factorfactor -> identifier | number | - factor | (expression)11An example• A -> Aa | Aab | cAnother example• S -> a S a | epsilon12Bottom-up parsing• Look at token stream and construct non-terminals– id_list -> id id_list_tailid_list_tail -> , id id_list_tail | ;– Parsing A,B• “A”: doesn’t match the rhs of any production• “A,”: ditto• “A,B”: ditto• “A,B;”: “;” matches a rhs of id_list_tail• “A,B id_list_tail”: ditto• “A id_list_tail”: matches rhs of id_list. Done!More on bottom-up parsing• Can handle more grammars than top-down since there is no prediction• We will look at only top-down in this course13Language design and parsing• Language design can have a significant effect on the complexity of parsers– Languages that are not amenable to straightforward parsing may also interfere with human readability– e.g., Pascal if-then-else statementsif (x) then if (y) thenwrite(“y’s then”);elsewrite (“whose else??”);Relationship to other courses• CSCI 4555: – More emphasis on using parser generation tools for larger grammars than we will use here14Relationship to readings• Reading covers parsing and lexical analysis in more detail– Gives more examples for both–


View Full Document

CU-Boulder CSCI 3155 - Syntax

Download Syntax
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Syntax and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Syntax 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?