CSE 3302 Programming LanguagesPhases of CompilationSyntax and SemanticsPurpose of Describing Syntax and SemanticsScanning and ParsingTokensReserved words vs. Predefined identifiersPrinciple of Longest SubstringWhite SpaceIndentation in PythonRegular ExpressionSlide 12Tasks of a ScannerScannerSlide 15GrammarSlide 17Grammars Produce LanguagesContext-Free GrammarWhat does “Context-Free” mean?Backus-Naur Form(BNF)BNF Example 1BNF Example 2Parse TreeSlide 25Slide 26Abstract Syntax TreeAST Example 1AST Example 2CSE 3302 Programming LanguagesChengkai Li, Weimin HeSpring 2008SyntaxLecture 2 - Syntax, Spring 2008 1CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008Phases of Compilation[Programming Language Pragmatics, by Michael Scott]Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20082Syntax and Semantics•Defining a programming language:–Specifications of syntaxSyntax – structure (form) of programs (the form a program in the language must take).–Specifications of semanticsSemantics - the meaning of programs•Precise definition, without ambiguity–Given a program, there is only one unique interpretation.Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20083Purpose of Describing Syntax and Semantics •Purpose–For language designers: Convey the design principles of the language–For language implementers: Define precisely what to be implemented–For language programmers: Describe the language that is to be used•How to describe? –Natural language: ambiguous –Formal ways: especially for syntaxLecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20084Scanning and Parsing•Lexical Structure: The structure of tokens (words)–scanning phase (lexical analysis) : scanner/lexer–recognize tokens from characters•Syntactical Structure: The structure of programs–parsing phase (syntax analysis) : parser–determines the syntactic structurecharacter streamparse treetoken streamscanner (lexical analysis)parser (syntax analysis)Lecture 2 - Syntax, Spring 2008 5CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008TokensTokens (words): Building blocks of programs•Reserved words (keywords): e.g., if,while,int,return•Literals/constants:–numeric literal: 42 –string literal: "hello"•Special symbols: e.g., “;”, “<=”, “+”•Identifiers: e.g., x24, monthly_balance, putcharLecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20086Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20087Reserved words vs. Predefined identifiers•Reserved words:–cannot be redefined.•e.g., double if; is illegal. •Predefined identifiers:–have initial meaning –allow redefinition (not a good idea in practice)•e.g., String,Object,System,Integer in JavaPrinciple of Longest Substring•doif vs. do if; x12 vs. x 12•The longest possible string of characters is collected into a single token.•An exception: FORTRAN–DO 99 I = 1.10 (the same as DO99I=1.10)–DO 99 I = 1, 10Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20088White Space•Principle of longest substring requires that tokens are separated by white space.•White space (token delimiters):–Blanks, newlines, tabs–ignored except that they separate tokens•Free-format language: format has no effect on the program structure–Most languages are free format–One exception: pythonLecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20089Indentation in Python def perm(l): for i in range(len(l)): s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) for x in p: r.append(l[i:i+1] + x) return r Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200810def perm(l):for i in range(len(l)):s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) for x in p: r.append(l[i:i+1] + x) return r #error: first line indented#error: not indented#error: unexpected indent#error: inconsistent dedentRegular Expression•A form for representing sets of strings•Description of patterns of characters•Basic operations:–Concatenation–Repetition–SelectionLecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20081112Regular ExpressionName REepsilon symbol aconcatenation ABselection A | Brepetition A*Shortcuts: A+ = AA*A? = A| [a-z] = (a|b|...|z)[a-z][a-z0-9]* (a|b)*aa(a|b)* [0-9]+(\.[0-9]+)? Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200813Tasks of a Scanner•Recognizes keywords•Recognizes special characters•Recognizes identifiers, integers, reals, decimals, strings, etc.•Ignores white spaces and commentsLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 200814Scannerregular expression description of the tokens→ (Lex or JLex)scanner of a language•Example: Figure 4.1 (page 82)Scanning and Parsing•Lexical Structure: The structure of tokens (words)•Syntactical Structure: The structure of programscharacter streamparse treetoken streamscanner (lexical analysis)parser (syntax analysis)Lecture 3 - Syntax, Spring 2008 15CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 2008regular expressiongrammarLecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200816GrammarExample:(1) sentence noun-phrase verb-phrase .(2) noun-phrase article noun(3) article a | the(4) noun girl | dog(5) verb-phrase verb noun-phrase(6) verb sees | petsFigure 4.2 (page 83)Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200817Grammar•Language: the programs (character streams) allowed•Grammar rules (productions): "produce" the languageleft-hand side, right-hand side•nonterminals (structured names): noun-phrase verb-phrase •terminals (tokens): . dog•metasymbols: (“consists of”) | (choice)•start symbol: the nonterminal that stands for the entire structure (sentence,
View Full Document