2008-1-181CSE 3302 Programming LanguagesSyntaxChengkai Li, Weimin HeSpring 2008SyntaxLecture 3 - Syntax, Spring 2008 1CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008Phases of Compilation[Programming Language Pragmatics, by Michael Scott]Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20082Syntax and Semantics• Defining a programming language:– Specifications of syntaxSyntax – structure (form) of programs (the form a program in the language must take).Sifiti f ti–Specifications of semanticsSemantics - the meaning of programs• Precise definition, without ambiguity– Given a program, there is only one unique interpretation.Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20083Purpose of Describing Syntax and Semantics• Purpose– For language designers: Convey the design principles of the language– For language implementers: Define precisely what to be implemented– For language programmers: Describe the language that is to be used• How to describe? – Natural language: ambiguous – Formal ways: especially for syntaxLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20084Scanning and Parsing• Lexical Structure: The structure of tokens (words)– scanning phase (lexical analysis) : scanner/lexer– recognize tokens from characters•Syntactical Structure:The structure of programsSyntactical Structure:The structure of programs– parsing phase (syntax analysis) : parser– determines the syntactic structurecharacter streamparse treetoken streamscanner (lexical analysis)parser (syntax analysis)Lecture 3 - Syntax, Spring 2008 5CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008TokensTokens (words): Building blocks of programs• Reserved words (keywords): e.g., if,while,int,return• Literals/constants:– numeric literal: 42– string literal: "hello"• Special symbols: e.g., “;”, “<=”, “+”• Identifiers: e.g., x24, monthly_balance, putcharLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200862008-1-182Reserved words vs. Predefined identifiers• Reserved words:– cannot be redefined.• e.g., double if; is illegal. Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20087• Predefined identifiers:– have initial meaning – allow redefinition (not a good idea in practice)• e.g., String,Object,System,Integer in JavaPrinciple of Longest Substring• doif vs. do if; x12 vs. x 12• The longest possible string of characters is ll d i i l kcollected into a single token.• An exception: FORTRAN– DO 99 I = 1.10 (the same as DO99I=1.10)– DO 99 I = 1, 10Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20088White Space• Principle of longest substring requires that tokens are separated by white space.• White space (token delimiters):–Blanks, newlines, tabsBlanks, newlines, tabs– ignored except that they separate tokens• Free-format language: format has no effect on the program structure– Most languages are free format– One exception: pythonLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20089Indentation in Pythondef perm(l): for i in range(len(l)): s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) for x in p: r append(l[i:i+1] + x)#error: first line indented#error: not indented#error: unexpected indentr.append(l[i:i+1] + x) return r Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200810def perm(l):for i in range(len(l)):s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:])for x in p: r.append(l[i:i+1] + x) return r #error: inconsistent dedentRegular Expression• A form for representing sets of strings• Description of patterns of characters• Basic operations:– Concatenation– Repetition– SelectionLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200811Regular ExpressionName REepsilon εsymbol aconcatenation ABselection A | B12repetition A*Shortcuts: A+ = AA*A? = A|ε [a-z] = (a|b|...|z)[a-z][a-z0-9]* (a|b)*aa(a|b)*[0-9]+(\.[0-9]+)? Lecture 2 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 20082008-1-183Tasks of a Scanner• Recognizes keywords• Recognizes special characters• Recognizes identifiers, integers, reals, decimals, Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200813g,g,,,strings, etc.• Ignores white spaces and commentsScannerregular expression description of the tokens→ (Lex or JLex)scanner of a language• Example: Figure 4.1 (page 82)Lecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 200814pg (pg)Lecture 3 - Syntax, Spring 2008 14CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008Scanning and Parsing• Lexical Structure: The structure of tokens (words)• Syntactical Structure: The structure of programscharacter streamparse treetoken streamscanner (lexical analysis)parser (syntax analysis)Lecture 3 - Syntax, Spring 2008 15CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He 2008regular expressiongrammarLecture 3 - Syntax, Spring 2008 15CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 2008GrammarExample:(1) sentence → noun-phrase verb-phrase .(2) noun-phrase → article nounLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200816(3) article → a | the(4) noun → girl | dog(5) verb-phrase → verb noun-phrase(6) verb → sees | petsFigure 4.2 (page 83)Grammar• Language: the programs (character streams) allowed• Grammar rules (productions): "produce" the languageleft-hand side, right-hand side• nonterminals (structured names): noun-phraseverb-phraseLecture 3 - Syntax, Spring 2008CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, 200817nounphrase verbphrase • terminals (tokens): . dog• metasymbols: → (“consists of”) | (choice)• start symbol: the nonterminal that
View Full Document