CSc 453 — Compilers and Systems Software5 : Concrete and Abstract GrammarsChristian CollbergDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergSeptember 9, 20091 Syntax• Thesyntax of a language (formal or natural) is the way the words in a sentence/program can bearranged.• eats dog bone the is not a legal arrangement of words in English.• = y x + 5 is not a legal arrangement of tokens in Java.• Somehow, we need to describe what constitutes legal and illegal sentences in a particular language.• We useproduction rules to describe the syntax of a language.2 Production Rules• Here’s a production rule:IfStat → if ( expr ) stat• This rule states that to construct an if-statement in C you have to type1. an if, then2. a (, then3. some sort of expression, then4. a ), then finally5. some sort of statement.3 A Grammar for English• Agrammar can be used for1. sentence generation (i.e. which sentences does this grammar generate?), or2. parsing (i.e. is sentence S generated by this grammar?).• Let’s look at a simple grammar for a fragment of English.14 Syntactic CategoriesS [Sentence] John likes Sarah’s black hairN [Noun] John, hairV [Verb] eating, satAdj [Adjective] black, longDet [Determiner] the, a, everyNP [Noun Phrase] Sarah’s long black hairVP [Verb Phrase]eating apples5 A Simple English GrammarS → NP VPVP → V NPVP → VNP → NNP → Det NN → JohnN → LisaN → houseV → diedV → kissedDet → theDet → a• S, NP, VP, N, Det, V arenon-terminal symbols.• John, Lisa, house, died,... areterminal symbols.• S is thestart symbol.6 Sentence Generation1. Start with the start sym-bol.2. Pick a non-terminal X onthe right hand side.3. Pick a grammar rule X →γ.4. Replace X with γ.5. Repeat until left with astring of words.SS→NP VP=⇒ NP VPNP→N=⇒ N VPN→John=⇒ John VPVP→V NP=⇒ John V NPV→kissed=⇒ John kissed NPNP→N=⇒ John kissed NN→Lisa=⇒ John kissed Lisa7 Terminology• A grammar is a 4-tuple(non-terminals, terminals, productions, start-symbol)2or(N, Σ, P, S)• A production is of the form α → β where α, β are taken from NSΣ.• Read α → β as “rewrite α with β”.• Read ⇒ as “directly derives”.• Readr⇒ as “directly derives using rule r”.• Read∗⇒ as “derives in one or more steps”.8 A Simple PL GrammarHere’s a grammar for a simple programming language:Program ::= BEGIN Stat ENDStat ::= ident := ExprExpr::= Expr + Expr |Expr * Expr |ident | number• We write terminal symbols like this.• We write non-terminal symbols likethis.• Sometimes we write ::= instead of →.• A → b | c is the same as A → b; A → c. Read | as “or”.9 A Simple PL Grammar. . .We know the sentenceBEGIN a := 5 + 4 * 3 ENDis in the language because we can derive it from the start symbol:Program ⇒ BEGIN Stat END⇒ BEGIN ident := Expr END⇒ BEGIN "a" := Expr END⇒ BEGIN "a" := Expr + Expr END⇒ BEGIN "a" := 5 + Expr END⇒ BEGIN "a" := 5 + Expr * Expr END⇒ BEGIN "a" := 5 + 4 * Expr END⇒ BEGIN "a" := 5 + 4 * 3 END310 Terminology. . .• Our English grammar is the 4-tuple({S,NP,V,. . . },{John,house,died,. . . },{S → NP VP, VP → V,. . . },S)• Our PL grammar is the 4-tuple({Program,Stat,. . . },{BEGIN,:=,*,. . . },{ Program ::= BEGIN Stat END,. . . },Program)11 Parse Trees• We often want to show how a particular sentence was derived. We can do this without listing all thesteps explicitly by drawing aparse tree.• A parse tree is a tree where1. The root is labeled by the start symbol.2. Each leaf is labeled by a terminal symbol.3. Each interior node is labeled by a non-terminal symbol.12 Parse Trees. . .• If one step of our derivation is· · · A · · · ⇒ · · · X Y Z · · ·(i.e, we used the rule A → XY Z) then we’ll get a parse (sub-)tree............AXYZ13 Parse Trees. . .SS→NP VP=⇒ NP VPNP→N=⇒ N VPN→John=⇒ John VPVP→V NP=⇒ John V NPV→kissed=⇒ John kissed NPNP→N=⇒ John kissed NN→Lisa=⇒ John kissed LisaLisaNPSVPNVNPNkissedJohn414 Parse Trees. . .Program ⇒ BEGIN Stat END⇒ BEGIN ident := Expr END⇒ BEGIN "a" := Expr END⇒ BEGIN "a" := Expr + Expr END⇒ BEGIN "a" := 5 + Expr END⇒ BEGIN "a" := 5 + Expr * Expr END⇒ BEGIN "a" := 5 + 4 * Expr END⇒ BEGIN "a" := 5 + 4 * 3 END+ident"a"Expr5Expr4Expr3Expr*ExprEND:=BEGIN StatProgram15Regular Grammars and Lexical Analysis16 Regular Grammars• A grammar isregular if all rules are of the formA → aBA → a• By convention, the symbols A, B, C, . . . are non-terminals, a, b, c, . . . are terminals, and α, β, γ, . . . arestrings of symbols.• Regular grammars are used to describe the lexical structure of programs, i.e. what tokens look like.17Parsing and the Definition of Syntax18 Context-Free Grammars• Programming language syntax is described by acontext free grammar (CFG).• In a CFG all rules are of the formA → γγ is any sequence of terminals or non-terminals. A is a single non-terminal.• Example: an if-statement consists of an if-token, expression, then-token, statement, and (maybe) anelse-token followed by a statement.519 EBNF• BNF is Backus-Naur Form, a way to write CFGs. EBNF (Extended BNF) is a more expressive wayto write CFGs.• Repetition and choice are common structures in a language (and hence, its grammar).• Repetition:int x,y,z,w,....;• Choice:class C { ... }class C extends D { ... }20 EBNF. . .• In BNF, our variable declarationint x,y,z,w,....;looks like this:vars ::= ident ident idlist ;idlist ::= , ident idlist | ǫ• In EBNF, it looks like this:vars ::= ident ident { , ident } ;• I.e. {e} means that e is repeated 0 or more times.21 EBNF. . .• In BNF, our class declarationclass C extends D { ... }looks like this:class ::= class ident extends { . . . }extends ::= extends ident | ǫ• In EBNF, it looks like this:class ::= class ident [extends ident] { . . . }• I.e. [e] means that e is optional.622 EBNF for Lucaprogram ::=PROGRAM ident ; decl list block .decl list ::={declaration ; }declaration ::=VAR ident : ident |TYPEident = RECORD [ [field list] ] |TYPE ident = ARRAY expression OF ident |CONST ident : ident = expression |PROCEDUREident ( [formal list] ) decl list block ;23 EBNF for Luca. . .field list ::= field decl {; field decl }field decl ::= ident : identformal list ::= formal param {; formal param }formal param ::= [VAR] ident : identactual list ::= expression {, expression }block ::= BEGIN
View Full Document