DOC PREVIEW
CMU CS 15312 - Concrete Syntax

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-312 Foundations of Programming LanguagesRecitation 1: Concrete SyntaxDaniel Spoonhowerspoons+@csRevised by: Matthew [email protected] 30, 20041 Defining a GrammarWe can imagine a spectrum of different representations of a concept or an idea,from the most concrete (e.g. written on a blackboard) to the more ethereal (e.g.your understanding). We’ll begin with a description of one of the more concrete:the syntax of our language. We will be looking for two characteristics of thisrepresentation: its success in conveying the more abstract understanding thatunderlies it (i.e. readability) and to what extent it specifies a program withoutambiguity.Example The number three is an abstract concept. The following are fivedifferent concrete representations of the number three.· · · III 3 1 + 1 + 1 011(Which representation is best?)We will use a context-free grammar to define the syntax of our language.Using a grammar, we will combine strings of symbols, letters, words, tokens, oratoms to form sentences, phrases, statements, and programs. In addition, wewill see that syntax lets us make simple syntactic distinctions, for example asbetween digits (i.e. 0, 1, 2, . . . , 9) and numbers (e.g. 5119). Finally, we will seean example of an ambiguous grammar and how we might disambiguate it.1.1 Context-Free Grammars (CFGs)A context-free grammar is composed of the following three components:• An alphabet Σ composed of a finite set of terminals (or symbols)• A set of non-terminals N that form syntactic categories1• A set of production rules P of the formA → αwhere α is a string of terminals and non-terminals(Why is this form context-free? What would the productions of a context-sensitive grammar look like?)Consider the following grammar:d → 0 d → 1 d → 2 d → 3d → 4 d → 5 d → 6 d → 7d → 8 d → 9 n → d n → d n(What are the terminals? The non-terminals?)Strings are derived from the grammar by the application of production rules.For each non-terminal A that occurs in the current string β, and given a ruleA → α, we substitute the right-hand side of the rule α for each instance of A inβ.Example Using the rule n → d n, we can show the following.d n ⇒ d d nHow do we show that the numeral “3” can be derived beginning with thenon-terminal n? What about “5119”?n ⇒ d ⇒ 3n ⇒ d n ⇒ 5 n ⇒ 5 d n ⇒ 5 1 n ⇒ 5 1 d n ⇒ . . .Given a grammar, the language of a non-terminal A is the set of all stringsderivable from applications of the production rules, starting with A. We willcall this language L(A).We have just shown that “3” ∈ L(n) and “5119” ∈ L(n). What is anothername for L(n)? What about L(d)? How does it compare to L(n)?1.2 Backus-Naur Form (BNF)Rather than write the same non-terminals over and over again, we can abbre-viate the grammar using Backus-Naur Form:d ::= 0 | 1 | 2 | . . . | 9n ::= d | d n(Many instances of BNF use ankle-brackets around non-terminals (e.g. hni ::=hdi hni) but I’ll avoid doing so unless there is an overlap in the way we writeterminals and non-terminals.)22 AmbiguityLet’s add sums and products to our language using another non-terminal:d ::= 0 | 1 | . . . | 9n ::= d | d ne ::= n | e + e | e ∗ eNow take the expression “3 + 4 ∗ 5”. Here is one derivation of this expression.e ⇒ e + e ⇒ n + e ⇒ d + e ⇒ 3 + e ⇒ 3 + e ∗ e ⇒ . . .But what about this alternative? (How is this significant?)e ⇒ e ∗ e ⇒ e ∗ n ⇒ e ∗ d ⇒ e ∗ 5 ⇒ . . .The conventional rules of arithmetic would tell us that ∗ has higher precedencethan +, that we should read the expression as “3 + (4 ∗ 5)”.How can we make the grammar unambiguous? Conceptually, we’d like toseparate those expressions that we multiply together (i.e. factors) from thosewe add (i.e. terms)....e ::= t | t + et ::= f | f ∗ tf ::= n | (e)What does a derivation of “3 + 4 ∗ 5” look like given our new rules?e ⇒ t + e ⇒ f + e ⇒ n + e ⇒ d + e ⇒ 3 + e ⇒ 3 + t ⇒ 3 ∗ f + t ⇒ . . .3 ParsingThe process of finding a derivation for a given string is called parsing; this isexactly what we have been doing by hand, above. Obviously, this is a processwe would like to automate.How would we write a program to parse our arithmetic expressions startingwith the non-terminal e? Consider the two productions for e: both start withthe non-terminal t, so we start by parsing a term. Afterward, if we’ve reachedthe end of the input string, we use the first production (and then we’ve finishedparsing), otherwise we expect to see “+” followed by another expression: weconsume the “+” and start again. We can use a similar algorithm in parsingterms t. Finally, for factors f, if the next symbol is a digit, we use the firstproduction. Otherwise, we expect an “(” followed by an expression and a “)”.Note that for our arithmetic grammar, whenever we need to make a decisionas to which production to use, the next terminal of the input always tells uswhich choice to make. This sort of algorithm is called a recursive-descent parser.3In fact, the prose in the preceding paragraph could easily be translated to forma working recursive-descent parser.Not all context-free grammars are this simple; in general, finding the deriva-tion for a string given a CFG and a starting non-terminal can be far more dif-ficult. While most programming languages have grammars that can be parsedin a reasonable (linear) amount of time, there are few where you would actuallywant to write the parser. In many cases, programmers who write compilers useparser generators turn a BNF grammar into a working program.In this course, we won’t need a parser generator— our grammar will remainfairly simple— and you won’t need to worry much about parsing: we will domost of that work for you.4 Further reading• Andrew Appel’s “Modern Compiler Implementation in {ML, Java, C}”covers a variety of different types of grammars and their respective parsers.• “Compilers: Principles, Techniques and Tools” by Aho, Sethi, and Ullmanhas similar coverage of these


View Full Document

CMU CS 15312 - Concrete Syntax

Download Concrete Syntax
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Concrete Syntax and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Concrete Syntax 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?