UT Dallas CS 4337 - #Sebesta ch04 Parsing & LL & LR (58 pages)

Previewing pages 1, 2, 3, 4, 27, 28, 29, 30, 55, 56, 57, 58 of 58 page document View the full content.
View Full Document

#Sebesta ch04 Parsing & LL & LR



Previewing pages 1, 2, 3, 4, 27, 28, 29, 30, 55, 56, 57, 58 of actual document.

View the full content.
View Full Document
View Full Document

#Sebesta ch04 Parsing & LL & LR

36 views


Pages:
58
School:
University of Texas at Dallas
Course:
Cs 4337 - Organization of Programming Languages
Unformatted text preview:

Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis The Parsing Problem Recursive Descent Parsing Bottom Up Parsing Syntax Analysis The syntax analysis portion of a language processor nearly always consists of two parts A low level part called a lexical analyzer mathematically a finite automaton based on a regular grammar A high level part called a syntax analyzer or parser mathematically a push down automaton based on a context free grammar or BNF BNF for syntax specification Copyright 2012 Addison Wesley All rights reserved 1 2 Advantages of Using BNF to Describe Syntax BNF example expr term term term factor factor factor id int constant expr Provides a clear and concise syntax description The parser can be based directly on the BNF Parsers based on BNF are easy to maintain Copyright 2012 Addison Wesley All rights reserved 1 3 Reasons to Separate Lexical and Syntax Analysis Simplicity less complex approaches can be used for lexical analysis separating them simplifies the parser Efficiency separation allows optimization of the lexical analyzer Portability parts of the lexical analyzer may not be portable but the parser always is portable Copyright 2012 Addison Wesley All rights reserved 1 4 Lexical Analysis A lexical analyzer is a pattern matcher for character strings is a front end for the parser Identifies substrings of the source program that belong together lexemes Lexemes match a character pattern which is associated with a lexical category called a token sum is a lexeme its token may be IDENT Copyright 2012 Addison Wesley All rights reserved 1 5 Lexical Analysis continued The lexical analyzer is usually a function which is called by the parser when it needs the next token Two approaches to building a lexical analyzer Write a formal description of the tokens and use a software tool that constructs a table driven lexical analyzer from such a description Design a state diagram that describes the tokens and write a program that implements the state diagram or hand construct a table driven implementation of the state diagram Copyright 2012 Addison Wesley All rights reserved 1 6 Lexical Analysis with State Diagram In many cases transitions can be combined to simplify the state diagram When recognizing an identifier all uppercase and lowercase letters are equivalent Use a character class that includes all letters When recognizing an integer literal all digits are equivalent use a digit class Reserved words and identifiers can be recognized together rather than having a part of the diagram for each reserved word Use a table lookup to determine whether a possible identifier is in fact a reserved word Copyright 2012 Addison Wesley All rights reserved 1 7 Lexical Analysis continued Convenient utility subprograms getChar gets the next character of input puts it in nextChar determines its class and puts the class in charClass addChar puts the character from nextChar into the place the lexeme is being accumulated lexeme lookup determines whether the string in lexeme is a reserved word returns a code Copyright 2012 Addison Wesley All rights reserved 1 8 State Diagram Copyright 2012 Addison Wesley All rights reserved 1 9 Lexical Analyzer Following is the output of the lexical analyzer of front c pp 172 177 when used on sum 47 total Next Next Next Next Next Next Next Next token token token token token token token token is is is is is is is is 25 11 21 10 26 24 11 1 Next Next Next Next Next Next Next Next Copyright 2012 Addison Wesley All rights reserved lexeme lexeme lexeme lexeme lexeme lexeme lexeme lexeme is is is is is is is is sum 47 total EOF 1 10 The Parsing Problem Goals of the parser given an input program Find all syntax errors for each produce an appropriate diagnostic message and recover quickly Produce the parse tree or at least a trace of the parse tree for the program Copyright 2012 Addison Wesley All rights reserved 1 11 The Parsing Problem continued Two categories of parsers Top down produce the parse tree beginning at the root Order is that of a leftmost derivation Traces or builds the parse tree in preorder Bottom up produce the parse tree beginning at the leaves Order is that of the reverse of a rightmost derivation Many parsers look only one token ahead in the input lookahead Copyright 2012 Addison Wesley All rights reserved 1 12 The Parsing Problem continued Top down Parsers Given a sentential form xA the parser must choose the correct A rule to get the next sentential form in the leftmost derivation using only the first token produced by A The most common top down parsing algorithms Recursive descent a coded implementation LL parsers table driven implementation Copyright 2012 Addison Wesley All rights reserved 1 13 The Parsing Problem continued Bottom up parsers Given a right sentential form determine what substring of is the right hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation The most common bottom up parsing algorithms are in the LR family Copyright 2012 Addison Wesley All rights reserved 1 14 The Parsing Problem continued The Complexity of Parsing Parsers that work for any unambiguous grammar are complex and inefficient O n3 where n is the length of the input Compilers use parsers that only work for a subset of all unambiguous grammars but do it in linear time O n where n is the length of the input Copyright 2012 Addison Wesley All rights reserved 1 15 Copyright 2012 Addison Wesley All rights reserved 1 16 Recursive Descent Parsing There is a subprogram for each nonterminal in the grammar which can parse sentences that can be generated by that nonterminal EBNF is ideally suited for being the basis for a recursive descent parser because EBNF minimizes the number of nonterminals Copyright 2012 Addison Wesley All rights reserved 1 17 Recursive Descent Parsing continued A grammar for simple expressions expr term term term factor factor factor id int constant expr Copyright 2012 Addison Wesley All rights reserved 1 18 Recursive Descent Parsing continued Assume we have a lexical analyzer named lex which puts the next token code in nextToken The coding process when there is only one RHS For each terminal symbol in the RHS compare it with the next input token if they match continue else there is an error For each nonterminal symbol in the RHS call its associated parsing subprogram Copyright 2012 Addison Wesley All rights reserved 1 19 Recursive Descent Parsing continued Function


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view #Sebesta ch04 Parsing & LL & LR and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view #Sebesta ch04 Parsing & LL & LR and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?