Wright CS 780 - Implementation of Lexical Analysis (Scanning)

Unformatted text preview:

1Implementation of Lexical Analysis(Scanning)CS780(Prasad) L4Lexer 1Adapted from material by:Prof. Alex Aiken and Prof. George Necula (UCB)Prof. Saman Amarasinghe (MIT)Outline• Specifying lexical structure using regular expressions• Recognizing tokens using finite automata¾Deterministic Finite Automata (DFAs)CS780(Prasad) L4Lexer 2()¾Non-deterministic Finite Automata (NFAs)• Implementation of regular expressionsRegExp => NFA => DFA => TablesRegular Expressions => Lexical Spec. 1. Write a regular expression for the lexemes of each token• Number = digit + (Kleene plus)+ (Kleene plus)•Keyword = ‘if’ | ‘else’ | … (Union)(Union)•Identifier =letter (letter | digit)*CS780(Prasad) L4Lexer 3•Identifier = letter (letter | digit)*• OpenPar = ‘(’•…2. Construct R, matching all lexemes for all tokensR = Keyword | Identifier | Number | …= R1| R2| … (cont’d)3. Let input be the sequence of characters x1…xn• For each 1 ≤ i ≤ n, check ifx1…xi∈ L(R)4. It must be thatCS780(Prasad) L4Lexer 4x1…xi∈ L(Rj) for some j5. Remove x1…xifrom input and go to (3)Problem: There are ambiguities in the algorithm.2Ambiguities • How much input is used? What if x1…xi∈ L(R) and also x1…xK∈ L(R)¾Rule: Pick longest possible string in L(R) The “maximal munch” << vs < , != vs !CS780(Prasad) L4Lexer 5• Which token is used? What if x1…xi∈ L(Rj) and also x1…xi∈ L(Rk)¾Rule: use rule listed first (j if j < k)Treats “if” as a keyword not an identifierError Handling• What ifNo rule matches a prefix of input ?• Problem: Can get stuck …CS780(Prasad) L4Lexer 6• Solution: ¾Write a rule matching all “bad” strings¾Put it last (catch-all clause)Summary• Regular expressions provide a concise notation for string patterns.• Use in lexical analysis requires small extensions:¾TlbiiiCS780(Prasad) L4Lexer 7¾To resolve ambiguities¾To handle errors• Good algorithms known¾Require only single pass over the input¾Few operations per character (table lookup)Parity Problem1s. ofnumber even contains )(: }1,0{**ωωω⇔→ΣΣ∈=ΣparitybooleanparityCS780(Prasad) L4Lexer 8EPOP1100Finite automaton = Recognizer3Basic Features• Consumes the entire input string.• Remembers the parity of the bit string by abstracting from the number of 1s in the string.• Finite amount of memory required for this purposeCS780(Prasad) L4Lexer 9purpose.Observe that counting requires unbounded memory, while computing the parity requires very small and fixed amount of memory.• Accepts/Rejects the input in a deterministic fashion.Deterministic Finite State Automaton (DFA)Q: Finite set of statesΣ: Finite Alphabet),,,,(0FqQMδΣ=CS780(Prasad) L4Lexer 10pδ: Transition function total function from Q x Σ to Q: Initial/Start StateF : Set of final/accepting state0qFinite Automata State Graphs• The start state• StateCS780(Prasad) L4Lexer 11• An accepting state• A transitionaOperation of the machine•Read the current letter of input under the0010qInput TapeFinite ControlCS780(Prasad) L4Lexer 12•Read the current letter of input under the tape head.• Transit to a new state depending on the current input and the current state, as dictated by the transition function.• Halt after consuming the entire input.4• Machine configuration:• Yields relation:Associating Language with DFA∗Σ∈∈ωω, where],[ QqqCS780(Prasad) L4Lexer 13• Language:]),,([ ],[ Mωδωaqaq a} ],[ ],[ |{*M0*Fqqq ∈∧Σ∈444344421aεωω• Set of strings over {a,b} that contain bb•Design states by partitioningΣ*∗∗)|()|( babbbaExampleCS780(Prasad) L4Lexer 14•Design states by partitioning Σ.Strings containing bb q2Strings not containing bbStrings that end in b q1Strings that do not end in b q0¾Initial state: q0¾Final state: q2q2State Diagram and Tableq0 q1abababCS780(Prasad) L4Lexer 15bδabq0 q0 q1q1 q0 q2q2 q2 q2}2{},{}2,1,0{qFbaqqqQ==Σ=],1[],0[*εqaabq aq0 q2Strings over {a,b} that do not contain bbq1abababCS780(Prasad) L4Lexer 16bδabq0 q0 q1q1 q0 q2q2 q2 q2}1,0{},{}2,1,0{qqFbaqqqQ==Σ=],0[],0[*εqbaq a5Strings over {a,b} containing even number of a’s and odd number of b’s.Σ*Ea OaEb Ob ObEbCS780(Prasad) L4Lexer 17bbbbaaaa[Oa,Ob][Ea,Eb] [Ea,Ob][Oa,Eb]Example• Alphabet {0,1}• What language does this recognize?10CS780(Prasad) L4Lexer 180 011Nondeterministic Finite Automataqiqjqkqqi qjaaaDFANFAaCS780(Prasad) L4Lexer 19relationsubset function total)(:function total :QQQΡowQQQNFANFADFA×Σ×⊆→Σ×→Σ×δδδq2NFA State Diagram (Strings over {a,b} ending in bb)q0 q1abb}210{qqqQ=bCS780(Prasad) L4Lexer 20δabq0 {q0}{q0,q1}q1φ{q2}q2φφ}2{},{}2,1,0{qFbaqqqQ==Σ=bbba∗)|(6],0[],0[],0[],0[εqbqbbqabbq aaa],1[],0[],0[],0[εqbqbbqabbq aaaHalts in non-accepting state after consuming the input.CS780(Prasad) L4Lexer 21],2[],1[],0[],0[εqbqbbqabbq aaaHalts in accepting state after consuming the input.)(NFALabb ∈Introducing ε-transitions into NFA•A ε-transition causes the machine to hidiiill)(}){(: QPQ →∪Σ×εδCS780(Prasad) L4Lexer 22change its state non-deterministically, without consuming any input.)()( NFAsLDFAsL ⊆Deterministic and Nondeterministic Automata• Deterministic Finite Automata (DFA)¾One transition per input per state¾No ε-moves• Nondeterministic Finite Automata (NFA)CS780(Prasad) L4Lexer 23¾Can have multiple transitions for one input in a given state¾Can have ε-moves• Finite automata have finite memory¾Need only to encode the current stateHow do we associate a language with NFA?• A DFA can take only one path through the state graph• NFAs can choose¾Whether to makeεmovesCS780(Prasad) L4Lexer 24¾Whether to make ε-moves¾Or one of the multiple transitions for a single input to take Accept if there exists a accepting computation. Reject if all computations are non-accepting.7• Every DFA is an NFA. However, does non-determinism make NFAs strictly more expressive (powerful) than DFAs?)()( DFAsLNFAsL ⊇CS780(Prasad) L4Lexer 25¾ For type 0 languages (Turning Machines) and type 3 languages (regular languages), non-determinism does not add expressive power.¾ For context-free languages and context-sensitive languages, non-determinism does enhance the expressive power.NFA State Diagram (a | b)* bb (a | b)*q2q0 q1abbbbaCS780(Prasad) L4Lexer 26q2q0


View Full Document

Wright CS 780 - Implementation of Lexical Analysis (Scanning)

Download Implementation of Lexical Analysis (Scanning)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Implementation of Lexical Analysis (Scanning) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation of Lexical Analysis (Scanning) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?