Unformatted text preview:

Outline Specifying lexical structure using regular expressions Recognizing tokens using finite automata Implementation of Lexical Analysis Scanning Deterministic Finite Automata DFAs Non deterministic Finite Automata NFAs Adapted from material by Prof Alex Aiken and Prof George Necula UCB Prof Saman Amarasinghe MIT Implementation of regular expressions RegExp NFA DFA Tables CS780 Prasad L4Lexer 1 CS780 Prasad Regular Expressions Lexical Spec Number digit Keyword if else Identifier letter letter digit OpenPar 3 Let input be the sequence of characters x1 xn For each 1 i n check if x1 xi L R Kleene plus Union 4 It must be that x1 xi L Rj L4Lexer for some j 5 Remove x1 xi from input and go to 3 2 Construct R matching all lexemes for all tokens R Keyword Identifier Number R1 R2 CS780 Prasad 2 cont d 1 Write a regular expression for the lexemes of each token L4Lexer Problem There are ambiguities in the algorithm 3 CS780 Prasad L4Lexer 4 1 Ambiguities Error Handling How much input is used What if What if x1 xi L R and also x1 xK L R No rule matches a prefix of input Rule Pick longest possible string in L R The maximal munch Problem Can get stuck vs vs Solution Which token is used What if x1 xi L Rj and also x1 xi L Rk Write a rule matching all bad strings Put it last catch all clause Rule use rule listed first j if j k Treats if as a keyword not an identifier CS780 Prasad L4Lexer 5 CS780 Prasad L4Lexer 6 Parity Problem Summary Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions 0 1 parity boolean parity contains even number of 1s To resolve T l ambiguities bi i i To handle errors Finite automaton Recognizer Good algorithms known Require only single pass over the input Few operations per character table lookup CS780 Prasad L4Lexer 0 7 CS780 Prasad 1 EP 1 L4Lexer OP 0 8 2 Deterministic Finite State Automaton DFA Basic Features M Q q0 F Consumes the entire input string Remembers the parity of the bit string by abstracting from the number of 1s in the string Finite amount of memory required for this purpose purpose Q Finite set of states Finite Alphabet p Transition function total function from Q x to Q q0 Initial Start State F Set of final accepting state Observe that counting requires unbounded memory while computing the parity requires very small and fixed amount of memory Accepts Rejects the input in a deterministic fashion CS780 Prasad L4Lexer 9 CS780 Prasad Finite Automata State Graphs L4Lexer Operation of the machine State 0 q0 The start state CS780 Prasad a L4Lexer 0 1 Input Tape Finite Control Read the current letter of input under the tape head Transit to a new state depending on the current input and the current state as dictated by the transition function Halt after consuming the entire input An accepting state A transition 10 11 CS780 Prasad L4Lexer 12 3 Associating Language with DFA Machine configuration q Example where q Q Set of strings over a b that contain bb a b bb a b Yields relation Design states by partitioning q a a M q a Strings containing bb Strings not containing bb Language q 0 a q 1 4 442 4 4 4 3 CS780 Prasad Strings that end in b Strings that do not end in b q F M L4Lexer q2 Initial state Final state 13 q1 q0 q0 q2 CS780 Prasad L4Lexer 14 Strings over a b that do not contain bb State Diagram and Table a a b q0 b q1 q2 a Q q0 q1 q2 a b F q2 q 0 aab a q1 CS780 Prasad a a L4Lexer a b q0 q0 q1 q1 q0 q2 q2 q2 q2 b q0 b q1 q2 a b Q q0 q1 q2 a b F q0 q1 q 0 ba a q 0 15 CS780 Prasad L4Lexer a b q0 q0 q1 q1 q0 q2 q2 q2 q2 b 16 4 Strings over a b containing even number of a s and odd number of b s Example Alphabet 0 1 What language does this recognize Ea Eb Ea Eb a a Oa Eb Ob Ob Oa Eb CS780 Prasad b 0 Ea Ob b a b a b L4Lexer a 17 a qi qj q a CS780 Prasad L4Lexer 18 b a qj q0 b b q1 q2 qk Q q0 q1 q2 a b F q2 total function NFA Q ow Q NFA Q Q L4Lexer NFA State Diagram Strings over a b ending in bb NFA DFA Q Q 1 CS780 Prasad a DFA 0 1 Oa Ob Nondeterministic Finite Automata qi 0 1 total function a b bb subset relation 19 CS780 Prasad L4Lexer a b q0 q0 q0 q1 q1 q2 q2 20 5 q0 abb a q0 bb a q0 b a q0 Introducing transitions into NFA q0 abb a q0 bb a q0 b a q1 Q P Q Halts in non accepting state after consuming the input q0 abb a q0 bb a q1 b a q2 A transition causes the machine to change h its i state non deterministically d i i i ll without consuming any input Halts in accepting state after consuming the input L DFAs L NFAs abb L NFA CS780 Prasad L4Lexer 21 Deterministic and Nondeterministic Automata One transition per input per state No moves 22 How do we associate a language with NFA NFAs can choose Nondeterministic Finite Automata NFA Can have multiple transitions for one input in a given state Can have moves Finite automata have finite memory Whether to make moves moves Or one of the multiple transitions for a single input to take Accept if there exists a accepting computation Reject if all computations are non accepting Need only to encode the current state L4Lexer L4Lexer A DFA can take only one path through the state graph Deterministic Finite Automata DFA CS780 Prasad CS780 Prasad 23 CS780 Prasad L4Lexer 24 6 NFA State Diagram a b bb a b Every DFA is an NFA However does nondeterminism make NFAs strictly more expressive powerful than DFAs L4Lexer NFA for b aa bb a q0 a q1 b q0 a b q1 b q2 a CS780 Prasad L4Lexer 26 NFA vs DFA b a b a q2 a a 25 b q1 b DFA For type 0 languages Turning Machines and type 3 languages regular languages non determinism does not add expressive power For context free languages and context sensitive languages non determinism does enhance the expressive power a b q0 L NFAs L DFAs CS780 Prasad a b a Equivalent machines b q2 1 NFA 0 0 0 0 0 a b CS780 Prasad q11 L4Lexer b b 1 DFA q22 0 1 1 27 CS780 Prasad L4Lexer 28 7 NFA vs DFA Reg Expr to Finite Automata High level sketch NFAs and DFAs recognize the same set of languages regular languages NFA DFAs are faster to execute There are no choices to consider For a …


View Full Document

Wright CS 780 - Implementation of Lexical Analysis (Scanning)

Loading Unlocking...
Login

Join to view Implementation of Lexical Analysis (Scanning) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Implementation of Lexical Analysis (Scanning) and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?