CSc 453Compilers and Systems Software4 : Lexical Analysis IIDepartment of Computer ScienceUniversity of [email protected]° 2009 Christian CollbergImplementing AutomataNFAs and DFAscan be hard-codedusing this pattern:state := start statec := first charwhile (true) {case state of {1: case c of {char1: {c := nextChar();state :=new state; }2: case c of {char2: {c := nextChar();state :=new state; }char3: {return; /* accept */}}}Implementing Automata. . .We can also encode the transitions directly into a transitiontable:next statestatechar1char2other Accepting1 22 2 2 [3]3√States in brackets don’t consume their inputs. Acceptingstates are indicated by a√. Empty entries represent errorstates.Implementing Automata. . .Given the table, we can write an interpreter to perform lexicalanalysis ofany DFA:state := 1c :=first charwhile not ACCEPT[state] do {newstate := NEXTSTATE[state,c]if ADVANCE[state,c] thenc := nextChar()state := newstate}if ACCEPT[state] then accept;Table-driven C Comments*3401all charsexcept *all charsexcept *,/2*/*/state / * other Accepting0 11 22 2 3 23 4 3 24√Table-driven C Comments. . .class Comments {public static final int SLASH = 0;public static final int STAR = 1;public static final int OTHER = 2;public static final int END = 3;static int[][] NEXTSTATE = {// "/" "*" other{ 1, -1, -1},{-1, 2, -1},{ 2, 3, 2},{ 4, 3, 2},{-1, -1, -1}};Table-driven C Comments. . .static boolean[] ACCEPT ={false,false,false,false,true};static boolean[][] ADVANCE = {// "/" "*" other{true, true, true},{true, true, true},{true, true, true},{true, true, true},{true, true, true}};Table-driven C Comments. . .static String input;static int current = -1;static int nextChar() {int ch;current++;if (current >= input.length()) return END;switch (input.charAt(current)) {case ’/’ : { ch = SLASH; break;}case ’*’ : { ch = STAR; break;}default : { ch = OTHER; break;}}return ch;}Table-driven C Comments. . .public static boolean interpret () {int state = 0;int c = nextChar();while ((c != END) && (state>=0) && !ACCEPT[state])int newstate = NEXTSTATE[state][c];if (ADVANCE[state][c])c = nextChar();state = newstate;}return (state>=0) && ACCEPT[state];}public static void main (String[] args) {input = args[0];boolean result = interpret();}Hard-coded C Comments*3401all charsexcept *all charsexcept *,/2*/*/Let’s do the same thing again, but this time we will hard-codethe interpreter using switch-statements.nextChar and the constant declarations are the same as forthe previous program.Hard-coded C Comments. . .class Comments {// Declarations of SLASH,STAR,OTHER,END, and nextChar().public static boolean interpret() {int state = 0;int ch = nextChar();while(true) {switch (state) {case -1 :return false;case 0 :switch (ch) {case SLASH:ch=nextChar();state=1;break;default :return false;}break;case 1 :switch (ch) {case STAR: ch=nextChar(); state=2;break;default : return false;}break;case 2 :switch (ch) {case SLASH: ch=nextChar(); state=2;break;case STAR : ch=nextChar(); state=3;break;case OTHER: ch=nextChar(); state=2;break;default : return false;}break;Hard-coded C Comments. . .case 3 :switch (ch) {case SLASH: ch=nextChar(); state=4;break;case STAR : ch=nextChar(); state=3;break;case OTHER: ch=nextChar(); state=2;break;default : return false;}break;case 4 :return (ch == END);}}}From REs to NFAsFrom REs to NFAsWe will describe our tokens using REs, convert these to anNFA, convert this to a DFA, and finally code this into aprogram or a table to be interpreted:NFADFAREprogramtableinterpreterWe will next show how to construct an NFA from a regularexpression. This algorithm is called Thompson’s Construction(after Ken Thompson of Bell Labs).Thompson’s ConstructionEach piece of a regular expression is turned into a part of anNFA.Each part is glued together (using ǫ-transitions) into acomplete automaton.An RE matching the character a translates intoaAn RE matching ǫ translates intoǫThompson’s Construction – ConcatenationWe represent an RE component r by the figure:rfor rStart stateAccepting statefor rAn RE matching the regular expression r followed by theregular expression s (rs) translates intosrǫThompson’s Construction – AlternationThe regular expression r|s translates intosrǫǫǫǫThompson’s Construction – RepetitionThe regular expression r* translates intorǫǫǫǫThompson’s Construction – Example IThe regular expression ab|a translates intoa baǫǫǫǫǫThompson’s Construction – Example IIThe regular expression letter(letter|digit)* translatesintoletterletterdigitǫǫǫǫǫǫǫǫǫFrom NFA to DFAFrom NFA to DFAWe now know how to translate a regular expression into anNFA, and how to translate a DFA into code. The missingpiece is how to translate an NFA into a DFA.NFADFAREprogramtableinterpreterFrom NFA to DFA. . .Each state in the DFA corresponds to a set of states in theNFA.The DFA will be in state2,3,4if the NFA could have been in any of the states2°, 3°, 4°.After reading a1a2···anthe DFA is in a a state thatrepresents the states the NFA could be in after seeing theinput a1a2···an.From NFA to DFA. . .4312aa bǫǫǫbBAbbaCA°in the DFA represents the set of states {1°, 2°, 4°} in theNFA. These are the states the FAs could be in before anyinput is consumed (the start states).B°in the DFA represents the set of states {2°, 3°, 4°} in theNFA. These are the states we can get to on the symbol afrom A°.From NFA to DFA. . .We need three functions:1ǫ-closure(T ) is the set of NFA states reachable from someNFA state s in T on ǫ-transitions alone. This is essentially agraph exploration algorithm that finds the nodes in a graphreachable from a griven node.2move(T ,a) is the set of NFA states to which there is atransition on input symbol a from some NFA state s ∈ T .3SubsetConstruction(N) returns a DFAD=(Dstates,Dtrans) corresponding to NFA N.ǫ-closure(T )procedure ǫ-closure(T )push all states in T onto stackC := Twhile stack is not empty dot := pop(stack)for each edge tǫ→ u doif u is not in C thenC := C ∪ upush(stack, u)return Cǫ-closure(T ) – Example4312aa bǫǫǫǫ-closure( 1°) = {1°, 2°, 4°}ǫ-closure( 2°) = {2°}ǫ-closure( 4°) = {2°, 4°}ǫ-closure({ 3°, 4°}) = {2°, 3°, 4°}move(T ,a) – Example4312aa bǫǫǫmove({ 1°}, a) = { 2°, 3°}move({ 2°, 3°}, b) = { 4°}SubsetConstruction(N)procedure SubsetConstruction(NFA N)Dstates := {ǫ-closure(s0)}Dtrans := {}repeatT := an unexplored state in Dstatesfor each input symbol a
View Full Document