DOC PREVIEW
UW-Madison CS 536 - Lecture Notes

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

97CS 536 Spring 2007©These four components of afinite automaton are oftenrepresented graphically:Finite automata (the plural ofautomaton is automata) arerepresented graphically usingtransition diagrams. We start atthe start state. If the next inputcharacter matches the label onis a transitionis the start stateis an accepting stateis a state98CS 536 Spring 2007©a transition from the currentstate, we go to the state itpoints to. If no move ispossible, we stop. If we finishin an accepting state, thesequence of characters readforms a valid token; otherwise,we have not seen a valid token.In this diagram, the validtokens are the stringsdescribed by the regularexpression (a b (c)+ )+.abcca99CS 536 Spring 2007©Deterministic Finite AutomataAs an abbreviation, a transitionmay be labeled with more thanone character (for example,Not(c)). The transition may betaken if the current inputcharacter matches any of thecharacters labeling the transition.If an FA always has a uniquetransition (for a given state andcharacter), the FA is deterministic(that is, a deterministic FA, orDFA). Deterministic finiteautomata are easy to programand often drive a scanner.If there are transitions to morethan one state for some character,then the FA is nondeterministic(that is, an NFA).100CS 536 Spring 2007©A DFA is conveniently representedin a computer by a transitiontable. A transition table, T, is atwo dimensional array indexed bya DFA state and a vocabularysymbol.Table entries are either a DFAstate or an error flag (oftenrepresented as a blank tableentry). If we are in state s, andread character c, then T[s,c] willbe the next state we visit, or T[s,c]will contain an error markerindicating that c cannot extendthe current token. For example,the regular expression// Not(Eol)* Eolwhich defines a Java or C++single-line comment, might betranslated into101CS 536 Spring 2007©The corresponding transitiontable is:A complete transition tablecontains one column for eachcharacter. To save space, tablecompression may be used. Onlynon-error entries are explicitlyrepresented in the table, usinghashing, indirection or linkedstructures.State Character/ Eol a b …12233343334eofEol//Not(Eol)1234102CS 536 Spring 2007©All regular expressions can betranslated into DFAs that accept(as valid tokens) the stringsdefined by the regularexpressions. This translation canbe done manually by aprogrammer or automaticallyusing a scanner generator.A DFA can be coded in:• Table-driven form• Explicit control formIn the table-driven form, thetransition table that defines aDFA’s actions is explicitlyrepresented in a run-time tablethat is “interpreted” by a driverprogram.In the direct control form, thetransition table that defines aDFA’s actions appears implicitly asthe control logic of the program.103CS 536 Spring 2007©For example, supposeCurrentChar is the current inputcharacter. End of file isrepresented by a special charactervalue, eof. Using the DFA for theJava comments shown earlier, atable-driven scanner is:State = StartStatewhile (true){if (CurrentChar == eof)breakNextState =T[State][CurrentChar] if(NextState == error)breakState = NextStateread(CurrentChar)}if (State in AcceptingStates)// Process valid tokenelse // Signal a lexical error104CS 536 Spring 2007©This form of scanner is producedby a scanner generator; it isdefinition-independent. Thescanner is a driver that can scanany token if T contains theappropriate transition table.Here is an explicit-control scannerfor the same comment definition:if (CurrentChar == '/'){read(CurrentChar)if (CurrentChar == '/')repeatread(CurrentChar)until (CurrentChar in{eol, eof})else //Signal lexical errorelse // Signal lexical errorif (CurrentChar == eol)// Process valid tokenelse //Signal lexical error105CS 536 Spring 2007©The token being scanned is“hardwired” into the logic of thecode. The scanner is usually easyto read and often is moreefficient, but is specific to a singletoken definition.106CS 536 Spring 2007©More Examples• A FORTRAN-like real literal (whichrequires digits on either or both sidesof a decimal point, or just a string ofdigits) can be defined asRealLit = (D+(λ | . )) | (D*. D+)This corresponds to the DFA.DDDD.107CS 536 Spring 2007©• An identifier consisting of letters,digits, and underscores, which beginswith a letter and allows no adjacentor trailing underscores, may bedefined asID = L (L | D)* ( _ (L | D)+)*This definition includes identifierslike sum or unit_cost, butexcludes _one and two_ andgrand___total. The DFA is:L | DLL | D_108CS 536 Spring 2007©Lex/Flex/JLexLex is a well-known Unix scannergenerator. It builds a scanner, inC, from a set of regularexpressions that define thetokens to be scanned.Flex is a newer and faster versionof Lex.Jlex is a Java version of Lex. Itgenerates a scanner coded inJava, though its regularexpression definitions are veryclose to those used by Lex andFlex.Lex, Flex and JLex are largely non-procedural. You don’t need to tellthe tools how to scan. All youneed to tell it what you wantscanned (by giving it definitionsof valid tokens).109CS 536 Spring 2007©This approach greatly simplifiesbuilding a scanner, since most ofthe details of scanning (I/O,buffering, character matching,etc.) are automatically handled.110CS 536 Spring 2007©JLexJLex is coded in Java. To use it,you enterjava JLex.Main f.jlexYour CLASSPATH should be set tosearch the directories where JLex’sclasses are stored.(The CLASSPATH we gave youincludes JLex’s classes).After JLex runs (assuming thereare no errors in your tokenspecifications), the Java sourcefilef.jlex.java is created. (f standsfor any file name you choose.Thus csx.jlex might hold tokendefinitions for CSX, andcsx.jlex.java would hold thegenerated scanner).111CS 536 Spring 2007©You compile f.jlex.java justlike any Java program, using yourfavorite Java compiler.After compilation, the class fileYylex.class is created.It contains the methods:• Token yylex() which is the actualscanner. The constructor for Yylextakes the file you want scanned, sonew Yylex(System.in)will build a scanner that reads fromSystem.in. Token is the tokenclass you want returned by thescanner; you can tell JLex what classyou want returned.• String yytext() returns thecharacter text matched by the last callto yylex.112CS 536 Spring 2007©A simple example of the use ofJLex is in~cs536-1/pubic/jlexJust entermake test113CS 536 Spring 2007©Input to JLexThere are three sections,delimited by %%. The


View Full Document

UW-Madison CS 536 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?