DOC PREVIEW
UW-Madison CS 536 - Lecture 09.4 Up

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

100CS 536 Spring 2008©The corresponding transitiontable is:A complete transition tablecontains one column for eachcharacter. To save space, tablecompression may be used. Onlynon-error entries are explicitlyrepresented in the table, usinghashing, indirection or linkedstructures.State Character/ Eol a b …12233343334eofEol//Not(Eol)1234101CS 536 Spring 2008©All regular expressions can betranslated into DFAs that accept(as valid tokens) the stringsdefined by the regularexpressions. This translation canbe done manually by aprogrammer or automaticallyusing a scanner generator.A DFA can be coded in:• Table-driven form• Explicit control formIn the table-driven form, thetransition table that defines aDFA’s actions is explicitlyrepresented in a run-time tablethat is “interpreted” by a driverprogram.In the direct control form, thetransition table that defines aDFA’s actions appears implicitly asthe control logic of the program.102CS 536 Spring 2008©For example, supposeCurrentChar is the current inputcharacter. End of file isrepresented by a special charactervalue, eof. Using the DFA for theJava comments shown earlier, atable-driven scanner is:State = StartStatewhile (true){if (CurrentChar == eof)breakNextState =T[State][CurrentChar] if(NextState == error)breakState = NextStateread(CurrentChar)}if (State in AcceptingStates)// Process valid tokenelse // Signal a lexical error103CS 536 Spring 2008©This form of scanner is producedby a scanner generator; it isdefinition-independent. Thescanner is a driver that can scanany token if T contains theappropriate transition table.Here is an explicit-control scannerfor the same comment definition:if (CurrentChar == '/'){read(CurrentChar)if (CurrentChar == '/')repeatread(CurrentChar)until (CurrentChar in{eol, eof})else //Signal lexical errorelse // Signal lexical errorif (CurrentChar == eol)// Process valid tokenelse //Signal lexical error104CS 536 Spring 2008©The token being scanned is“hardwired” into the logic of thecode. The scanner is usually easyto read and often is moreefficient, but is specific to a singletoken definition.105CS 536 Spring 2008©More Examples• A FORTRAN-like real literal (whichrequires digits on either or bothsides of a decimal point, or just astring of digits) can be defined asRealLit = (D+(λ | . )) | (D*. D+)This corresponds to the DFA.DDDD.106CS 536 Spring 2008©• An identifier consisting of letters,digits, and underscores, whichbegins with a letter and allows noadjacent or trailing underscores,may be defined asID = L (L | D)* ( _ (L | D)+)*This definition includes identifierslike sum or unit_cost, butexcludes _one and two_ andgrand___total. The DFA is:L | DLL | D_107CS 536 Spring 2008©Lex/Flex/JLexLex is a well-known Unix scannergenerator. It builds a scanner, inC, from a set of regularexpressions that define thetokens to be scanned.Flex is a newer and faster versionof Lex.JLex is a Java version of Lex. Itgenerates a scanner coded inJava, though its regularexpression definitions are veryclose to those used by Lex andFlex.Lex, Flex and JLex are largely non-procedural. You don’t need to tellthe tools how to scan. All youneed to tell it what you wantscanned (by giving it definitionsof valid tokens).108CS 536 Spring 2008©This approach greatly simplifiesbuilding a scanner, since most ofthe details of scanning (I/O,buffering, character matching,etc.) are automatically handled.109CS 536 Spring 2008©JLexJLex is coded in Java. To use it,you enterjava JLex.Main f.jlexYour CLASSPATH should be set tosearch the directories where JLex’sclasses are stored.(The CLASSPATH we gave youincludes JLex’s classes).After JLex runs (assuming thereare no errors in your tokenspecifications), the Java sourcefilef.jlex.java is created. (f standsfor any file name you choose.Thus csx.jlex might hold tokendefinitions for CSX, andcsx.jlex.java would hold thegenerated scanner).110CS 536 Spring 2008©You compile f.jlex.java justlike any Java program, using yourfavorite Java compiler.After compilation, the class fileYylex.class is created.It contains the methods:• Token yylex() which is the actualscanner. The constructor for Yylextakes the file you want scanned, sonew Yylex(System.in)will build a scanner that reads fromSystem.in. Token is the tokenclass you want returned by thescanner; you can tell JLex whatclass you want returned.• String yytext() returns thecharacter text matched by the lastcall to yylex.111CS 536 Spring 2008©A simple example of the use ofJLex is in~cs536-1/pubic/jlexJust entermake test112CS 536 Spring 2008©Input to JLexThere are three sections,delimited by %%. The generalstructure is:User Code%%Jlex Directives%%Regular Expression rulesThe User Code section is Javasource code to be copied into thegenerated Java source file. Itcontains utility classes or returntype classes you need. Thus if youwant to return a classIntlitToken (for integer literalsthat are scanned), you include itsdefinition in the User Codesection.113CS 536 Spring 2008©JLex directives are variousinstructions you can give JLex tocustomize the scanner yougenerate.These are detailed in the JLexmanual. The most important are:• %{Code copied into the Yylexclass (extra fields ormethods you may want)%}• %eof{Java code to be executed whenthe end of file is reached%eof}• %type classnameclassname is the return type youwant for the scanner method,yylex()114CS 536 Spring 2008©Macro DefinitionsIn section two you may alsodefine macros, that are used insection three. A macro allows youto give a name to a regularexpression or character class.This allows you to reusedefinitions and make regularexpression rule more readable.Macro definitions are of the formname = defMacros are defined one per line.Here are some simple examples:Digit=[0-9]AnyLet=[A-Za-z]In section 3, you use a macro byplacing its name within { and }.Thus {Digit} expands to thecharacter class defining the digits0 to 9.115CS 536 Spring 2008©Regular Expression RulesThe third section of the JLex inputfile is a series of token definitionrules of the formRegExpr {Java code}When a token matching the givenRegExpr is matched, thecorresponding Java code(enclosed in “{“ and “}”) isexecuted. JLex figures out whatRegExpr applies; you need onlysay what the token looks like(using RegExpr) and what youwant done when the token ismatched (this is usually to returnsome token object, perhaps withsome processing of the tokentext).116CS 536 Spring 2008©Here are some


View Full Document

UW-Madison CS 536 - Lecture 09.4 Up

Download Lecture 09.4 Up
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 09.4 Up and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 09.4 Up 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?