DOC PREVIEW
UW-Madison CS 536 - CS 536 Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

110CS 536 Spring 2007©JLexJLex is coded in Java. To use it,you enterjava JLex.Main f.jlexYour CLASSPATH should be set tosearch the directories where JLex’sclasses are stored.(The CLASSPATH we gave youincludes JLex’s classes).After JLex runs (assuming thereare no errors in your tokenspecifications), the Java sourcefilef.jlex.java is created. (f standsfor any file name you choose.Thus csx.jlex might hold tokendefinitions for CSX, andcsx.jlex.java would hold thegenerated scanner).111CS 536 Spring 2007©You compile f.jlex.java justlike any Java program, using yourfavorite Java compiler.After compilation, the class fileYylex.class is created.It contains the methods:• Token yylex() which is the actualscanner. The constructor for Yylextakes the file you want scanned, sonew Yylex(System.in)will build a scanner that reads fromSystem.in. Token is the tokenclass you want returned by thescanner; you can tell JLex what classyou want returned.• String yytext() returns thecharacter text matched by the last callto yylex.112CS 536 Spring 2007©A simple example of the use ofJLex is in~cs536-1/pubic/jlexJust entermake test113CS 536 Spring 2007©Input to JLexThere are three sections,delimited by %%. The generalstructure is:User Code%%Jlex Directives%%Regular Expression rulesThe User Code section is Javasource code to be copied into thegenerated Java source file. Itcontains utility classes or returntype classes you need. Thus if youwant to return a classIntlitToken (for integer literalsthat are scanned), you include itsdefinition in the User Codesection.114CS 536 Spring 2007©JLex directives are variousinstructions you can give JLex tocustomize the scanner yougenerate.These are detailed in the JLexmanual. The most important are:• %{Code copied into the Yylexclass (extra fields ormethods you may want)%}• %eof{Java code to be executed whenthe end of file is reached%eof}• %type classnameclassname is the return type youwant for the scanner method,yylex()115CS 536 Spring 2007©Macro DefinitionsIn section two you may alsodefine macros, that are used insection three. A macro allows youto give a name to a regularexpression or character class.This allows you to reusedefinitions and make regularexpression rule more readable.Macro definitions are of the formname = defMacros are defined one per line.Here are some simple examples:Digit=[0-9]AnyLet=[A-Za-z]In section 3, you use a macro byplacing its name within { and }.Thus {Digit} expands to thecharacter class defining the digits0 to 9.116CS 536 Spring 2007©Regular Expression RulesThe third section of the JLex inputfile is a series of token definitionrules of the formRegExpr {Java code}When a token matching the givenRegExpr is matched, thecorresponding Java code(enclosed in “{“ and “}”) isexecuted. JLex figures out whatRegExpr applies; you need onlysay what the token looks like(using RegExpr) and what youwant done when the token ismatched (this is usually to returnsome token object, perhaps withsome processing of the tokentext).117CS 536 Spring 2007©Here are some examples:"+" {return new Token(sym.Plus);}("")+ {/* skip white space */}{Digit}+ {returnnew IntToken(sym.Intlit,new Integer(yytext()).intValue());}118CS 536 Spring 2007©Regular Expressions in JLexTo define a token in JLex, the userto associates a regular expressionwith commands coded in Java.When input characters that matcha regular expression are read, thecorresponding Java code isexecuted. As a user of JLex youdon’t need to tell it how to matchtokens; you need only say whatyou want done when a particulartoken is matched.Tokens like white space aredeleted simply by having theirassociated command not returnanything. Scanning continuesuntil a command with a return init is executed.The simplest form of regularexpression is a single string thatmatches exactly itself.119CS 536 Spring 2007©For example,if {return new Token(sym.If);}If you wish, you can quote thestring representing the reservedword ("if"), but since the stringcontains no delimiters oroperators, quoting it isunnecessary.For a regular expression operator,like +, quoting is necessary:"+" {returnnew Token(sym.Plus);}120CS 536 Spring 2007©Character ClassesOur specification of the reservedword if, as shown earlier, isincomplete. We don’t (yet) handleupper or mixed-case.To extend our definition, we’ll usea very useful feature of Lex andJLex—character classes.Characters often naturally fall intoclasses, with all characters in aclass treated identically in a tokendefinition. In our definition ofidentifiers all letters form a classsince any of them can be used toform an identifier. Similarly, in anumber, any of the ten digitcharacters can be used.121CS 536 Spring 2007©Character classes are delimited by[ and ]; individual characters arelisted without any quotation orseparators. However \, ^, ] and -,because of their special meaningin character classes, must beescaped. The character class[xyz] can match a single x, y, orz.The character class [\])] canmatch a single ] or ).(The ] is escaped so that it isn’tmisinterpreted as the end ofcharacter class.)Ranges of characters areseparated by a -; [x-z] is thesame as [xyz]. [0-9] is the setof all digits and [a-zA-Z] is theset of all letters, upper- and lower-case. \ is the escape character,used to represent unprintablesand to escape special symbols.122CS 536 Spring 2007©Following C and Java conventions,\n is the newline (that is, end ofline), \t is the tab character, \\ isthe backslash symbol itself, and\010 is the charactercorresponding to octal 10.The ^ symbol complements acharacter class (it is JLex’srepresentation of the Notoperation).[^xy] is the character class thatmatches any single characterexcept x and y. The ^ symbolapplies to all characters thatfollow it in a character classdefinition, so [^0-9] is the set ofall characters that aren’t digits.[^] can be used to match allcharacters.123CS 536 Spring 2007©Here are some examples ofcharacter classes:CharacterClass Set of Characters Denoted[abc] Three characters: a, b and c[cba] Three characters: a, b and c[a-c] Three characters: a, b and c[aabbcc] Three characters: a, b and c[^abc] All characters except a, band c[\^\-\]] Three characters: ^, - and ][^] All characters"[abc]" Not a character class. Thisis one five character


View Full Document

UW-Madison CS 536 - CS 536 Lecture Notes

Download CS 536 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 536 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 536 Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?