DOC PREVIEW
UD CISC 672 - Phase I- The Scanner

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CISC 672 Advanced Compiler Construction Fall 2009Phase I: The ScannerDue Date: September 28th.Teamwork: Forbidden.1 OverviewThe assignments will direct you to design and build a compiler for Cool. Each assignment will coverone component of the compiler: lexical analysis, parsing, semantic analysis, and code generation. Eachassignment will ultimately result in a working compiler phase which can interface with other phases.For this assignment you are to write a lexical analyzer, also called a scanner, using a lexical analyzergenerator (the Java tool is called jlex.) You will describe the set of tokens for Cool in an appropriateinput format and the analyzer generator will generate the actual code (Java) for recognizing tokens inCool programs.You should do this assignment individually. That means that you should not be working in groupswith the same lex specification, but learning how to use JLex for your own specification.If you have any questions on the project, send an email to the TA or visit the TA during office hours.2 Tasks1. Check out the svn-repository. If you have trouble doing this, contact your TA immediately. See belowfor information on what you find in the directory.2. Read the JLex specification manual athttp://www.cs.princeton.edu/~appel/modern/java/JLex/current/manual.html.3. The file lexer/CoolLexer.lex contains a skeleton for a lexical description for Cool. You can actually builda scanner with this description but it does not do much. Modify this file to be a JLex specificationfile for COOL. Any auxiliary routines that you wish to write should be added directly to this file inthe appropriate section (see comments in the file). This is the main task of this assigment.4. The file README Phase1.txt contains detailed instructions for the assignment. You should also editthis file to include the write-up for your project. You should explain design decisions and why yourcode is correct. It is part of the assignment to clearly and concisely explain things in text as well asto comment your code.3 Files and DirectoriesThe repositor now contains two more folders: examples and testCasesLexer. The first contains a numberof little cool programs. If your lexer can handle all of those, great! The second folder contains a numberof odd, many times incorrect cool programs. For full credit, your lexer should be able to handle most ofthe files in this folder as well.page 1 of 3CISC 672 Advanced Compiler Construction Fall 2009In your own folder you will find skeleton files for the class project. Don’t be overwhelmed by themass of files – most of them will not be used in this Phase. Instead, you should focus on the files in thelexer-package.Do not modify any java-file.4 Scanner ResultsYou should follow the specification of the lexical structure of Cool given in Section 10 and Figure 1 of theCool-Manual. Your scanner should be robust — it should work for any conceivable input. For example,you must handle errors such as an EOF occurring in the middle of a string or comment, as well as stringconstants that are too long. These are just some of the errors that can occur; see the manual for the rest.You must make some provision for graceful termination if a fatal error occurs. Core dumps oruncaught exceptions are unacceptable.Programs tend to have many occurrences of the same lexeme. For example, an identifier generally isreferred to more than once in a program (or else it isn’t very useful!). To save space and time, a commoncompiler practice is to store lexemes in a string table. We provide a string table implementation for Java.See the following sections for the details.All errors will be passed along to the parser, which is better equipped to handle them. The Coolparser knows about a special error token called ERROR. When an invalid character is encountered, thatcharacter and any invalid characters that follow should be gathered together into a string until the lexerfinds a character that can begin a new token. This string will be returned as the error message. Forerrors besides strings of invalid characters (e.g., a string constant that is too long, or an end-of-file insideof a comment) it is sufficient to return an informative error message (e.g., “String constant too long” or“EOF in comment”). Make sure that the error message is informative so that we can understand whatyou did. The following sections clarify how to actually return the error message.There is an issue in deciding how to handle the special identifiers for the basic classes (Object, Int,Bool, String), SELF TYPE, and self. However, this issue doesn’t actually come up until later phases ofthe compiler—the scanner should treat the special identifiers exactly like any other identifier.Finally, if the lexical specification is incomplete (some input has no regular expression that matches)then the generated scanner will invoke a default action on unmatched strings. The default action simplycopies the string to the console. Your final scanner should have no default actions. Note that defaultactions are very bad for mycoolc, which works by piping output from one compiler phase to the next; anyextra output will cause errors in downstream phases.5 Brief Discussion of the Skeleton Code• Each call on the scanner returns the next token and lexeme from the input. The value returned bythe method CoolLexer.next token is an object of class java cup.runtime.Symbol. This objecthas a field representing the syntactic category of a token — whether it is an integer literal, semicolon,the if keyword, etc. The syntactic codes for all tokens are defined in the file TokenConstants.java.The component, the semantic value or lexeme (if any), is also placed in a java cup.runtime.Symbolobject. The documentation for the class java cup.runtime.Symbol as well as other supporting codeis available on the course web page.• For class identifiers, object identifiers, integers and strings, the semantic value should be of typeAbstractSymbol. For boolean constants, the semantic value is of type java.lang.Boolean. Exceptpage 2 of 3CISC 672 Advanced Compiler Construction Fall 2009for errors (see below), the lexemes for the other tokens do not carry any interesting information. Sincethe value field of class java cup.runtime.Symbol has generic type java.lang.Object, you will needto cast it to a proper type before calling any methods on it.• We provide you with a string table implementation, which is defined in AbstractTable.java.• When a


View Full Document

UD CISC 672 - Phase I- The Scanner

Documents in this Course
Syllabus

Syllabus

18 pages

Load more
Download Phase I- The Scanner
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Phase I- The Scanner and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Phase I- The Scanner 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?