UK CS 541 - Programming Assignment 2 CSX Scanner

Unformatted text preview:

CS 541 — Spring 2014Programming Assignment 2CSX ScannerYour next project step is to write a scanner module for the programming language CSX (Computer Science eXperimental). Use the JFlex scanner-generation tool (based on Lex). Future assignments will involve a CSX parser, type checker and code generator.The CSX ScannerGenerate the CSX scanner, a member of class Yylex, using JFlex. Your main task is to create the file csx.jflex, the input to JFlex. The jflex file specifies the regular ex-pression patterns for all the CSX tokens, as well as any special processing required by tokens.When a valid CSX token is matched by member function yylex(), it returns an object that is an instance of class java_cup.runtime.Symbol (the class our parser ex-pects to receive from the scanner). Symbol contains an integer field sym that identifies the token class just matched. Possible values of sym are identified in the class sym1.Symbol also contains a field value,which contains token information beyond the token’s identity. For CSX, the value field references an instance of class CSXToken (or a subclass of CSXToken). CSXToken contains the line number and column number at which each token was found. This information is necessary to frame high-quality error messages. The line number on which a token appears is stored in linenum. The column number at which a token begins is stored in colnum. The column number counts tabs as one character, even though they expand into several blanks when viewed.You must also store auxiliary information for identifiers, integer literals, character literals and string literals. For identifiers, class CSXIdentifierToken, a subclass of CSXToken, contains the identifier’s name in field identifierText. For integer literals, class CSXIntLitToken, a subclass of CSXToken, contains the literal’s numeric value in field intValue. For character literals, class CSXCharLitToken, a subclass of CSXToken, contains the literal’s character value in field charValue. For 1 Java class names normally are capitalized. However, certain classes created by the tool Java CUP ignore this convention.string literals, class CSXStringLitToken, a subclass of CSXToken, contains a field stringText, the full text of the string (with enclosing double quotes and internal escape sequences included as they appeared in the original string text that was scanned).CSX TokensThe CSX languages uses the following classes of tokens:• The reserved words of the CSX language:bool break char class const continue else false if int read returntrue void while printThe break and continue reserved words are optional; compilers that include them receive extra credit.• Identifiers. An identifier is a sequence of letters and digits starting with a letter, excluding reserved words.Id = (A | B | … | Z | a | b | … z) (A | B | … | Z | a | b | … z | 0 | 1 | … 9)* − Reserved• Integer Literals. An integer literal is a sequence of digits, optionally preceded by a ~. A ~ denotes a negative value. IntegerLit = (~ | λ) (0 | 1 | … | 9)+ • String Literals. A string literal is any sequence of printable ASCII characters, delimited by double quotes. A double quote within the text of a string must be escaped (as \”, to avoid being misinterpreted as the end of the string). Tabs and newlines within a string must be escaped (\n is a newline and \t is a tab). Backslashes within a string must also be escaped (as \\). No other escaped characters are allowed. Strings may not cross line boundaries.StringLit = " ( Not(" | \ | UnprintableChar) | \" | \n | \t | \\ )* "• Character Literals. A character literal is any printable ASCII character, enclosed within single quotes. A single quote within a character literal must be escaped (as \', to avoid being misinterpreted as the end of the literal). A tab or newline must be escaped ('\n' is a newline and '\t' is a tab). A backslash must also be escaped (as '\\'). No other escaped characters are allowed.CharLit = ' ( Not(' | \ | UnprintableChar) | \' | \n | \t | \\ ) '• Other Tokens. These are miscellaneous one- or two-character symbols representing operators and delimiters.( ) [ ] = ; + - * / == != && || < > <= >= , ! { } :• End-of-File (EOF) Token. The EOF token is automatically returned by yylex() when it reaches the end of file while scanning the first character of a token.Comments and white space, as defined below, are not tokens because they are not re-turned by the scanner. Nevertheless, they must be matched (and skipped) when they are encountered.• A Single Line Comment. As in C++ and Java, this style of comment begins with a pair of slashes and ends at the end of the current line. Its body can include any character other than an end-of-line.LineComment = // Not(Eol)* Eol• A Multi-Line Comment. This comment begins with the pair ## and ends with the pair ##. Its body can include any character sequence other than two consecutive #’s.BlockComment = ## ( (#|λ) Not(#) )* ##• White Space. White space separates tokens; otherwise it is ignored.WhiteSpace = ( Blank | Tab | Eol) +Any character that cannot be scanned as part of a valid token, comment or white space is invalid and should generate an error message.Considerations/Requirements• Because reserved words look like identifiers, you must be careful not to miss-scan them as identifiers. You should include distinct token definitions for each reserved word before your definition of identifiers.• Upper- and lower-case letters are equivalent in reserved words and in identifiers. When you print a reserved word or an identifier, you may either print its original case or a conversion to standard case (such as lower case).• Print character and string literals as they are input, that is, with the escaped characters shown as \n, \\, or whatever, and with the surrounding quotes. However, you should also store the effective values of character and string literals, in which escaped characters are replaced by their meaning, and surround quotes are removed.• You should not assume any limit on the length of identifiers.• You should not assume any limit on the length of input lines that are scanned.• You may use Java API classes to convert strings representing integer literals to their corresponding integer values. Be careful


View Full Document

UK CS 541 - Programming Assignment 2 CSX Scanner

Download Programming Assignment 2 CSX Scanner
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Programming Assignment 2 CSX Scanner and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Programming Assignment 2 CSX Scanner 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?