DOC PREVIEW
Berkeley COMPSCI 164 - Building a Scanner

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Ras Bodik, CS 164, Fall 2004 1Building a ScannerCS164, Fall 2004Ras Bodik, CS 164, Fall 20044Recall: The Structure of a Compilerscannerparsercheckercode genDecaf program (stream of characters)stream of tokensAbstract Syntax Tree (AST) AST with annotations (types, declarations)maybe x86Ras Bodik, CS 164, Fall 20045Recall: Lexical Analysis• The input is just a sequence of characters. Example:if (i == j)z = 0;elsez = 1;• More accurately, the input is string:\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;• Goal: find lexemes and map them to tokens:1. partition input string into substrings (called lexemes), and2. classify them according to their role (role = token)Ras Bodik, CS 164, Fall 20046Continued• Lexer input:\tif(i==j)\n\t\tz = 0;\n\telse\n\t\tz = 1;• partitioned into these lexemes:\t if ( i == j ) \n\t\t z = 0 ; \n\t else \n\t\t z = 1 ;• mapped to a sequence of tokensIF, LPAR, ID(“i”), EQUALS, ID(“j”) …• Notes: – whitespace lexemes are dropped, not mapped to tokens• is this the same fatal mistake as in FORTRAN? (see Lecture 1)– some tokens have attributes: the lexeme and/or line number• why do we need them?Ras Bodik, CS 164, Fall 20047What’s a Token?• A token is a syntactic category– In English:noun, verb, adjective, …– In a programming language:Identifier, Integer, Keyword, Whitespace, …• Parser relies on the token distinctions: – identifiers are treated differently than keywords– but all identifiers are treated the same, regardless of what lexeme created them &Ras Bodik, CS 164, Fall 20048What are lexemes?• Webster: – “items in the vocabulary of a language”• cs164: – same: items in the vocabulary of a language:• numbers, keywords, identifiers, operators, etc.– strings into which the input string is partitioned.&Ras Bodik, CS 164, Fall 200410Writing the lexer• Not by hand– tedious, repetitious, error-prone, non-maintainable• Instead, we’ll build a lexer generator– once we have the generator, we’ll only describe the lexemes and their tokens …• that is, provide Decaf’s lexical specification (the What)– … and generate code that performs the partitioning• generated code hides repeated code (the How)&Ras Bodik, CS 164, Fall 200411Code generator: key benefit&The scanner generator allows the programmer to focus on:• Whatthe lexer should do,• rather than How it should be done.• what: declarative programming• how: imperative programming&Ras Bodik, CS 164, Fall 200412Imperative scanner (in Java)• Let’s first build the scanner in Java, by hand:• to see how it is done, and where’s the repetitiouscode that we want to hide• A simple scanner will do. Only four tokens:“*”TIMES“+”PLUS“==“EQUALSa sequence of one or more letters or digits starting with a letterIDLexemeTOKENRas Bodik, CS 164, Fall 200413Imperative scannerc=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;})))Ras Bodik, CS 164, Fall 200414Imperative scanner• You could write your entire scanner in this style– and for small scanners this style is appropriate• This code looks simple and clean, but try to add – tokens that start with the same string: “if” and “iffy”– C-style comments: /* anything here /* nested comments */ */– string literals with escape sequences: “…\t … \”…”– error handling, e.g., badly formed strings (see PA2)• Look at StreamTokenizer.nextToken() for an example of real imperative scanner– in Eclipse, type Ctrl+Shift+T.– Enter StreamTokenizer. – Press F4. In the Hierarchy view, select method nextToken.)))Ras Bodik, CS 164, Fall 200415Maximal munch rule• What is the need for undoNextChar()?– it performs look-ahead, to determine whether the ID lexeme can be grown further• This is an example of maximal much rule:– this rule followed by all scanners– the rule: the input character stream is partitioned into lexemes that are as large as possible– Ex.: in Java, “iffy” is not partitioned into “if” (the IF keyword) and “fy” (ID), but into “iffy” (ID))))Ras Bodik, CS 164, Fall 200416Imperative Lexer: what vs. howc=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;} )littlelogic, much plumbing)))Ras Bodik, CS 164, Fall 200417Identifying the plumbing (the how)c=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;} )characters read always the same way)))Ras Bodik, CS 164, Fall 200418Identifying the plumbing (the how)c=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;} )tokens are always return-edRas Bodik, CS 164, Fall 200419Identifying the plumbing (the how)c=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;} )the lookahead is explicit))Ras Bodik, CS 164, Fall 200420Identifying the plumbing (the how)c=nextChar();if (c == ‘=‘) { c=nextChar(); if (c == ‘=‘) {return EQUALS;}}if (c == ‘+’) { return PLUS; }if (c == ‘*’) { return TIMES; }if (c is a letter) { c=NextChar(); while (c is a letter or digit) { c=NextChar(); }undoNextChar(c);return ID;} )must build decision tree out of nested if’s (yuck!)))Ras Bodik, CS 164, Fall 200421Can we hide the plumbing?• That is, can we avoid having to – spell out the details of calls to the input method?– write the return in front of tokens?– code the look-ahead code explicitly?– use if’s and while’s to indicate the decision tree?• In short, can we make the code look like the specification table?“*”TIMES“+”PLUS“==“EQUALSa sequence


View Full Document

Berkeley COMPSCI 164 - Building a Scanner

Documents in this Course
Lecture 8

Lecture 8

40 pages

Load more
Download Building a Scanner
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Building a Scanner and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Building a Scanner 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?