DOC PREVIEW
U of I CS 421 - Lexical Analysis

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS421 Lecture 5–Lexical AnalysisCS421 Lecture 5 Lexical Analysis▶Today’s class▶Today s class▶ Lexing▶FiniteState Machine as Lexer▶Finite-State Machine as LexerCompiler OutlineCompiler Outline▶Front-End▶FrontEnd▶ Takes Input Source Code▶Returns Abstract Syntax Tree▶Returns Abstract Syntax Tree▶ Back-End▶ Takes Abstract Syntax Tree▶ Returns Machine Executable Binary CodeCompiler Outline (Figure)Compiler Outline (Figure)Manual and automatic methodsManual and automatic methods▶ We will study how to write lexers and yparsers. For each, we will give a manual technique and an automatic one:q▶ Lexing:▶Manual: Finite-state machinesManual: Finitestate machines▶ Automatic: Regular expressions – ocamllex▶Parsing▶Parsing▶ Manual: Top-down (recursive descent) parsingAt ti Btt(LR(1))l▶Automatic: Bottom-up (LR(1)) -ocamlyaccLexerLexer▶Divide input into“tokens”▶Divide input into tokens▶ Tokens are smallest units that are useful for parsing E g parser needs to know iffor parsing. E.g. parser needs to know if “while” keyword appears; doesn’t need to know that it is made up of characters w hknow that it is made up of characters w, h, etc.Wh ? Effi i▶Why? Efficiency▶ Simpler to specify grammatical structure, and il i fkimplement parser, in terms of tokensLexer Input & OutputLexer Input & Output▶Lexer Input▶Lexer Input▶ Character stream in the form of▶Input Stream or▶Input Stream, or▶ String▶Lexer Output▶Lexer Output▶ Stream of tokens, or▶List of tokens▶List of tokensTokenstype token = EOF | BOOLEAN | BREAK | CASE | CHAR | CLASS | CONST | CONTINUE| DO | DOUBLE | ELSE | EXTENDS | FINAL | FINALLY | FLOAT | FOR | DEFAULT | IMPLEMENTS | IMPORT | INT | NEW | IF | PUBLIC | SWITCH | RETURN | VOID | STATIC | WHILE | THIS| SWITCH | RETURN | VOID | STATIC | WHILE | THIS| NULL_LITERAL | LPAREN | RPAREN | LBRACE | RBRACE | LBRACK | RBRACK | SEMICOLON | COMMA | DOT | EQ | GT | LT | NOT | COMP | QUESTION | COLON | EQEQ | LTEQ | GTEQ | NOTEQ | ANDAND | OROR| PLUSPLUS | MINUSMINUS | PLUS | MINUS | MULT | DIV | AND | OR | XOR | MOD | LSHIFT | RSHIFT | URSHIFT | PLUSEQ | MINUSEQ | MULTEQ | DIVEQ | ANDEQ | OREQ | XOREQ | MODEQ | LSHIFTEQ | RSHIFTEQ|| || | | || URSHIFTEQ| BOOLEAN_LITERAL of bool| INTEGER_LITERAL of int| FLOAT LITERAL of float| FLOAT_LITERAL of float| IDENTIFIER of string| STRING_LITERAL of stringExampleExample▶Input▶Input“class MP1 { public static void main ( ……”▶ Output[CLASS; IDENTIFIER“MP1”; LBRACE; PUBLIC;[CLASS; IDENTIFIER MP1 ; LBRACE; PUBLIC; STATIC; VOID; IDENTIFIER “main”; LPAREN; …… ]Lexing with FSMLexing with FSM▶Words recognized by corresponding finite▶Words recognized by corresponding finite state automaton▶Deterministic Finite Automaton (DFA)▶Deterministic Finite Automaton (DFA)▶ A directed graph whose verticesare labeled from a setTokens U {Error}and whoseedgesfrom a set Tokens U {Error}and whose edgesare labeled with sets of characters. Also, if the outgoing edges from vertex v are {e1,…,en},outgoing edges from vertex v are {e1, …, en}, then the sets label(e1), …, label(en) are disjoint. Also, a vertex u specified as the start vertex.Example 1Example 1▶DFA for keywords▶DFA for keywords class case finallyExample 2Example 2▶DFA for Operators▶DFA for Operators ; { + += < <= << <<=Example 3Example 3▶DFA for integer constants▶DFA for integer constantsExample 4Example 4▶DFA for integers and decimal▶DFA for integers and decimalCompleting the DFACompleting the DFA▶Need to create a single DFA for all tokens–▶Need to create a single DFA for all tokens recall that all outgoing edges must have disjoint label setsdisjoint label sets.▶ DFA labels are similar to tokens in the token data type but not necessarily identicaldata type, but not necessarily identical▶ For keyword & identifiers:▶ Instead of creating the DFA shown earlier, create a small DFA and use action to distinguish keywordsdistinguish keywordsImplementing lexing with a DFAImplementing lexing with a DFA▶ Define a transition function▶ state x character -> state u {-1}▶ Label▶ state -> token u {error}▶ Assume start state = 0Implementing lexing with a DFAImplementing lexing with a DFAFunction to get a single token:(state x string) getnexttoken() {s = 0; tokenchars = “”;hil ( ) {while (true) {c = peek at next charif (move(s c) ==1)if (move(s,c) == -1)return (s,tokenchars)move c from input to tokencharsmove c from input to tokencharss = move(s,c)}Implementing lexing with a DFAImplementing lexing with a DFAtoken list gettokens() {tokenlis = []while (true) {c = peek at next charpif (c == eofchar) {tokenlis = tokenlis @ [EOF]breakbreak}(s, tokenchars) = getnexttoken()perform action based on s and tokenchars}return tokenlis}Typical lexer actionsTypical lexer actions▶Recall that label(s) is a token or error▶Recall that label(s) is a token or error. Action depends on that label, e.g.:▶Error: Represents an erroneous input; abort▶Error: Represents an erroneous input; abort.▶ LTLT:▶ IDENT:▶ COMMENTMore DFAsMore DFAsMore DFAsMore DFAsMore DFAsMore


View Full Document

U of I CS 421 - Lexical Analysis

Documents in this Course
Lecture 2

Lecture 2

12 pages

Exams

Exams

20 pages

Lecture

Lecture

32 pages

Lecture

Lecture

21 pages

Lecture

Lecture

15 pages

Lecture

Lecture

4 pages

Lecture

Lecture

68 pages

Lecture

Lecture

68 pages

Lecture

Lecture

84 pages

s

s

32 pages

Parsing

Parsing

52 pages

Lecture 2

Lecture 2

45 pages

Midterm

Midterm

13 pages

LECTURE

LECTURE

10 pages

Lecture

Lecture

5 pages

Lecture

Lecture

39 pages

Load more
Download Lexical Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexical Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexical Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?