DOC PREVIEW
U of I CS 421 - Lexing and ocamllex

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

OutlineOverviewLexingocamllexActivityCS421 Lecture 12: Lexing and ocamllex1Mark [email protected] of Illinois at Urbana-ChampaignJune 27, 20061Based on slides by Mattox Beckman, as updated by Vikram Adve, GulAgha, and Elsa GunterMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewLexingocamllexActivityMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityWhere We Are Going◮Overall goal: we want to turn strings of characters – code –into computer instructions◮Easiest to break this down into phases◮First, turn strings into abstract syntax trees (ASTs) – this isparsing◮Next, turn abstract syntax trees into executable instructions –compiling or interpretingMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityLexing and ParsingStrings are converted into ASTs in two phases:Lexing Convert strings (streams of characters) into lists (orstreams) of tokens, representing words in thelanguageParsing Convert lists of tokens into abstract syntax treesMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsLexingWith lexing, we break sequences of characters into differentsyntactic categories, called tokens. As an example, we could breakthis:asd 123 jkl 3.14into this:[String ‘‘asd’’, Int 123; String ‘‘jkl’’; Float 3.14]Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsLexing StrategyOut strategy will be to leverage regular expressions and finiteautomata to recognize tokens:◮each syntactic category will be described by a regularexpression (with some extended syntax)◮words will be recognized by an encoding of a correspondingfinite state machineHowever, this s til l leaves us with a problem. How do we pullmultiple words out of a string, instead of just recognizing a singleword?Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsLexing: Multiple TokensTo solve this, we will modify the behavior of the DFA.◮if we find a character where there is no transition from thecurrent state, stop processing the string◮if we are in an accepting state, return the token correspondingto what we found as well as the remainder of the string◮now, use iterator or recursion to keep pulling out more tokens◮if we were not in an accepting state, fail – invalid syntaxMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsExamples0s1s2s30 − 9a − za − z0 − 9.0 − 9Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsLexing OptionsWe could write a lexer by writing regular expressions, and thentranslating these by hand into a DFA. That sounds tedious andrepetitive – perfect for a computer! Can we write a program thattakes regular expressions and generates automata for us?◮Someone already did – Lex!◮OCaml version of this is ocamllexMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityOverviewStrategyOptionsHow does it work?We need a few core items to get this working:◮Some way to identify the input string – we’ll call this thelexing buffer◮A set of regular expressions that correspond to tokens in ourlanguage◮A corresponding set of actions to take when tokens arematchedThe lexer can then take the regular expressions to build statemachines, which are then used to process the lexing buffer. If wereach an accept state and can take no further transitions, we canapply the actions.Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning CommentsMechanics of Using ocamllex◮Lexer definitions using ocamllex are written in a file with a.mll extension. The file includes the regular expressions in atable, with associated actions for each.◮OCaml code for the lexe r is generated withocamllex file.mll◮This generates the code for the lexer in file file.mlMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning CommentsSample Lexer1 rule main = parse2 | [’0’-’9’]+ { print_string "Int\n"}3 | [’0’-’9’]+’.’[’0’-’9’]+ { print_string "Float\n"}4 | [’a’-’z’]+ { print_string "String\n"}5 | _ { main lexbuf }6 {7 let newlexbuf = (Lexing.from_channel stdin) in8 print_string "Ready to lex.\n";9 main newlexbuf10 }Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning CommentsGeneral Lexer Format1 { header }2 let ident = regexp ...3 rule entrypoint [arg1... argn] = parse4 | regexp { action }5 | ...6 | regexp { action }7 and entrypoint [arg1... argn] = parse8 ...and ...9 { trailer }Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning Commentsocamllex Input◮header and footer contain arbitrary OCaml code to insert intogenerated .ml file◮shorthands for regular expressions can be introduced withlet ident = regexp◮multiple entry points turn into multiple functions in the .mlfile, with the given arguments and an additional argument forthe lexing bufferMark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning CommentsRegular Expressions in ocamllexThe regular expression format is similar to what we’ve seen so far,but still slightly different.◮Single quoted characters for letters: ’a’◮Underscores match any letter:◮End-of-file marker: eof◮Concatenation of most expressions same as before◮Concatenation of character sequence shown as a string:‘‘while’’◮Choice: instead of e1+ e2, it is e1| e2Mark Hills CS421 Lecture 12: Lexing and ocamllexOutlineOverviewLexingocamllexActivityGetting StartedLexer InputRegular ExpressionsExample 1Example 2Scanning CommentsRegular Expressions, cont.◮Character ranges – pick any character in the range, based oncharacter codes: [c1− c2]◮Negative character ranges – any character not in the range:[c1− c2]◮e∗ has same meaning as we’ve already seen◮e+ means one ore more,


View Full Document

U of I CS 421 - Lexing and ocamllex

Documents in this Course
Lecture 2

Lecture 2

12 pages

Exams

Exams

20 pages

Lecture

Lecture

32 pages

Lecture

Lecture

21 pages

Lecture

Lecture

15 pages

Lecture

Lecture

4 pages

Lecture

Lecture

68 pages

Lecture

Lecture

68 pages

Lecture

Lecture

84 pages

s

s

32 pages

Parsing

Parsing

52 pages

Lecture 2

Lecture 2

45 pages

Midterm

Midterm

13 pages

LECTURE

LECTURE

10 pages

Lecture

Lecture

5 pages

Lecture

Lecture

39 pages

Load more
Download Lexing and ocamllex
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexing and ocamllex and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexing and ocamllex 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?