DOC PREVIEW
U of I CS 421 - Programming Languages and Compilers

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Programming Languages andCompilers (CS 421)Elsa L Gunter2112 SC, UIUChttp://www.cs.uiuc.edu/class/sp07/cs421/Based in part on slides by Mattox Beckman, as updatedby Vikram Adve and Gul AghaElsa L. GunterWhere We Are Going• We want to turn strings (code) intocomputer instructions• Done in phases• Turn strings into abstract syntax trees(parse)• Translate abstract syntax trees intoexecutable instructions (interpret orcompileElsa L. GunterLexing and Parsing• Converting strings to abstract syntaxtrees done in two phases– Lexing: Converting string (or streamsof characters) into lists (or streams) oftokens (the “words” of the language)– Parsing: Convert a list of tokens intoan abstract syntax treeElsa L. GunterLexing• Different syntactic categories of“words”: tokensExample:• Convert sequence of characters intosequence of strings, integers, andfloating point numbers.• "asd 123 jkl 3.14" will become:[String "asd"; Int 123; String "jkl"; Float3.14]Elsa L. GunterLexing• Each category described by regularexpression (with extended syntax)• Words recognized by (encoding of)corresponding finite stateautomaton• Problem: we want to pull words outof a string; not just recognize asingle wordElsa L. GunterLexing• Modify behavior of DFA• When we encounter a character in astate for which there is no transaction– Stop processing the string– If in an accepting state, return thetoken that corresponds to the state,and the remainder of the string– If not, fail• Add recursive layer to get sequenceElsa L. GunterExample• s1 return a string• s2 return an integer• s3 return a float.s3s2s0a-z0-9s1a-z0-90-9Elsa L. GunterLex, ocamllex• Could write the regular expression, thentranslate to DFA by hand– A lot of work• Better: Write program to take regularexpression as input and automaticallygenerate automata• Lex is such a program• ocamllex version for ocamlElsa L. GunterHow to do it• To use regular expressions tolex our input we need:–Some way to identify the inputstring — call it a lexing buffer–Set of regular expressions,–Corresponding set of actions totake when they are matched.Elsa L. GunterHow to do it• The lexer will take the regularexpressions and generate a statemachine.• The state machine will take our lexingbuffer and apply the transitions...• If we reach an accept state from whichwe can go no further, the machine willperform the appropriate action.Elsa L. GunterMechanics• Put table of reg exp and correspondingactions (written in ocaml) into a file<filename>.mll• Callocamllex <filename>.mll• Produces Caml code for a lexicalanalyzer in file <filename>.mlElsa L. GunterSample Inputrule main = parse ['0'-'9']+ { print_string "Int\n"} | ['0'-'9']+'.'['0'-'9']+ { print_string "Float\n"} | ['a'-'z']+ { print_string "String\n"} | _ { main lexbuf } { let newlexbuf = (Lexing.from_channel stdin) in print_string "Ready to lex.\n"; main newlexbuf}Elsa L. GunterGeneral Input{ header }let ident = regexp ...rule entrypoint [arg1... argn] = parse regexp { action } | ... | regexp { action }and entrypoint [arg1... argn] = parse...and ...{ trailer }Elsa L. GunterOcamllex Input• header and trailer contain arbitraryocaml code put at top an bottom of<filename>.ml• let ident = regexp ... Introducesident for use in later regularexpressionsElsa L. GunterOcamllex Input• <filename>.ml contains one lexingfunction per entrypoint– Name of function is name given forentrypoint– Each entry point becomes an OCamlfunction that takes n+1 arguments,the extra implicit last argument beingof type Lexing.lexbuf• arg1... argn are for use in actionElsa L. GunterOcamllex Regular Expression• Single quoted characters for letters:‘a’• _: (underscore) matches any letter• Eof: special “end_of_file” marker• Concatenation same as usual• “string”: concatenation of sequenceof characters• e1 | e2 : choice - what was e1 ∨ e2Elsa L. GunterOcamllex Regular Expression• [c1 - c2]: choice of any characterbetween first and second inclusive,as determined by character codes• [^c1 - c2]: choice of any characterNOT in set• e*: same as before• e+: same as e e*• e?: option - was e1 ∨ εElsa L. GunterOcamllex Regular Expression• e1 # e2: the characters in e1 but notin e2; e1 and e2 must describe justsets of characters• ident: abbreviation for earlier regexp in let ident = regexp• e1 as id: binds the result of e1 to idto be used in the associated actionElsa L. GunterOcamllex Manual• More details can be found athttp://caml.inria.fr/pub/docs/manual-ocaml/manual026.htmlElsa L. GunterExample : test.mll{ type result = Int of int | Float of float |String of string }let digit = ['0'-'9']let digits = digit +let lower_case = ['a'-'z']let upper_case = ['A'-'Z']let letter = upper_case | lower_caselet letters = letter +Elsa L. GunterExample : test.mllrule main = parse (digits)'.'digits as f { Float (float_of_string f) } | digits as n { Int (int_of_string n) } | letters as s { String s} | _ { main lexbuf } { let newlexbuf = (Lexing.from_channel stdin) in print_string "Ready to lex."; print_newline (); main newlexbuf }Elsa L. GunterExample# #use "test.ml";;…val main : Lexing.lexbuf -> result = <fun>val __ocaml_lex_main_rec : Lexing.lexbuf -> int-> result = <fun>Ready to lex.hi there 234 5.2- : result = String "hi"• What happened to the rest?!?Elsa L. GunterExample# let b = Lexing.from_channel stdin;;# main b;;hi 673 there- : result = String "hi"# main b;;- : result = Int 673# main b;;- : result = String "there"Elsa L. GunterProblem• How to get lexer to look at more thanthe first token?• Answer: action has to tell it to --recursive calls• Side Benefit: can add “state” into lexing• Note: already used this with the _ caseElsa L. GunterExamplerule main = parse (digits) '.' digits as f { Float (float_of_stringf) :: main lexbuf} | digits as n { Int (int_of_string n) ::main lexbuf } | letters as s { String s :: main lexbuf} | eof { [] } | _ { main lexbuf }Elsa L. GunterExample ResultsReady to lex.hi there 234 5.2- : result list = [String "hi"; String "there";Int 234; Float 5.2]#• Used Ctrl-d to send the end-of-filesignalElsa L. GunterDealing with comments• First Attemptlet open_comment = "(*"let close_comment = "*)"rule main = parse (digits) '.' digits as f { Float(float_of_string f) :: main lexbuf} | digits as n { Int


View Full Document

U of I CS 421 - Programming Languages and Compilers

Documents in this Course
Lecture 2

Lecture 2

12 pages

Exams

Exams

20 pages

Lecture

Lecture

32 pages

Lecture

Lecture

21 pages

Lecture

Lecture

15 pages

Lecture

Lecture

4 pages

Lecture

Lecture

68 pages

Lecture

Lecture

68 pages

Lecture

Lecture

84 pages

s

s

32 pages

Parsing

Parsing

52 pages

Lecture 2

Lecture 2

45 pages

Midterm

Midterm

13 pages

LECTURE

LECTURE

10 pages

Lecture

Lecture

5 pages

Lecture

Lecture

39 pages

Load more
Download Programming Languages and Compilers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Programming Languages and Compilers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Programming Languages and Compilers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?