DOC PREVIEW
U of I CS 421 - Regular expressions

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 421 Lecture 6: Regular expressions Announcements Lecture outline Regular expressions Ocamllex6/8/20091Announcements MP2 extension and update New due date 1:00pm Friday, June 12 Problem 10 has been updated “Valid” old solutions will get full credit MP3 has been posted Due 1:00pm Wed, June 17 Warning: more work than the first MPs Collaboration is allowed in two-person teamsOverview of Ocamllex Automatic OCaml lexer generatorOCaml definition of a lexing functionSpecification of tokens via regular expressionsOcamllexRegular expressions A regular expression is one ofε, a.k.a. “” ‘a’ for any character a r1r2, where r1and r2are regular expr’s r1|r2, where r1and r2are regular expr’s r*, where r is a regualr expr Ø Every regular expr r represents a set of strings, denoted L(r)Languageof rRegular expression examples L(‘a’ ’b’ ’c’) = {“abc”} L((‘a’ | ’b’) ‘c’) = {“ac”, “bc”} L((‘a’ | ‘b’)* ‘c’) = {“c”, “ac”, “bc”, “aac”, “abc”, …}Regular expression examples Keywords: ‘c’ ‘a’ ‘s’ ‘e’ | ‘c’ ‘ ‘l’ ‘a’ ‘s’ ‘s’ | … Operators ‘<‘ | ‘<‘ ‘<‘ | ‘<‘ ‘=‘ | … Identifiers (‘a’ | ‘b’ | … | ‘z’ | ‘A’ | … | ‘Z’)(‘a’ | ‘b’ | … | ‘z’ | ‘A’ | … | ‘Z’ | ‘0’ | ‘1’ | … | ‘9’)* Int literals ??Abbreviations “c1c2… cn” => ‘c1’ ‘c2’ … ‘cn’ [‘a’ – ‘z’ ‘#’] => ‘a’ | ‘b’ | … | ‘z’ | ‘#’ [‘a’ ‘w’ ‘#’] => ‘a’ | ‘w’ | ‘#’ r+ => r(r*) r? => r | “” [^ ‘a’ – ‘z’] => all chars except ‘a’ – ‘z’(complement of ‘a’ – ‘z’) _ => any single charRegular expressions examples Floating-point literal[‘0’-’9’]+ . [‘0’-’9’]+ ([‘e’’E’] [‘+’’-’]? [‘0’-’9’]+)? Note: r* = (r+)?Regular expression examples C++ style comments (// …)“//” [^ ‘\n’]* ‘\n’ C style comments (/* … */)“/*” ([^ ‘*’] | ‘*’+ [^ ‘*’’/’])* “*/”Implementing regular expressions Translate REs to NFAs Translate NFAs to DFAsLexing with regular expressions Create one large RE:RE for case {action for case}| RE for class {action for class}| …| RE for idents {action for idents}| RE for FP lits {action for FP lits}| RE for Int lits {action for int lits} Then add some actionsLexing with regular expressions (cont.) Ambiguous cases: Two tokens found, one longer Choose the longer one Two tokens found, the same length Choose the earlier reg. expr.Ocamllex mechanics Put table of regular expressions and corresponding actions (written in Ocaml) into a file <filename>.mll Callocamllex <filename.mll> Produces Ocaml code for a lexical analyzer in<filename>.mlOcamllex input{header} let ident = regexp …rule entrypoint[arg1 … argn] = parse regexp {action}and entrypoint[arg1 … argn] =parse … and …{trailer}Ocamllex input{header} let ident = regexp …rule entrypoint[arg1 … argn] = parse regexp {action}and entrypoint[arg1 … argn] =parse … and …{trailer}header – ocaml defnsEntrypoint – name of gen’d function with argsarg1, …, argn, lexbuftrailer – ocaml defnsOcamllex inputheaderand trailercontain arbitrary Ocaml code put at top and bottom of <filename>.ml let ident= regexp… introduces identfor use in later regular expressionsSample inputrule main = parse[‘0’-’9’]+ { print_string “Int\n” }| [‘0’-’9’]+’.’[‘0’-’9’] { print_string “Float\n” }| [‘a’-’z’]+ { print_string “String\n” }| _ { main lexbuf }{let newlexbuf = (Lexing.from_channel stdin) inprint_string “Ready to lex.\n”;main newlexbuf}Ocamllex output <filename>.ml contains one lexing function per entrypoint Name of function is name given for entrypoint Each entry point becomes an Ocaml function that takes n+1 arguments The extra implicit argument being of type Lexing.lexbufarg1…argnare for use in actionOcamllex regular expressions ‘a’ : single quoted characters for letters _ : matches any character eof : special end_of_file marker e1e2: concatenation “string” : concatenation of a sequence of characters e1|e2: choiceOcamllex regular expressions [c1-c2] : choice of any character between first and second, inclusive, as determined by character codes [^c1-c2] : choice of any character NOT in the set e* : same as before e+ : same as e e* e? : option – was e1|εOcamllex regular expression e1#e2: the characters in e1but not in e2; e1and e2must describe just sets of characters ident: abbreviation for earlier reg exp in let ident= regexp e1 as id: binds the result of e1 to id, to be used in the associated action Example([‘0’-’9’]+ as decpart ‘.’ ([‘0’-’9’]+ as fracpart …Ocamllex manual More details can be found at http://caml.inria.fr/pub/docs/manual-ocaml/manual026.htmlExample: test.mll{ type result = Int of int | Float of float | String of string }let digit = [‘0’-’9’]let digits = digit+let lower_case = [‘a’-’z’]let upper_case = [‘A’-’Z’]let letter = lower_case | upper_caselet letters = letter+...Example: test.mllrule main = parsedigits’.’digits as f { Float (float_of_string f) }| digits as n { Int (int_of_string n) }| letters as s { String s }| _ { main lexbuf }{let newlexbuf = (Lexing.from_channel stdin) inprint_string “Ready to lex.\n”;main newlexbuf }Example> ocamllex test.mll> ocaml# #use “test.ml”...val main : Lexing.lexbuf -> result = <fun>Ready to lex.hi there 234 5.6- : result = String “hi”# What happened to the rest?Example# let b = Lexing.from_channel stdin;;# main b;;hi 789 there- : result = String “hi”# main b;;- : result = Int 789# main b;;- : result = String “there”Problem How to get the lexer to look at more than the first token? Answer 1: repeatedly call lexing function Answer 2: actionhas to tell it to – recursive calls. Value of action is token list instead of token. Note: already used this with the _ caseExamplerule main = parsedigits’.’digits as f { Float (float_of_string f):: main lexbuf }| digits as n { Int (int_of_string n):: main lexbuf }| letters as s { String s :: main lexbuf }| eof { [] }| _ { main lexbuf


View Full Document

U of I CS 421 - Regular expressions

Documents in this Course
Lecture 2

Lecture 2

12 pages

Exams

Exams

20 pages

Lecture

Lecture

32 pages

Lecture

Lecture

21 pages

Lecture

Lecture

15 pages

Lecture

Lecture

4 pages

Lecture

Lecture

68 pages

Lecture

Lecture

68 pages

Lecture

Lecture

84 pages

s

s

32 pages

Parsing

Parsing

52 pages

Lecture 2

Lecture 2

45 pages

Midterm

Midterm

13 pages

LECTURE

LECTURE

10 pages

Lecture

Lecture

5 pages

Lecture

Lecture

39 pages

Load more
Download Regular expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?