Unformatted text preview:

COS 320 CompilersOutlineThe Front EndLexical AnalysisLexical Analysis ExampleSlide 6Slide 7Slide 8Lexer ImplementationSlide 10Slide 11Slide 12Some DefinitionsRegular Expressions: ConstructionRegular ExpressionsSlide 16Slide 17Ambiguous Token Rule SetsSlide 19Slide 20Slide 21Slide 22Slide 23ML-Lex SpecificationUser DeclarationsML-LEX DefinitionsRulesSlide 28Slide 29A Simple LexerUsing Multiple LexersSlide 32A (Marginally) More Exciting LexerImplementing LexersTable-driven algorithmSlide 36Slide 37SummaryCOS 320CompilersDavid WalkerOutline•Last Week–Introduction to ML•Today:–Lexical Analysis–Reading: Chapter 2 of AppelThe Front End•Lexical Analysis: Create sequence of tokens from characters•Syntax Analysis: Create abstract syntax tree from sequence of tokens•Type Checking: Check program for well-formedness constraintsLexer Parserstream ofcharactersstream oftokensabstractsyntaxTypeCheckerLexical Analysis•Lexical Analysis: Breaks stream of ASCII characters (source) into tokens•Token: An atomic unit of program syntax–i.e., a word as opposed to a sentence•Tokens and their types:Type:IDREALSEMILPARENNUMIFCharacters Recognized:foo, x, listcount10.45, 3.14, -2.1;(50, 100ifToken:ID(foo), ID(x), ...REAL(10.45), REAL(3.14), ...SEMILPARENNUM(50), NUM(100)IFLexical Analysis Examplex = ( y + 4.0 ) ;Lexical Analysis Examplex = ( y + 4.0 ) ;ID(x)Lexical AnalysisLexical Analysis Examplex = ( y + 4.0 ) ;ID(x) ASSIGNLexical AnalysisLexical Analysis Examplex = ( y + 4.0 ) ;ID(x) ASSIGN LPAREN ID(y) PLUS REAL(4.0) RPAREN SEMILexical AnalysisLexer Implementation•Implementation Options:1. Write a Lexer from scratch–Boring, error-prone and too much work2. Use a Lexer Generator–Quick and easy. Good for lazy compiler writers.Lexer SpecificationLexer Implementation•Implementation Options:1. Write a Lexer from scratch–Boring, error-prone and too much work2. Use a Lexer Generator–Quick and easy. Good for lazy compiler writers.Lexer SpecificationlexergeneratorLexerLexer Implementation•Implementation Options:1. Write a Lexer from scratch–Boring, error-prone and too much work2. Use a Lexer Generator–Quick and easy. Good for lazy compiler writers.Lexer SpecificationlexergeneratorLexerstream ofcharactersstream oftokens•How do we specify the lexer?–Develop another language –We’ll use a language involving regular expressions to specify tokens•What is a lexer generator?–Another compiler ....Some Definitions•We will want to define the language of legal tokens our lexer can recognize–Alphabet – a collection of symbols (ASCII is an alphabet)–String – a finite sequence of symbols taken from our alphabet–Language of legal tokens – a set of strings•Language of ML keywords – set of all strings which are ML keywords (FINITE)•Language of ML tokens – set of all strings which map to ML tokens (INFINITE)•Some people use the word “language” to mean more general sets:–eg: ML Language – set of all strings representing correct ML programs (INFINITE).Regular Expressions: Construction•Base Cases:–For each symbol a in alphabet, a is a RE denoting the set {a}–Epsilon (e) denotes { }•Inductive Cases (M and N are REs)–Alternation (M | N) denotes strings in M or N•(a | b) == {a, b}–Concatenation (M N) denotes strings in M concatenated with strings in N•(a | b) (a | c) == { aa, ac, ba, bc }–Kleene closure (M*) denotes strings formed by any number of repetitions of strings in M•(a | b )* == {e, a, b, aa, ab, ba, bb, ...}Regular Expressions•Integers begin with an optional minus sign, continue with a sequence of digits•Regular Expression: (- | e) (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)*Regular Expressions•Integers begin with an optional minus sign, continue with a sequence of digits•Regular Expression: (- | e) (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)*•So writing (0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9) and even worse (a | b | c | ...) gets tedious...Regular Expressions•common abbreviations: –[a-c] == (a | b | c)–. == any character except \n–\n == new line character–a+ == one or more–a? == zero or one•all abbreviations can be defined in terms of the “standard” regular expressionsAmbiguous Token Rule Sets•A single expression is a completely unambiguous specification of a token.•Sometimes, when we put together a set of regular expressions to specify all of the tokens in a language, ambiguities arise:–i.e., two regular expression match the same stringAmbiguous Token Rule Sets•Example:–Identifier tokens: a-z* (a-z | 0-9)*–Sample keyword tokens: if, then, ...•How do we tokenize:–foobar ==> ID(foobar) or ID(foo) ID(bar)–if ==> ID(if) or IFAmbiguous Token Rule Sets•We resolve ambiguities using two rules:–Longest match: The regular expression that matches the longest string takes precedence.–Rule Priority: The regular expressions identifying tokens are written down in sequence. If two regular expressions match the same (longest) string, the first regular expression in the sequence takes precedence.Ambiguous Token Rule Sets•Example:–Identifier tokens: a-z (a-z | 0-9)*–Sample keyword tokens: if, then, ...•How do we tokenize:–foobar ==> ID(foobar) or ID(foo) ID(bar)–if ==> ID(if) or IFAmbiguous Token Rule Sets•Example:–Identifier tokens: a-z* (a-z | 0-9)*–Sample keyword tokens: if, then, ...•How do we tokenize:–foobar ==> ID(foobar) or ID(foo) ID(bar)–if ==> ID(if) or IFLexer ImplementationImplementation Options:1. Write Lexer from scratch–Boring and error-prone2. Use Lexical Analyzer Generator–Quick and easyml-lex is a lexical analyzer generator for ML.lex and flex are lexical analyzer generators for C.ML-Lex Specification•Lexical specification consists of 3 parts:User Declarations%%ML-LEX Definitions%%RulesUser Declarations•User Declarations:–User can define various values that are available to the action fragments.–Two values must be defined in this section:•type lexresult–type of the value returned by each rule action.•fun eof ()–called by lexer when end of input stream is reached.ML-LEX Definitions•ML-LEX Definitions:–User can define regular expression abbreviations:–Define multiple lexers to work together. Each is given a unique name.DIGITS =


View Full Document

Princeton COS 320 - Compilers

Download Compilers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Compilers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Compilers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?