Unformatted text preview:

Lexical Analysis (Sections 2.1.2-2.1.3)Phases of CompilationSpecification of Programming LanguagesSlide 4ScannerFormal definition of tokensRegular ExpressionsToken Definition ExampleScanningDFAsDifficultiesPowerPoint PresentationScanner GeneratorsSlide 14Scanners and String Processing11Lexical AnalysisLexical Analysis(Sections 2.1.2-2.1.3)(Sections 2.1.2-2.1.3)CSCI 431 Programming LanguagesCSCI 431 Programming LanguagesFall 2003Fall 2003A modification of slides developed by Felix A modification of slides developed by Felix Hernandez-Campos at UNC Chapel Hill Hernandez-Campos at UNC Chapel Hill22Phases of CompilationPhases of Compilation33Specification of Programming LanguagesSpecification of Programming Languages•PLs require precise definitions (i.e. no PLs require precise definitions (i.e. no ambiguityambiguity))–LanguageLanguage form form (Syntax)(Syntax)–LanguageLanguage meaning meaning (Semantics) (Semantics)•Consequently, PLs are specified using formal Consequently, PLs are specified using formal notation:notation:–Formal syntaxFormal syntax»TokensTokens»GrammarGrammar–Formal semanticsFormal semantics44Phases of CompilationPhases of Compilation55ScannerScanner•Main task: identify tokensMain task: identify tokens–Basic building blocks of programsBasic building blocks of programs–E.g.E.g. keywords, identifiers, numbers, punctuation marks keywords, identifiers, numbers, punctuation marks•Other tasks: remove comments, deal with pragmas, Other tasks: remove comments, deal with pragmas, save source locationssave source locations•Desk calculator language example:Desk calculator language example:read Aread Asum := A + 3.45e-3sum := A + 3.45e-3write sumwrite sumwrite sum / 2write sum / 266Formal definition of tokensFormal definition of tokens•A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet–{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}•A set of tokens is a A set of tokens is a regular setregular set that can be defined by that can be defined by comprehension using a comprehension using a regular expressionregular expression•For every regular set, there is a For every regular set, there is a deterministic finite deterministic finite automatonautomaton (DFA) that can recognize it (DFA) that can recognize it–i.e.i.e. determine whether a string belongs to the set or not determine whether a string belongs to the set or not–Scanners extract tokens from source code in the same way Scanners extract tokens from source code in the same way DFAs determine membershipDFAs determine membership77Regular ExpressionsRegular Expressions•A regular expression (RE) is:A regular expression (RE) is:–A single characterA single character–The empty string, The empty string, –The The concatenationconcatenation of two regular expressions of two regular expressions»Notation:Notation: RE RE11 RE RE22 ( (i.e. i.e. RERE11 followed by RE followed by RE22))–The The unionunion of two regular expressionsof two regular expressions»Notation: Notation: RERE11 | RE | RE22–The The closureclosure of a regular expression of a regular expression»Notation: Notation: RE*RE*»* is known as the * is known as the Kleene starKleene star»* * represents the concatenation of 0 or more stringsrepresents the concatenation of 0 or more strings88Token Definition ExampleToken Definition Example•Numeric literals in PascalNumeric literals in Pascal–Definition of the token Definition of the token unsigned_numberunsigned_numberdigit digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9unsigned_integer unsigned_integer  digitdigit digitdigit**unsigned_number unsigned_number  unsigned_integer unsigned_integer ( ( . ( ( . unsigned_integer unsigned_integer ) | ) |  ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | |  ) )•Recursion is not allowed!Recursion is not allowed!•Notice the use of parentheses to avoid ambiguityNotice the use of parentheses to avoid ambiguity99ScanningScanning•Pascal scannerPascal scannerPseudo-codePseudo-code1010DFAsDFAs•Scanners are Scanners are deterministicdeterministicfinite automatafinite automata(DFAs)(DFAs)–With someWith somehackshacks1111DifficultiesDifficulties•Keywords and variable namesKeywords and variable names•Look-aheadLook-ahead–Pascal’s ranges [1..10]Pascal’s ranges [1..10]–FORTRAN’s exampleFORTRAN’s exampleDO 5 I=1,25DO 5 I=1,25=> Loop 25 times up to label 5=> Loop 25 times up to label 5DO 5 I=1.25DO 5 I=1.25 => Assign 1.25 to DO5I=> Assign 1.25 to DO5I»NASA’s Mariner 1 (apocryphal?)NASA’s Mariner 1 (apocryphal?)•Pragmas: Pragmas: significant commentssignificant comments–Compiler optionsCompiler options1212•Outline ofOutline ofthe Scannerthe Scanner1313Scanner GeneratorsScanner Generators•Scanners generators:Scanners generators:– E.g. E.g. lexlex, flex, flex–These programs take regular expressions as their input and These programs take regular expressions as their input and return a program (return a program (i.e.i.e. a a scannerscanner) that can extract tokens ) that can extract tokens from a stream of charactersfrom a stream of characters1414•Table-drivenTable-drivenscannerscanner•Lexical errorsLexical errors1515Scanners and String ProcessingScanners and String Processing•Scanning is a common task in programmingScanning is a common task in programming–String processingString processing–E.g.E.g. reading configuration files, processing log files,… reading configuration files, processing log files,…•StringTokenizerStringTokenizer and and StreamTokenizerStreamTokenizer in Java in Java•Regular expressions in Perl and other scripting Regular expressions in Perl and other scripting


View Full Document

UNCA CSCI 431 - Lexical Analysis

Download Lexical Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexical Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexical Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?