UD CISC 471 - The Transport Layer- Tutorial and Survey

Unformatted text preview:

The Front End: Scanning and Parsing How they work together…What is a token? A lexeme?Designing a ScannerThen, why did they invent lex?Specifying lexemes with Regular ExpressionsExamples of Regular Expressions for LexemesPractice with writing regular expressionsWhat strings are accepted here?From Specification to Scanning…From Reg Expr to NFAScanning as a Finite AutomatonThe Whole Scanner Generator ProcessHowever, 3 Major Ways to Build ScannersA Semi-mechanical DFA WayManually written scanner codeThe Scanner Generator WayMore on the Scanner Generator on Thursday…Form of a Lex/Flex Spec FileLex Spec ExampleSome Notes on LexA Makefile for the scannerA typical token.h fileA typical driver for testing the scanner without a parserIn summary, Scanner is the only phase to see the input file, so…Why separate phases?The Front End:Scanning and ParsingHow they work together…scanner parserstring tableSourcefileIRGet next tokenerrorstokenWhat is a token? A lexeme?• English? • Programming Languages?• Lexeme •Token•Examples?lexemes tokensDesigning a ScannerStep 1: define a finite set of tokensHow? Step 2: describe the strings (lexemes) for each tokenHow? So, a simple scanner design?Then, why did they invent lex?It is not so straightforward…Even, simple examples: i vs if ; = vs ==Specifying lexemes with Regular ExpressionsLet ∑ be an alphabet.Rules for Defining regular expressions over ∑ :- ε Denotes the set containing the empty string.-For each a in ∑ , a is the reg expr denoting {a}- If r and s are reg expr’s, thenr s = set of strings consisting of strings from r followed by strings from sr | s = set of strings for either r or sr * = 0 or more strings from r (closure) (r) used to indicate precedenceExamples of Regular Expressions for LexemesWhat strings/lexemes are represented by these regular expressions?Practice with writing regular expressions- Binary numbers of at least one digit- Capitalized words-Legal identifiers that must start with a letter, can contain either upper or lower case letters, digits, or _.-white space including tabs, newlines, spacesShorthand for regular expressions?What strings are accepted here?• Numerical literals in Pascal may be generated by the following:From Specification to Scanning…From Reg Expr to NFAHow do we build an NFA for:a?Concatenation? abAlternation? a | bClosure? a*Scanning as a Finite AutomatonThe Whole Scanner Generator ProcessHowever, 3 Major Ways to Build Scanners– ad-hoc– semi-mechanical pure DFA (usually realized as nested case statements)– table-driven DFA• Ad-hoc generally yields the fastest, most compact code by doing lots of special-purpose things, though good automatically-generated scanners come very closeA Semi-mechanical DFA WayManually written scanner codeThe Scanner Generator WayMore on the Scanner Generator on Thursday…Since the scanner is the only phase to touch the input source file, what else does it need to do?Form of a Lex/Flex Spec FileDefinitions/declarations used for re clarity%%Reg exp0 {action0} // translation rules to be Reg exp1 {action1} // converted to scanner……%%Auxiliary functions to be copied directlyLex Spec Exampledelim [ \t\n]ws {delim}+letter [A-Aa-z]digit [0-9]id {letter}({letter}|{digit})*number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?%%{ws} {/*no action and no return*?}if {return(IF);}then {return(THEN);}{id} {yylval=(int) installID(); return(ID);}{number} {yylval=(int) installNum(); return(NUMBER);}%%Int installID() {/* code to put id lexeme into string table*/}Int installNum() {/* code to put number constants into constant table*/}Some Notes on Lex• yylval – global integer variable to pass additional information about the lexeme• yyleng – length of lexeme matched• yytext – points to start of lexemeLex/flex scanner generatorTurtle.lLex speclex.yy.cc/c++ compiler scannerTest.tltToken streamA Makefile for the scannereins.out: eins.tlt scanner scanner < eins.tlt > eins.outlex.yy.o: lex.yy.c token.h symtab.hgcc -c lex.yy.clex.yy.c: turtle.lflex turtle.lscanner: lex.yy.o symtab.cgcc lex.yy.o symtab.c -lfl -o scannerA typical token.h file#define SEMICOLON 274#define PLUS 275#define MINUS 276#define TIMES 277#define DIV 278#define OPEN 279#define CLOSE 280#define ASSIGN 281… /*for all tokens*/typedef union YYSTYPE{ int i; node *n; double d;}YYSTYPE;YYSTYPE yylval;A typical driver for testing the scanner without a parser%%main(){int token;while ((token = yylex()) != 0) {switch (token) {case JUMP : printf("JUMP\n"); break;/*need a case here for every token possible, printing yylval as needed for those with more than one lexeme per token*/default:printf("ILLEGAL CHARACTER\n"); break;}}}In summary, Scanner is the only phase to see the input file, so…The scanner is responsible for:– tokenizing source– removing comments– (often) dealing with pragmas (i.e., significant comments)– saving text of identifiers, numbers, strings– saving source locations (file, line, column) for error messagesWhy separate phases?Slide From Keith Cooper,


View Full Document

UD CISC 471 - The Transport Layer- Tutorial and Survey

Documents in this Course
Load more
Download The Transport Layer- Tutorial and Survey
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Transport Layer- Tutorial and Survey and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Transport Layer- Tutorial and Survey 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?