AdministrativeSyntactic AnalysisExample of Scanner OutputInternal representationsStrategy Overview for ScannerAdministrative• (Repeat) CS staff (not us!) will process the wait list as space be-comes available.• (Repeat) If you decide to drop , p l ease inform TeleBEARS as soon aspossible to let others in.• Register with the course electronically from your account (whetheror not enrolled yet)by Monday.• CSUA (Computer Science Undergra dua tes As sociation):– Web site: http://csua. berkeley.edu– Mon., Jan 24 at 5PM, 337 Soda: First Gener a l Meeting (+ freedinner)– Tue, Jan 25 at 6PM, 306 Soda: vi and emacs Help Session– Thu, Jan 27, 6PM, Wozniak Lounge (4th floor Soda): MentoringMeeting - Pair up with an UD major.Last modified: Fri Jan 21 12:4 0:30 2005 CS164: Lecture #2 1Syntactic Analysis• First part of the “front end” of compiler.• Purpose: convert string of characters (from file, editor input buffer,etc.) into a tree or other interme di ate form.• Like all phases of compiler:– Eliminates (or moves aside) information not needed by later phases.– Detects errors and substitutes something correct.– Gathers and converts other information to convenient form.• Traditionally divided into alexical analyzer(scanner) and(context-free) par ser.• Scanner converts stream of characters into stream oftokens:moreconvenient chunks.Last modified: Fri Jan 21 12:4 0:30 2005 CS164: Lecture #2 2Example of Scanner Output• Origina l program might beif(i== j)z = 0; /* No work needed */elsez= 1;• Or as the translator sees it:\tif(i== j)\n \t\tz = 0; /* No work needed */\n\te lse\n\t\tz= 1;• Scanner is intended to convert this to something like:IF, LPAR, ID("i"), EQUAL S, ID ("j"), RPAR, ID("z"), AS SIGN,INTLIT("0"), SEMI, ELSE, ID("z"), ASSIGN, INTLIT("1"), SEMI• That is, a sequence of objects, each with asyntactic categor y(IF,etc.) of interest to the pa rser, and possi bl y alexical valueof inter-est to later things.• May also add source-location information for error messages.Last modified: Fri Jan 21 12:4 0:30 2005 CS164: Lecture #2 3Internal representations• Syntactic categories (symbolic in example) can simply be member sof a finite set of integer s or other discrete values:– (convenient for equality comparison or indexing tables)• The lexical value could be anything. In our example, it is thelexemeitself (the actual source characters ).• So one might define:class Token {enum SyntacticCategory { IF, LPAR, ID, EQUALS, RP A R , ASSIGN, ... };SyntacticCat e g o ry syntax;Object value;Location sourcePosi t i o n;...}Last modified: Fri Jan 21 12:4 0:30 2005 CS164: Lecture #2 4Strategy Overview for Scanner•Regular expr essionscan describe lexeme s.•Finite-state automata(FSAs): abstract machines that can recog-nize languages .•Determinis ti c finite-state automata(DFAs): s ubset of FSAs easilyconverted into programs.• Can convert regular expressions to FSAs.• Any FSA can be converte d to a DFA ( i f not already there), and henceto program.• The total process from regular expression to program is automat-able (in our course, flex and jflex).Last modified: Fri Jan 21 12:4 0:30 2005 CS164: Lecture #2
View Full Document