UVA CS 671 - Lexical Analysis & Syntactic Analysis

Unformatted text preview:

Lexical Analysis & Syntactic AnalysisLast TimeFinite AutomataSlide 4ImplementationRegExp  Finite AutomatonDeterministic vs. NondeterministicDFA vs. NFARegExp  NFAInductive ConstructionExecuting NFAExampleNFADFA ConversionDFA MinimizationAutomatic Scanner ConstructionBuilding a LexerLexical Analysis SummaryWhere Are We?Phases of a CompilerWhat is Parsing?Tree RepresentationsOverview of Syntactic AnalysisWhat Parsing Doesn’t DoSpecifying Language SyntaxNeed a More Powerful RepresentationContext-Free GrammarsContext-Free Grammar TerminologySum GrammarDevelop a Context-Free Grammar for … 1. anbncn 2. ambncm+nConstructing a DerivationDerivation ExampleDerivation  Parse TreeParse Tree vs. ASTDerivation OrderSlide 35AssociativityAnother ExampleLeft vs. Right derivationsRight-Most DerivationLeft-Most DerivationImpact of AmbiguityDerivations and PrecedenceEliminating AmbiguityAdding precedenceExpression exampleAmbiguous grammarsIf-then-elseIf-then-else AmbiguityRemoving AmbiguityLimits of CFGsBig PictureRoadmapLexical Analysis & Syntactic AnalysisCS 671January 24, 20082CS 671 – Spring 2008Last TimeLexical analyzerSyntax analyzerSemantic analyzerIntermediate code generatorCode optimizerCode generatorSource programTarget programLexical Analyzer•Group sequence of characters into lexemes – smallest meaningful entity in a language (keywords, identifiers, constants)•Characters read from a file are buffered – helps decrease latency due to i/o. Lexical analyzer manages the buffer•Makes use of the theory of regular languages and finite state machines•Lex and Flex are tools that construct lexical analyzers from regular expression specifications3CS 671 – Spring 2008Finite AutomataTakes an input string and determines whether it’s a valid sentence of a language–A finite automaton has a finite set of states–Edges lead from one state to another–Edges are labeled with a symbol–One state is the start state–One or more states are the final state0 1 2i fIF0 1a-za-z0-9ID26 edges4CS 671 – Spring 2008Finite AutomataAutomaton (DFA) can be represented as:•A transition table \“ [^”]* \”•A graph” non-”0120 1 2””Non-”5CS 671 – Spring 2008Implementationboolean accept_state[NSTATES] = { … };int trans_table[NSTATES][NCHARS] = { … };int state = 0;while (state != ERROR_STATE) {c = input.read();if (c < 0) break;state = table[state][c];}return accept_state[state];0 1 2““Non-“6CS 671 – Spring 2008RegExp  Finite Automaton• Can we build a finite automaton for every regular expression?• Strategy: consider every possible kind of regular expression (define by induction)0 1aaR1R2R1|R2?7CS 671 – Spring 2008Deterministic vs. NondeterministicDeterministic finite automata (DFA) – No two edges from the same state are labeled with the same symbolNondeterministic finite automata (NFA) – may have arrows labeled with  (which does not consume input)baba a8CS 671 – Spring 2008DFA vs. NFA• DFA: action of automaton on each input symbol is fully determined–obvious table-driven implementation• NFA:–automaton may have choice on each step–automaton accepts a string if there is any way to make choices to arrive at accepting state / every path from start state to an accept state is a string accepted by automaton–not obvious how to implement efficiently!9CS 671 – Spring 2008RegExp  NFA-? [0-9]+ (-|) [0-9][0-9]*0,1,2…-0,1,2…10CS 671 – Spring 2008Inductive ConstructionaaR1R2R1|R2R*R1R2R1R2R11CS 671 – Spring 2008Executing NFA• Problem: how to execute NFA efficiently?“strings accepted are those for which there is some corresponding path from start state to an accept state”• Conclusion: search all paths in graph consistent with the string• Idea: search paths in parallel–Keep track of subset of NFA states that search could be in after seeing string prefix–“Multiple fingers” pointing to graph12CS 671 – Spring 2008Example• Input string: -23• NFA States____________________• Terminology: -closure - set of all reachable states without consuming any input -closure of 0 is {0,1}0,1,2…321-00,1,2…13CS 671 – Spring 2008NFADFA Conversion• Can convert NFA directly to DFA by same approach• Create one DFA for each distinct subset of NFA states that could arise• States: {0,1}, {1}, {2, 3}0,1,2…321-00,1,2…{0,1} {1}{2,3}-0,1,2… 0,1,2…0,1,2…14CS 671 – Spring 2008DFA Minimization• DFA construction can produce large DFA with many states• Lexer generators perform additional phase of DFA minimization to reduce to minimum possible size1110 00What does this DFA do?Can it be simplified?15CS 671 – Spring 2008Automatic Scanner ConstructionTo convert a specification into code1. Write down the RE for the input language2. Build a big NFA3. Build the DFA that simulates the NFA4. Systematically shrink the DFA5. Turn it into codeScanner generators•Lex and flex work along these lines•Algorithms are well known and understood•Key issue is interface to the parser16CS 671 – Spring 2008Building a LexerSpecification“if”“while”[a-zA-Z][a-zA-Z0-9]*[0-9][0-9]*()NFA for each REGiant NFAGiant DFA Table-driven code17CS 671 – Spring 2008Lexical Analysis Summary• Regular expressions–efficient way to represent languages–used by lexer generators• Finite automata–describe the actual implementation of a lexer• Process–Regular expressions (+priority) converted to NFA–NFA converted to DFA18CS 671 – Spring 2008Where Are We?Source code: if (b==0) a = “Hi”;Token Stream: if (b == 0) a = “Hi”;Abstract Syntax Tree (AST)Lexical AnalysisSyntactic AnalysisSemantic Analysisif==b0=a“Hi”;Do tokens conform to the language syntax?19CS 671 – Spring 2008Phases of a CompilerLexical analyzerSyntax analyzerSemantic analyzerIntermediate code generatorCode optimizerCode generatorSource programTarget programParser• Convert a linear structure – sequence of tokens – to a hierarchical tree-like structure – an AST• The parser imposes the syntax rules of the language• Work should be linear in the size of the input (else unusable)  type consistency cannot be checked in this phase•Deterministic context free languages and pushdown automata for the basis• Bison and yacc allow a user to construct parsers from CFG specifications20CS 671 – Spring 2008What is Parsing?Parsing: Recognizing whether a sentence (or


View Full Document

UVA CS 671 - Lexical Analysis & Syntactic Analysis

Download Lexical Analysis & Syntactic Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lexical Analysis & Syntactic Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lexical Analysis & Syntactic Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?