RecognizersParsers and recognizersBuilding a recognizerReview of BNFRecognizing simple alternatives, IRecognizing simple alternatives, IIJava codeHelper methodsRecognizing simple alternatives, IIIFirst implementation of symbolImplementing symbolnextTokenMatches #1nextTokenMatches #2addOperator repriseSequences, ISequences, IIImplementing a fancier pushBack()Sequences, IIISequences, IVSimple sequences in Javafalse vs. errorSequences and alternativesRecursion, IRecursion, IIExtended BNF—optional partsExtended BNF—zero or moreBack to parsersConclusionsThe EndJan 15, 2019Recognizers2Parsers and recognizersGiven a grammar (say, in BNF) and a string,A recognizer will tell whether the string belongs to the language defined by the grammarA parser will try to build a tree corresponding to the string, according to the rules of the grammarInput string Recognizer result Parser result2 + 3 * 4 true2 + 3 * false Error3Building a recognizerOne way of building a recognizer from a grammar is called recursive descentRecursive descent is pretty easy to implement, once you figure out the basic ideasRecursive descent is a great way to build a “quick and dirty” recognizer or parserProduction-quality parsers use much more sophisticated and efficient techniquesIn the following slides, I’ll talk about how to do recursive descent, and give some examples in Java4Review of BNF“Plain” BNF< > indicate a nonterminal that needs to be further expanded, for example, <variable>Symbols not enclosed in < > are terminals; they represent themselves, for example, if, while, (The symbol ::= means is defined asThe symbol | means or; it separates alternatives, for example, <addop> ::= + | -Extended BNF[ ] enclose an optional part of the ruleExample:<if statement> ::= if ( <condition> ) <statement> [ else <statement> ]{ } mean the enclosed can be repeated zero or more timesExample:<parameter list> ::= ( ) | ( { <parameter> , } <parameter> )5Recognizing simple alternatives, IConsider the following BNF rule:<add_operator> ::= + | -That is, an add operator is a plus sign or a minus signTo recognize an add operator, we need to get the next token, and test whether it is one of these charactersIf it is a plus or a minus, we simply return trueBut what if it isn’t?We not only need to return false, but we also need to put the token back because it doesn’t belong to us, and some other grammar rule probably wants itOur tokenizer needs to be able to take back tokensUsually, it’s enough to be able to put just one token backMore complex grammars may require the ability to put back several tokens6Recognizing simple alternatives, IIOur rule is <add_operator> ::= + | -Our method for recognizing an <add_operator> (which we will simply call addOperator) looks like this:public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false}7Java codepublic boolean addOperator() { Token t = myTokenizer.next(); if (t.type == Type.SYMBOL && t.value.equals("+")) { return true; } if (t.type == Type.SYMBOL && t.value.equals("-")) { return true; } myTokenizer.pushBack(1); return false;}While this code isn’t particularly long or hard to read, we are going to have a lot of very similar methods8Helper methodsRemember the DRY principle: Don’t Repeat YourselfIf we turn each BNF production directly into Java, we will be writing a lot of very similar codeWe should write some auxiliary or “helper” methods to hide some of the details for usFirst helper method:private boolean symbol(String expectedSymbol)Gets the next token and tests whether it matches the expectedSymbolIf it matches, returns trueIf it doesn’t match, puts the symbol back and returns falseWe’ll look more closely at this method in a moment9Recognizing simple alternatives, IIIOur rule is <add_operator> ::= + | -Our pseudocode is:public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false}Thanks to our helper method, our actual Java code is:public boolean addOperator() { return symbol("+") || symbol("-");}10First implementation of symbolHere’s what symbol does:Gets a tokenMakes sure that the token is a symbolCompares the symbol to the desired symbol (by value)If all the above is satisfied, returns true Else (if not satisfied) puts the token back, and returns falseprivate boolean symbol(String value) { Token t = tokenizer.next(); if (t.type == Type.SYMBOL && value.equals(t.value())) { return true; } else { tokenizer.pushBack(1); return false; }}11Implementing symbolWe can implement methods name, number, and maybe eol the same wayAll this code will look pretty much alikeThe main difference is in checking for the typeThe DRY principle suggests we should use a helper method for symbolprivate boolean symbol(String expectedValue) { return nextTokenMatches(Type.SYMBOL, expectedValue);}12nextTokenMatches #1The nextTokenMatches method should:Get a tokenCompare types and valuesReturn true if the token is as expectedPut the token back and return false if it doesn’t matchprivate boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); if (type == t.type() && value.equals(t.value())) { return true; } else { tokenizer.pushBack(1); return false; }}13nextTokenMatches #2The previous method is fine for symbols, but what if we only care about the type?For example, we want to get a number—any numberWe need to compare only type, not valueprivate boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); omit this parameter if (type == t.type() && value.equals(t.getValue())) return true; else tokenizer.pushBack(1); omit this test return false;}It’s easier to overload nextTokenMatches than to combine the two versions, and both versions are fairly short, so we are probably better off with the code duplication14addOperator reprisepublic boolean addOperator() { return symbol("+") || symbol("-");}private boolean symbol(String
View Full Document