RecognizersParsers and recognizersBuilding a recognizerRecognizing simple alternatives, ICreating a decorator classPushbackTokenizer IPushbackTokenizer IISample use of PushbackTokenizerRecognizing simple alternatives, IIHelper methodsRecognizing simple alternatives, IIICategories of tokensImplementing symbolnextTokenMatches #1nextTokenMatches #2addOperator repriseSequences, ISequences, IISequences, IIISimple sequences in JavaSequences and alternativesRecursion, IRecursion, IIExtended BNF—optional partsExtended BNF—zero or moreBack to parsersConclusionsThe EndJan 14, 2019Recognizers2Parsers and recognizersGiven a grammar (say, in BNF) and a string,A recognizer will tell whether the string belongs to the language defined by the grammarA parser will try to build a tree corresponding to the string, according to the rules of the grammarInput string Recognizer result Parser result2 + 3 * 4 true2 + 3 * false Error3Building a recognizerOne way of building a recognizer from a grammar is called recursive descentRecursive descent is pretty easy to implement, once you figure out the basic ideasRecursive descent is a great way to build a “quick and dirty” recognizer or parserProduction-quality parsers use much more sophisticated and efficient techniquesIn the following slides, I’ll talk about how to do recursive descent, and give some examples in Java4Recognizing simple alternatives, IConsider the following BNF rule:<add_operator> ::= “+” | “-”That is, an add operator is a plus sign or a minus signTo recognize an add operator, we need to get the next token, and test whether it is one of these charactersIf it is a plus or a minus, we simply return trueBut what if it isn’t?We not only need to return false, but we also need to put the token back because it doesn’t belong to us, and some other grammar rule probably wants itWe need a tokenizer that can take back charactersWe will make do with putting back only one token at a time5Creating a decorator classA Decorator class is a class that extends another class and add functionalityIn this case, StringTokenizer may do what we need, except be able to take back tokensWe can decorate StringTokenizer and add the ability to take back tokensFor simplicity, we’ll allow only a single token to be returnedOur decorator class will need a little extra storage, and will need to override and extend some methods6PushbackTokenizer Ipublic class PushbackTokenizer extends StringTokenizer { String pushedToken = null; // to hold returned token public PushbackTokenizer(String s) { super(s); // superclass has no default constructor } public void pushBack(String token) { // added method pushedToken = token; }7PushbackTokenizer II public boolean hasMoreTokens() { if (pushedToken != null) return true; else return super.hasMoreTokens(); }Notice how we not only overrode this method, but if we didn’t have anything special to do, we just let the superclass’s method handle it public String nextToken() { if (pushedToken == null) return super.nextToken(); // We only return a pushedToken once String result = pushedToken; pushedToken = null; return result; }Again, we just use the superclass’s method, we don’t reinvent it8Sample use of PushbackTokenizerpublic static void main(String[ ] args) { PushbackTokenizer pb = new PushbackTokenizer("This is too cool"); String token; System.out.print(pb.nextToken( ) + " "); // “This” System.out.print(pb.nextToken( ) + " "); // “is” System.out.print (token = pb.nextToken() + " "); // “too” pb.pushBack(token); // return “too” System.out.print(pb.nextToken( ) + " "); // get “too” again System.out.print(pb.nextToken( ) + " "); // “cool”}Output: This is too too coolQuestion: Why the extra space?9Recognizing simple alternatives, IIOur rule is <add_operator> ::= “+” | “-”Our method for recognizing an <add_operator> (which we will simply call addOperator) looks like this:public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false}10Helper methodsWe could turn the preceding pseudocode directly into Java codeBut we will be writing a lot of very similar code......and it won’t be very readable codeWe should write some auxiliary or “helper” methods to hide some of the details for usFirst helper method:private boolean symbol(String expectedSymbol)Gets the next token and tests whether it matches the expectedSymbolIf it matches, return trueIf it doesn’t match, put the symbol back and return falseWe’ll look more closely at this method in a moment11Recognizing simple alternatives, IIIOur rule is <add_operator> ::= “+” | “-”Our pseudocode is:public boolean addOperator() { Get the next token, call it t If t is a “+”, return true If t is a “-”, return true If t is anything else, put the token back return false}Thanks to our helper method, our actual Java code is:public boolean addOperator() { return symbol("+") || symbol("-");}12Categories of tokensTokens are always strings, but they come in a variety of kinds:Names: "limit", "y", "maxValue"Keywords: "if", "while", "instanceof"Numbers: "25", "3"Symbols: "+", "=", ";"Special: "\n", end_of_inputInstead of treating tokens as simple strings, it’s convenient to create a Token class that holds both its string value and a constant telling what kind of token it isclass Token { String value; int type;...and this class should define some constants to represent the various types:public static final int NAME = 1;public static final int SYMBOL = 2; etc.If you are using Java 5.0, this is what enums were invented for!13Implementing symbolsymbol gets a token, makes sure it’s a symbol, compares it to the desired value, possibly puts the token back, and returns true or falseWe will want to do something similar for numbers, names, end of lines, and maybe end of inputIt would be foolish to write and debug all of these separatelyAgain, we should use an auxiliary methodprivate boolean symbol(String expectedSymbol) { return nextTokenMatches(Token.SYMBOL, expectedSymbol);}14nextTokenMatches #1The
View Full Document