Recognizers Jan 14 2019 Parsers and recognizers Given a grammar say in BNF and a string A recognizer will tell whether the string belongs to the language defined by the grammar A parser will try to build a binary tree corresponding to the string according to the rules of the grammar Input string Recognizer result 2 3 4 true 2 3 false Parser result Error 2 Building a recognizer One way of building a recognizer from a grammar is called recursive descent Recursive descent is pretty easy to implement once you figure out the basic ideas Recursive descent is a great way to build a quick and dirty recognizer or parser Production quality parsers use much more sophisticated and efficient techniques In the following slides I ll talk about how to do recursive descent and give some examples in Java 3 Recognizing simple alternatives I Consider the following BNF rule add operator That is an add operator is a plus sign or a minus sign To recognize an add operator we need to get the next token and test whether it is one of these characters If it is a plus or a minus we simply return true But what if it isn t We not only need to return false but we also need to put the token back because it doesn t belong to us and some other grammar rule probably wants it Our tokenizer needs to be extended to take back characters We will make do with putting back only one token at a time 4 Recognizing simple alternatives II Our rule is add operator Our method for recognizing an add operator which we will simply call addOperator looks like this public boolean addOperator Get the next token call it t If t is a return true If t is a return true If t is anything else put the token back return false 5 Helper methods We could turn the preceding pseudocode directly into Java code But we will be writing a lot of very similar code and it won t be very readable code We should write some auxiliary or helper methods to hide some of the details for us First helper method private boolean symbol String expectedSymbol Gets the next token and tests whether it matches the expectedSymbol If it matches return true If it doesn t match put the symbol back and return false We ll look more closely at this method in a moment 6 Recognizing simple alternatives III Our rule is add operator Our pseudocode is public boolean addOperator Get the next token call it t If t is a return true If t is a return true If t is anything else put the token back return false Thanks to our helper method our actual Java code is public boolean addOperator return symbol symbol 7 Implementing symbol symbol gets a token makes sure it s a symbol compares it to the desired value possibly puts the token back and returns true or false We will want to do something similar for numbers names end of lines and maybe end of input It would be foolish to write and debug all of these separately Again we should use an auxiliary method private boolean symbol String expectedSymbol return nextTokenMatches Token SYMBOL expectedSymbol 8 nextTokenMatches 1 The nextTokenMatches method should Get a token Compare types and values Return true if the token is as expected Put the token back and return false if it doesn t match private boolean nextTokenMatches int type String value Token t tokenizer next if type t getType value equals t getValue return true else tokenizer pushBack t return false Wait a minute pushBack This is a method that you will have to add to your Tokenizer class Real code changes 9 nextTokenMatches 2 The previous method is fine for symbols but what if we only care about the type For example we want to get a number any number We need to compare only type not value private boolean nextTokenMatches int type String value Token t tokenizer next omit this parameter if type t getType value equals t getValue return true else tokenizer pushBack t omit this test return false The two versions of nextTokenMatches are difficult to combine and fairly small so we won t worry about the code duplication too much 10 addOperator reprise public boolean addOperator return symbol symbol private boolean symbol String expectedSymbol return nextTokenMatches Token SYMBOL expectedSymbol private boolean nextTokenMatches int type String value Token t tokenizer next if type t getType value equals t getValue return true else tokenizer pushBack t return false 11 Sequences I Suppose we want to recognize a grammar rule in which one thing follows another for example The code for this would be fairly simple empty list public boolean emptyList return symbol symbol except for one thing What happens if we get a and don t get a The above method won t work why not Only the second call to symbol failed and only one token gets pushed back 12 Sequences II The grammar rule is empty list And the token string contains 5 Solution 1 Write a pushBack method that can keep track of more than one token at a time say in a Vector Solution 2 Call it an error You might be able to get away with this depending on the grammar For example for any reasonable grammar 2 3 is clearly an error Solution 3 Change the grammar This will allow you to put the back both the and the 5 The code gets pretty messy You have to be very careful of the order in which you return tokens Tricky and may not be possible Solution 4 Combine rules See the next slide 13 Sequences III Suppose the grammar really says list number Now your pseudocode should look something like this public boolean list if first token is if second token is return true else if second token is a number if third token is return true else error else put back first token Revised grammar list rest of list rest of list number 14 Simple sequences in Java Suppose you have this rule factor expression A good way to do this is often to test whether the grammar rule is not met public boolean factor if symbol if expression error Error in parenthesized expression if symbol error Unclosed parenthetical expression return true return false 15 Sequences and alternatives Here s the real grammar rule for factor factor name number expression And here s the actual code public boolean factor if name return true if number return true if symbol if expression error Error in parenthesized expression if symbol error Unclosed parenthetical expression return true return false 16 Recursion I Here s an unfortunate but legal grammar rule Here s some code for it expression expression term public boolean expression if expression return false if addOperator return true if term error Error in expression after or
View Full Document