U of I CS 421 - Lecture Notes - D1059564

Home> Schools> University of Illinois> Computer Science (CS) > CS 421> Lecture Notes

DOC PREVIEW

U of I CS 421 - Lecture Notes

School name University of Illinois

Course Cs 421- Natural Language Processing

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 421 – Spring 2007Lecture Notes Set 19:BNF GrammarsElsa L. Gunter1Transcribed by: Pooja MathurCorresponding to Slides: 16 - ocamllex (slides 27-end) 17 - bnf-grammars (slides 1-27)Made available: March 5, 2007Revision 1.01 Change Log1.0 Initial Release.2 Comments2.1 In general (slides 27 - 28)To deal with comments, we are going to add another regular expression to our parser. That is the open comment andclose comment expression. Now, it may seem very simple to use these regular expressions. You see the open comment,and you wait until you see a close comment. First, note that when you see the first open comment, you want to startparsing differently. Other regular expressions that aren’t the open/close regular expressions, don’t mean anything tothis parser, which means we have to handle them differently.To deal with this, we make two lexers. In our main lexer, when we see an open comment, we call our other parser.This other parser, basically sits and waits until the close comment regular expression is seen. Anything that is not thatexression, simply gets eaten up by the parser. When the close comment regular expression is finally seen, then theaction associated with that would be to go back to the first parser with whatever is left of lexbuf.OCaml doesn’t do this, but if we wanted to add into a language, the single line comment, the way to do that wouldbe, to go to your comment parser, but instead of simply eating characters until the close comment expression was seen,the parser would be eating characters until the newline character expression was seen. When that is seen, the actionwould be to go back to the main parser and continue parsing like normal.Even so, we still do have OCaml’s comment structure quite exactly yet. This is the C++ way of handling things.There really isn’t much in there for nested comments. OCaml is different though, OCaml wants to handle nestedcomments, but it can’t do it like the way we have been. We need to keep track of how many open/close commentexpressions we’ve seen.2.2 Nested comments (slides 29 - 31)Now, we need to tell our comment parser how many open comments expressions we have seen without close commentexpressions. If we do that, we can being to handle nested comments. So, we have the argument depth that keepstrack of how many comment expressions are waiting to be closed. The simple easy case is that we have only seen onecomment expression, and so when we call the comment parser, we are giving it the additional arument of 1. Then,inside the comment parser, when another open comment expression is seen, we call comment again, with 1 addedto the depth argument. If we see a close comment expression, we call comment with 1 subtracted from its depthargument, however, if the depth is 1 and you see a close comment expression, then that is when it is time to back to1c 2007, Share and Enjoy1the main parser. And like normal, for a comment parser, anything that is not an open/close comment expression getsthrown aside.The idea is that the depth argument is like a counter. Every time you see an open comment expression, that meansyou have at least 1 close comement to see. Then, if inside the first open comment expression, you see another opencomment expression, you add another to your counter. You continue this trend until your counter is reduced to 1,where the next close comment expression will take lexbuf back to the main parser.This is a very typical way of using arguments with parsers.3 Grammars (slide set 2: slides 2 - 3)We talked about this before. Grammars are formalized descriptions of which character sets of accepted by a certainlanguage. They are basically the regular expressions we have been discussing for a few languages now. We discussedthe way of using ASCII characters to describe a language. We have talked about using finite state automata as well.3.1 Sample Grammar (slide 4)Here, our language is describing all parenthesized sums of 0’s and 1’s. So our language specifically includes the leftparenthesis, right parenthesis, the character, 0, the character 1, and finally the addition character. So we have, ’(’, ’)’,’1’, ’0’, and ’+’. These five characters make up our language. We also have the special symbol, <Sum>. Even thoughit isn’t exactly a single character, look at it pretending it is one. So, when we say that <Sum> ::= 0, we are reallysaying that <Sum> can have the value of 0. <Sum> ::= 1 means that <Sum> can have the value 1. With the additionoperator, we can say that <Sum> can have a value that is the addition of two <Sum>’s. And finally, we can say thatyou can put parentheses around it and it will still make sense.3.2 Syntax for BNF Grammars (slides 5 - 7)We will start with the characters, which on the corresponding slides are denoted as the early lower case letters, inthis case, a, b, c,... These are your terminals, that is, the characters we are actually going to use to build our strings.Seperately, you give a completely disjoint set of characters for nonterminals. These, will will denote as the later uppercase letter, X, Y , Z, ... Non-terminals, are the variables we will use to build up the strings in our language. From thenon-terminal set of characters, we need to pull one special symbol, that we will call our start symbol. It will tell ushow to start building strings for our language.Now that we have to pieces of our language, we will start to build rules, which can also be called productions. Weuse the ::= or an arrow to denote that a production is being made. That is, we have some non-terminal is the productionof some terminal. The left side of the production symbol is always the non-terminal. So if we had X ::= a, we aresaying that X is the production of a. Note that terminals and non-terminals can appear on the right side.So with our example from before, our terminals were, 0, 1, +, (, and ). There, technically no meaning to theterminals. They are simply symbols and should be treated as such. Our non-terminal was the <Sum>. For our startsymbol, since we only have one non-terminal, it is trivial to choose the <Sum>. (In BNF Grammars, it is traditionalto put angle brackets around non-terminals, so that is what we have done.) So, the basis of our language may looklike:<Sum> ::= 0<Sum> ::= 1<Sum> ::= <Sum> + <Sum><Sum> ::= ( <Sum> )But this can also be written as the following, applying the use of the verticle bar as the or symbol:<Sum> ::= 0 | 1 | <Sum> + <Sum> | ( <Sum> )3.3 BNF

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

U of I CS 421 - Lecture Notes

Sign up for free to view:

Please select your school