DOC PREVIEW
MIT 6 863J - Laboratory 4: Parsing Parsing

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Massachvsetts Institvte of TechnologyDepartment of Electrical Engineering and Computer Science6.863J/9.611J, Natural Language ProcessingLaboratory 4: Parsing ParsingHanded out: March 28, 2011 Due: April 10, 2011Goals of the LaboratoryLaboratory 4 is an introduction to context-free parsing: learning how ambiguity makes natural languageparsing difficult, and learning a bit about two basic algorithmic approaches to solving this problem, onestack-based, and t he other using a data structure called a “chart” to “memoize” and re-use alread y -b ui ltstructure. Re-using already-built st r uc t ur e is the key t o efficient context-free parsing. We als o introduceyou to prob ab il i st i c context-free parsing: how this s peeds up sorting through the tremendous number ofparses that can arise even with simple sentences when using more than a “toy” grammar; how to infer rulesfrom already-parsed example sentences; and some of the strengths and weaknesses of this simple learningapproach. In this lab, you’ll have a chance to work with an actual, large-scale corpus, the Penn Treebank,which will give you familiar i ty with the d at a conventions commonly referenced in computational linguisticsand NLP pape rs .After completing this laboratory, you sh oul d be able to understand and use both the classical and themodern probabilistic “tools of t h e trade” for context-free parsing, as well as some of their limitations.What you must turn in. As before, please email your write-up as a pdf file t o [email protected] of the questions that f ol low require only short answers. You may use the Latex write-u p templateprovided here:http://web.mit.edu/6.863/spring2011/writeups/lab4/As usual, you may collaborate with whomever you wish; just please n ot e the names of your coll aborators inyour report. Your report should be recognizably your own work.Initial Preparation : First, be sure you have Python with NLTK and Numpy installed as for the previousLaboratory. Then, download the software for this lab from:http://web.mit.edu/6.863/spring2011/code/lab4.zipSome of the tools you use in this l ab will need to be run on Athena.1 Conte xt -f ree parsing and Stack-based parsingBackground reading: Read chapter 9 of the NLTK book that we have provided:http://web.mit.edu/~6.863/www/spring2011/labs/nltk-ch9.pdfWarning: please make sure you read the pdf of the NLTK chapter that we provide in the spe-cific url specified above. Any newer online versions on the nltk.org site itself do not containthe material required for this Laboratory.Additional reference material on context-free parsing is available in the Jurafsky and Mart i n textbook,chapter 13. If you do not have the textbook , you can read the draft here (there are some slight differences;e.g., this chapter is numbered chapter 12 in its draft form):http://www.mit.edu/~6.863/spring2011/jmnew/12.pdf1The following python commands will bring up a GUI with a simple, interactive shift-reduce parser. (Youmay see a warning about pylab after the import line, but don’t worry about it si n ce it will not interfere withyour ability to do this lab.)>>> from nltk.app import srparser_app>>> srparser_app.app()You should get a window that looks like this:The pane on the right contains the stack and the remai ni n g input. The pane on the left contains theentire grammar, as a list of pro du ct i on rules.You can run the parser by clicking the button labeled “Step” repeatedl y. Each step will animate onestep of the parser, either shifting a word onto the stack or reducing two subtrees on the stack into a newsubtree. Which action it has just carried out is displayed at t he lower left, in the “Last Operation” panel.Try stepping through the parser, using the provided example sentence, “my dog saw a man in the parkwith a statue”. Click on “Step” until the parser gets stuck in a situation where it cannot shift, because theinput has no tokens left, and it cannot reduce, because there is no prod uc t ion that involves the items onthe stack. As we d em ons t rat e d in class, there is not really a way to get any thin g parsed if you used a pure“always shi ft before reduce” policy in terms of this demo software. You get stuck immediately. For eachsymbol shifted you need to immediately reduce it to get the POS tag, otherwise t he parse will guarantee t ofail. In effect then, we should state that a “shift” operation really consists of two operations: a shift andan immediate reduce to get the part of speech tag. Even so, it is still easy to run into some state wherethe parser can no longer proceed, because of some incorrect choice between a shift and a reduce action.This is called a shift-reduce conflict. To get around this parsing misstep, you can now back up the parser byrepeatedly hitting the “Undo” button.Try this for yourself: making sure that you always follow a shift by a reduce in order to get a part ofspeech tag, see how to force the parser int o some corner where neither a shift nor a reduce action will lead tofurther processing and production of a parse tree that spans th e entire sentence. Then, back up to the pointat which the incorrect choice was made, and click “Shift” or “Reduce” yourself to make the correct choice.Problem 1.1. By making m anual decisions to shi f t or reduce when necessary, obtain a parse tree for thewhole sentence. Include a screenshot of this parse tree in your lab report.We therefore see that context-free parsers in general must operate non-det e rm in i st i cal l y – that is, theymust guess. This is a primary source of computational difficulty in natural language p rocessing. Sin ce nophysical machine can operate non-deterministically, we must somehow either dodge this problem (use an2oracle, order actions probabilis t i cal l y, figure out how people operate...), or else simulate a nondeterministicmachine.For the moment, we will pursue another plan: developing a way to “always make the right choice” – thatis, a pars i ng strategy that orders choices systematically when more than one action can apply at a givenstep. In a shift-reduce parser, there are only two possible such conflicts:• A shift-reduce conflict, in which it is ambiguous whether the parser should shift a new word onto thestack or reduce two items on the stack• A reduce-reduce conflic t , in which there is more than one way to r e du ce the items on the stack into anew nonterminalA strategy is a sy st e mat i c


View Full Document

MIT 6 863J - Laboratory 4: Parsing Parsing

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Laboratory 4: Parsing Parsing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Laboratory 4: Parsing Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Laboratory 4: Parsing Parsing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?