This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Massachvsetts Institvte of TechnologyDepartment of Electrical Engineering and Computer Science6.863J/9.611J, Natural Language ProcessingLaboratory 5&6: Advanced Parsing – Features and Lexical SemanticsHanded out: April 11, 2011 Due: April 27, 2011Goals of the Laboratory. Laboratory 4 and Competitive Grammar Writing introduced you to probabilisticcontext-free parsing. In this Lab you will explore this method of parsing more fully.In Part I, you will explore:1. How to use features to simplify grammar construction.2. How to tie the pykimmo system into a set of context-free grammar rules.3. How to accommodate the phenomenon of movement in language.We will want you to understand the trade-off between the simplicity of the grammar and its precision interms of covering more and more detailed grammatical phenomena.In Part II, you will learn about the following:4. What are the strengths and weaknesses of current state-of-the-art statistical natural language parsers?If they were perfect, we’d be done, at least for the syntax part of natural language processing. Butthe parsers are not perfect. We will look at ambiguity (again). One of the things modern statisticalparsers do better is adding information about particular words in order to figure out how to pruneparsing possibilities. Classic examples would be those like I saw the guy on the hill with the telescope.Does with the telescope associate more strongly with the hill or saw?Part III will give deeper understanding of how statistical parsers work and how they interact with lexicalfrequencies, and syntactic and semantic regularities. In particular, you will be assigned a ‘verb of your own’and then asked to:5. Investigate the connection between lexical (word-level) semantics and parsing, using the Penn TreeBank (PTB) as a concrete test bed.6. Explore how a state-of-the-art probabilistic parser will handle these issues.What you must turn in. As usual, you will need to turn in a write-up that covers Parts I – III. Thisincludes a writeup of Part I and II and a report for the verb you have been assigned for Part III. Pleaseemail your write-ups as pdf files to [email protected]. In your email, include 6.863 Lab 5&6 in yoursubject. As usual, you may collaborate with whomever you wish; just note the names of your collaboratorsin your report. Your report should be recognizably your own work. You may use the write-up templatesprovided here:http://web.mit.edu/6.863/spring2011/writeups/lab5_6/1Part IAdvanced Parsing with FeaturesInitial Preparation:• Background reading:Read (or re-read) chapter 15 of the textbook or our semi-Google books version of Chapter 15 of the2nd edition JM text, here:http://www.mit.edu/~6.863/spring2011/jmnew/ch15.pdfRead the (old, not online) version of the NLTK description of feature-based parsing, here:http://www.mit.edu/~6.863/spring2011/labs/featgram.pdf• Software for feature-based parsing:NLTK: In this laboratory, you will again be using an older nltk package, namely 0.9.8. This isthe version running on Athena, so if you run and test your code there, your work is done. Otherwise,you can download nltk0.9.8 from http://web.mit.edu/6.863/spring2011/code/nltk-0.9.8.zipThe nltk Earley parser feature package: you can either run this via ssh in to Athena and run thetext-based nltk feature-based Earley parser as described below, or else download, uncompress, anduntar the files in:http://web.mit.edu/6.863/spring2011/code/lab5_6/parse.zipwhich will give you a directory parse. You can then cd to this new directory on your machine, andrun the nltk feature-based Earley parser as described below.• Running the Earley Feature-based-pykimmo Parser: We now assume that you have eitherdownloaded the feature-parser archive and unpacked it into the directory parse, or are connected toAthena. Let’s first check that you can run the feature-based parser. If you are on Athena, add 6.863as usual, then cd to the directory /mit/6.863/spring2011/code/lab5_6/parse/. If you are workingon your own machine, cd to the directory parse. To test out the parser, you can load the grammarsystem, set the tracing to minimal (i.e., 0; we explain tracing levels below), and parse each sentence inthe file test-sentences.txt:% pythonPython 2.5.4 (r254:67916, Mar 9 2009, 00:23:22)[GCC 4.3.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> from featurechart import *>>> g = load_earley('gazdar6.cfg', trace=0)>>> g.batch_test('test-sentences.txt')Sentence: Which guy does Mary see1 parses.([INIT]:(Start:(Q:2(NP[agr=[person=3, -plural], +wh]:(DET[+wh]: 'Which')<. . . rest of parse tree here>Sentence: I will eat a raw eggplant2 parses.<. . . 2 parse trees >1 From a feature-less to a feature-based grammarFor the first part of this lab, we want you to take an existing feature-free grammar that is in the parsedirectory, starter.cfg, and convert it to a feature-based grammar that will parse the same sentences asbefore and a bit more, using kimmo-style rules and a lexicon. You can take a look at this starter grammarin any text editor, which is also how you can edit this file to make your new grammar. Note the followingimportant points about this grammar:• It does not have separate pykimmo lex and yaml spelling change rule files. Thus, all lexical itemsare introduced in the grammar itself via single context-free productions such as N -> 'detectives'.Note that case matters for the lexical rules: ‘Poirot’ is different from ‘poirot’ (this is easy to forgetwhen testing sentences).• Verbs are classified as one of only 5 types, or subcategories: V0, V1, V2, V3, and V4. V0 takes zeroarguments (an intransitive verb); V1 takes one argument (a normal transitive verb); V2 takes twoarguments, (the first an NP, the second a PP headed by “to”, as in send the solutions to the police);V3 takes a full sentence as an argument, a CP (complementizer phrase), as in Poirot thought that thedetectives solved the case, where that the detectives solved the case is a full proposition or sentenceform; and V4 takes an adjective phrase, as in the police were incompetent, where “incompetent” isan adjectival form. Obviously we have omitted many other, more refined subcategories, e.g., send thepolice the solutions.• It accounts for just a few examples of ‘filler-gap’ relations by means of an expanded set of nonterminalnames as discussed in class and in the nltk documentation, via rules such as S_WhNPGap->NP VP_WhNPGapthat


View Full Document

MIT 6 863J - LECTURE NOTES

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?