Massachvsetts Institvte of TechnologyDepartment of Electrical Engineering and Computer Science6.863J/9.611J, Natural Language ProcessingLaboratory 5&6: Advanced Parsing – Features and Lexical SemanticsHanded out: April 11, 2011 Due: April 27, 2011Goals of the Laboratory. Laboratory 4 and Competitive Grammar Writing introduced you to probabilisticcontext-free parsing. In this Lab you will explore this method of parsing more fully.In Part I, you will explore:1. How to use features to simplify grammar construction.2. How to tie the pykimmo system into a set of context-free grammar rules.3. How to accommodate the phenomenon of movement in language.We will want you to understand the trade-off between the simplicity of the grammar and its precision interms of covering more and more detailed grammatical phenomena.In Part II, you will learn about the following:4. What are the strengths and weaknesses of current state-of-the-art statistical natural language parsers?If they were perfect, we’d be done, at least for the syntax part of natural language processing. Butthe parsers are not perfect. We will look at ambiguity (again). One of the things modern statisticalparsers do better is adding information about particular words in order to figure out how to pruneparsing possibilities. Classic examples would be those like I saw the guy on the hill with the telescope.Does with the telescope associate more strongly with the hill or saw?Part III will give deeper understanding of how statistical parsers work and how they interact with lexicalfrequencies, and syntactic and semantic regularities. In particular, you will be assigned a ‘verb of your own’and then asked to:5. Investigate the connection between lexical (word-level) semantics and parsing, using the Penn TreeBank (PTB) as a concrete test bed.6. Explore how a state-of-the-art probabilistic parser will handle these issues.What you must turn in. As usual, you will need to turn in a write-up that covers Parts I – III. Thisincludes a writeup of Part I and II and a report for the verb you have been assigned for Part III. Pleaseemail your write-ups as pdf files to [email protected]. In your email, include 6.863 Lab 5&6 in yoursubject. As usual, you may collaborate with whomever you wish; just note the names of your collaboratorsin your report. Your report should be recognizably your own work. You may use the write-up templatesprovided here:http://web.mit.edu/6.863/spring2011/writeups/lab5_6/1Part IAdvanced Parsing with FeaturesInitial Preparation:• Background reading:Read (or re-read) chapter 15 of the textbook or our semi-Google books version of Chapter 15 of the2nd edition JM text, here:http://www.mit.edu/~6.863/spring2011/jmnew/ch15.pdfRead the (old, not online) version of the NLTK description of feature-based parsing, here:http://www.mit.edu/~6.863/spring2011/labs/featgram.pdf• Software for feature-based parsing:NLTK: In this laboratory, you will again be using an older nltk package, namely 0.9.8. This isthe version running on Athena, so if you run and test your code there, your work is done. Otherwise,you can download nltk0.9.8 from http://web.mit.edu/6.863/spring2011/code/nltk-0.9.8.zipThe nltk Earley parser feature package: you can either run this via ssh in to Athena and run thetext-based nltk feature-based Earley parser as described below, or else download, uncompress, anduntar the files in:http://web.mit.edu/6.863/spring2011/code/lab5_6/parse.zipwhich will give you a directory parse. You can then cd to this new directory on your machine, andrun the nltk feature-based Earley parser as described below.• Running the Earley Feature-based-pykimmo Parser: We now assume that you have eitherdownloaded the feature-parser archive and unpacked it into the directory parse, or are connected toAthena. Let’s first check that you can run the feature-based parser. If you are on Athena, add 6.863as usual, then cd to the directory /mit/6.863/spring2011/code/lab5_6/parse/. If you are workingon your own machine, cd to the directory parse. To test out the parser, you can load the grammarsystem, set the tracing to minimal (i.e., 0; we explain tracing levels below), and parse each sentence inthe file test-sentences.txt:% pythonPython 2.5.4 (r254:67916, Mar 9 2009, 00:23:22)[GCC 4.3.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> from featurechart import *>>> g = load_earley('gazdar6.cfg', trace=0)>>> g.batch_test('test-sentences.txt')Sentence: Which guy does Mary see1 parses.([INIT]:(Start:(Q:2(NP[agr=[person=3, -plural], +wh]:(DET[+wh]: 'Which')<. . . rest of parse tree here>Sentence: I will eat a raw eggplant2 parses.<. . . 2 parse trees >1 From a feature-less to a feature-based grammarFor the first part of this lab, we want you to take an existing feature-free grammar that is in the parsedirectory, starter.cfg, and convert it to a feature-based grammar that will parse the same sentences asbefore and a bit more, using kimmo-style rules and a lexicon. You can take a look at this starter grammarin any text editor, which is also how you can edit this file to make your new grammar. Note the followingimportant points about this grammar:• It does not have separate pykimmo lex and yaml spelling change rule files. Thus, all lexical itemsare introduced in the grammar itself via single context-free productions such as N -> 'detectives'.Note that case matters for the lexical rules: ‘Poirot’ is different from ‘poirot’ (this is easy to forgetwhen testing sentences).• Verbs are classified as one of only 5 types, or subcategories: V0, V1, V2, V3, and V4. V0 takes zeroarguments (an intransitive verb); V1 takes one argument (a normal transitive verb); V2 takes twoarguments, (the first an NP, the second a PP headed by “to”, as in send the solutions to the police);V3 takes a full sentence as an argument, a CP (complementizer phrase), as in Poirot thought that thedetectives solved the case, where that the detectives solved the case is a full proposition or sentenceform; and V4 takes an adjective phrase, as in the police were incompetent, where “incompetent” isan adjectival form. Obviously we have omitted many other, more refined subcategories, e.g., send thepolice the solutions.• It accounts for just a few examples of ‘filler-gap’ relations by means of an expanded set of nonterminalnames as discussed in class and in the nltk documentation, via rules such as S_WhNPGap->NP VP_WhNPGapthat
View Full Document