MIT 6 863J - Earley Parser - D2786955

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 863J> Earley Parser

DOC PREVIEW

MIT 6 863J - Earley Parser

School name Massachusetts Institute of Technology

Course 6 863j- Natural Language and the Computer Representation of Knowledge

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The 6.863 Earley Parser1 IntroductionThe 6.863 Earley parser will work with any well-defined context-free grammar, including recursiveones and those containing empty categories. It can use either no lookahead or a one wordlookahead, i.e. k = 0 or k = 1. A feature system has been overlaid on the parser, allowingpowerful grammars to be created more easily than with a simple CFG. Although this featuresystem does not by any means implement any reasonable variation of the GPSG theory, it willbe referred to as GPSG for historical reasons.In addition to the parser, the Earley system has Lisp facilities for creating and editing CFG andGPSG grammars, dictionaries, and sentence sets.2 Loading the SystemAll of the parser and utility code is Com mon Lisp, and were all Common Lisps equal, theparser would be machine independent. The files are set to load into the package USER The fileearley.lisp contains all the necessary code.3 A Quick OverviewThe Earley system stores various types of rule sets, such as CFG grammars, GPSG grammars, orsentence sets. These can be edited and used as input for the parser. The parser requires specialgrammar tables as input, which are compiled from featureless context-free grammars (GPSGsare automatically translated into featureless CFGs).To look at the rule sets stored, evaluate (list-all). This prints out the name, type, and numberof rules in each rule set. There are 6 different types of rule sets: CFG, GPSG, CFG-TABLE,GPSG-TABLE, SENTENCES, and DICTIONARY. The -TABLE types are used as input forthe parser. They can not be edited, but all other types of rule sets can.3.1 Parsing SentencesBefore we can parse a sentence we need to supply grammar information. The parser requiresa DICTIONARY and either a CFG-TABLE or a GPSG-TABLE before it can parse. Parsingis handled through the p function. Let’s say we have the CFG-TABLE CT stored, and also thedictionary DICT. We can parse the sentence “John saw Mary” by evaluating> (p ’(JOHN SAW MARY) :grammar ’CT :dictionary ’DICT)1which will return a list of valid tree structures. The grammar and dictionary variables aresticky, so now that we’ve specified the grammar to use, to parse “John walked into the tree”only requires> (p ’(JOHN WALKED INTO THE TREE))Parses are returned as lists of possible tree structures, where each tree structure has the form(Category SubTree1 SubTree2 . . . ). Thus for the above example we might get back the parse list((ROOT (S (NP (N JOHN))(VP (VP (V WALKED))(PP (P INTO) (NP (D THE) (NP (N TREE))))))))The full form of the p function is(p sentence &key (template ’*) grammar dictionary (use-lookahead t)print-states file sentence-set)where sentence can either be a list or a string; :template specifies which parses are to be ac-cepted; :grammar and :dictionary specify the grammar and dictionary to use; :use-lookaheadspecifies whether or not to use lookahead if it is available; :print-states sp ec ifies whether ornot to print out all the Earley states at the end of the parse; :file, if given, is the name of afile to send all output to; and :sentence-set specifies a set of sentences to parse. After a parsehas been completed the function retrieve-parses with the format(retrieve-parses &optional (template ’*))can be used to retrieve the results again. More parsing functions, including ones to parse sets ofsentences at one time, are given in the function listing later.3.2 Editing Rule SetsIf there is a rule set SD, a DICTIONARY, it can be listed by e valuating (list-rules ’SD). Thiswill print out all of the rules (dictionary entries) in the set. A template may be specified, so allverbs in SD could be listed with (list-rules ’SD ’(* V)). Likewise, all rules with verb phrasesas heads in the CFG GRAMMAR1 could be listed with (list-rules ’GRAMMAR1 ’(VP **)).List-rules does more than just print out rules: it also sets up rule sets for quick editing. Let’ssay we evaluated (list-rules ’SD ’(* A)) to list out all the adjectives in the dictionary SD.Perhaps the result was2> (list-rules ’SD ’(* A))1. HARD A2. QUICK A3. RED A4. SLIMY A5. SLOW AWe can now use the three functions ADD, DEL, and RPL to edit the dictionary SD. To delete “hard”and “slimy” we evaluate (del 1 4). Now, we decide to add “lazy” in their place, so we do (add’(LAZY A)), and finally we decide to replace “quick” with “fast” and “slow” with an adverb,“slowly” - (rpl 2 ’(FAST A) 5 ’(SLOWLY ADV)). These three functions work on the last ruleset listed with list-rules. Thus they permit quick and easy editing of grammars, dictionaries,and test sentences.3.3 Creating and Removing Rule SetsTo simply remove or add a rule set, we can use the functions (remove-rule-set name) and(add-rule-set name type). Then rules can be added or removed from them either by usingthe previously mentioned e diting functions, or through(add-rule name-of-rule-set rule);; (add-rule ’GRAMMAR1 ’(S ==> NP AUXP VP))(del-rule name-of-rule-set rule)(add-rule-list name-of-rule-set rule-list);; (add-rule-list ’DICT ’((THE D) (AND CONJ)))(del-rule-list name-of-rule-set rule-list)These will not help us generate the tables the parsers need for running, but only base rulesets. Specialized functions are provided to compile rule sets into -TABLEs. Translating a CFGGRAMMAR1 to a C FG-TABLE CT usable by the parser takes one step, which requires specifyingthe root node of the CFG and also whether k = 0 or k = 1. The lookahead table takes little timeto compute, so presumably k will be 1. The translation function is create-cfg-table, whichtakes the form(create-cfg-table table-name cfg-name root k)or in our case, with root ROOT(create-cfg-table ’CT ’GRAMMAR1 ’ROOT 1)The corresponding function for GPSG grammars is create-gpsg-table, which takes the samearguments.33.4 TemplatesThere are two places templates are used, in retrieving parses and in listing rule se ts. Theymerely provide a filtering system to either limit the types of parse tree structures accepted orlimit the types of rules to be printed. Look back to the respective sections for more backgroundinformation.Templates follow four basic rules1. A symbol matches itself or any list with that symbol as it’s first element, or car.2. A list matches another list if each element matches, or if each element before a doubleasterisk in the template list matches the other list.3. An asterisk * matches any symbol or list.4. A double asterisk ** matches any symbol or

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

MIT 6 863J - Earley Parser

Sign up for free to view:

Please select your school