Stanford CS 224 - Study Notes - D1931426

Home> Schools> Stanford University> Computer Science (CS) > CS 224> Study Notes

DOC PREVIEW

Stanford CS 224 - Study Notes

School name Stanford University

Course Cs 224- N Natural Language Processing with Deep Learning

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Anna RaffertyTongKe XueIncreasing Accuracy While Maintaining Minimal Grammars in CKY ParsingSignificant work in both lexicalized and unlexicalized parsing has been done inthe past ten years. F1 measures of accuracy of over 90% have been achieved (Bikel,2005), and linguistic notions of lexical dependencies and using head words have beenharnessed to create significant improvements in probabilistic CFG (PCFG) parsers (Bikel2005; Collins 1996, 1997, 1999; Klein and Manning 2003). Klein and Manning (2003)note, however, that many of the techniques for improving lexicalized parsing createrelatively little gain while creating more complex algorithms. While Collins (1999)parser is extremely useful is F1 is of the utmost importance, Klein and Manning’s (2003)parser achieves F1 within 5% of that parser without invoking lexicalization.For our project, we attempted to increase the precision and recall (F1) of the CKYparser built for project four. Our goal was to significantly improve F1 while keep theparser unlexicalized and maintaining a relatively small number of non-terminals.Although such a parser might not be as accurate as lexicalized models or unlexicalizedmodels with larger grammars, we wished to show that acceptable F1 scores can beattained using minimal grammars and no lexicalization; we hypothesized that an F1 scorewithin 1% of the best unlexicalized parser we found in the literature (Klein and Manning,2003), with an F1 of 86.36%, was achievable. The minimal grammar allows for fastparsing; given a fully optimized parser, this grammar might be used for extremely quicktrials that approximate the results of larger grammars that produce more accurate parsingor for pre-processing purposes.In increasing the F1 of our parser, we modeled our changes closely on thosedescribed in “Accurate Unlexicalized Parsing” (2003) by Dan Klein and Chris Manning.This paper provided the clearest suggestions for improving unlexicalized parsing andallowed us to explore linguistic patterns that are useful in parsing. All of ourimprovements were made through annotating the grammar in various ways to reflectexternal areas of the parse tree that might affect the current non-terminal’s behavior andinternal properties of this particular non-terminal or the structure below it.Our baseline parser was that created for project four. This parser includedsecond-order vertical Markovization and first order horizontal Markovization.Additionally, it annotated preterminal nodes with the tag of their parents, just as verticalMarkovization annotates other nodes with their parent tag. This parser produced abaseline F1 of 81.92%. Given previous results suggesting that increasing the order ofvertical Markovization vastly increases the number of tags and is difficult to furtherannotate without creating problems of sparseness, we chose to include only second-ordervertical Markovization in our improved parser rather than experiment further with thisvariable. We also limited the model to first order horizontal Markovization rather thansecond order as Klein and Manning (2003) did to limit the number of non-terminals,which increases significantly with higher order horizontal Markovization, and thus testout hypothesis that a relatively small number of non-terminals can be used to achievehigh F1.Unary AnnotationsOne feature we used to produce annotations was whether or not a node producedonly one child. Nodes producing only one child are fairly rare and tend to occur withinspecific constructions, so marking nodes that produced only one child allowedrecognition of these patterns. Including this trait increased our F1 to 84.05%, an absoluteincrease in F1 of 2.13%. This notation actually produced the largest increase of anyindividual annotation. Figure 1Consider Figure 1, an example of part of a sentence that was parsed incorrectly inthe baseline but correctly when unary annotations were added. The baseline parserincludes three instances of non-preterminals that have only one child; in contrast, theparser with unary annotations includes only one such instance. By learning when a unarychild is likely to occur, the unary parser has lower probabilities for the unary children thatare falsely created in the baseline parse, decreasing the probability that such a parse willbe chosen. Many similar fixes occurred in other sentences in the test set.As an extension to this unary tagging, we chose to annotate certain tags that wereonly children themselves. Based again on Klein and Manning (2003), we experimentedwith annotating determiners and adverbs that were only children. By annotating onlythese specific types of tags, we limited the number of non-terminals in our grammar, aconcern for reasons of parsing speed, while increasing our F1 to 84.20%.Figure 2Although clearly this annotation was not as successful as the previous inimproving parsing, it was good at fixing errors like that in Figure 2. In this case it isappropriate for RB to be a single child, but adding the annotation allows the parser todifferentiate what sort of words appear as children of RBs that are only children. Thisdifferentiation creates the improvement in parsing by choosing ADVP as the parent ofRB rather than NP.Head Tag AnnotationsWe attempted a few annotations based on the tag of the head word of a phrase(“head tag”). Initially, we were going to identify head words based on hand-taggings.However, we felt this was in some ways working with more data than was containedwithin the model in general, and we wished to be able to use our parser to learn agrammar from corpora that did not include such hand-taggings. Additionally, we thoughtit would be interesting to examine how much of a gain in accuracy we could producefrom head words that were only approximated. Thus, based on some rough linguisticanalysis and the discussion of Collins’ parser in Bikel (2004) that suggested head wordstend to be the word that appears at the beginning of a phrase, we use the tag of this firstword in the phrase as the proxy for the correct head tag. An additional limitation on ouruse of head tags was that we did not propagate them up the tree more than one level. Forinstance, we did not perform any annotations based on head tags for nodes located abovethe parents of preterminals in the tree. This limits the amount of information that can begained from using head tags but also limits the growth of non-terminals. We felt such alimitation was appropriate given

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

Stanford CS 224 - Study Notes

Sign up for free to view:

Please select your school