DOC PREVIEW
Stanford CS 224 - Study Notes

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Anna RaffertyTongKe XueIncreasing Accuracy While Maintaining Minimal Grammars in CKY ParsingSignificant work in both lexicalized and unlexicalized parsing has been done inthe past ten years. F1 measures of accuracy of over 90% have been achieved (Bikel,2005), and linguistic notions of lexical dependencies and using head words have beenharnessed to create significant improvements in probabilistic CFG (PCFG) parsers (Bikel2005; Collins 1996, 1997, 1999; Klein and Manning 2003). Klein and Manning (2003)note, however, that many of the techniques for improving lexicalized parsing createrelatively little gain while creating more complex algorithms. While Collins (1999)parser is extremely useful is F1 is of the utmost importance, Klein and Manning’s (2003)parser achieves F1 within 5% of that parser without invoking lexicalization.For our project, we attempted to increase the precision and recall (F1) of the CKYparser built for project four. Our goal was to significantly improve F1 while keep theparser unlexicalized and maintaining a relatively small number of non-terminals.Although such a parser might not be as accurate as lexicalized models or unlexicalizedmodels with larger grammars, we wished to show that acceptable F1 scores can beattained using minimal grammars and no lexicalization; we hypothesized that an F1 scorewithin 1% of the best unlexicalized parser we found in the literature (Klein and Manning,2003), with an F1 of 86.36%, was achievable. The minimal grammar allows for fastparsing; given a fully optimized parser, this grammar might be used for extremely quicktrials that approximate the results of larger grammars that produce more accurate parsingor for pre-processing purposes.In increasing the F1 of our parser, we modeled our changes closely on thosedescribed in “Accurate Unlexicalized Parsing” (2003) by Dan Klein and Chris Manning.This paper provided the clearest suggestions for improving unlexicalized parsing andallowed us to explore linguistic patterns that are useful in parsing. All of ourimprovements were made through annotating the grammar in various ways to reflectexternal areas of the parse tree that might affect the current non-terminal’s behavior andinternal properties of this particular non-terminal or the structure below it.Our baseline parser was that created for project four. This parser includedsecond-order vertical Markovization and first order horizontal Markovization.Additionally, it annotated preterminal nodes with the tag of their parents, just as verticalMarkovization annotates other nodes with their parent tag. This parser produced abaseline F1 of 81.92%. Given previous results suggesting that increasing the order ofvertical Markovization vastly increases the number of tags and is difficult to furtherannotate without creating problems of sparseness, we chose to include only second-ordervertical Markovization in our improved parser rather than experiment further with thisvariable. We also limited the model to first order horizontal Markovization rather thansecond order as Klein and Manning (2003) did to limit the number of non-terminals,which increases significantly with higher order horizontal Markovization, and thus testout hypothesis that a relatively small number of non-terminals can be used to achievehigh F1.Unary AnnotationsOne feature we used to produce annotations was whether or not a node producedonly one child. Nodes producing only one child are fairly rare and tend to occur withinspecific constructions, so marking nodes that produced only one child allowedrecognition of these patterns. Including this trait increased our F1 to 84.05%, an absoluteincrease in F1 of 2.13%. This notation actually produced the largest increase of anyindividual annotation. Figure 1Consider Figure 1, an example of part of a sentence that was parsed incorrectly inthe baseline but correctly when unary annotations were added. The baseline parserincludes three instances of non-preterminals that have only one child; in contrast, theparser with unary annotations includes only one such instance. By learning when a unarychild is likely to occur, the unary parser has lower probabilities for the unary children thatare falsely created in the baseline parse, decreasing the probability that such a parse willbe chosen. Many similar fixes occurred in other sentences in the test set.As an extension to this unary tagging, we chose to annotate certain tags that wereonly children themselves. Based again on Klein and Manning (2003), we experimentedwith annotating determiners and adverbs that were only children. By annotating onlythese specific types of tags, we limited the number of non-terminals in our grammar, aconcern for reasons of parsing speed, while increasing our F1 to 84.20%.Figure 2Although clearly this annotation was not as successful as the previous inimproving parsing, it was good at fixing errors like that in Figure 2. In this case it isappropriate for RB to be a single child, but adding the annotation allows the parser todifferentiate what sort of words appear as children of RBs that are only children. Thisdifferentiation creates the improvement in parsing by choosing ADVP as the parent ofRB rather than NP.Head Tag AnnotationsWe attempted a few annotations based on the tag of the head word of a phrase(“head tag”). Initially, we were going to identify head words based on hand-taggings.However, we felt this was in some ways working with more data than was containedwithin the model in general, and we wished to be able to use our parser to learn agrammar from corpora that did not include such hand-taggings. Additionally, we thoughtit would be interesting to examine how much of a gain in accuracy we could producefrom head words that were only approximated. Thus, based on some rough linguisticanalysis and the discussion of Collins’ parser in Bikel (2004) that suggested head wordstend to be the word that appears at the beginning of a phrase, we use the tag of this firstword in the phrase as the proxy for the correct head tag. An additional limitation on ouruse of head tags was that we did not propagate them up the tree more than one level. Forinstance, we did not perform any annotations based on head tags for nodes located abovethe parents of preterminals in the tree. This limits the amount of information that can begained from using head tags but also limits the growth of non-terminals. We felt such alimitation was appropriate given


View Full Document

Stanford CS 224 - Study Notes

Documents in this Course
Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?