DOC PREVIEW
CORNELL CS 674 - Study Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Comparing a Linguistic and a Stochastic TaggerChrister SamuelssonAtro VoutilainenComparing a Linguistic and a Stochastic Tagger Compares HMM to EngCG-2 HMM is statistical, EngCG is based on hand-coded linguistic rules An attempt to allay fears of bias in previous EngCG results Original EngCG reported 99.7% correct analysis (with some small ambiguity remaining) The validity of these results was questioned Skeptics say: Even human linguists can only agree 97% of the time – how can a machine get 99% accuracy? Test corpus may be biased towards high performance for EngCG EngCG tag set may be very basic, making POS tagging easy Low error rate may be due to high remaining ambiguityBackground:How Does EngCG Work? Sequentially applied modules Morphological Analyser Assigns all possible POS tags to words, e.g. Heuristics determine possible POS tags of unseen words Disambiguator Remove illegitimate analyses. Can leave ambiguity! Optionally, application-specific heuristics / statistical disambiguators for still ambiguous words"free" A ABS"free" <SVO> V SUBJUNCTIVE VFIN"free" <SVO> V IMP VFIN"free" <SVO> V INF"free" <SVO> V PRES -SG3 VFIN"<free>"Background:How Does EngCG Work? The Disambiguator Multiple passes (5 subgrammars) Starts with very reliable rules, e.g. Proceeds to rough heuristics; error rates increase to 10% - 30% in final two subgrammarsREMOVE (V)(-1C DET) ; 99.5596.54445% Correct Remain% Extra Senses Removed# RulesSubgramar99.7195.7471499.8594.42374399.8692.87158299.8891.7029671Issue One: Maximum Accuracy Samuelsson and Voutilainen believe inter-linguist agreement can approach 100% In creating benchmark corpus, agreement between two experts was measured at 99.3% before corrections After correction of simple errors, agreement reached 99.96% Two special approaches in their case EngCG tag set avoids semantically-motivated tags Linguists have “Grammarian’s Manual” of most common ambiguous cases and their correct resolution Using statistical tests, can determine that there is a 95% chance that human evaluators agree more than 99.2% of the time on average (in these conditions)Issue Two: Bias in Corpora Because the paper’s focus is on unbiased comparison, the methods used to create corpora are especially important Two corpora were used: Training corpus: 357,000-word sample from Brown corpus Used to train HMM Test corpus: 55,000-word sample of journalistic, scientific, and manual texts No subject overlap with training corpus Helps EngCG?The Training Corpus Training corpus annotated with EngCG tags First pass was original EngCG algorithm Ambiguities resolved by expert Used in testing EngCG-2; continually improved as new rules were tested and deployed Does this lead to a bias favoring EngCG?The Training Corpus Training corpus annotated with EngCG tags First pass was original EngCG algorithm Ambiguities resolved by expert Used in testing EngCG-2; continually improved as new rules were tested and deployed Does this lead to a bias favoring EngCG? If tagged by EngCG, sets an upper bound on how well HMM can perform Imagine if EngCG were only 50% accurate – HMM could never do better than 50% However, a standard practice in NLP Given many iterations of testing and correction, most incorrect classifications were most likely weeded outThe Test Corpus First analyzed using only morphological analyzer Independently disambiguated by two linguists Agreement reached 99.96% After correcting for clerical errors, only disagreement was on 21 words (out of 55,000) genuinely ambiguous at the meaning level Final “consensus corpus” made from one of two disambiguated versionsIssue Three: Simple Tagset Idea is that EngCG performs so well only because the tagset it uses is so simple as to make annotating copora trivial While one can’t compare tagsets directly, their relative “difficulty” can be compared by training the same algorithm with two different tagsets and comparing error rates In this case, the HMM model’s performance with the EngCG tagset was compared to its performance with more common tagsets, and was found to be similarIssue Four:Ambiguity / Accuracy Tradeoff Could be that EngCG performs so well only because of ambiguity that remains in POS assignments Can’t be disproven without forcing EngCG to fully disambiguate Rather than removing ambiguity from EngCG, authors decided to allow it in HMM When annotating with HMM, allow tags with probabilities over a certain threshold to be assigned to the word in addition to the most probable tag By varying threshold, vary allowable ambiguity So, can set HMM ambiguity equivalent to EngCG ambiguity Issues? HMM was not designed to work this way May not take advantage of allowed ambiguity as much as EngCGExperiment First, test HMM on Brown corpus at various training set sizes Hold back 35,000 words from training corpus Train HMM on successively larger chunks of remaining words, evaluating on held back subset Main experiment HMM: Train HMM on full 357,000 words Test on 55,000 word test corpus at varying levels of allowable ambiguity EngCG: Run on entire training corpus at varying levels of ambiguity (number of subgrammars used) Compare HMM and EngCG at same ambiguity levelsResults: HMM Testing Learning curve of HMM with respect to training set size Paper states “has leveled off at 322,000 words, indicating that little is to be gained from further training” Has it? Remember Scaling to Very Very Large Corpora for Natural Language DisambiguationResults: Algorithm Comparison EngCG dominates at comparable ambiguity levels EngCG’s error rate ranges from 8.6 to 28 times smaller than HMM’s However, HMM’s performance also 1% lower than when training / testing on subset of Brown corpus Indicates that training HMM on a larger corpus –and/or one that included documents similar to the benchmark corpus - could improve performanceDiscussion Caveats of EngCG Vastly more work to create However, (Chanod and Tapanainen 1995) suggest that, given a limited amount of time to create both an HMM and a constraint-based system, the constraint-based system still outperforms the HMM Does not disambiguate fully, and therefore unsuitable for some tasks Could be corrected for by using


View Full Document
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?