DOC PREVIEW
Berkeley COMPSCI 294 - POS Tagging II

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingPOS Tagging IILecture 8: 9/26/05Recap: POS Ambiguity Words are syntactically ambiguous: Two sources of information: Clues from the input (current word, next word, capitalization, suffixes, word shape) Clues from adjacent hidden labels (connectivity) What of this could HMMs capture? Remember: POS sequence models will be the basis of information extraction methods laterFed raises interest rates 0.5 percentNNP NNS NN NNS CD NNVBN VBZ VBP VBZVBD VB Recap: Accuracies Roadmap of (known / unknown) accuracies: Most freq tag: ~90% / ~50% Trigram HMM: ~95% / ~55% Maxent P(t|w): 93.7% / 82.6% TnT (HMM++): 96.2% / 86.0% Maxent tagger: 96.9% / 86.9% Cyclic tagger: 97.2% / 89.0% Upper bound: ~98%Most errors on unknown wordsRecap: Errors Common errors [from Toutanova & Manning 00]NN/JJ NNofficial knowledgeVBD RP/IN DT NNmade up the storyRB VBD/VBN NNSrecently sold sharesBetter Features Can do surprisingly well just looking at a word by itself: Word the: the → DT Lowercased word Importantly: importantly → RB Prefixes unfathomable: un- → JJ Suffixes Importantly: -ly → RB Capitalization Meridian: CAP → NNP Word shapes 35-year: d-x → JJ Then build a maxent (or whatever) model to predict tag Maxent P(t|w): 93.7% / 82.6%Sequence-Free Tagging? What about looking at a word and it’s environment, but no sequence information? Add in previous / next word the __ Previous / next word shapes X __ X Occurrence pattern features [X: x X occurs] Crude entity detection __ ….. (Inc.|Co.) Phrasal verb in sentence? put …… __ Conjunctions of these things All features except sequence: 96.6% / 86.8% Uses lots of features: > 200K Why isn’t this the standard approach?2Maxent Taggers One step up: also condition on previous tags Train up P(ti|w,ti-1,ti-2) as a normal maxent problem, then use to score sequences This is referred to as a maxent tagger [Ratnaparkhi96] Beam search effective! (Why?) What’s the advantage of beam size 1?Feature Templates We’ve been sloppy: Features: <w0=future, t0=JJ> Feature templates: <w0, t0> In maxent taggers: Can now add edge feature templates: < t-1, t0>  < t-2, t-1, t0>  Also, mixed feature templates: < t-1, w0 , t0 > Decoding Decoding maxent taggers: Just like decoding HMMs Viterbi, beam search, posterior decoding Viterbi algorithm (HMMs): Viterbi algorithm (Maxent):TBL Tagger [Brill 95] presents a transformation-based tagger Label the training set with most frequent tagsDT MD VBD VBD .The can was rusted . Add transformation rules which reduce training mistakes MD → NN : DT __ VBD → VBN : VBD __ . Stop when no transformations do sufficient good Does this remind anyone of anything? Probably the most widely used tagger (esp. outside NLP) … but not the most accurate: 96.6% / 82.0 %TBL Tagger II What gets learned? [from Brill 95]EngCG Tagger English constraint grammar tagger [Tapanainen and Voutilainen 94] Something else you should know about Hand-written and knowledge driven “Don’t guess if you know” (general point about modeling more structure!) Tag set doesn’t make all of the hard distinctions as the standard tag set (e.g. JJ/NN) They get stellar accuracies: 98.5% on their tag set Linguistic representation matters… … but it’s easier to win when you make up the rules3CRF Taggers Newer, higher-powered discriminative sequence models CRFs (also voted perceptrons, M3Ns) Do not decompose training into independent local regions Can be deathly slow to train – require repeated inference on training set Differences tend not to be too important for POS tagging However: one issue worth knowing about in local models “Label bias” and other explaining away effects Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” This means that often evidence doesn’t flow properly Why isn’t this a big deal for POS tagging?Domain Effects Accuracies degrade outside of domain Up to triple error rate Usually make the most errors on the things you care about in the domain (e.g. protein names) Open questions How to effectively exploit unlabeled data from a new domain (what could we gain?) How to best incorporate domain lexica in a principled way (e.g. UMLS specialist lexicon, ontologies)Unsupervised Tagging? AKA part-of-speech induction Task: Raw sentences in Tagged sentences out Obvious thing to do: Start with a (mostly) uniform HMM Run EM Inspect resultsEM for HMMs: Quantities Remember from last time: Can calculate in O(s2n) time (why?)EM for HMMs: Process From these quantities, we can re-estimate transitions: And emissions: If you don’t get these formulas immediately, just think about hard EM instead, where were re-estimate from the Viterbi sequencesMerialdo: Setup Some (discouraging) experiments [Merialdo 94] Setup: You know the set of allowable tags for each word Fix k training examples to their true labels Set P(w|t) on these examples Set P(t|t-1,t-2) on these examples Re-estimate with EM for n iterations Note: we know allowed tags but not frequencies4Merialdo: Results So How to Fix It? Lots of progress in learning parts-of-speech Distributional word clustering methods Morphology-driven models Contrastive estimation Other ideas! Stay


View Full Document

Berkeley COMPSCI 294 - POS Tagging II

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download POS Tagging II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view POS Tagging II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view POS Tagging II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?