Berkeley COMPSCI 294 - POS Tagging II - D1672103

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> POS Tagging II

DOC PREVIEW

Berkeley COMPSCI 294 - POS Tagging II

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingPOS Tagging IILecture 8: 9/26/05Recap: POS Ambiguity Words are syntactically ambiguous: Two sources of information: Clues from the input (current word, next word, capitalization, suffixes, word shape) Clues from adjacent hidden labels (connectivity) What of this could HMMs capture? Remember: POS sequence models will be the basis of information extraction methods laterFed raises interest rates 0.5 percentNNP NNS NN NNS CD NNVBN VBZ VBP VBZVBD VB Recap: Accuracies Roadmap of (known / unknown) accuracies: Most freq tag: ~90% / ~50% Trigram HMM: ~95% / ~55% Maxent P(t|w): 93.7% / 82.6% TnT (HMM++): 96.2% / 86.0% Maxent tagger: 96.9% / 86.9% Cyclic tagger: 97.2% / 89.0% Upper bound: ~98%Most errors on unknown wordsRecap: Errors Common errors [from Toutanova & Manning 00]NN/JJ NNofficial knowledgeVBD RP/IN DT NNmade up the storyRB VBD/VBN NNSrecently sold sharesBetter Features Can do surprisingly well just looking at a word by itself: Word the: the → DT Lowercased word Importantly: importantly → RB Prefixes unfathomable: un- → JJ Suffixes Importantly: -ly → RB Capitalization Meridian: CAP → NNP Word shapes 35-year: d-x → JJ Then build a maxent (or whatever) model to predict tag Maxent P(t|w): 93.7% / 82.6%Sequence-Free Tagging? What about looking at a word and it’s environment, but no sequence information? Add in previous / next word the __ Previous / next word shapes X __ X Occurrence pattern features [X: x X occurs] Crude entity detection __ ….. (Inc.|Co.) Phrasal verb in sentence? put …… __ Conjunctions of these things All features except sequence: 96.6% / 86.8% Uses lots of features: > 200K Why isn’t this the standard approach?2Maxent Taggers One step up: also condition on previous tags Train up P(ti|w,ti-1,ti-2) as a normal maxent problem, then use to score sequences This is referred to as a maxent tagger [Ratnaparkhi96] Beam search effective! (Why?) What’s the advantage of beam size 1?Feature Templates We’ve been sloppy: Features: <w0=future, t0=JJ> Feature templates: <w0, t0> In maxent taggers: Can now add edge feature templates: < t-1, t0>  < t-2, t-1, t0>  Also, mixed feature templates: < t-1, w0 , t0 > Decoding Decoding maxent taggers: Just like decoding HMMs Viterbi, beam search, posterior decoding Viterbi algorithm (HMMs): Viterbi algorithm (Maxent):TBL Tagger [Brill 95] presents a transformation-based tagger Label the training set with most frequent tagsDT MD VBD VBD .The can was rusted . Add transformation rules which reduce training mistakes MD → NN : DT __ VBD → VBN : VBD __ . Stop when no transformations do sufficient good Does this remind anyone of anything? Probably the most widely used tagger (esp. outside NLP) … but not the most accurate: 96.6% / 82.0 %TBL Tagger II What gets learned? [from Brill 95]EngCG Tagger English constraint grammar tagger [Tapanainen and Voutilainen 94] Something else you should know about Hand-written and knowledge driven “Don’t guess if you know” (general point about modeling more structure!) Tag set doesn’t make all of the hard distinctions as the standard tag set (e.g. JJ/NN) They get stellar accuracies: 98.5% on their tag set Linguistic representation matters… … but it’s easier to win when you make up the rules3CRF Taggers Newer, higher-powered discriminative sequence models CRFs (also voted perceptrons, M3Ns) Do not decompose training into independent local regions Can be deathly slow to train – require repeated inference on training set Differences tend not to be too important for POS tagging However: one issue worth knowing about in local models “Label bias” and other explaining away effects Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” This means that often evidence doesn’t flow properly Why isn’t this a big deal for POS tagging?Domain Effects Accuracies degrade outside of domain Up to triple error rate Usually make the most errors on the things you care about in the domain (e.g. protein names) Open questions How to effectively exploit unlabeled data from a new domain (what could we gain?) How to best incorporate domain lexica in a principled way (e.g. UMLS specialist lexicon, ontologies)Unsupervised Tagging? AKA part-of-speech induction Task: Raw sentences in Tagged sentences out Obvious thing to do: Start with a (mostly) uniform HMM Run EM Inspect resultsEM for HMMs: Quantities Remember from last time: Can calculate in O(s2n) time (why?)EM for HMMs: Process From these quantities, we can re-estimate transitions: And emissions: If you don’t get these formulas immediately, just think about hard EM instead, where were re-estimate from the Viterbi sequencesMerialdo: Setup Some (discouraging) experiments [Merialdo 94] Setup: You know the set of allowable tags for each word Fix k training examples to their true labels Set P(w|t) on these examples Set P(t|t-1,t-2) on these examples Re-estimate with EM for n iterations Note: we know allowed tags but not frequencies4Merialdo: Results So How to Fix It? Lots of progress in learning parts-of-speech Distributional word clustering methods Morphology-driven models Contrastive estimation Other ideas! Stay

View Full Document

Berkeley COMPSCI 294 - POS Tagging II

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

Berkeley COMPSCI 294 - POS Tagging II

Sign up for free to view:

Please select your school