600.465 - Intro to NLP - J. Eisner 1 Part-of-Speech Tagging A Canonical Finite-State Task600.465 - Intro to NLP - J. Eisner 2 The Tagging Task Input: the lead paint is unsafe Output: the/Det lead/N paint/N is/V unsafe/Adj Uses: text-to-speech (how do we pronounce “lead”?) can write regexps like (Det) Adj* N+ over the output preprocessing to speed up parser (but a little dangerous) if you know the tag, you can back off to it in other tasks600.465 - Intro to NLP - J. Eisner 3 Why Do We Care? The first statistical NLP task Been done to death by different methods Easy to evaluate (how many tags are correct?) Canonical finite-state task Can be done well with methods that look at local context Though should “really” do it by parsing! Input: the lead paint is unsafe Output: the/Det lead/N paint/N is/V unsafe/Adj600.465 - Intro to NLP - J. Eisner 4 Degree of Supervision Supervised: Training corpus is tagged by humans Unsupervised: Training corpus isn’t tagged Partly supervised: Training corpus isn’t tagged, but you have a dictionary giving possible tags for each word We’ll start with the supervised case and move to decreasing levels of supervision.600.465 - Intro to NLP - J. Eisner 5 Current Performance How many tags are correct? About 97% currently But baseline is already 90% Baseline is performance of stupidest possible method Tag every word with its most frequent tag Tag unknown words as nouns Input: the lead paint is unsafe Output: the/Det lead/N paint/N is/V unsafe/Adj600.465 - Intro to NLP - J. Eisner 6 What Should We Look At? Bill directed a cortege of autos through the dunes PN Verb Det Noun Prep Noun Prep Det Noun correct tags PN Adj Det Noun Prep Noun Prep Det Noun Verb Verb Noun Verb Adj some possible tags for Prep each word (maybe more) …? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too …600.465 - Intro to NLP - J. Eisner 7 What Should We Look At? Bill directed a cortege of autos through the dunes PN Verb Det Noun Prep Noun Prep Det Noun correct tags PN Adj Det Noun Prep Noun Prep Det Noun Verb Verb Noun Verb Adj some possible tags for Prep each word (maybe more) …? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too …600.465 - Intro to NLP - J. Eisner 8 What Should We Look At? Bill directed a cortege of autos through the dunes PN Verb Det Noun Prep Noun Prep Det Noun correct tags PN Adj Det Noun Prep Noun Prep Det Noun Verb Verb Noun Verb Adj some possible tags for Prep each word (maybe more) …? Each unknown tag is constrained by its word and by the tags to its immediate left and right. But those tags are unknown too …600.465 - Intro to NLP - J. Eisner 9 Three Finite-State Approaches Noisy Channel Model (statistical) noisy channel X Y real language X yucky language Y want to recover X from Y part-of-speech tags (n-gram model) replace tags with words text600.465 - Intro to NLP - J. Eisner 10 Three Finite-State Approaches 1. Noisy Channel Model (statistical) 2. Deterministic baseline tagger composed with a cascade of fixup transducers 3. Nondeterministic tagger composed with a cascade of finite-state automata that act as filters600.465 - Intro to NLP - J. Eisner 11 Review: Noisy Channel noisy channel X Y real language X yucky language Y p(X) p(Y | X) p(X,Y) * = want to recover xX from yY choose x that maximizes p(x | y) or equivalently p(x,y)600.465 - Intro to NLP - J. Eisner 12 Review: Noisy Channel p(X) p(Y | X) p(X,Y) * = .o. = Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”?600.465 - Intro to NLP - J. Eisner 13 Review: Noisy Channel p(X) p(Y | X) p(X,Y) * = .o. = Suppose y=“C”; what is best “x”?600.465 - Intro to NLP - J. Eisner 14 Review: Noisy Channel p(X) p(Y | X) p(X, y) * = .o. = .o. * (Y = y)? restrict just to paths compatible with output “C”600.465 - Intro to NLP - J. Eisner 15 Noisy Channel for Tagging p(X) p(Y | X) p(X, y) * = .o. = .o. * (Y = y)? acceptor: p(tag sequence) transducer: tags words acceptor: the observed words transducer: scores candidate tag seqs on their joint probability with obs words; pick best path “Markov Model” “Unigram Replacement” “straight line”600.465 - Intro to NLP - J. Eisner 16 Markov Model (bigrams) Det Start Adj Noun Verb Prep Stop600.465 - Intro to NLP - J. Eisner 17 Markov Model Det Start Adj Noun Verb Prep Stop 0.3 0.7 0.4 0.5 0.1600.465 - Intro to NLP - J. Eisner 18 Markov Model Det Start Adj Noun Verb Prep Stop 0.7 0.3 0.8 0.2 0.4 0.5 0.1600.465 - Intro to NLP - J. Eisner 19 Markov Model Det Start Adj Noun Verb Prep Stop 0.3 0.4 0.5 Start Det Adj Adj Noun Stop = 0.8 * 0.3 * 0.4 * 0.5 * 0.2 0.8 0.2 0.7 p(tag seq) 0.1600.465 - Intro to NLP - J. Eisner 20 Markov Model as an FSA Det Start Adj Noun Verb Prep Stop 0.7 0.3 0.4 0.5 Start Det Adj Adj Noun Stop = 0.8 * 0.3 * 0.4 * 0.5 * 0.2 0.8 0.2 p(tag seq) 0.1600.465 - Intro to NLP - J. Eisner 21 Markov Model as an FSA Det Start Adj Noun Verb Prep Stop Noun 0.7 Adj 0.3 Adj 0.4 0.1 Noun 0.5 Start Det Adj Adj Noun Stop = 0.8 * 0.3 * 0.4 * 0.5 * 0.2 Det 0.8 0.2 p(tag seq)600.465 - Intro to NLP - J. Eisner 22 Markov Model (tag bigrams) Det Start Adj Noun Stop Adj 0.4 Noun 0.5 0.2 Det 0.8 p(tag seq) Start Det Adj Adj Noun Stop = 0.8 * 0.3 * 0.4 * 0.5 * 0.2 Adj 0.3600.465 - Intro to NLP - J. Eisner 23 Noisy Channel for Tagging p(X) p(Y | X) p(X, y) * = .o. = .o. * p(y | Y) automaton: p(tag sequence) transducer: tags words automaton: the observed words transducer: scores candidate tag seqs on their joint probability with obs words; pick best path “Markov Model” “Unigram Replacement” “straight line”600.465 - Intro to NLP - J. Eisner 24 Noisy Channel for Tagging p(X) p(Y | X) p(X, y) * = .o. = .o. * p(y | Y) transducer: scores candidate tag seqs on their joint probability with obs words; we should pick best path the cool directed autos Adj:cortege/0.000001 … Noun:Bill/0.002 Noun:autos/0.001 … Noun:cortege/0.000001
View Full Document