600.465 - Intro to NLP - J. Eisner 1 Finite-State and the Noisy Channel600.465 - Intro to NLP - J. Eisner 2 Word Segmentation What does this say? And what other words are substrings? Could segment with parsing (how?), but slow. Given L = a “lexicon” FSA that matches all English words. How to apply to this problem? What if Lexicon is weighted? From unigrams to bigrams? Smooth L to include unseen words? theprophetsaidtothecity600.465 - Intro to NLP - J. Eisner 3 Spelling correction Spelling correction also needs a lexicon L But there is distortion … Let T be a transducer that models common typos and other spelling errors ance () ence (deliverance, ...) e e (deliverance, ...) e e // Cons _ Cons (athlete, ...) rr r (embarrasş occurrence, …) ge dge (privilege, …) etc. Now what can you do with L .o. T ? Should T and L have probabilities? Want T to include “all possible” errors …600.465 - Intro to NLP - J. Eisner 4 Noisy Channel Model noisy channel X Y real language X yucky language Y want to recover X from Y600.465 - Intro to NLP - J. Eisner 5 Noisy Channel Model noisy channel X Y real language X yucky language Y want to recover X from Y correct spelling typos misspelling600.465 - Intro to NLP - J. Eisner 6 Noisy Channel Model noisy channel X Y real language X yucky language Y want to recover X from Y (lexicon space)* delete spaces text w/o spaces600.465 - Intro to NLP - J. Eisner 7 Noisy Channel Model noisy channel X Y real language X yucky language Y want to recover X from Y (lexicon space)* pronunciation speech language model acoustic model600.465 - Intro to NLP - J. Eisner 8 Noisy Channel Model noisy channel X Y real language X yucky language Y want to recover X from Y tree delete everything but terminals text probabilistic CFG600.465 - Intro to NLP - J. Eisner 9 Noisy Channel Model noisy channel X Y real language X yucky language Y p(X) p(Y | X) p(X,Y) * = want to recover xX from yY choose x that maximizes p(x | y) or equivalently p(x,y)600.465 - Intro to NLP - J. Eisner 10 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = .o. = Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”?600.465 - Intro to NLP - J. Eisner 11 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = .o. = Suppose y=“C”; what is best “x”?600.465 - Intro to NLP - J. Eisner 12 Noisy Channel Model p(X) p(Y | X) p(X, y) * = .o. = .o. * (Y=y)? restrict just to paths compatible with output “C”600.465 - Intro to NLP - J. Eisner 13 Morpheme Segmentation Let Lexicon be a machine that matches all Turkish words Same problem as word segmentation Just at a lower level: morpheme segmentation Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized Some constraints on morpheme sequence: bigram probs Generative model – concatenate then fix up joints stop + -ing = stopping, fly + -s = flies, vowel harmony Use a cascade of transducers to handle all the fixups But this is just morphology! Can use probabilities here too (but people often don’t)600.465 - Intro to NLP - J. Eisner 14 Edit Distance Transducer O(k) deletion arcs O(k) insertion arcs O(k2) substitution arcs O(k) no-change arcs600.465 - Intro to NLP - J. Eisner 15 Edit Distance Transducer O(k) deletion arcs O(k) insertion arcs O(k2) substitution arcs O(k) identity arcs Likely edits = high-probability arcs Stochastic600.465 - Intro to NLP - J. Eisner 16 clara Edit Distance Transducer Stochastic .o. = caca .o. Best path (by Dijkstra’s algorithm)Speech Recognition by FST Composition (Pereira & Riley 1996) 600.465 - Intro to NLP - J. Eisner 17 p(word seq) p(phone seq | word seq) p(acoustics | phone seq) .o. .o. trigram language model pronunciation model acoustic model .o. observed acousticsSpeech Recognition by FST Composition (Pereira & Riley 1996) 600.465 - Intro to NLP - J. Eisner 18 p(word seq) p(phone seq | word seq) p(acoustics | phone seq) .o. .o. .o. æ : phone context phone context CAT:k æ t trigram language
View Full Document