Unformatted text preview:

CMSC 723 Computational Linguistics I Session 11 Word Sense Disambiguation Jimmy Lin The iSchool University of Maryland Wednesday November 11 2009 Material drawn from slides by Saif Mohammad and Bonnie Dorr Progression of the Course Words Structure Finite state morphology Part of speech tagging TBL HMM CFGs parsing CKY Earley N gram language models Meaning Today s Agenda Word sense disambiguation Beyond lexical semantics Semantic attachments to syntax Shallow semantics PropBank Word Sense Disambiguation Recap Word Sense From WordNet Noun pipe tobacco pipe a tube with a small bowl at one end used for smoking tobacco pipe pipage piping a long tube made of metal or plastic that is used to carry water or oil or gas etc pipe tube a hollow cylindrical shape pipe a tubular wind instrument organ pipe pipe pipework the flues and stops on a pipe organ Verb shriek shrill pipe up pipe utter a shrill cry pipe transport by pipeline pipe oil water and gas into the desert pipe play on a pipe pipe a tune pipe trim with piping pipe the skirt Word Sense Disambiguation Task automatically select the correct sense of a word Theoretically useful for many applications Lexical sample All words Semantic similarity remember from last time Information retrieval Machine translation Solution in search of a problem Why How big is the problem Most words in English have only one sense But the others tend to have several senses Average of 3 83 in LDOCE Average of 2 96 in WordNet Ambiguous words are more frequently used 62 in Longman s Dictionary of Contemporary English 79 in WordNet In the British National Corpus 84 of instances have more than one sense Some senses are more frequent than others Ground Truth Which sense inventory do we use Issues there Application specificity Corpora Lexical sample All words line hard serve corpus 4k sense tagged examples interest corpus 2 369 sense tagged examples SemCor 234k words subset of Brown Corpus Senseval 3 2081 tagged content words from 5k total words Observations about the size Evaluation Intrinsic Measure accuracy of sense selection wrt ground truth Extrinsic Integrate WSD as part of a bigger end to end system e g machine translation or information retrieval Compare WSD Baseline Upper Bound Baseline most frequent sense Equivalent to take first sense in WordNet Does surprisingly well 62 accuracy in this case Upper bound Fine grained WordNet sense 75 80 human agreement Coarser grained inventories 90 human agreement possible What does this mean WSD Approaches Depending on use of manually created knowledge sources Knowledge lean Knowledge rich Depending on use of labeled data Supervised Semi or minimally supervised Unsupervised Lesk s Algorithm Intuition note word overlap between context and dictionary entries Unsupervised but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable rate mortgage securities WordNet Lesk s Algorithm Simplest implementation Lots of variants Count overlapping content words between glosses and context Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps e g bigrams Give extra weight to infrequent words e g idf weighting Works reasonably well Supervised WSD NLP meets ML WSD as a supervised classification task Train a separate classifier for each word Three components of a machine learning problem Training data corpora Representations features Learning method algorithm model Supervised Classification Training Testing training data label1 label2 label3 unlabeled document label4 Representation Function label1 supervised machine learning algorithm label2 Classifier label3 label4 Three Laws of Machine Learning Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data Thou shalt not mingle training data with test data you o d t a But wh m d e e n u do if yo a ta d t s e t ore Features Possible features POS and surface form of the word itself Surrounding words and POS tag Positional information of surrounding words and POS tags Same as above but with n grams Grammatical information Richness of the features Richer features ML algorithm does less of the work More impoverished features ML algorithm does more of the work Classifiers Once we cast the WSD problem as supervised classification many learning techniques are possible Na ve Bayes the thing to try first Decision lists Decision trees MaxEnt Support vector machines Nearest neighbor methods Classifiers Tradeoffs Which classifier should I use It depends Number of features Types of features Number of possible values for a feature Noise General advice Start with Na ve Bayes Use decision trees lists if you want to understand what the classifier is doing SVMs often give state of the art performance MaxEnt methods also work well Na ve Bayes Pick the sense that is most probable given the context Context represented by feature vector s arg max P s f s S By Bayes Theorem s arg max s S P f s P s P f We can ignore this term why Problem data sparsity The Na ve Part Feature vectors are too sparse to estimate directly n P f s P f j s j 1 So assume features are conditionally independent given the word sense This is na ve because Putting everything ntogether s arg max P s P f j s s S j 1 Na ve Bayes Training How do we estimate the probability distributions n s arg max P s P f j s s S Maximum Likelihood Estimates MLE P si P f j s j 1 count si w j count w j count f j s count s What else do we need to do Well how well does it work later Decision List Ordered list of tests equivalent to case statement Example decision list discriminating between bass fish and bass music Building Decision Lists Simple algorithm Compute how discriminative each feature is P S1 f i log P S 2 f i Create ordered list of tests from these values Limitation How do you build n way classifiers from binary classifiers One vs rest sequential vs parallel Another learning problem Well how well does it work later Decision Trees Instead of a list imagine a tree fish in k words yes FISH no striped bass yes no FISH guitar in k words yes MUSIC no Using Decision Trees Given an instance list of feature values Start at the root At each interior node check feature value Follow corresponding branch based on the test When a leaf node is reached return its category Decision tree material drawn from slides by Ed Loper Building Decision Trees Basic idea build tree top down recursively


View Full Document

UMD CMSC 723 - Word Sense Disambiguation

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Loading Unlocking...
Login

Join to view Word Sense Disambiguation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Word Sense Disambiguation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?