Unformatted text preview:

1 Depending on use of high quality manually created knowledge sources Knowledge lean Knowledge rich Depending on use of labeled data Supervised Semi or minimally supervised Unsupervised 2 Lesk s algorithm Sense si of ambiguous word w is likely to be the intended sense if many of the words used in the dictionary definition of si are also used in the definitions of words in the context window Only consider content words 3 the keyboard of the terminal was terminal 1 a point on an electrical device at which electric current enters or leaves 2 where transport vehicles load or unload passengers or goods 3 an input output device providing access to a computer keyboard 1 set of keys on a piano or organ or typewriter or typesetting machine or computer or the like 2 an arrangement of hooks on which keys or locks are hung 4 Many variants possible include the examples in dictionary definitions include other manually tagged example texts Give more weight to larger overlaps Give extra weight to infrequent words occurring in the bags Results Simple versions of Lesk achieve accuracy around 50 60 Lesk plus simple smarts gets to nearly 70 5 Manually labeled training data is fed to a machine learning algorithm Training and test sets must be non overlapping Why More annotated data is again expensive to create Tractable only for small lexical sample tasks Create separate classifier for each word 6 Each training instance is converted to a feature vector Commonly used features Surface form of the target Part of speech of the target Unigrams and bigrams in the context of the target word and their part of speech Syntactic dependencies verb object subject object 7 Instance I opened an account at the bank tag financial institution Bag of words feature vector I opened an account at the bank Could take position into account Exploit collocations such as fine wine blood bank All the training instance feature vectors are fed to a machine learning algorithm Decision trees decision lists na ve Bayes 8 A classifier is learnt for each word Number of possible classes equals number of senses seen in training data Convert unseen test instance into feature vector Feed feature vector to classifier which assigns a suitable sense class to it 9 Pick the sense that is most probably given the context Represent context by a bag of words Let f be the test instance feature vector Let S be the set of all senses of a target word Intended sense of wt argmax P s f s in S Data sparseness is a problem Intended sense of wt argmax P f s P s s in S 10 Independence assumption P f s approximated by product of individual features P fj s all j P fj s count fj s count s P si count si wt count wt Systems using Na ve Bayes have achieved accuracies in the range of 62 to 72 with adequate training data 11 Ordered list of strong clues features to the senses of the target 12 Learned decision list for each target word Bootstrapped from seeds very large corpus heuristics One sense per discourse One sense per collocation Used supervised algorithm to build decision list Corpus 460M words mixed texts 13 Think of seed features for each sense Manufacturing in context of plant industrial building sense Life in context of plant the living thing sense Compile the first set of training data 14 15 16 Create a new decision list classifier supervised training with the data tagged so far training set Looks for collocations as features for classification Apply new classifier to the test set remaining data tag some new instances Optional Apply one sense per discourse rule wherever one sense now dominates a text Co training 17 18 19 Stop when Error on training data is less than a threshold No more training data is covered Use final decision list for WSD Performance was shown to be as good as a supervised algorithm 20 Strength of method The one sense heuristics Automatically generated a huge training corpus Bootstrapping Unsupervised use of supervised algorithm Disadvantages Train each word separately Works well for homonyms only Danger of snowballing error with co training 21 Unsupervised WSD approaches choose that sense of the target which is closest in meaning to the context of the target word E g the Lesk algorithm Supervised WSD approaches Choose that sense of the target whose context is closest to the training set contexts of that sense E g bag of words features approach using decision lists na ve bayes Yarowsky 1995 method 22 Word matching helps determine similarity As in the Lesk algorithm But it is very limited What about word pairs that have different word forms but yet are close in meaning There are hundreds of thousands of such word pairs 23 The bench dismissed the case Bench a long seat for two or more persons the persons who sit as judges a former wave cut shore of a sea or lake or floodplain of a river Case a set of circumstances or conditions a suit or action in law or equity 24 The bench dismissed the case Bench a long seat for two or more persons the persons who sit as judges a former wave cut shore of a sea or lake or floodplain of a river Case a set of circumstances or conditions a suit or action in law or equity 25 Cognate identification Coreference resolution Document clustering Information retrieval Multiword expression identification Paraphrasing and textual entailment Question answering Real word spelling error detection Relation extraction Semantic similarity of texts Speech recognition Subjectivity determination Summarization Textual inference Word prediction Word sense disambiguation Word sense discovery Word sense dominance determination Word translation 26 Semantically close bank money apple fruit tree forest bank river pen paper run walk mistake error car wheel Semantically distant doctor beer painting January money river apple penguin nurse bottle pen river clown tramway car algebra 27 Word meaning The two concepts are close in terms of their meaning World knowledge The two concepts have similar properties or often occur together or occur in similar contexts Psychology We often think of the two concepts together 28 Two terms are semantically close or related if there is a lexical semantic relation between them synonymy hyponymy meronymy troponymy Two terms are semantically similar if there is a particular lexical semantic relation between them synonymy hyponymy troponymy meronymy dog and paw semantically related not similar dog and golden retriever semantically related and semantically similar 29 semantically distant semantically


View Full Document

UMD CMSC 723 - WSD and Semantic Distance

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Loading Unlocking...
Login

Join to view WSD and Semantic Distance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view WSD and Semantic Distance and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?