Splitting WordsSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Representing Word as VectorEach word type has a different vectorEach word token has a different vectorEach word sense has a different vectorWhere can we get sense-labeled training data?Slide 14A problem with the vector modelJust one cue is sometimes enough ...An assortment of possible cues ...Slide 18Final decision list for lead (abbreviated)Slide 20First, find a pair of “seed words” that correlate well with the 2 sensesYarowsky’s bootstrapping algorithmSlide 23Slide 24Slide 25Slide 26Slide 27“One sense per discourse”Slide 29A Note on Combining CuesSlide 31Slide 32600.465 - Intro to NLP - J. Eisner 1Splitting Wordsa.k.a. “Word Sense Disambiguation”slide courtesy of D. Yarowskyslide courtesy of D. Yarowskyslide courtesy of D. Yarowskyslide courtesy of D. Yarowskyslide courtesy of D. Yarowskyslide courtesy of D. Yarowskyslide courtesy of D. Yarowsky600.465 - Intro to NLP - J. Eisner 9Representing Word as VectorCould average over many occurrences of the word ...Each word type has a different vector?Each word token has a different vector?Each word sense has a different vector? (for this one, we need sense-tagged training data) (is this more like a type vector or a token vector?)What is each of these good for?600.465 - Intro to NLP - J. Eisner 10Each word type has a different vectorWe saw this yesterdayIt’s good for grouping wordssimilar semantics?similar syntax?depends on how you build the vector600.465 - Intro to NLP - J. Eisner 11Each word token has a different vectorGood for splitting words - unsupervised WSDCluster the tokens: each cluster is a sense!have turned it into the hot dinner-party topic. The comedy is theselection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubtthat had been passed to a second party who made a financial decisionA future obliges each party to the contract to fulfil it by600.465 - Intro to NLP - J. Eisner 12Each word sense has a different vectorRepresent each new word token as vector, too Now assign each token the closest sense(could lump together all tokens of the word in the same document: assume they all have same sense)have turned it into the hot dinner-party topic. The comedy is theselection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going let you know that there’s a party at my house tonight. Directions: Drivein the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt??600.465 - Intro to NLP - J. Eisner 13Where can we get sense-labeled training data?To do supervised WSD, need many examples of each sense in contexthave turned it into the hot dinner-party topic. The comedy is theselection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going let you know that there’s a party at my house tonight. Directions: Drivein the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party , who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt??600.465 - Intro to NLP - J. Eisner 14Where can we get sense-labeled training data?Sources of sense-labeled training text: Human-annotated text - expensiveBilingual text (Rosetta stone) – can figure out which sense of “plant” is meant by how it translatesDictionary definition of a sense is one sample contextRoget’s thesaurus entry of a sense is one sample contexthardly any data per sense – but we’ll use it later to get unsupervised training startedTo do supervised WSD, need many examples of each sense in context600.465 - Intro to NLP - J. Eisner 15A problem with the vector modelBad idea to treat all context positions equally:Possible solutions:Faraway words don’t count as strongly?Words in different positions relative to plant are different elements of the vector? (i.e., (pesticide, -1) and (pesticide,+1) are different features)Words in different syntactic relationships to plant are different elements of the vector?600.465 - Intro to NLP - J. Eisner 16Just one cue is sometimes enough ...slide courtesy of D. Yarowsky (modified)600.465 - Intro to NLP - J. Eisner 17An assortment of possible cues ...slide courtesy of D. Yarowsky (modified)generates a whole bunch of potential cues – use data to find out which ones work best600.465 - Intro to NLP - J. Eisner 18An assortment of possible cues ...slide courtesy of D. Yarowsky (modified)merged rankingof all cues of all these typesonly a weak cue ...but we’ll trust it if there’s nothing better600.465 - Intro to NLP - J. Eisner 19Final decision list for lead (abbreviated)slide courtesy of D. Yarowsky (modified)To disambiguate a token of lead :Scan down the sorted listThe first cue that is found gets to make the decision all by itselfNot as subtle as combining cues, but works well for WSDCue’s score is its log-likelihood ratio: log [ p(cue | sense A) [smoothed] / p(cue | sense B) ]slide courtesy of D. Yarowsky (modified)very readable paper at http://cs.jhu.edu/~yarowsky/acl95.pssketched on the following slides ...unsupervised learning!600.465 - Intro to NLP - J. Eisner 21First, find a pair of “seed words”that correlate well with the 2 sensesunsupervised learning!If “plant” really has 2 senses, it should appear in2 dictionary entries: Pick content words from those2 thesaurus entries: Pick synonyms from those2 different clusters of documents: Pick representative words from those2 translations in parallel text: Use the translations as seed wordsOr just have a human name the seed words(maybe from a list of words that occur unusually often near “plant”)make that “minimally supervised”600.465 - Intro to NLP - J. Eisner22target word:planttable taken from Yarowsky (1995)Yarowsky’s
View Full Document