Unformatted text preview:

CS 388: Natural Language Processing: Word Sense DisambiguationLexical AmbiguityMotivation for Word Sense Disambiguation (WSD)Sense InventoryWordNetWordNet Synset RelationshipsEuroWordNetWordNet SensesSenses Based on Needs of TranslationLearning for WSDFeature EngineeringContextual FeaturesSurrounding Bag of WordsPOS of Neighboring WordsLocal CollocationsSyntactic Relations (Ambiguous Verbs)Syntactic Relations (Ambiguous Nouns)Syntactic Relations (Ambiguous Adjectives)Using Syntax in WSDEvaluation of WSDLexical Sample vs. All Word TaggingWSD “line” CorpusSenses of “line”Experimental Data for WSD of “line”Learning AlgorithmsNearest-Neighbor Learning AlgorithmK Nearest-NeighborSimilarity Metrics3 Nearest Neighbor Illustration (Euclidian Distance)PerceptronDecision Tree LearningRule LearningDecision List LearningDecision Lists and LanguageEvaluating CategorizationN-Fold Cross-ValidationLearning CurvesN-Fold Learning CurvesLearning Curves for WSD of “line”Discussion of Learning Curves for WSD of “line”Train Time Curves for WSD of “line”Discussion of Train Time Curves for WSD of “line”Test Time Curves for WSD of “line”Discussion of Test Time Curves for WSD of “line”SenseEvalSenseval 1: 1998Senseval 1 English Sense InventorySenseval MetricsSenseval 1 Overall English ResultsSenseval 2: 2001Senseval 2 ResultsSlide 52Slide 53Ensemble ModelsSenseval 3: 2004Senseval 3 English Lexical SampleSenseval 3: English All Words TaskOther Approaches to WSDIssues in WSD11CS 388: Natural Language Processing:Word Sense DisambiguationRaymond J. MooneyUniversity of Texas at Austin2Lexical Ambiguity•Most words in natural languages have multiple possible meanings.–“pen” (noun)•The dog is in the pen.•The ink is in the pen.–“take” (verb)•Take one pill every morning.•Take the first right past the stoplight.•Syntax helps distinguish meanings for different parts of speech of an ambiguous word.–“conduct” (noun or verb)•John’s conduct in class is unacceptable.•John will conduct the orchestra on Thursday.3Motivation forWord Sense Disambiguation (WSD)•Many tasks in natural language processing require disambiguation of ambiguous words.–Question Answering–Information Retrieval–Machine Translation–Text Mining–Phone Help Systems•Understanding how people disambiguate words is an interesting problem that can provide insight in psycholinguistics.4Sense Inventory•What is a “sense” of a word?–Homonyms (disconnected meanings) •bank: financial institution•bank: sloping land next to a river–Polysemes (related meanings with joint etymology) •bank: financial institution as corporation•bank: a building housing such an institution•Sources of sense inventories–Dictionaries–Lexical databases5WordNet•A detailed database of semantic relationships between English words.•Developed by famous cognitive psychologist George Miller and a team at Princeton University.•About 144,000 English words.•Nouns, adjectives, verbs, and adverbs grouped into about 109,000 synonym sets called synsets.6WordNet Synset Relationships•Antonym: front  back•Attribute: benevolence  good (noun to adjective)•Pertainym: alphabetical  alphabet (adjective to noun)•Similar: unquestioning  absolute•Cause: kill  die•Entailment: breathe  inhale•Holonym: chapter  text (part to whole)•Meronym: computer  cpu (whole to part)•Hyponym: plant  tree (specialization)•Hypernym: apple  fruit (generalization)7EuroWordNet•WordNets for–Dutch–Italian–Spanish–German–French–Czech–Estonian8WordNet Senses•WordNets senses (like many dictionary senses) tend to be very fine-grained.•“play” as a verb has 35 senses, including–play a role or part: “Gielgud played Hamlet”–pretend to have certain qualities or state of mind: “John played dead.”•Difficult to disambiguate to this level for people and computers. Only expert lexicographers are perhaps able to reliably differentiate senses.•Not clear such fine-grained senses are useful for NLP.•Several proposals for grouping senses into coarser, easier to identify senses (e.g. homonyms only).9Senses Based on Needs of Translation•Only distinguish senses that are translate to different words in some other language.–play: tocar vs. jugar–know: conocer vs. saber–be: ser vs. estar–leave: salir vs dejar–take: llevar vs. tomar vs. sacar•May still require overly fine-grained senses–river in French is either:•fleuve: flows into the ocean•rivière: does not flow into the ocean10Learning for WSD•Assume part-of-speech (POS), e.g. noun, verb, adjective, for the target word is determined.•Treat as a classification problem with the appropriate potential senses for the target word given its POS as the categories.•Encode context using a set of features to be used for disambiguation.•Train a classifier on labeled data encoded using these features.•Use the trained classifier to disambiguate future instances of the target word given their contextual features.11Feature Engineering•The success of machine learning requires instances to be represented using an effective set of features that are correlated with the categories of interest.•Feature engineering can be a laborious process that requires substantial human expertise and knowledge of the domain.•In NLP it is common to extract many (even thousands of) potentially features and use a learning algorithm that works well with many relevant and irrelevant features.12Contextual Features•Surrounding bag of words.•POS of neighboring words•Local collocations•Syntactic relationsExperimental evaluations indicate that all of these features are useful; and the best results comes from integrating all of these cues in the disambiguation process.13Surrounding Bag of Words•Unordered individual words near the ambiguous word.•Words in the same sentence.•May include words in the previous sentence or surrounding paragraph.•Gives general topical cues of the context.•May use feature selection to determine a smaller set of words that help discriminate possible senses.•May just remove common “stop words” such as articles, prepositions, etc.14POS of Neighboring Words•Use part-of-speech of immediately neighboring words.•Provides evidence of local syntactic context.•P-i is the POS of the word i positions to the left of the target word.•Pi is the POS of the word i positions to


View Full Document

UT CS 388 - Word Sense Disambiguation

Download Word Sense Disambiguation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Word Sense Disambiguation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Word Sense Disambiguation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?