SWARTHMORE CS 97 - Wordnet Wordsense Disambigioution using an Automatically Generated Ontology (9 pages)

Previewing pages 1, 2, 3 of 9 page document View the full content.
View Full Document

Wordnet Wordsense Disambigioution using an Automatically Generated Ontology



Previewing pages 1, 2, 3 of actual document.

View the full content.
View Full Document
View Full Document

Wordnet Wordsense Disambigioution using an Automatically Generated Ontology

66 views


Pages:
9
School:
Swarthmore College
Course:
Cs 97 - Computer Perception
Computer Perception Documents
Unformatted text preview:

Wordnet Wordsense Disambigioution using an Automatically Generated Ontology Sven Olsen Swarthmore College solsen1 swarthmore edu Abstract In this paper we present a word sense disambiguation method in which ambiguous words are first disambiguated to senses from an automatically generated ontology and from there mapped to Wordnet senses We use the clustering by committee algorithm to automatically generate sense clusters given untagged text The content of each cluster is used to map ambiguous words from those clusters to Wordnet senses The algorithm does not require any training data but we suspect that performance could be improved by supplementing the text to be disambiguated with untagged text from a similar source We compare our algorithm to a similar disambiguation scheme that does not make use of automatically generated senses as well as too an intermediate algorithm that makes use of the automatically generated semantic categories but does not limit itself to the actual sense clusters While what results we were able to gather show that the direct disambiguator outperforms our other two algorithms there are a number of reasons not to give up hope in the approach 1 Introduction Word sense disambiguation algorithms are valuable because there are a number of tasks such as machine translation and information extraction for which being able to perform effective word sense disambiguation is helpful or even necessary In order to fully define the task of word sense disambiguation WSD we need to know the set of senses associated with a given word What set of senses ought to be associated with any word almost certainly depends on the context we are working in In the case of automatic translation from English to another language the best sense set for each word should be influenced by the set of translations of that word into the target language Translation between distant languages such as English and Inuit might require much finer sense disambiguation than would be needed when going between related languages such as English and German WSD becomes a much more tractable problem when we have some understanding of the semantics of the senses that we are disambiguating For this reason word sense disambiguation experiments are usually do assuming the sense sets of large ontologies such as Wordnet Using Wordnet senses gives researchers access to information regarding the semantic relationships of the senses of deferent words and many WSD algorithms rely on knowledge of these relationships Using Wordnet senses may also make the act of sense disambiguation more useful For example an information extraction algorithm may take advantage of the semantic content implied by Wordnet senses However there are a number of reasons why Wordnet might not be the ideal ontology for any given task If we try to use Wordnet in an information retrieval task we may find that important technical terms are missing O Sullivan 1995 If we try to use Wordnet for machine translation tasks we may find that the sense distinctions are too fine In a perfect world we would have a separate ontology specifically tailored for each task However compiling ontologies tends to be very difficult and so Wordnet is still the de facto standard for most WSD experiments Naturally there is a demand for algorithms that can automatically infer ontologies from text thus providing researchers with an infinite set of viable alternatives to Wordnet While no current automatically generated ontology can compete with Wordnet s fine sense distinctions Pantel and Lin 2002 present an algorithm capable of generative sense groups of a quality similar to those in Roget s thesaurus 2002 Unlike Wordnet this automati 69 Appeared in Proceedings of the Class of 2003 Senior Conference pages 69 77 Computer Science Department Swarthmore College cally generated ontology has no hierarchical information instead it simply provides groups of related words senses In this paper we present and algorithm which automatically generates an ontology given untagged text and then disambiguates that text into the senses of the generated ontology Thus we hope to provide researchers with a context sensitive alternative to Wordnet based disambiguation We also outline a method for converting our senses to Wordnet senses This allows us to disambiguate text to Wordnet senses by first disambiguating to the automatically generated senses and then mapping the results to Wordnet Because we expect the automatically generated sense clusters to be coarser than those of Wordnet and because the act of generating the senses leaves our algorithm with access to extra information regarding the ambiguous senses we expect that disambiguating to the automatically generated senses will be easy There are ways in which our method of disambiguating to Wordnet senses might have advantages over more direct approaches Because the senses used by our system are inferred from the text to be disambiguated we can expect to avoid confusion caused by senses that never appear in our text Additionally our system has the advantage of requiring no tagged training data Mapping the automatically generated senses to Wordnet senses may be complicated by the fact that the generated senses are coarser than Wordnet s however we expect that the type mistakes realized because of this to be similar to those mistakes that a human would make when tagging text with the often frustratingly fine Wordnet senses 2 Related Work Lin 1994 introduced PRINCIPAR a broad coverage English parser that works using a message passing model Among other things PRINCIPAR can be made to output a set of dependency triples given any sentence Recent work done using MiniPar PRINCIPAR s publicly available successor has shown that these dependency triples prove quite useful in the context of a number of different tasks Lin 1997 introduces an algorithm for word sense disambiguation based on information from MiniPar s dependency triples Lin 1998 includes an excellent articulation of the means through which the syntactic information represented by the dependency triples can be used to infer semantic knowledge Papers such as our own and Pantel and Lin 2002 tend to rush their descriptions of the methods first outlined in this paper and readers trying to implement our algorithms for themselves will be well served by referring back to it Pantel and Lin 2002 presents an algorithm in which the information from the dependency triples is used to automatically


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Wordnet Wordsense Disambigioution using an Automatically Generated Ontology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Wordnet Wordsense Disambigioution using an Automatically Generated Ontology and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?