SWARTHMORE CS 97 - Wordnet Wordsense Disambigioution using an Automatically Generated Ontology - D1378006

Home> Schools> Swarthmore College> (CS) > CS 97> Wordnet Wordsense Disambigioution using an Automatically Generated Ontology

DOC PREVIEW

SWARTHMORE CS 97 - Wordnet Wordsense Disambigioution using an Automatically Generated Ontology

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 69–77Computer Science Department, Swarthmore CollegeWordnet Wordsense Disambigioution using anAutomatically Generated OntologySven OlsenSwarthmore [email protected] this paper we present a word sense disam-biguation method in which ambiguous wordsare first disambiguated to senses from an au-tomatically generated ontology, and from theremapped to Wordnet senses. We use the ”clus-tering by committee” algorithm to automati-cally generate sense clusters given untaggedtext. The content of each cluster is used tomap ambiguous words from those clusters toWordnet senses. The algorithm does not re-quire any training data, but we suspect that per-formance could be improved by supplementingthe text to be disambiguated with untagged textfrom a similar source. We compare our algo-rithm to a similar disambiguation scheme thatdoes not make use of automatically generatedsenses, as well as too an intermediate algorithmthat makes use of the automatically generatedsemantic categories, but does not limit itself tothe actual sense clusters. While what results wewere able to gather show that the direct disam-biguator outperforms our other two algorithms,there are a number of reasons not to give uphope in the approach.1 IntroductionWord sense disambiguation algorithms are valuable be-cause there are a number of tasks, such as machine trans-lation and information extraction, for which being ableto perform effective word sense disambiguation is help-ful or even necessary. In order to fully define the taskof word sense disambiguation (WSD), we need to knowthe set of senses associated with a given word. What setof senses ought to be associated with any word almostcertainly depends on the context we are working in. Inthe case of automatic translation from English to anotherlanguage, the best sense set for each word should be influ-enced by the set of translations of that word into the tar-get language. Translation between distant languages suchas English and Inuit might require much finer sense dis-ambiguation than would be needed when going betweenrelated languages such as English and German.WSD becomes a much more tractable problem whenwe have some understanding of the semantics of thesenses that we are disambiguating. For this reason wordsense disambiguation experiments are usually do assum-ing the sense sets of large ontologies such as Wordnet.Using Wordnet senses gives researchers access to infor-mation regarding the semantic relationships of the sensesof deferent words, and many WSD algorithms rely onknowledge of these relationships. Using Wordnet sensesmay also make the act of sense disambiguation more use-ful. For example, an information extraction algorithmmay take advantage of the semantic content implied byWordnet senses.However, there are a number of reasons why Word-net might not be the ideal ontology for any given task.If we try to use Wordnet in an information retrieval taskwe may find that important technical terms are missing(O’Sullivan, 1995). If we try to use Wordnet for machinetranslation tasks, we may find that the sense distinctionsare too fine. In a perfect world, we would have a sep-arate ontology specifically tailored for each task. How-ever, compiling ontologies tends to be very difficult, andso Wordnet is still the de facto standard for most WSDexperiments.Naturally there is a demand for algorithms that canautomatically infer ontologies from text, thus providingresearchers with an infinite set of viable alternatives toWordnet. While no current automatically generated on-tology can compete with Wordnet’s fine sense distinc-tions, Pantel and Lin (2002) present an algorithm capableof generative sense groups of a quality similar to those inRoget’s thesaurus (2002). Unlike Wordnet, this automati-69Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 69–77Computer Science Department, Swarthmore Collegecally generated ontology has no hierarchical information,instead it simply provides groups of related words senses.In this paper we present and algorithm which auto-matically generates an ontology given untagged text, andthen disambiguates that text into the senses of the gener-ated ontology. Thus we hope to provide researchers witha context sensitive alternative to Wordnet based disam-biguation. We also outline a method for converting oursenses to Wordnet senses. This allows us to disambiguatetext to Wordnet senses by first disambiguating to the auto-matically generated senses, and then mapping the resultsto Wordnet. Because we expect the automatically gener-ated sense clusters to be coarser than those of Wordnet,and because the act of generating the senses leaves ouralgorithm with access to extra information regarding theambiguous senses, we expect that disambiguating to theautomatically generated senses will be easy.There are ways in which our method of disambiguatingto Wordnet senses might have advantages over more di-rect approaches. Because the senses used by our systemare inferred from the text to be disambiguated, we canexpect to avoid confusion caused by senses that never ap-pear in our text. Additionally, our system has the advan-tage of requiring no tagged training data. Mapping theautomatically generated senses to Wordnet senses maybe complicated by the fact that the generated senses arecoarser than Wordnet’s, however, we expect that the typemistakes realized because of this to be similar to thosemistakes that a human would make when tagging textwith the often frustratingly fine Wordnet senses.2 Related WorkLin (1994) introduced PRINCIPAR, a broad coverageEnglish parser that works using a message passing model.Among other things, PRINCIPAR can be made to outputa set of ”dependency triples” given any sentence. Recentwork done using MiniPar, PRINCIPAR’s publicly avail-able successor, has shown that these dependency triplesprove quite useful in the context of a number of differenttasks.Lin (1997) introduces an algorithm for word sense dis-ambiguation based on information from MiniPar’s depen-dency triples.Lin (1998) includes an excellent articulation of themeans through which the syntactic information repre-sented by the dependency triples can be used to infersemantic knowledge. Papers such as our own and Pan-tel and Lin (2002) tend to rush their descriptions of themethods first outlined in this paper, and readers tryingto implement our

View Full Document