Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Creating a Bilingual Ontology: Creating a Bilingual Ontology: A Corpus-Based Approach for A Corpus-Based Approach for Aligning WordNet and HowNetAligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.ChurchAbout this paperAbout this paper Creates a bilingual ontology by aligning WordNet with an existing Chinese ontology HowNet Borrows techniques used in information retrieval and machine translation.Wants to show there exists an efficient algorithm that is capable aligning ontologies with two very different language structures Structural information within the ontologies–Not applicable to ontology that have vastly diff. structureA Bilingual Chinese-English ontology A Bilingual Chinese-English ontology Linking the American English WordNet and Simplified Chinese HowNet together by their most basic concepts–the WordNet synset and the HowNet Definition.Why picked WordNet & HowNet?–Structure–Polysemous words–Excellent test for the portability of the algorithmWordNetWordNetElectronic lexical databaseDifferentiate word senses from each other through the use of synsets. Ex: “address” -- {address, computer address}, {address, speech}Synsets are linked to other synsets through hierarchical relations. (ex: hyponyms, hypernyms)A total of 109,377 synsets are defined.HowNetHowNetElectronic lexical databaseMostly in Chinese with some English technical terms (ex: ASCII)Synsets are not explicitly definedMany words often belongs to the same definitions 1500 basic definitionsA total of 16,788 word concepts are composed of subsets of the definitionWant to know more? Want to know more? A detailed WordNet –HowNet Structural comparison can be found in Wong & Fong (2002)Word Sense ambiguation problemWord Sense ambiguation problemFinding the correct translation for Polysemous word in Chinese and English was the biggest problem.–Example: “Crane” One can see the problem of ambiguation by :–Baseline Experiment:Step 1: Pick 2000 HowNet definitions (and associated words) at random Step 2: Translate each of these words to EnglishStep 3: Associate each of the translated English words with one synset in WordNet.Result of Baseline ExperimentResult of Baseline ExperimentFor every definition in HowNet, there are on average 5 Chinese words with that definitionFor every definition in HowNet, there are on average 8 WordNet associated synsets.Average no. of HowNet Entries per Definition 5.4Average no. of WordNet Synsets per Definition 8.1Finer-Mapping Approach…Finer-Mapping Approach…•Definition Match Algorithm (Knight & Luk, 1994)o Compare words with their contexts from example sentences and definition found in a dictionary.oUses word contexts from a large bilingual corpus.• Fung & Lo ‘s information retrieval-like method oComparison of word contexts across languages and corpora that need not be parallel oEffective at extracting bilingual word trans. pairsUsing Synsets for Word Sense Using Synsets for Word Sense DisambiguationDisambiguationGoal of the algorithm: The alignment of the proper translation pair to the correct word sense•The candidate WordNet synsets are ranked according to their similarity with the Chinese HowNet definition.•The alignment ‘winner’ is defined as the HIGHEST-RANKING WordNet synset.Word Sense Alignment Method …Word Sense Alignment Method …1. Given a HowNet definition d, first extract its associated set of Chinese words and their English translations.2. For each word from the English translations, find all the WordNet synsets that it belongs to.3. For each of these candidate WordNet synsets s, a) If s contains only a single word( |s| = 1), expand it by adding words from its direct hyperset*.b) Define:What is hyperset? What is hyperset? The set of hypernyms of the current word which are included to aid in defining the meaning. Why need it?Why need it?The algorithm works better with synsets that contains more entries.More elements in the Synsets , the greater of the value ofSimilarity (d,s).Experiment…Experiment…Bilingual data source: English-Chinese Hong Kong News Corpus which comprises of 18,500 aligned article pairs, from news doc released between 1997-2000. * over 6 million words on the English side * use the entire HowNet vocabulary as a lexicon.The word list for the context vector construction was extracted by taking the monosemous (single meaning) word from WordNetThrow out all the words that had more than one translation in ChineseOverall ResultOverall ResultFor each HowNet definition , the highest scoring WordNet synset that was aligned to it, and the corresponding alignment score are shown.The reverse mapping of WordNet synsets to HowNet definitions can also demonstrate the capabilities of the method.Final AnalysisFinal Analysis•1-to-1 mapping from all HowNet definitions to WordNet synsets does not exists •The seed word (a word that can be directly translation from one lang. to the other) coverage Precise translation? ( !! No !!)What about Rare Words? It creates lots of blank fields.•Non-compositional compounds (NCC) causes problemEx: floppy disk, hot dog•Implement stemming technique Be able to capture the way a word is used more accuratelyConclusion and Future WorkConclusion and Future WorkDoes not make any assumptions about the structural alignment between both ontologiesExpand the work on:–Address the concerns in the analysis section–Produce a full alignment from HowNet to WordNet–Expand the algorithm with more structural info.–Examine the use of the aligned ontology in application ( cross-lingual information retrieval and machine


View Full Document

UA CSC 620 - Creating a Bilingual Ontology

Download Creating a Bilingual Ontology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Creating a Bilingual Ontology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Creating a Bilingual Ontology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?