Ontology Alignment Problem Statement Given N Ontologies O1 On In a Particular Domain Different Level of Coverage Goal Evaluate Commonality of Entities Rank Entities Challenges Solutions Ontology Alignments Largest Common Subgraph LCS Vector Space Model TF IDF Accuracy of Entities in Aligned Concepts Ranking Entities LCS Algorithm for Multiple Ontologies Find the LCS for two Ontologies Align LCS with other Ontologies Largest Common Subgraph LCS Algorithm between two Ontologies Data Structure for LCS Algorithm C 5 C 2 C 1 C 4 C 3 C 7 C 2 C 3 C 6 C 1 C 6 C 4 Similarity Measure for Corresponding Entities Node Similarity Structural Similarity C1 C1 C 1 95 C1 C 6 77 C1 C 3 71 C1 C 4 65 C1 C 5 54 C1 C 2 34 C2 C2 C 3 85 C2 C 2 67 C2 C 1 51 C2 C 4 45 C2 C 5 24 C2 C 6 14 C3 C3 C 4 90 C3 C 1 67 C3 C 3 51 C3 C 2 45 C3 C 5 34 C3 C 6 24 C4 C4 C 2 95 C4 C 1 65 C4 C 3 51 C4 C 4 45 C4 C 5 23 C4 C 6 14 C 5 Node Similarity Instance based Representing types using N grams Node Similarity Name Match Find Common N gram N 2 for corresponding columns CA CB StrName FENAME Status LOCUSTGROVE DR LOCUST GROVE BUILT LOUISE LN LOUISE BUILT Street TRAIL RANGE DR Laddres s 1600 CR45 MANE 2500 T CT N gram types from A StrName LO OC CU ST Raddres s 1798 2598 N gram types from B Street TR RA R4 5 Jeffrey Partyka Neda Alipanah Latifur Khan Bhavani Thuraisingham Shashi Shekhar Content Based Ontology Matching for GIS Datasets ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems ACM GIS 2008 Page 407 410 Irvine California USA November 2008 Node Similarity Instancebased Visualizing Entropy and Conditional Entropy H C pi log pi for all x C1 U C2 H C T H C T H C for all x C1 U C2 and t T Node Similarity Faults of this Method Semantically similar columns are not guaranteed to have a high similarity score City Countr y ctyName country Shanghai China Dallas USA Beijing China Houston USA Tokyo Japan Kingston Jamaica New Delhi India Halifax Canada Malaysia Mexico City Mexico Kuala Lumpur A O1 B O2 2 grams extracted from A Da al la as Ho ou us 2 grams extracted from B Sh ha an ng gh ha ai Be ei ij Node Similarity Instancebased Step1 Extract from compared columns K medoid distinct NGDkeywords instance similarity C1 C1 O1 roadName City Johnson Rd C2 Road County Plano Custer Pwy Collin School Dr Richardson 15th St Collin Zeppelin St Lakehurst Parker Rd Collin C2 O2 Keywords extracted from columns Johnson Rd School 15th Step2 Group distinct keywords together into semantic Johnson School Dr clusters Rd Dr St Pwy Column 1 C1 U C2 Step3 Calculate Column 2 Similarity H C T H C Node Similarity Instancebased Problems with K medoid NGD It is possible that two different geographic entities ie Dallas TX and Dallas County in the same location will have a very low computed NGD value and thus be mistaken for being similar roadName City Road County Johnson Rd Plano Custer Pwy Cooke School Dr Richardson 15th St Collin Zeppelin St Lakehurst Parker Rd Collin Alma Dr Richardson Alma Dr Collin Preston Rd Addison Campbell Rd Denton Dallas Pkwy Dallas Harry Hines Blvd Dallas Jeffrey Partyka Latifur Khan Bhavani Thuraisingham Semantic Schema Matching Without Shared Instances to appear in Third IEEE International Conference on Semantic Computing Berkeley CA USA September 14 16 2009 Node Similarity Instance based Using geographic type information We use a gazetteer to determine the geographic type of an instance O1 Geotypes O2 Jeffrey Partyka Latifur Khan Bhavani Thuraisingham GeographicallyTyped Semantic Schema Matching to appear in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems ACM GIS 2009 Seattle Washington USA November 2009 Node Similarity Instance based Results of Geographic Matching Over 2 Separate Road Network Data Sources Structural Similarity Structural Similarity Measurement I Neighbor Similarity C2 C5 C 3 C 1 C1 C3 C6 C 4 C 5 Structural Similarity Structural Similarity Measurement I isA Properties Similarity C 2 C 5 subClas s hasColo C C r 1 isA hasFlavo 4 r subClas C hasToppingC 3 6 s C 7 isA C 2 hasDrink hasFood C 1 hasFlavo r isA C 3 C 6 hasFlavo subclas r C sC 4 5 RTC1 3isA 2subClass 1hasFlavor 1hasColor 0 hasFood 1 hasTopping RTC2 1isA 1subClass 2hasFlavor 0hasColor 1hasFood Similarity Results of Pairwise Ontology Matching I3CON Benchmark Matching using Name Similarity RTS Matching using Name Similarity RTS and Neighbor Ontology Matching Vector Space Model VSM Define the VSM for Each Entity isA Collection of Words in label edge types comment and neighbors C 2 C 5 C 2 subClas s hasColo C C r 1 isA hasFlavo 4 isA r subClas C C s 3 6 C hasTopping 7 hasDrink hasFood C 1 hasFlavo r isA C 6 hasFlavo subclas rC sC 4 VSM C1 1C1 1C2 1C3 1C5 1C6 1isA 2subClass 1hasFlavor VSM C 1 1C 3 C 4 1C 5 1isA 2hasFlavor C 3 5 Ontology Matching Vector Space Model VSM Update VSM by Word Score Using TF IDF Calculate Cosine Similarity for corresponding entities Cos VSM C1 VSM C2 Aligned Concepts Aggregate different ontologies Example Aligned Concepts Statistical Model Aligned Concepts Calculate the probabilities of appearance of each entity in GO Use Maximum likelihood Estimation Calculate and Reification Reification can be considered as a metadata about RDF OWL statements Ontology Alignment approaches rely on probabilistic measures to find matches between concepts in different ontologies Reification data can be attached with the alignment information to show the match factor between two concepts in OWL 2 Advanced analytic algorithms can benefit from reification in establishing the relevance of search results OWL 2 OWL 2 is an extension to OWL Some of the new features in OWL 2 are as follows Syntactic sugar eg Disjoint union of classes Property chains Richer datatypes data ranges Qualified cardinality restrictions new constructs that increase expressivity simple metamodeling capabilities extended annotation capabilities Following link lists all the new features in OWL 2 http www w3 org TR 2009 REC owl2 new featu res 20091027 Ontology Extraction from Text Documents Problem Statement Our solution for ontology construction of documents Use hierarchical clustering algorithm to build a hierarchy for documents Hierarchical Agglomerative Clustering HAC Modified Self Organizing Tree MSOT Hierarchical Growing Self Organizing Tree HGSOT Assign concept for each node in the hierarchy Usage of the WordNet Concept Assignment Concept Assignment to document LVQ1 topic vector t is built by training with the
View Full Document
Unlocking...