Ontology AlignmentProblem StatementChallenges & SolutionsLCS Algorithm for Multiple OntologiesLargest Common Subgraph (LCS) Algorithm between two OntologiesData Structure for LCS AlgorithmNode Similarity: Instance-based Representing types using N-grams*Node Similarity: Instance-based Visualizing Entropy and Conditional EntropyNode Similarity: Faults of this MethodSlide 10Slide 11Slide 12Slide 13Structural SimilaritySlide 15Similarity Results of Pairwise Ontology Matching(I3CON Benchmark)Ontology Matching Vector Space Model (VSM)Slide 18Aligned ConceptsAligned ConceptsSlide 21ReificationOWL - 2Ontology Extraction from Text DocumentsSlide 25Concept AssignmentSlide 27Slide 28Ontology AlignmentOntology AlignmentProblem StatementProblem StatementGiven N Ontologies (O1 ,…, On)◦In a Particular Domain ◦Different Level of CoverageGoal◦Evaluate Commonality of Entities◦Rank EntitiesChallenges & SolutionsChallenges & SolutionsOntology Alignments◦Largest Common Subgraph (LCS)◦Vector Space Model (TF/ IDF)Accuracy of Entities in Aligned Concepts◦Ranking EntitiesLCS Algorithm for Multiple LCS Algorithm for Multiple OntologiesOntologiesFind the LCS for two OntologiesAlign LCS with other OntologiesLargest Common Subgraph Largest Common Subgraph (LCS) Algorithm between two (LCS) Algorithm between two OntologiesOntologiesData Structure for LCS Data Structure for LCS Algorithm Algorithm C1C2C3C4C5C6C7C’1C’2C’3C’4C’5C’6Similarity Measure for Corresponding EntitiesNode Similarity + Structural SimilarityC1(C1,C’1, .95),(C1,C’6,.77),(C1,C’3,.71),(C1,C’4,.65),(C1,C’5,.54),(C1,C’2,.34)C2(C2,C’3, .85),(C2,C’2,.67),(C2,C’1,.51),(C2,C’4,.45),(C2,C’5,.24),(C2,C’6,.14)C3(C3,C’4, .90),(C3,C’1,.67),(C3,C’3,.51),(C3,C’2,.45),(C3,C’5,.34),(C3,C’6,.24)C4(C4,C’2, .95),(C4,C’1,.65),(C4,C’3,.51),(C4,C’4,.45),(C4,C’5,.23),(C4,C’6,.14)C5(C5,C’4, .80),(C5,C’1,.67),(C5,C’3,.65),(C5,C’2,.35),(C5,C’5,.34),(C5,C’6,.24)C6(C6,C’1, .20),(C6,C’1,.15),(C6,C’3,.12),(C6,C’2,.12),(C6,C’5,.09),(C6,C’6,.08)C7(C7,C’4, .31),(C7,C’1,.25),(C7,C’3,.23),(C7,C’2,.15),(C7,C’5,.14),(C7,C’6,.12)Node Similarity: Instance-based Node Similarity: Instance-based Representing types using N-grams*Representing types using N-grams*Node Similarity (Name-Match)◦Find Common N-gram (N = 2) for corresponding columnsStrName FENAME StatusLOCUST-GROVE DRLOCUST GROVEBUILTLOUISE LN LOUISE BUILTStreet LaddressRaddressTRAIL RANGE DR 16001798CR45/MANET CT2500 2598CAN-gram types from A.StrName = {LO, OC, CU,ST,…..}N-gram types from B.Street = {TR, RA, R4, 5/,…..}CB*Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani Thuraisingham & Shashi Shekhar, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.Node Similarity: Instance-Node Similarity: Instance-basedbasedVisualizing Entropy and Conditional Visualizing Entropy and Conditional EntropyEntropyH(C) = –Σpi log pi for all x є C1 U C2H(C | T) = H (C,T) – H(C) for all x є C1 U C2 and t є TNode Similarity: Faults of Node Similarity: Faults of this Methodthis Method• Semantically similar columns are not guaranteed to have a high similarity score City CountryDallas USAHouston USAKingston JamaicaHalifax CanadaMexico CityMexicoctyName countryShanghai ChinaBeijing ChinaTokyo JapanNew Delhi IndiaKuala LumpurMalaysia2-grams extracted from A: {Da, al, la, as, Ho, ou, us…} A є O1 B є O2 2-grams extracted from B: {Sh, ha, an, ng, gh, ha, ai, Be, ei, ij…}: Column 1: Column 2Similarity = H(C|T) / H(C) C1 є O1 C2 є O2 Step3Step3: Calculate SimilarityStep1Step1: Extract distinct keywords from compared columnsStep2Step2: Group distinct keywords together into semantic clustersKeywords extracted from columns = {Johnson, Rd., School, 15th,…}“Rd.”,”Dr.”,”St.”,”Pwy”,…“Johnson”,”School”,”Dr.”….C1C2C1 U C2roadName CityJohnson Rd. PlanoSchool Dr. RichardsonZeppelin St. LakehurstRoad CountyCuster Pwy Collin15th St. CollinParker Rd. CollinNode Similarity: Instance-Node Similarity: Instance-basedbasedK-medoid + NGD instance similarityK-medoid + NGD instance similarityNode Similarity: Instance-Node Similarity: Instance-basedbased Problems with K-medoid + NGD*Problems with K-medoid + NGD*It is possible that two different geographic entities (ie: Dallas, TX and Dallas County) in the same location will have a very low computed NGD value, and thus, be mistaken for being similar:roadName CityJohnson Rd. PlanoSchool Dr. RichardsonZeppelin St. LakehurstAlma Dr. RichardsonPreston Rd. AddisonDallas Pkwy DallasRoad CountyCuster Pwy Cooke15th St. CollinParker Rd. CollinAlma Dr. CollinCampbell Rd. DentonHarry Hines Blvd.Dallas*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Semantic Schema Matching Without Shared Instances,” to appear in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA - September 14-16, 2009.NodeNode Similarity: Instance-basedSimilarity: Instance-basedUsing geographic type information*Using geographic type information*We use a gazetteer to determine the geographic type of an instance: O1O2Geotypes*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” to appear in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009.Node Similarity: Instance-basedNode Similarity: Instance-basedResults of Geographic Matching Over 2 Results of Geographic Matching Over 2 Separate Road Network Data SourcesSeparate Road Network Data SourcesStructural Similarity Structural Similarity ◦Structural Similarity MeasurementI. Neighbor SimilarityC’1C’3C’4C’5C1C2C3C5C6Structural Similarity Structural Similarity Structural Similarity MeasurementI. Properties SimilarityC1C2C3C4C5C6C7C’1C’2C’3C’4C’5C’6isAisAisAsubClasshasFlavorhasColorsubClassisAhasFlavorhasFlavorhasFoodhasDrinksubclassRTC1 = [3isA, 2subClass,1hasFlavor,1hasColor, 0 hasFood,1 hasTopping] RTC2 = [1isA, 1subClass,2hasFlavor,0hasColor,1hasFood] hasToppingSimilaritySimilarityResults of Pairwise Ontology Results of Pairwise Ontology Matching(I3CON Matching(I3CON Benchmark)Benchmark)Matching using Name Similarity + RTSMatching usingName Similarity +
View Full Document