DOC PREVIEW
UTD CS 7301 - Geographically-Typed Geospatial Data Source Matching with High-Quality

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Geographically-Typed Geospatial Data Source Matching with High-Quality Clustering and Multi-Attribute MatchingTopic OutlineMotivationSample ScenarioDifferent Bibliography OntologiesProblem Statement: Schema MatchingSlide 7Slide 8Slide 9Some DefinitionsSome Definitions (cont)Slide 12Overview of Matching AlgorithmDetermining Semantic SimilarityApplying EBD to Semantic MatchingSlide 16Matching Using N-gramsFaults of this MethodSlide 19Non-Geographic MatchingDefinition of Google DistanceSlide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Attribute WeightingMeasuring Attribute Match ImportanceAttribute UniquenessAttribute ClusteringCutoff Point vs. # of Cluster IterationsCalculating AU, corrected EBD valueAttribute Weighting AlgorithmSlide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Summary of Matching MethodsHierarchical GT MatchingSlide 59Geographically-Typed Geospatial Data Source Matching with High-Quality Clustering and Multi-Attribute MatchingJeffrey PartykaDr. Latifur KhanDr. Bhavani ThuraisinghamFunded by NGA & US Air ForceTopic Outline•Problem Statement•Background Information•Matching Procedures - Generalized Solution - N-grams - Non-Geographic Matching (NGT Matching) - Geographic Matching (GT Matching) - Attribute Weighting - High-Quality Clustering - 1:N Matching•Experimental Results•Future WorkMotivation•Internet Architecture ▫Highly Distributed▫Federated Architecture•Web Application Problems ▫ Low Performance for Information Retrieval▫Accuracy of Retrieved InformationSample ScenarioRank Data SourceQuery: Publication of Academic StaffMIT OntologyKarlsruhe OntologyUMBC Ontology{Article, Book, Booklet, InBook, InCollection, InProceedings, Manual, Misc, Proceedings, Report, Technical Report, Project Report, Thesis, Master Thesis, PhD Thesis, Unpublished, Faculty Member, Lecturer}Different Bibliography OntologiesMIT OntologyKarlsruhe OntologyUMBC OntologyProblem Statement: Schema MatchingGiven 2 data sources, S1 and S2 , each of which is composed of a set of tables where {T11, T12, T13…T1k…T1m} є S1 and {T21, T22, T23…T2j…T2n} є S2, with 1<= k <= m and 1 <= j <= n, determine the similarity between T1k and T2j roadName CityJohnson Rd. PlanoSchool Dr. RichardsonZeppelin St. LakehurstAlma Dr. RichardsonRoad CountyCuster Pwy Cooke15th St. CollinParker Rd. CollinAlma Dr. CollinS1S2COUNTY DestinationSNOHOMISH MukilteoPIERCE Point DefianceKITSAP SouthworthSNOHOMISH EdmondsCity CountyAnacortes SkagitFriday Harbor San JuanArgyle San JuanKirkland KingRoadRoadGiven 2 ontologies, O1 and O2 , each of which is composed of a set of concepts where {C11, C12, C13…C1k…C1m} є O1 and {C21, C22, C23…C2j…C2n} є O2, with 1<= k <= m and 1 <= j <= n, determine the similarity between C1k and C2jProblem Statement: Ontology MatchingMotivating Scenarios1 Making Complex Business Decisions“Should we invest in a new cholesterol drug for the Asia-Pacific region?“2Robust Semantic Web Applications2R & DCorporate MarketingRegulatory AffairsManufacturingYes/No/Maybe?“Find the group of friends around Jeff. Then find the most important person out of the group. Find out if this person was at an event of type Meeting, and happened between 9AM-11AM within 5 miles of UTD”Jeff, Jeff’s friendsWithin 5 miles of UTD9:00am-11:00amYes/No/Maybe?Social NetworkGeospatial OntologyTemporal Logic RDFS LookupEvent of Type ‘Meeting’Matching ApproachesMappings may be generated in several ways – some approaches are:(1: Name Matching(2: Structure Matching(3: Instance MatchingEmail emailAddressCounty DSPKitsap KingstonWahkiak Puget IslandCOUNTYNAME CIDTRAIL RANGE DR 96KITSAP 97?Some Definitions Definition 1 (attribute) An attribute of a table T, denoted as att(T), is defined as a property of T that further describes it.Definition 2 (instance) An instance x of an attribute att(T) is defined as a data value associated with att(T).Definition 3 (keyword) A keyword k of an instance x associated with attribute att(T) is defined as a meaningful word (not a stopword) representing a portion of the instance.Some Definitions (cont) Definition 4a (geographic type (GT)) A geographic type GT associated with attribute att(T) is defined as a class of instances of att(T) that represent the same geographic feature. (e.g: “lake”, “road”)Definition 4b (non-geographic type (NGT)) A non-geographic type (NGT) associated with attribute att(T) is defined as a group of keywords from instances of att(T) that are semantically related to each other. CollinPlanoRichardsonNew JerseyTrentonMonmouthTopic Outline•Problem Statement•Background Information•Matching Procedures - Generalized Solution - N-grams - Non-Geographic Matching (NGT) - GT Matching - Attribute Weighting - High-Quality Clustering - 1:N Matching•Experimental Results•Future WorkOverview of Matching Algorithm 1Select attribute pairs for comparison 2roadNameroadType cityMatch instances between compared attributes townrType rName countyroadNamerName 3Determine final attribute similarityK Ave.Jupiter Rd.Coit Rd.L Ave.LBJ FreewayUS 75roadNamerNameEBD = .98Run Sim algorithms…Determining Semantic Similarity•We use Entropy-Based Distribution (EBD)•EBD is a measurement of type similarity between 2 attributes (or columns):•EBD takes values in the range of [0,1] . Greater EBD corresponds to more similar type distributions between compared attributes (columns)EBD = H(C|T) H(C)Applying EBD to Semantic Matching att1XXXYYZ att2XXYYYZXXXYYZYYYXXZYYXYYYXXXXZZEntropy = H(C) = Conditional Entropy = H(C|T) =Topic Outline•Problem Statement•Background Information•Matching Procedures - Generalized Solution - N-grams - Non-Geographic Matching (NGT) - GT Matching - Attribute Weighting - High-Quality Clustering - 1:N Matching•Experimental Results•Future WorkMatching Using N-grams •Use commonly occurring N-grams [2,3] in compared attributes to determine similarity (N = 2)StrName FENAME StatusLOCUST-GROVE DRLOCUST GROVEBUILTTRAIL RANGE DR TRAIL RANGEBUILTStreet Laddress RaddressLOUISE -DOVER DR 16001798CR45/MANET CT2500 2598TASome N-grams extracted from A.StrName = {LO, OC, CU,ST, OV…..}Some N-grams extracted from B.Street = {LO, OU, UI, OV,…..}TBLOLOOVOVSTUIConditional Entropy = H(C|T) = [2] Jeffrey Partyka,


View Full Document

UTD CS 7301 - Geographically-Typed Geospatial Data Source Matching with High-Quality

Documents in this Course
Load more
Download Geographically-Typed Geospatial Data Source Matching with High-Quality
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Geographically-Typed Geospatial Data Source Matching with High-Quality and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Geographically-Typed Geospatial Data Source Matching with High-Quality 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?