UIC CS 583 - Information Integration and Synthesis

Unformatted text preview:

Chapter 10: Information Integration and SynthesisInformation integrationGlobal Query InterfaceConstructing global query interface (QI)Schema matching as correlation mining (He and Chang, KDD-04)Slide 6A clustering approach to schema matching (Wu et al. SIGMOD-04)Hierarchical ModelingFind 1:1 Mappings via Clustering“Bridging” EffectComplex MappingsComplex Mappings (Cont’d)Instance-based matching via query probing (Wang et al. VLDB-04)Query interface and result pageKnowledge SynthesisKnowledge/Information SynthesisBing search of “cell phone”Knowledge synthesis: a case studyAn exampleExploiting information redundancyEach Web page is already organizedUsing language patterns to find sub-topicsPut them togetherAdditional techniquesSome concepts extraction resultsFinding concepts and sub-conceptsPANKOW (Cimiano, Handschuh and Staab WWW-04)StepsCategorization stepKnowItAll (Etzioni et al WWW-04 and AAAI-04)Syntactic patterns used in KnowItAllMain Modules of KnowItAllSummaryChapter 10: Information Integration and SynthesisCS583, Bing Liu2Information integration Many integration tasks,Integrating Web query interfaces (search forms)Integrating ontologies (taxonomy)Integrating extracted dataIntegrating textual information…We only introduce integration of query interfaces.Many web sites provide forms to query deep webApplications: meta-search and meta-queryCS583, Bing Liu3Global Query Interfaceunited.com airtravel.comdelta.com hotwire.comCS583, Bing Liu4Constructing global query interface (QI)A unified query interface:Conciseness - Combine semantically similar fields over source interfacesCompleteness - Retain source-specific fieldsUser-friendliness – Highly related fields are close togetherTwo-phrased integrationInterface MatchingInterface Matching – Identify semantically similar fieldsInterface Integration – Merge the source query interfacesCS583, Bing Liu5Schema matching as correlation mining (He and Chang, KDD-04)Across many sources:Synonym attributes are negatively correlatedsynonym attributes are semantically alternatives.thus, rarely co-occur in query interfacesGrouping attributes with positive correlationgrouping attributes semantically complementthus, often co-occur in query interfacesA data mining problem (frequent itemset mining)CS583, Bing Liu61. Positive correlation mining as potential groups2. Negative correlation mining as potential matchingsMining positive correlationsLast Name, First NameMining negative correlationsAuthor = {Last Name, First Name}3. Matching selection as model constructionAuthor (any) = {Last Name, First Name}Subject = Category Format = BindingCS583, Bing Liu7A clustering approach to schema matching (Wu et al. SIGMOD-04)Hierarchical modelingBridging effect“a2” and “c2” might not look similar themselves but they might both be similar to “b3”1:m mappingsAggregate and is-a typesUser interaction helps in:learning of matching thresholdsresolution of uncertain mappingsXCS583, Bing Liu8Hierarchical ModelingSource Query InterfaceOrdered Tree RepresentationCapture: ordering and grouping of fieldsCS583, Bing Liu9Find 1:1 Mappings via ClusteringInterfaces:After one merge:…, final clusters:{{a1,b1,c1}, {b2,c2},{a2},{b3}}Initial similarity matrix:Similarity functions linguistic similarity  domain similarityCS583, Bing Liu10“Bridging” Effect?ACBObservations: - It is difficult to match “vehicle” field, A, with “make” field, B - But A’s instances are similar to C’s, and C’s label is similar to B’s - Thus, C might serve as a “bridge” to connect A and B!Note: Connections might also be made via labelsCS583, Bing Liu11Complex Mappings Aggregate type – contents of fields on the many side are part ofthe content of field on the one sideCommonalities – (1) field proximity, (2) parent label similarity, and (3) value characteristicsCS583, Bing Liu12Complex Mappings (Cont’d)Is-a type – contents of fields on the many side are sum/union ofthe content of field on the one sideCommonalities – (1) field proximity, (2) parent label similarity,and (3) value characteristicsCS583, Bing Liu13Instance-based matching via query probing (Wang et al. VLDB-04)Both query interfaces and returned results (called instances) are considered in matching.Assume a global schema (GS) is given and a set of instances are also given.The method uses each instance value (IV) of every attribute in GS to probe the underlying database to obtain the count of IV appeared in the returned results. These counts are used to help matching. It performs matches of Interface schema and global schema, result schema and global schema, and interface schema and results schema.CS583, Bing Liu14Query interface and result pageCS583, Bing Liu15Knowledge SynthesisWeb search paradigm:Given a query, a few wordsA search engine returns a ranked list of pages.The user then browses and reads the top-ranked pages to find what s/he wants. Sufficient for navigational queries if one is looking for a specific piece of information, e.g., homepage of a person, a paper.Not sufficient for informational queriesopen-ended research or exploration, for which more can be done.CS583, Bing Liu16Knowledge/Information SynthesisA growing trend among web search engines:Go beyond the traditional paradigm of presenting a list of pages ranked by relevance to provide more varied, comprehensive information about the search topic. Example: Categories, related searchesGoing beyond: Can a system provide the “complete” information of a search topic? I.e., Find and combine related bits and piecesto provide a coherent picture of the topic.CS583, Bing Liu17Bing search of “cell phone”CS583, Bing Liu18Knowledge synthesis: a case study Motivation: traditionally, when one wants to learn about a topic, one reads a book or a survey paper. With the rapid expansion of the Web, this habit is changing. Learning in-depth knowledge of a topic from the Web is becoming increasingly popular. Web’s convenienceRichness of information, diversity, and applicationsFor emerging topics, it may be essential - no book. Can we mine “a book” from the Web on a topic?Knowledge in a book is well organized: the authors have painstakingly synthesize and organize the knowledge about the topic and present it in a coherent


View Full Document

UIC CS 583 - Information Integration and Synthesis

Download Information Integration and Synthesis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Integration and Synthesis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Integration and Synthesis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?