UIC CS 583 - Chapter 10 - Information-integration (40 pages)

Previewing pages 1, 2, 3, 19, 20, 38, 39, 40 of 40 page document View the full content.
View Full Document

Chapter 10 - Information-integration



Previewing pages 1, 2, 3, 19, 20, 38, 39, 40 of actual document.

View the full content.
View Full Document
View Full Document

Chapter 10 - Information-integration

37 views

Lecture Notes


Pages:
40
School:
University of Illinois at Chicago
Course:
Cs 583 - Data Mining and Text Mining

Unformatted text preview:

Chapter 10 Information Integration Introduction At the end of last topic we identified the problem of integrating extracted data column match and instance value match Unfortunately limited research has been done in this specific context Much of the Web information integration research has been focused on the integration of Web query interfaces In this part we introduce some basic integration techniques and Web query interface integration Bing Liu UIC ACL 07 2 Database integration Rahm and Berstein 2001 Information integration started with database integration which has been studied in the database community since the early 1980s Fundamental problem schema matching which takes two or more database schemas to produce a mapping between elements or attributes of the two or more schemas that correspond semantically to each other Objective merge the schemas into a single global schema Bing Liu UIC ACL 07 3 Integrating two schemas Consider two schemas S1 and S2 representing two customer relations Cust and Customer S1 S2 Cust Customer CNo CustID CompName FirstName LastName Company Contact Phone Bing Liu UIC ACL 07 4 Integrating two schemas contd Represent the mapping with a similarity relation over the power sets of S1 and S2 where each pair in represents one element of the mapping E g Cust CNo Customer CustID Cust CompName Customer Company Cust FirstName Cust LastName Customer Contact Bing Liu UIC ACL 07 5 Different types of matching Schema level only matching only schema information is considered Domain and instance level only matching some instance data data records and possibly the domain of each attribute are used This case is quite common on the Web Integrated matching of schema domain and instance data Both schema and instance data possibly domain information are available Bing Liu UIC ACL 07 6 Pre processing for integration He and Chang SIGMOG 03 Madhavan et al VLDB 01 Wu et al SIGMOD 04 Tokenization break an item into atomic words using a dictionary e g Expansion



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Chapter 10 - Information-integration and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 10 - Information-integration and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?