Debugging Schema Mappings with Routes Laura Chiticariu Wang Chiew Tan UC Santa Cruz UC Santa Cruz laura cs ucsc edu wctan cs ucsc edu ABSTRACT a language that is based on tgds and egds for specifying or programming schema mappings has several advantages over lowerlevel languages such as XSLT scripts or Java programs 14 19 in that it is declarative and it has been widely used in the formal study of the semantics of data exchange and data integration Indeed the use of a higher level declarative language for programming schema mappings is similar to the goal of model management 4 15 One of the goals in model management is to reduce programming effort by allowing a user to manipulate higher level abstractions called models and mappings between models In this case models and mappings between models are schemas and mappings between schemas A recent example of a data exchange system that allows a user to program a schema mapping using a declarative language based on tgds and egds is Clio 11 However developmental support for programming schema mappings in this language is still lacking In particular to the best of our knowledge a tool for debugging schema mappings has not yet been developed It is for the same motivation as developing a debugger for a programming language that we wish to develop a debugger for the language of schema mappings In this paper we present a primary feature of our debugger called routes that allows a user to explore and understand a schema mapping Routes describe the relationships between source and target data with the schema mapping A user is able to select target or source data and our algorithms are able to compute all routes or one route for the selected data Our algorithms are based on the formalism of tgds and egds for specifying schema mappings and we have implemented these algorithms on top of Clio with Clio s language for programming schema mappings We emphasize that even though our implementation is built around Clio s language for schema mappings it is not specific to the execution engine of Clio In fact our implementation requires no changes to the underlying execution engine of Clio Hence we believe that the algorithms we have developed in this paper can be easily adapted for other data exchange systems based on a similar formalism Our debugger can also be used to understand the specification of a data integration system In this case we materialize test data under the target schema often called the global schema in the terminology of data integration for the purpose of debugging the schema mapping It is also worth mentioning that in Clio schema mappings are often not programmed directly by the user but rather they are generated from the result of matching the source and target schemas i e schema matching However it is often the case that the generated schema mapping needs to be further refined before it accurately reflects the user s intention Hence even though schema mappings are not usually programmed directly in Clio there is still a need for a debugger that would allow the user to understand the generated schema mapping A schema mapping is a high level declarative specification of the relationship between two schemas it specifies how data structured under one schema called the source schema is to be converted into data structured under a possibly different schema called the target schema Schema mappings are fundamental components for both data exchange and data integration To date a language for specifying or programming schema mappings exists However developmental support for programming schema mappings is still lacking In particular a tool for debugging schema mappings has not yet been developed In this paper we propose to build a debugger for understanding and exploring schema mappings We present a primary feature of our debugger called routes that describes the relationship between source and target data with the schema mapping We present two algorithms for computing all routes or one route for selected target data Both algorithms execute in polynomial time in the size of the input In computing all routes our algorithm produces a concise representation that factors common steps in the routes Furthermore every minimal route for the selected data can essentially be found in this representation Our second algorithm is able to produce one route fast if there is one and alternative routes as needed We demonstrate the feasibility of our route algorithms through a set of experimental results on both synthetic and real datasets 1 INTRODUCTION A schema mapping is a high level declarative specification of the relationship between two schemas it specifies how data structured under one schema called the source schema is to be converted into data structured under a possibly different schema called the target schema Schema mappings are fundamental components for both data exchange and data integration 12 13 A widely used formalism for specifying relational to relational schema mappings is that of tuple generating dependencies tgds and equality generating dependencies egds In the terminology of data integration tgds are equivalent to global and local as view assertions Using Supported in part by NSF CAREER Award IIS 0347065 and NSF grant IIS 0430994 Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage the VLDB copyright notice and the title of the publication and its date appear and notice is given that copying is by permission of the Very Large Data Base Endowment To copy otherwise or to republish to post on servers or to redistribute to lists requires a fee and or special permission from the publisher ACM VLDB 06 September 12 15 2006 Seoul Korea Copyright 2006 VLDB Endowment ACM 1 59593 385 9 06 09 79 f1 MANHATTAN CREDIT Cards cardNo limit ssn name maidenName salary location SupplementaryCards accNo ssn name address f2 FARGO BANK FBAccounts bankNo ssn name income address CreditCards cardNo creditLimit custSSN SOURCE INSTANCE I Cards cardNo limit ssn name s1 6689 15K 434 J Long FARGO FINANCE Accounts accNo limit accHolder m1 m2 m4 Clients ssn name maidenName income address m5 Source to target dependencies st m1 Cards cn l s n m sal loc A Accounts cn l s Clients s m m sal A m3 FBAccounts bn s n i a CreditCards cn cl cs M Accounts cn cl cs Clients cs n M i a Target dependencies t m4 Accounts a l s N M I A Clients s N M I A m5 Clients s n m i a N L
View Full Document