DOC PREVIEW
Duke CPS 296.1 - Efficient Evaluation of XML Middle-ware Queries

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Efficient Evaluation of XML Middle-ware QueriesMary FernandezAT&T Labs - [email protected] Morishima∗University of [email protected] Suciu†University of [email protected] address the problem of efficiently constructing ma-terialized XML views of relational databases. In oursetting, the XML view is specified by a query in thedeclarative query language of a middle-ware system,called SilkRoute. The middle-ware system evaluates aquery by sending one or more SQL queries to the tar-get relational database, integrating the resulting tuplestreams, and adding the XML tags. We focus on howto best choose the SQL queries, without having controlover the target RDBMS.1. INTRODUCTIONXML is the universal data-exchange format betweenapplications on the Web. Most existing data, however,is stored in non-XML database systems, so applicationstypically convert data into XML for exchange purposes.When received by a target application, XML data canbe re-mapped into the application’s data structures ortarget database system. Thus, XML often serves as alanguage for defining a view of non-XML data.We are interested in the case when the source datais relational, and the exchange of XML data is betweenseparate organizations or businesses on the Web. Thisscenario is common, because an important use of XMLis in business-to-business (B2B) applications, and mostbusiness-critical data is stored in relational databasesystems (RDBMS). This scenario is also challenging, be-cause the mapping from the relational model to XML isinherently complex and may be difficult to compute ef-ficiently. Relational data is flat, normalized (3NF), andits schema is often proprietary. For example, relationand attribute names may refer to a company’s internalorganization, and this information should not be ex-posed in the exported XML data. In contrast, XMLdata is nested, unnormalized, and its schema (e.g., a∗Research conducted as visitor at AT&T Labs.†Research conducted as employee at AT&T Labs.Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ACM SIGMOD 2001 May 21-24, Santa Barbara, California, USACopyright 2001 ACM 1-58113-332-4/01/05 ...$5.00.DTD or XML Schema) is public. The mapping fromthe relational data to XML, therefore, usually requiresnested queries, joins of multiple relations, and possiblyintegration of disparate databases.In this work, we address the problem of evaluatingefficiently an XML view in the context of SilkRoute [5],a relational to XML middle-ware system. In SilkRoute,a relational to XML view is specified in the declar-ative query language RXL. An RXL query has con-structs for data extraction and for XML construction.We are interested in the special case of materializinglarge RXL views. In practice, large, materialized viewsmay be atypical: often the XML view is kept virtual,and users’ queries extract small fragments of the entireXML view. For example, SilkRoute supports composi-tion of user-defined queries in XML-QL [4] and virtualRXL views and translates the composed queries intoSQL. SilkRoute’s query composition algorithm is de-scribed elsewhere [5]. Our goal is to support data-exportor warehousing applications, which require a large XMLview of the entire database. In this case, computing theXML view may be costly, and query optimization canyield dramatic improvements.Shanmugasudaram et al. [9] evaluate experimentallya variety of approaches for publishing XML data in arelational query engine. In our scenario, the XML doc-ument defined by an RXL view typically exceeds thesize of main memory, therefore, the sorted, outer-unionapproach [9] best suits our needs. This approach con-structs one large, SQL query from the view query; readsthe SQL query’s resulting tuple stream; and then addsXML tags. The SQL query consists of several left-outerjoins, which are combined in outer unions. The result-ing tuples are sorted by the XML element in which theyoccur, so that the XML tagging algorithm can executein constant space [9]. SilkRoute initially used a morenaive approach, in which the view query was decom-posed into multiple SQL queries that do not containouter joins or outer unions. Each result is sorted topermit merging and tagging of the tuples in constantspace. We call this the fully partitioned strategy.This work makes two contributions. First, we showexperimentally that neither of the above approaches isoptimal. This is surprising for the sorted outer-unionstrategy, because only one SQL query is generated, andtherefore has the greatest potential for optimization bythe RDBMS. In experiments on a 100MB database, wefound that the outer-union query was slower than the103Supplier(*suppkey, name, addr, nationkey)PartSupp(*partkey, suppkey, availqty)Part(*partkey, name, mfgr, brand, size, retail)Customer(*custkey, name, addr, nationkey, ph)LineItem(*orderkey, partkey, suppkey, lno, qty, prc)Orders(*orderkey, custkey, status, price, date)Nation(*nationkey, name, regionkey)Region(*regionkey, name)Figure 1: Fragment of TPC-H Schemaqueries produced by the fully-partitioned strategy. Wefound that the optimal strategy generates multiple SQLqueries, but fewer than the fully partitioned strategy,therefore the optimal SQL queries may contain outerjoins and outer unions. XML tagging still uses constantspace, because it merges sorted tuple streams. The op-timal strategy executes 2.5 to 5 times faster than thesorted outer-union and fully-partitioned strategies.Given this finding, we want to devise an algorithmfor decomposing an RXL view query into an optimalset of SQL queries. This problem is complicated by twoissues. First, the RXL view query may be large, be-cause it constructs an XML document and, therefore, isas complex as the output schema. Public DTD’s haveup to several hundreds elements and several thousandattributes, therefore any program or query generatingXML documents for those DTDs must have a compa-rable complexity [7]. This rules out exhaustive-searchstrategies like the dynamic-programming algorithm ofSystem R [8]. Second, our algorithm must functionin a middle-ware system, and,


View Full Document

Duke CPS 296.1 - Efficient Evaluation of XML Middle-ware Queries

Documents in this Course
Lecture

Lecture

18 pages

Lecture

Lecture

6 pages

Lecture

Lecture

13 pages

Lecture

Lecture

5 pages

Load more
Download Efficient Evaluation of XML Middle-ware Queries
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Efficient Evaluation of XML Middle-ware Queries and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Efficient Evaluation of XML Middle-ware Queries 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?