Unformatted text preview:

Querying XML Data Alin Deutsch Mary Fernandez Daniela Florescu Univ of Pennsylvania adeutsch gradient cis upenn edu AT T Labs Research mff research att com INRIA Rocquencourt France Daniela Florescu inria fr Alon Levy David Maier Dan Suciu University of Washington Seattle alon cs washington edu Oregon Graduate Institute maier cse ogi edu AT T Labs Research suciu research att com 1 Introduction XML threatens to expand beyond its document markup origins to become the basis for data interchange on the Internet One highly anticipated application of XML is the interchange of electronic data EDI Unlike existing Web documents electronic data is primarily intended for computer not human consumption For example businesses could publish data about their products and services and potential customers could compare and process this information automatically business partners could exchange internal operational data between their information systems on secure channels search robots could integrate automatically information from related sources that publish their data in XML format like stock quotes from financial sites sports scores from news sites New opportunities will arise for third parties to add value by integrating transforming cleaning and aggregating XML data Once it becomes pervasive it s not hard to imagine that many information sources will structure their external view as a repository of XML data no matter what their internal storage mechanisms Data exchange between applications will then be in XML format What is then the role of a query language in this world One could see it as a local adjunct to a browsing capability providing a more expressive find command over one or more retrieved documents Or it might serve as a souped up version of XPointer allowing richer forms of logical reference to portions of documents Neither of these modes of use is very databasey From the database viewpoint the enticing role of an XML query language is as a tool for structural and content based query that allows an application to extract precisely the information it needs from one or several XML data sources One salient question is why not adapt SQL or OQL to query XML The answer is that XML data is fundamentally different from relational and object oriented data and therefore neither SQL nor OQL is appropriate for XML The key distinction between data in XML and data in traditional models is that XML is not rigidly structured In the relational and object oriented models every data instance has a schema which is separate from and independent of the data In XML the schema exists with the data Thus XML data is self describing and can naturally model irregularities that cannot be modeled by relational or object oriented data For example data items may have missing elements or multiple occurrences of the same element elements may have atomic values in some data items and structured values in others and collections of elements can have heterogeneous structure Even XML data that has an associated DTD is self describing the data can still be parsed even if Copyright 1999 IEEE Personal use of this material is permitted However permission to reprint republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists or to reuse any copyrighted component of this work in other works must be obtained from the IEEE Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 10 the DTD is removed and except for restrictive forms of DTDs may have all the irregularities described above Most importantly this flexibility is crucial for EDI applications Self describing data has been considered recently in the database research community Researchers have found this data to be fundamentally different from relational or object oriented data and called it semistructured data 1 3 18 Semistructured data is motivated by the problems of integrating heterogeneous data sources and modeling sources such as biological databases Web data and structured text documents such as SGML and XML Research on semistructured data has addressed data models 16 query language design 2 5 10 query processing and optimization 13 schema languages 15 4 11 and schema extraction 14 In this paper we address the problem of querying XML databases We start spelling out some requirements for an XML query language in Section 2 Next we describe in some detail XML QL in Section 3 a query language specially designed for XML and also illustrate how it satisfies some of the requirements Section 4 briefly reviews some other languages We conclude in Section 5 2 Requirements for a query language for XML In this section we set forth characteristics for an XML query language that derive from its anticipated use as a net query language along with an explanation of the need for each 1 Precise Semantics An XML query language should have a formal semantics The formalization needs to be sufficient to support reasoning about XML queries such as determining result structure equivalence and containment Query equivalence is a prerequisite to query optimization while query containment is useful for semantic caching or for determining if a push stream of data can be used to answer a particular query 2 Rewritability Optimizability XML data will often be generated automatically from other formats relational object oriented special purpose formats Thus such XML data will be a view over data in these other models It is desirable that an XML query over that view be translateable into the query language of the native data rather than having to convert the native data to XML format and then apply a query Alternatively when the XML data is native and is processed by a query processor then XML queries need to be optimizable like SQL queries over relational data 3 Query Operations The different operations that have to be supported by an XML query language are selection choosing a document element based on content structure or attributes extraction pulling out particular elements of a document reduction removing selected sub elements of an element restructuring constructing a new set of element instances to hold queried data and combination merging two or more elements into one These operations should all be possible in a single XML query It should not be necessary to resort to another language or multiple XML queries to perform these operations One reason is that an XML server might not


View Full Document

USC CSCI 585 - xml-query

Loading Unlocking...
Login

Join to view xml-query and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view xml-query and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?