DOC PREVIEW
USC CSCI 585 - xml-query

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Querying XML DataAlin DeutschUniv. of [email protected] FernandezAT&T Labs – [email protected] FlorescuINRIA Rocquencourt, [email protected] LevyUniversity of Washington, [email protected] MaierOregon Graduate [email protected] SuciuAT&T Labs – [email protected] IntroductionXML threatens to expand beyond its document markup origins to become the basis for data interchange on theInternet. One highly anticipated application of XML is the interchange of electronic data (EDI). Unlike existingWeb documents, electronic data is primarily intended for computer, not human, consumption. For example, busi-nesses could publish data about their products and services, and potential customers could compare and processthis information automatically; business partners could exchange internal operational data between their infor-mation systems on secure channels; search robots could integrate automatically information from related sourcesthat publish their data in XML format, like stock quotes from financial sites, sports scores from news sites. Newopportunities will arise for third parties to add value by integrating, transforming, cleaning, and aggregating XMLdata.Once it becomes pervasive, it’s not hard to imagine that many information sources will structure their externalview as a repository of XML data, no matter what their internal storage mechanisms. Data exchange betweenapplications will then be in XML format. What is then the role of a query language in this world? One couldsee it as a local adjunct to a browsing capability, providing a more expressive “find” command over one or moreretrieved documents. Or it might serve as a souped-up version of XPointer, allowing richer forms of logical refer-ence to portions of documents. Neither of these modes of use is very “databasey”. From the database viewpoint,the enticing role of an XML query language is as a tool for structural and content-based query that allows anapplication to extract precisely the information it needs from one or several XML data sources.One salient question is why not adapt SQL or OQL to query XML. The answer is that XML data is funda-mentally different from relational and object-oriented data, and therefore, neither SQL nor OQL is appropriatefor XML. The key distinction between data in XML and data in traditional models is that XML is not rigidlystructured. In the relational and object-oriented models, every data instance has a schema, which is separatefrom and independent of the data. In XML, the schema exists with the data. Thus, XML data is self-describingand can naturally model irregularities that cannot be modeled by relational or object-oriented data. For example,data items may have missing elements or multiple occurrences of the same element; elements may have atomicvalues in some data items and structured values in others; and collections of elements can have heterogeneousstructure. Even XML data that has an associated DTD is self-describing (the data can still be parsed, even ifCopyright 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for ad-vertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse anycopyrighted component of this work in other works must be obtained from the IEEE.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering10the DTD is removed) and, except for restrictive forms of DTDs, may have all the irregularities described above.Most importantly, this flexibility is crucial for EDI applications.Self-describing data has been considered recently in the database research community. Researchers have foundthis data to be fundamentally different from relational or object-oriented data, and called it semistructured data[1, 3, 18]. Semistructured data is motivated by the problems of integrating heterogeneous data sources and mod-eling sources such as biological databases, Web data, and structured text documents, such as SGML and XML.Research on semistructured data has addressed data models [16], query-language design [2, 5, 10], query pro-cessing and optimization [13], schema languages [15, 4, 11], and schema extraction [14].In this paper we address the problem of querying XML databases. We start spelling out some requirementsfor an XML query language in Section 2. Next we describe in some detail XML-QL in Section 3, a query lan-guage specially designed for XML, and also illustrate how it satisfies some of the requirements. Section 4 brieflyreviews some other languages. We conclude in Section 52 Requirements for a query language for XMLIn this section we set forth characteristics for an XML query language that derive from its anticipated use as anet query language, along with an explanation of the need for each.1. Precise Semantics. An XML query language should have a formal semantics. The formalization needs tobe sufficient to support reasoning about XML queries, such as determining result structure, equivalence andcontainment. Query equivalence is a prerequisite to query optimization, while query containment is useful forsemantic caching, or for determining if a push stream of data can be used to answer a particular query.2. Rewritability, Optimizability XML data will often be generated automatically from other formats: relational,object-oriented, special-purpose formats. Thus, such XML data will be a view over data in these other models.It is desirable that an XML query over that view be translateable into the query language of the native data,rather than having to convert the native data to XML format and then apply a query. Alternatively, when theXML data is native and is processed by a query processor, then XML queries need to be optimizable, like SQLqueries over relational data.3. Query Operations. The different operations that have to be supported by an XML query language are: selec-tion (choosing a document element based on content, structure or attributes), extraction (pulling out particularelements of a document), reduction (removing selected sub-elements of an element), restructuring (construct-ing a new set of element instances to hold queried data) and combination (merging two or more elements intoone). These operations should all be possible in a single XML query. It should not benecessary to resort to an-other


View Full Document

USC CSCI 585 - xml-query

Download xml-query
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view xml-query and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view xml-query 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?