Unformatted text preview:

XMLDBMSComputer Science 764December 22, 1998Kevin Beach, Vuk Ercegovac, Michael Henderson, Amy Rea, Suan YongIntroduction:XML-QL is a query language for obtaining data from XML documents on the WorldWide Web. From a database viewpoint, an XML document serves as a database from which aquery will extract results. While the semi-structured nature of XML lends itself to an object datamodel, the relational data model has been shown to perform well with queries posed over largedata sets. Thus, we have designed an implemented a simple database system that executesrelational-like queries over XML data sets that have been transformed into the relational model.Specifically, we execute XML-QL queries in a system, which dynamically loads and transformsXML data sets into relations. The queries are transformed into intermediate execution plansfrom which an optimizer will produce a less costly plan to access the relations with RDBMS-likeoperators.Since we are primarily interested in issues concerning the use of relations to store andquery XML data sets, we do not handle issues relating to recovery, concurrency, or the use ofsecondary and non-volatile storage. This decision is also supported by the expected normalusage of such a system: the intended user is an XML “surfer” who, given a set of XMLdocuments, poses queries in XML-QL via a applet in a browser that can display the results of thequery. In essence, the system serves as an XML document filter that transforms XML data setsinto relations to facilitate more efficient processing.We have initially developed our system to support only a subset of the features providedby XML-QL. Supporting the complete XML-QL specifications is not necessary to achieve ourgoals. With respect to the query language, we have implemented the features that demonstratemost completely the querying aspect of the language and not the data manipulation aspect. Assuch, the optimizer will only be able to take advantage of operators for which language supporthas been added. Similarly, the GUI attempts to provide a clean interface for constructing queriesand displaying results in a straightforward way. We do not deal with the problem of displayingXML graphically. Our goal is to build a system with which we can attain some insight into thedesign considerations that arise when using relations to store and query XML data sets.Architecture Overview:Figure 1 is a schematic of the XMLDBMS system, showing the steps involved inprocessing a query. Initially, the client applet submits to the server an XML-QL query (or adocument with an embedded query). The server strips out the query and forwards it to the XML-QL to SQL translator. The translator identifies the URLs of the XML documents that the queryneeds, and tells the storage manager. The storage manager will load the DTD documentassociated with the URL and convert it into an internal schema data structure. The storagemanager will also get the catalog associated with the data in the XML document (at present, weload the document and build the catalog from scratch; in the future we envision having pre-computed catalog information stored in a separate file. See Future Work). The schema andcatalog is returned to the translator, which uses the schema to verify the validity of the XML-QLquery. The translator then produces an SQL query, and combines the catalogs it has collectedinto a single catalog. The SQL query and catalog is passed on to the query optimizer, whichgenerates the execution plan. The plan execution component obtains the tables from the storagemanager (which fetches the XML document and translates it into an internal table data structure)and produces a resultant table that is returned to the translator. The results are then convertedinto the desired XML formatting and returned to the server, which passes it along (or embeds itinto the document containing the embedded XML-QL query) to the client applet.Translator OptimizerPlanExecutionClient(applet)ServerStorageManagerXML-QLSQL,catalogplanformattedresultsXML-QLqueryresulttableXMLresultsschema,catalogURLtablenametable(fetch DTD document)(fetch XML document)DTD→schema(build catalog)XML→tablesFigure 1 - flowchart of the XMLDBMS systemThe Storage ManagerThe XMLDBMS storage manager plays the role of a buffer manager for data that couldpotentially be scattered throughout the web. Specifically, it is responsible for acquiring, for agiven XML document, a schema, a catalog, and a table containing the data in that document. Itis also in charge of assigning to each XML document a unique page ID that is prepended to thename of each attribute in that document’s table. This is to ensure, for example, that two XMLdocuments contain tables that happen to have the same name will have different “internalnames”.When the schema for a given document is needed, the storage manager will fetch theDTD for that document, and the DTD parser translates it into an internal schema data structure(which is actually just a table). At present we assume the DTD for a given document is in aseparate file in the same directory, and has the filename of the document plus a “.dtd” suffix(e.g., the DTD for the file “http://domain/file.xml” is in “http://domain/file.xml.dtd”). When thetable for a given XML document is needed, the storage manager fetches the document and givesit to the XML parser, which builds the tables associated with that document.When the catalog for a given document is needed, the storage manager will get the tablesassociated with the document and build a catalog from scratch. We treat the fetching of thecatalog as a separate functionality of the storage manager because a possible extension of thisproject is to have the query execution distributed among multiple servers. In this case, it wouldbe desirable to be able to obtain the catalog for a given XML document without having to fetchthe document itself (the catalog information would, for example, be stored in a separate file, likethe DTD). We describe this extension further in Future Work.Our current implementation of the storage manager caches the schema, tables, andcatalogs it has built. This is desirable if we assume that when a client query over a given XMLdocument is likely to make more queries over the same document. This also assumes tables fit inmemory. In the current implementation the cache is never


View Full Document
Download CS 764 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 764 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 764 Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?