DOC PREVIEW
CORNELL CS 632 - Brian

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

<XML>IntroductionSection 1 What is XMLDTDsNamespacesXSLSection 2 XML as Semi Structured DataThe OEM modelLOREDrawbacks of XML as Semi-Structured DataSection 3 Alternatives to Semi Structured DataStoredSection 4 Toward a Query LanguageConclusion</XML>Brian SabinoCS 632<XML>IntroductionIt's estimated that today 90% of the worlds data sits outside of any database system, usually in the file system or a custom application [PR 98]. Much of this data is unstructured or only semi-structured and as such is not suitable for storage in today's world of RDBMSs. XML holds the promise of a common interchange format, and, more importantly from our perspective, a method for giving data enough structure that it can bestored in a database. This will allow the database community to offer efficient storage, transaction processing, and the ability to query data to areas where databases have traditionally had little impact.So, the question now becomes how to deal with XML documents and data. Do we need an entirely new data model? Can we adapt relational databases to work with semi-structured XML data? And finally what types of queries should we support, and in what language?This paper starts with a very brief introduction to XML in section 1. In section 2 we will examine XML as traditional semi-structured data and give an introduction to LORE. In section 3 we will hear some opposing views and examine the possibility of storing XML in a traditional RDBMS with a look at STORED. Finally in section 4 we will briefly examine the proposed query languages for XML.Section 1 What is XML For the purposes of this paper we don't really care what XML is, we are really only interested in how we can apply database techniques to XML, and what opportunities1Brian SabinoCS 632XML offers the database community. Therefore we will give only a brief introduction starting with XML basics then explaining DTD's and Namespaces.XML stands for Extensible Mark-up Language. XML like HTML is a subset of SGML (Standard Generalized Markup Language). Now lets spend a moment to untanglethe alphabet soup. Markup Language just means that the text is "marked" with tags (usually symmetric) like the familiar <html></html>. XML is designed by the w3c to be a simple subset of the very complex SGML in the hopes that this simplicity will speed adoption. So how is XML different from HTML? HTML defines how a document is to be displayed, but it says nothing about the structure or contents of the document. XML on the other hand contains both display information and document structure information. XML allows you to define your own tags and use them to give structure to a document. For example you might write the following in HTML: <table><tr><td align=center>Panel on:<br><b><font size=-1>Honey, I Shrunk the DBMS: Footprint, Mobility, and Beyond </font></b><br><font size=-1>Praveen Seshadri</font></td></tr></table>While an equivalent piece of XML might look like:<Panel startTime=1400 endTime=1530><Title>Honey, I Shrunk the DBMS: Footprint, Mobility, and Beyond</Title> <Moderator> <firstName> Praveen </firstName> <lastName> Seshadri </lastName></Moderator></Panel>DTDsA DTD (Document Type Definition) can be thought of as a grammar defining the form of an XML document. A DTD looks like a series of regular expressions with the * (keen closure), +, ? (optional), and | operators. Continuing the above example we might define a DTD file conference.dtd as follows:<!ELEMENT conference (presentation | panel)+>2Brian SabinoCS 632<!ELEMENT panel (title, moderators) ><!ELEMENT title (#PCDATA)><!ELEMENT moderators (moderator)+><!ELEMENT moderator (firstName,lastName)> …To finish the example we would add <!DOCTYPE conference SYSTEM "conference.dtd"> to the original XML document. A DTD provides a way to validate an XML document. A good example of this would be a company that created an "invoice.dtd". Whenever they received an invoice they could compare it to the DTD and determine if the invoice was well formed (used syntactically correct XML) and valid (adhered to the DTD). The DTD is of particular interest to us because it can be seen in some ways as defining a schema that the XML documents must adhere to.NamespacesNamespaces are a simple addition to XML to allow multiple people to use the same XML tags. For example instead of every conference defining their own tags they can instead use a common set of tags like so: <Moderator:Conference>. Meaning this is the moderator tag defined in the Conference namespace. Namespaces are important because they help insure a common format for similar XML documents. Unfortunately we cannot always ensure that similar documents use the same namespace, and thus determining if two similar documents are equivalent is a serious challenge for XML document processing.XSLXSL stands for eXtensible Stylesheet Language. This is a tree based language very similar to CSS for HTML[W3CXSL]. It provides methods for manipulating the display of an XML document, for example if you wanted to display different pieces of 3Brian SabinoCS 632one XML document in different places. As we will see later extensions to XSL are beinglooked at as a possible query language for XSL.Section 2 XML as Semi Structured DataSome people see XML as something of a solved problem. If you view XML solely as semi-structured data then you can simply apply the considerable research done on semi-structured data [Abi97, Bun97, PGMW95] to processing XML. An excellent example of this approach is LORE. Before examining LORE we will first look at what isthe "de facto model for semistructured data"[Suc98] the OEM model. The OEM modelThe OEM model [PGMW95] treats semi-structured data as a labeled digraph with one unique root. Each node in this graph is either a built in data type (possible leaf) or a nodewith edges to othernodes (internal node). Each edge in thismodel represents arelationship betweennodes. The exampleat right shows how theXML example above might be translated under the OEM model. A key point to note is that there is no schema that the data conforms to. Instead the data is in some sense "self describing" [MAG+97] Another important observation is that the data can be irregular. That is for example a panel might have zero or more moderators, or a moderator might have only a last name. The ability to effectively handle irregular data is an advantage of 4Brian SabinoCS 632the OEM model over the relational model, but as we will


View Full Document

CORNELL CS 632 - Brian

Download Brian
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Brian and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Brian 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?