DOC PREVIEW
USC CSCI 585 - icde-xml

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Describing and Manipulating XML DataSudarshan S. ChawatheDepartment of Computer ScienceUniversity of MarylandCollege Park, MD [email protected] paper presents a brief overview of data management using the Extensible Markup Language(XML). It presents the basics of XML and the DTDs used to constrain XML data, and describes metadatamanagement using RDF. It also discusses how XML data is queried, referenced, and transformed usingstylesheet language XSLT and referencing mechanisms XPath and XPointer.1 Describing XML DataThe Extensible Markup Language (XML) [BPSM98] models data as a tree of elements that contain characterdata and have attributes composed of name-value pairs. For example, here is an XML representation of cataloginformation for a book:<book><title>The spy who came in from the cold</title><author>John <lastname>Le Carre</lastname></author><price currency="USD">5.59</price><review><author>Ben</author>Perhaps one of the finest...</review><review><author>Jerry</author>An intriguing tale of...</review><bestseller authority="NY Times"/></book>Text delimited by angle brackets (<...>)ismarkup, while the rest is character data. (Here, and in the rest of thispaper, we introduce concepts informally as needed for our discussion; for formal specifications, see [W3C99].)Elements may contain a mix of character data and other elements; e.g., the book element contains the text “Hereare some...” in addition to elements such as title and price. The element named title contains charac-ter data denoting the book title and is contained in the book element. Similarly, the element price containscharacter data denoting the book’s price. This element also has an attribute named currency with value USD,represented using the syntax attribute-name="attribute-value" within the element’s start-tag. Ingeneral, element names are not unique; e.g., the book element in our example contains two review elements.However, attribute names are unique within an element; e.g., the price element cannot have another attributenamed currency. The syntax permits an empty element <bestseller></bestseller> to be representedmore concisely as <bestseller/>. XML documents are called well-formed if they satisfy simple syntacticconstraints, such as proper delimiting of element names and attributes and proper nesting of start and end tags.Copyright 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for ad-vertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse anycopyrighted component of this work in other works must be obtained from the IEEE.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering31.1 DTDAs described above, XML provides a simple and general markup facility which is useful for data interchange.The simple tag-delimited structure of well-formed XML makes parsing extremely simple. However, applicationsthat operate on XML data often need additional guarantees on the structure and content of such data. For example,a program that calculates the tax on the sale of a book may need to assume that each book element in its XMLinput includes a price subelement with a currency attribute and a numeric content. Such constraints on documentstructure can be expressed using a Document Type Definition (DTD). A DTD defines a class of XML documentsusing a language that is essentially a context-free grammar with several restrictions. For example, one may usethe following DTD declaration to constrain XML documents such as those in our book example:<!ELEMENT book (title, author+, price, review*, bestseller?)><!ELEMENT title (#PCDATA)><!ELEMENT author (#PCDATA|lastname|firstname|fullname)*><!ELEMENT price (#PCDATA)><!ATTLIST price currency CDATA "USD"source (list|regular|sale) listtaxed CDATA #FIXED "yes"><!ELEMENT bestseller EMPTY><!ATTLIST bestseller authority CDATA #REQUIRED>The first line of this declaration is an element type declaration that constrains the contents of the book element.Following common convention, the declaration syntax uses commas for sequencing, parentheses for grouping,and the operators ?, *,and+ to denote, respectively, zero or one, zero or more, and one or more occurrencesof the preceding construct. Note that the declaration requires every book element to have a price sub-element.The second line declares the type for the title element to be parsed character data (implying an XML pro-cessor will parse the contents looking for markup). Note that the use of some element names (e.g., review,lastname) without a corresponding declaration is not an error; such elements are simply not constrained bythis DTD. The last two lines declare bestseller to be an entity that must be empty and that must have anauthority attribute of type character data. The declaration also indicates that the price element mayhave attributes currency, of type character data and default value USD; source, with one of the three valuesshown (an enumerated type) and default value list;andtaxed, with the fixed value yes. The fixed attributetype is a special case of the default attribute type; it mandates that the specified default value not be changed byan XML document conforming to the DTD. Fixed-value attributes are convenient for ensuring that data criticalto processing an element type is available with the desired value without requiring it to be explicitly specified foreach element of that type. Our example DTD specifies that the book in our XML example must be taxed.An XML document that satisfies the constraints of a DTD is said to be valid with respect to that DTD. TheDTD associated with an XML document may be specified using several methods, one of which is the inclusionof a document type declaration<!DOCTYPE BOOKCATALOG SYSTEM "http://tt.com/bookcatalog.dtd">. in a specialsection near the beginning of a document, called its prolog. This declaration indicates that the XML documentclaims validity with respect to the BOOKCATALOG DTD which may be found at the indicated location.The data modeling facilities provided by DTDs are insufficient for many applications. For example, we can-not use DTDs to require that the value of the element price be a fixed-precision real number in the range zerothrough 10000 with two digits after the point. Thus our tax-calculation application cannot rely on XML validitywith respect to its DTD for such simple error-checking. The XML


View Full Document

USC CSCI 585 - icde-xml

Download icde-xml
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view icde-xml and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view icde-xml 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?