DOC PREVIEW
Berkeley COMPSCI 186 - XML Background

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1“The reason that so many people are excited about XML is that so many people are excited about XML.”ANON<Course><Title> CS 186 </Title><Semester> Spring 2006 </Semester><Lecture Number = “26”><Topic> XML </Topic><Topic> Databases </Topic></Lecture></Course>XML Background• eXtensible Markup Language• Roots are HTML and SGML– HTML mixes formatting and semantics– SGML is cumbersome• XML is focused on content– Designers (or others) can create their own sets of tags.– These tag definitions can be exchanged and shared among various groups (DTDs, XSchema).– XSL is a companion language to specify presentation.• <Opinion> XML is ugly </Opinion>– Intended to be generated and consumed by applications --- not people!From HTML to XMLHTML describes the presentationHTML<h1> Bibliography </h1><p> <i> Foundations of Databases </i>Abiteboul, Hull, Vianu<br> Addison Wesley, 1995<p> <i> Data on the Web </i>Abiteoul, Buneman, Suciu<br> Morgan Kaufmann, 1999Example in XML<bibliography><book> <title> Foundations… </title><author> Abiteboul </author><author> Hull </author><author> Vianu </author><publisher> Addison Wesley </publisher><year> 1995 </year></book>…</bibliography>XML describes the contentXML as a Wire Format• People quickly figured out that XML is a convenient way to exchange data among applications.– E.g. Ford’s purchasing app generates a purchase order in XML format, e-mails it to a billing app at Firestone.– Firestone’s billing app ingests the email, generates a bill in XML format, and e-mails it to Ford’s bank.• Emerging standards to get the “e-mail” out of the picture: SOAP, WSDL, UDDI…• The basis of “Web Services” --- potential impact is tremendous.• Why is it catching on?It’s just text, so…•Platform, Language, Vendor agnostic•Easy to understand, manipulate and extend.•Compare this to data trapped in an RDBMS.2What’s this got to do with Databases?• Given that apps will communicate by exchanging XML data, then databases must at least be able to:– Ingest XML formatted data– Publish their own data in XML format• Thinking a bit harder:– XML is kind of a data model.– Why convert to/from relational if everyone wants XML?• More cosmically:– Like evolution from spoken language to written language!• The (multi-) Billion Dollar Question:– Will people really want to store XML data directly?– Current opinion: All major vendors say Yes, or at least, “Maybe”Another (partial) Example<Invoice><Buyer><Name> ABC Corp. </Name><Address> 123 ABC Way </Address></Buyer><Seller><Name> Goods Inc. </Name><Address> 17 Main St. </Address></Seller><ItemList><Item> widget </Item><Item> thingy </Item><Item> jobber </Item></ItemList></Invoice>Can View XML Document as a TreeInvoice as a treeInvoiceBuyerSellerItemlistNameAddressItemABC Corp. 123 ABC WayGoods Inc.17 Main St.widget thingy jobberNameAddressItemItemMapping to Relational• Relational systems handle highly structured dataNew splinters from XML Difficult to search trees that are broken into tables Very expensive to store variable document types≠≠≠Parent IDLabelNULL0102…01NULL2NULL…articleauthorE.F. Coddpages377-387…Mapping to Relational I• Question: What is a relational schema for storing XML data? • Answer – Depends on how “Structured” it is…• If unstructured – use an “Edge Map”…articleauthor year numberE.F. Coddpages377-3871970journalCACM601 2 3 4 53STORED table(author, year, journal, …)Overflow bucketsMapping to Relational II• Can leverage Schema (or DTD) information to create relational schema.• Sometimes called “shredding”• For semi-structured data use hybrid with edge map for overflow.E.F. Codd377-387…articleauthor year cdrompages1970journalCACMP377.pdfOther XML features• Elements can have “attributes” (not clear why). <Price currency="USD">1.50</Price>• XML docs can have IDs and IDREFs, URIs– reference to another document or document element• Two APIs for interacting with/parsing XML Docs:–Document Object Model (DOM)• A tree “object” API for traversing an XML doc• Typically for Java–SAX• Event-Driven: Fire an event for each tag encountered during parse.• May not need to parse the entire document.Document Type Definitions (DTDs)• Grammar for describing the allowed structure of XML Documents.• Specify what elements can appear and in what order, nesting, etc.• DTDs are optional (!)• Many “standard” DTDs have been developed for all sorts of industries, groups, etc.– e.g. NITF for news article dissemination• DTDs are being replaced by XSchema (more in a moment)DTD Example (partial)<?xml version="1.0" encoding="UTF-8"?><!ENTITY % datetime.tz "CDATA"> <!ENTITY % string "CDATA"><!ENTITY % nmtoken "CDATA"> <!-- Any combo of XML name chars. --><!ENTITY % xmlLangCode "%nmtoken;"><!ELEMENT SupplierID (#PCDATA)> <!ATTLIST SupplierIDdomain %string; #REQUIRED><!ELEMENT Comments (#PCDATA)> <!ELEMENT ItemSegment (ContractItem+)><!ATTLIST ItemSegmentsegmentKey %string; #IMPLIED><!ELEMENT Contract (SupplierID+, Comments?, ItemSegment+)><!ATTLIST ContracteffectiveDate %datetime.tz; #REQUIREDexpirationDate %datetime.tz; #REQUIRED>Here’s a DTD for a ContractElements contain others:? = 0 or 1* = 0 or more+ = 1 or moreXML Schemas, etc.• XML Documents can be described using XSchema– Has a notion of types and typechecking– Introduces some notions of IC’s– Quite complicated, controversial ... But will replace simpler DTDs• XML Namespaces– Can import tag names from others– Disambiguate by prefixing the namespace name• i.e. usa:price is different from eurozone:priceQuerying XML• Xpath– A single-document language for “path expressions”• XSLT– XPath plus a language for formatting output• XQuery– An SQL-like proposal with XPath as a sub-language– Supports aggregates, duplicates, …– Data model is lists, not sets– “reference implementations” have appeared, but language is still not widely accepted.• SQL/XML– the SQL standards community fights back4XPath• Syntax for tree navigation and node selection– Navigation is defined by “paths”– Used by other standards: XSLT, XQuery, XPointer,XLink• / = root node or separator between steps in path• * matches any one element name• @ references attributes of the current node• // references any descendant of the current node• [] allows specification of a filter (predicate)


View Full Document

Berkeley COMPSCI 186 - XML Background

Documents in this Course
Load more
Download XML Background
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view XML Background and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view XML Background 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?