Lecture 8: XML DataOutlineFacts About XMLWhat is XML ? From HTML to XMLHTMLXMLSlide 7XML ApplicationsXML SyntaxXML TerminologySlide 11The XML TreeMore XML Syntax: AttributesReplacing Attributes with Elements“Types” (or “Schemas”) for XMLAn Example DTDDTDs as GrammarsMore on DTDs as GrammarsXML for Representing DataXML vs Data ModelsSemi-structured Data ExplainedSemistructured Data ExplainedXML Data v.s. E/R, ODL, RelationalData Sharing with XML: Easy Exporting Relational Data to XMLExport data grouped by companiesThe DTDExport Data by ProductsWhich One Do We Choose ?Storing XML DataLecture 8: XML DataWednesday, October 11 2000Outline•XML, DTDs (Data on the Web, 3.1)•Semistructured data in XML (3.2)•Exporting Relational Data in XML (8.3.1)Facts About XML•132 books at Amazon•875,340 pages at www.altavista.com•Every database vendor X has www.x.com/xml•Many applications are just fancier Websites•But, most importantly, XML enables data sharing on the Web – hence our interestWhat is XML ?From HTML to XMLHTML describes the presentation: easy for humansHTML<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999HTML is hard for applicationsXML<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …</bibliography>XML describes the content: easy for applicationsXML•eXtensible Markup Language•Roots: comes from SGML (very nasty language).•After the roots: a format for sharing data•Emerging format for data exchange on the Web and between applicationsXML Applications•Sharing data between different components of an application.•Format for storing all data in Office 2000.•EDI: electronic data exchange:–Transactions between banks–Producers and suppliers sharing product data (auctions)–Extranets: building relationships between companies–Scientists sharing data about experiments.XML Syntax•Very simple:<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>XML Terminology•tags: book, title, author, …•start tag: <book>, end tag: </book>•start tags must correspond to end tags, and converselyXML Terminology•an element: everything between tags–example element: <title>Complete Guide to DB2</title>–example element: <book> <title> Complete Guide to DB2 </title> <author>Chamberlin</author> </book>•elements may be nested•empty element: <red></red> abbreviated <red/>•an XML document has a unique root elementwell formed XML document: if it has matching tagsThe XML Treedbbook book publishertitle authortitle authorauthor name state“CompleteGuideto DB2”“Chamberlin” “TransactionProcessing”“Bernstein” “Newcomer”“MorganKaufman”“CA”Tags on nodesData values on leavesMore XML Syntax: Attributes<book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year></book>price, currency are called attributesReplacing Attributes with Elements<book> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency></book>attributes are alternative ways (worse ) to represent data“Types” (or “Schemas”) for XML•Document Type Definition – DTD•Define a grammar for the XML document, but we use it as substitute for types/schemas•Will be replaced by XML-Schema (will extend DTDs)An Example DTD<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)>]>•PCDATA means Parsed Character Data (a mouthful for string)DTDs as Grammarsdb ::= (book|publisher)*book ::= (title,author*,year?)title ::= stringauthor ::= stringyear ::= stringpublisher ::= string•A DTD is a EBNF (Extended BNF) grammar•An XML tree is precisely a derivation treeXML Documents that have a DTD and conform to it are called validMore on DTDs as Grammars<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]><!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]><paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>XML documents can be nested arbitrarily deepXML for Representing Data<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row></persons>n a m e p h o n eJ o h n 3 6 3 4S u e 6 3 4 3D i c k 6 3 6 3row row rowname name namephone phonephone“John” 3634 “Sue” “Dick”6343 6363personsXML:personsXML vs Data Models•XML is self-describing•Schema elements become part of the data–Reational schema: persons(name,phone)–In XML <persons>, <name>, <phone> are part of the data, and are repeated many times•Consequence: XML is much more flexible•XML = semistructured dataSemi-structured Data Explained•Missing attributes:–<person> <name> John</name> <phone>1234</phone> </person>–<person><name>Joe</name></person> no phone !•Repeated attributes–<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person>Semistructured Data Explained•Attributes with different types in different objects–<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone> </person>•Nested collections (no 1NF)•Heterogeneous collections:–<db> contains both <book>s and <publisher>sXML Data v.s. E/R, ODL, Relational•Q: is XML better or worse ?•A: serves different purposes–E/R, ODL, Relational models:•For centralized processing, when
View Full Document