Lecture 24 XML Data Management Nov 17 2006 ChengXiang Zhai Most slides are from Ning Zhang s presentation www2 cs uh edu ceick 3480 XML 3480 ppt CS511 Advanced Database Management Systems 1 What is XML XML documents have elements and attributes attribute Elements indicated by begin end tags can be nested but cannot interleave each other can have arbitrary number of sub elements can have free text as values end elemen t chap title Introduction To XML some free text sect title What is XML sect begin element sect title Elements sect sect title Why XML sect possibly more free text chap CS511 Advanced Database Management Systems Elements w same name can be nested 2 XML bibliography bibliography book book title title Foundations Foundations title title author author Abiteboul Abiteboul author author author author Hull Hull author author author author Vianu Vianu author author publisher publisher Addison AddisonWesley Wesley publisher publisher year year 1995 1995 year year book book bibliography bibliography XML describes the content easy for applications CS511 Advanced Database Management Systems 3 Document Type Definitions DTDs as Grammars DOCTYPE DOCTYPE paper paper ELEMENT ELEMENTpaper paper section section ELEMENT ELEMENTsection section title section title section text text ELEMENT ELEMENTtitle title PCDATA PCDATA ELEMENT ELEMENTtext text PCDATA PCDATA paper section text text section section title title section section section section section paper XML documents can be nested arbitrarily deep CS511 Advanced Database Management Systems 4 XML for Representing Data XML persons nam e John row phone 3634 Sue 6343 D ic k 6363 CS511 Advanced Database Management Systems persons name John row row phone name phone 3634 Sue name 6343 Dick phone 6363 persons persons row row name John name name John name phone phone 3634 phone row 3634 phone row row row name Sue name name Sue name phone phone 6343 phone 6343 phone row row name Dick name name Dick name phone phone 6363 phone row 6363 phone row persons persons 5 XML vs Data Models XML is self describing Schema elements become part of the data Relational schema persons name phone In XML persons name phone are part of the data and are repeated many times Consequence XML is much more flexible XML semistructured data CS511 Advanced Database Management Systems 6 Semi structured Data Explained Missing attributes person person name name John name John name phone 1234 phone phone 1234 phone person person person person name Joe name name Joe name person person Repeated person person name name Mary name Mary name phone 2345 phone attributes phone 2345 phone phone 3456 phone phone 3456 phone person person CS511 Advanced Database Management Systems no phone two phones 7 Semistructured Data Explained Attributes with different types in different objects person person name name first first John John first first last last Smith Smith last last name name phone 1234 phone phone 1234 phone person person structured name Nested collections Heterogeneous collections db contains both book s and publisher s CS511 Advanced Database Management Systems 8 Why XML Relational databases organize data in tables XML documents organize data in ordered trees chap Database Side XML is a new way to organize data Document Side XML is a semantic markup language HTML focuses on presentation while plain text has no structure XML focuses on semantics structure in the data CS511 Advanced Database Management Systems sect sect sect sect sect sect html h1 Chapter 1 h1 some free text h2 Section 1 h2 some more free text h3 Section 1 1 h3 html 9 Data Management Relational vs XML Relational data are well organized fully structured more strict E R modeling to model the data structures in the application E R diagram is converted to relational tables and integrity constraints relational schemas XML data are semi structured more flexible Schemas may be unfixed or unknown flexible anyone can author a document Suitable for data integration data on the web data exchange between different enterprises CS511 Advanced Database Management Systems 10 More about Relational vs XML XML is not meant to replace relational database systems RDBMSs are well suited for OLTP applications e g electronic banking which has 1000 small transactions per minute XML is suitable for data exchange over heterogeneous data sources e g Web services that allow them to talk CS511 Advanced Database Management Systems 11 Uses of XML As document representation language XML can be transformed to other format e g by XSLT XML HTML XML LaTeX bibTeX XML PDF DocBook standard schema for authoring document book CS511 Advanced Database Management Systems 12 Uses of XML cont As data integration and exchange language Web services SOAP WSDL UDDI Amazon com eBay Microsoft MapPoint Domain specific data exchange schemas 1000 legal document exchange language business information exchange RSS XML news feed CNN slashdot blogs CS511 Advanced Database Management Systems 13 Uses of XML cont In general appropriate for any data having hierarchical structure Email Header from to cc bcc Body my message replied email Network log file IP address time request type error code CS511 Advanced Database Management Systems 14 Exporting Relational Data to XML product makes company Product pid name weight Company cid name address Makes pid cid price CS511 Advanced Database Management Systems 15 Export data grouped by companies db company db company name name GizmoWorks GizmoWorks name name address address Tacoma Tacoma address address product product name name gizmo gizmo name name price 19 99 price price 19 99 price product product product product product product company company company company name name Bang Bang name name address Kirkland address Kirkland address address product name gizmo product name gizmo name name price 22 99 price price 22 99 price product product company company db db CS511 Advanced Database Management Systems Redundant representation of products 16 The DTD ELEMENT ELEMENTdb db company company ELEMENT ELEMENTcompany company name name address address product product ELEMENT ELEMENTproduct product name price name price ELEMENT ELEMENTname name PCDATA PCDATA ELEMENT ELEMENTaddress address PCDATA PCDATA ELEMENT ELEMENTprice price CS511 Advanced Database Management Systems PCDATA PCDATA 17 Export Data by Products db db product product name name Gizmo Gizmo name name manufacturer manufacturer name name GizmoWorks GizmoWorks name name price price 19
View Full Document