NU EECS 317 - Semistructured Data Extensible Markup Language Document Type Definitions

Unformatted text preview:

XMLSemistructured DataThe Information-Integration ProblemExampleTwo Approaches to IntegrationWarehouse DiagramA MediatorGraphs of Semistructured DataExample: Data GraphSlide 10Well-Formed and Valid XMLWell-Formed XMLTagsExample: Well-Formed XMLXML and Semistructured DataSlide 16DTD StructureDTD ElementsExample: DTDElement DescriptionsExample: Element DescriptionUse of DTD’sExample (a)Example (b)AttributesExample: AttributesExample: Attribute UseID’s and IDREF’sCreating ID’sCreating IDREF’sExample: ID’s and IDREF’sThe DTDExample DocumentEmpty ElementsExample: Empty Element1XMLSemistructured DataExtensible Markup LanguageDocument Type Definitions2Semistructured DataAnother data model, based on trees.Motivation: flexible representation of data.Often, data comes from multiple sources with differences in notation, meaning, etc.Motivation: sharing of documents among systems and databases.3The Information-Integration ProblemRelated data exists in many places and could, in principle, work together.But different databases differ in:1. Model (relational, object-oriented?).2. Schema (normalized/unnormalized?).3. Terminology: are consultants employees? Retirees? Subcontractors?4. Conventions (meters versus feet?).4ExampleEvery bar has a database.One may use a relational DBMS; another keeps the menu in an MS-Word document.One stores the phones of distributors, another does not.One distinguishes ales from other beers, another doesn’t.One counts beer inventory by bottles, another by cases.5Two Approaches to Integration1. Warehousing : Make copies of the data sources at a central site and transform it to a common schema.Reconstruct data daily/weekly, but do not try to keep it more up-to-date than that.2. Mediation : Create a view of all sources, as if they were integrated.Answer a view query by translating it to terminology of the sources and querying them.6Warehouse DiagramWarehouseWrapper WrapperSource 1 Source 27A MediatorMediatorWrapper WrapperSource 1 Source 2User queryQueryQueryQueryQueryResultResultResultResultResult8Graphs of Semistructured DataNodes = objects.Labels on arcs (attributes, relationships).Atomic values at leaf nodes (nodes with no arcs out).Flexibility: no restriction on:Labels out of a node.9Example: Data GraphBudA.B.Gold1995MapleJoe’sMillerbeer beerbarmanfmanfservedAtnamenamenameaddrprizeyear awardrootThe bar objectfor Joe’s BarThe beer objectfor BudNotice anew kindof data.10XMLXML = Extensible Markup Language.While HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.11Well-Formed and Valid XMLWell-Formed XML allows you to invent your own tags.Similar to labels in semistructured data.Valid XML involves a DTD (Document Type Definition), a grammar for tags.12Well-Formed XMLStart the document with a declaration, surrounded by <?xml … ?> .Normal declaration is:<?xml version = “1.0” standalone = “yes” ?>“Standalone” = “no DTD provided.”Balance of document is a root tag surrounding nested tags.13TagsTags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .Tags may be nested arbitrarily.XML tags are case sensitive.14Example: Well-Formed XML<?xml version = “1.0” standalone = “yes” ?><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER></BAR><BAR> … </BARS>A NAMEsubobjectA BEERsubobject15XML and Semistructured DataWell-Formed XML with nested tags is exactly the same idea as trees of semistructured data.We shall see that XML also enables nontree structures, as does the semistructured data model.16ExampleThe <BARS> XML document is:Joe’s BarBud 2.50 Miller 3.00PRICEBARBARBARSNAME. . .BARPRICENAMEBEERBEERNAME17DTD Structure<!DOCTYPE <root tag> [<!ELEMENT <name>(<components>)>. . . more elements . . .]>18DTD ElementsThe description of an element consists of its name (tag), and a parenthesized description of any nested tags.Includes order of subtags and their multiplicity.Leaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags.19Example: DTD<!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]>A BARS object haszero or more BAR’snested within.A BAR has oneNAME and oneor more BEERsubobjects.A BEER has aNAME and aPRICE.NAME and PRICEare text.20Element DescriptionsSubtags must appear in order shown.A tag may be followed by a symbol to indicate its multiplicity.* = zero or more.+ = one or more.? = zero or one.Symbol | can connect alternative sequences of tags.21Example: Element DescriptionA name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address:<!ELEMENT NAME ((TITLE?, FIRST, LAST) | IPADDR)>22Use of DTD’s1. Set standalone = “no”.2. Either:a) Include the DTD as a preamble of the XML document, orb) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.23Example (a)<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER></BAR> <BAR> … </BARS>The DTDThe document24Example (b)Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM “bar.dtd”><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER></BAR><BAR> … </BARS>Get the DTDfrom the filebar.dtd25AttributesOpening tags in XML can have attributes.In a DTD,<!ATTLIST E . . . > declares an attribute for element E, along with its datatype.26Example: AttributesBars can have an attribute kind, a character string describing the bar.<!ELEMENT BAR (NAME BEER*)><!ATTLIST BAR kind CDATA #IMPLIED>Character stringtype; no tagsAttribute is optionalopposite: #REQUIRED27Example: Attribute UseIn a document that allows BAR tags, we might see:<BAR kind =


View Full Document

NU EECS 317 - Semistructured Data Extensible Markup Language Document Type Definitions

Download Semistructured Data Extensible Markup Language Document Type Definitions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Semistructured Data Extensible Markup Language Document Type Definitions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Semistructured Data Extensible Markup Language Document Type Definitions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?