XMLSemistructured DataGraphs of Semistructured DataExample: Data GraphSlide 5Well-Formed and Valid XMLWell-Formed XMLTagsExample: Well-Formed XMLXML and Semistructured DataExampleDTD StructureDTD ElementsExample: DTDElement DescriptionsExample: Element DescriptionUse of DTD’sExample (a)Example (b)AttributesExample: AttributesExample: Attribute UseID’s and IDREF’sCreating ID’sCreating IDREF’sExample: ID’s and IDREF’sThe DTDExample DocumentEmpty ElementsExample: Empty Element1Jeff Ullman: Introduction to XMLXMLSemistructured DataExtensible Markup LanguageDocument Type Definitions2Jeff Ullman: Introduction to XMLSemistructured DataAnother data model, based on trees.Motivation: flexible representation of data.oOften, data comes from multiple sources with differences in notation, meaning, etc.Motivation: sharing of documents among systems and databases.3Jeff Ullman: Introduction to XMLGraphs of Semistructured DataNodes = objects.Labels on arcs (attributes, relationships).Atomic values at leaf nodes (nodes with no arcs out).Flexibility: no restriction on:oLabels out of a node.oNumber of successors with a given label.4Jeff Ullman: Introduction to XMLExample: Data GraphBudA.B.Gold1995MapleJoe’sM’lobbeer beerbarmanfmanfservedAtnamenamenameaddrprizeyear awardrootThe bar objectfor Joe’s BarThe beer objectfor BudNotice anew kindof data.5Jeff Ullman: Introduction to XMLXMLXML = Extensible Markup Language.While HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.6Jeff Ullman: Introduction to XMLWell-Formed and Valid XMLWell-Formed XML allows you to invent your own tags.oSimilar to labels in semistructured data.Valid XML involves a DTD (Document Type Definition), a grammar for tags.7Jeff Ullman: Introduction to XMLWell-Formed XMLStart the document with a declaration, surrounded by <?xml … ?> .Normal declaration is:<?xml version = “1.0” standalone = “yes” ?>o“Standalone” = “no DTD provided.”Balance of document is a root tag surrounding nested tags.8Jeff Ullman: Introduction to XMLTagsTags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .Tags may be nested arbitrarily.XML tags are case sensitive.9Jeff Ullman: Introduction to XMLExample: Well-Formed XML<?xml version = “1.0” standalone = “yes” ?><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER></BAR><BAR> … </BARS>A NAMEsubobjectA BEERsubobject10Jeff Ullman: Introduction to XMLXML and Semistructured DataWell-Formed XML with nested tags is exactly the same idea as trees of semistructured data.We shall see that XML also enables nontree structures, as does the semistructured data model.11Jeff Ullman: Introduction to XMLExampleThe <BARS> XML document is:Joe’s BarBud 2.50 Miller 3.00PRICEBARBARBARSNAME. . .BARPRICENAMEBEERBEERNAME12Jeff Ullman: Introduction to XMLDTD Structure<!DOCTYPE <root tag> [<!ELEMENT <name>(<components>)>. . . more elements . . .]>13Jeff Ullman: Introduction to XMLDTD ElementsThe description of an element consists of its name (tag), and a parenthesized description of any nested tags.oIncludes order of subtags and their multiplicity.Leaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags.14Jeff Ullman: Introduction to XMLExample: DTD<!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]>A BARS object haszero or more BAR’snested within.A BAR has oneNAME and oneor more BEERsubobjects.A BEER has aNAME and aPRICE.NAME and PRICEare text.15Jeff Ullman: Introduction to XMLElement DescriptionsSubtags must appear in order shown.A tag may be followed by a symbol to indicate its multiplicity.o* = zero or more.o+ = one or more.o? = zero or one.Symbol | can connect alternative sequences of tags.16Jeff Ullman: Introduction to XMLExample: Element DescriptionA name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address:<!ELEMENT NAME ((TITLE?, FIRST, LAST) | IPADDR)>17Jeff Ullman: Introduction to XMLUse of DTD’s1. Set standalone = “no”.2. Either:a) Include the DTD as a preamble of the XML document, orb) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.18Jeff Ullman: Introduction to XMLExample (a)<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER></BAR> <BAR> … </BARS>The DTDThe document19Jeff Ullman: Introduction to XMLExample (b)Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM “bar.dtd”><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER></BAR><BAR> … </BARS>Get the DTDfrom the filebar.dtd20Jeff Ullman: Introduction to XMLAttributesOpening tags in XML can have attributes.In a DTD,<!ATTLIST E . . . > declares an attribute for element E, along with its datatype.21Jeff Ullman: Introduction to XMLExample: AttributesBars can have an attribute kind, a character string describing the bar.<!ELEMENT BAR (NAME BEER*)><!ATTLIST BAR kind CDATA #IMPLIED>Character stringtype; no tagsAttribute is optionalopposite: #REQUIRED22Jeff Ullman: Introduction to XMLExample: Attribute UseIn a document that allows BAR tags, we might see:<BAR kind = “sushi”><NAME>Akasaka</NAME><BEER><NAME>Sapporo</NAME><PRICE>5.00</PRICE></BEER>...</BAR>Note attributevalues are quoted23Jeff Ullman: Introduction to XMLID’s and IDREF’sAttributes can be pointers from one object to another.oCompare to HTML’s NAME = “foo” and HREF = “#foo”.Allows the structure of an XML document to be a general graph, rather than just a tree.24Jeff Ullman: Introduction to XMLCreating ID’sGive an element E an attribute A of type ID.When using tag <E > in an XML document, give its attribute A a unique
View Full Document