New version page

NU EECS 317 - XML Semistructured Data Extensible Markup Language Document Type Definitions

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

XMLFrameworkThe Information-Integration ProblemExampleTwo Approaches to IntegrationWarehouse DiagramA MediatorSemistructured DataGraphs of Semistructured DataExample: Data GraphSlide 11Well-Formed and Valid XMLWell-Formed XMLTagsExample: Well-Formed XMLXML and Semistructured DataSlide 17Document Type DefinitionsDTD StructureDTD ElementsExample: DTDElement DescriptionsExample: Element DescriptionUse of DTD’sExample (a)Example (b)AttributesExample: AttributesExample: Attribute UseID’s and IDREF’sCreating ID’sCreating IDREF’sExample: ID’s and IDREF’sThe DTDExample Document1XMLSemistructured DataExtensible Markup LanguageDocument Type Definitions2Framework1. Information Integration : Making databases from various places work as one.2. Semistructured Data : A new data model designed to cope with problems of information integration.3. XML : A standard language for describing semistructured data schemas and representing data.3The Information-Integration ProblemRelated data exists in many places and could, in principle, work together.But different databases differ in:1. Model (relational, object-oriented?).2. Schema (normalized/unnormalized?).3. Terminology: are consultants employees? Retirees? Subcontractors?4. Conventions (meters versus feet?).4ExampleEvery bar has a database.One may use a relational DBMS; another keeps the menu in an MS-Word document.One stores the phones of distributors, another does not.One distinguishes ales from other beers, another doesn’t.One counts beer inventory by bottles, another by cases.5Two Approaches to Integration1. Warehousing : Make copies of the data sources at a central site and transform it to a common schema.Reconstruct data daily/weekly, but do not try to keep it more up-to-date than that.2. Mediation : Create a view of all sources, as if they were integrated.Answer a view query by translating it to terminology of the sources and querying them.6Warehouse DiagramWarehouseWrapper WrapperSource 1 Source 27A MediatorMediatorWrapper WrapperSource 1 Source 2User queryQueryQueryQueryQueryResultResultResultResultResult8Semistructured DataPurpose: represent data from independent sources more flexibly than either relational or object-oriented models.Think of objects, but with the type of each object its own business, not that of its “class.”Labels to indicate meaning of substructures.9Graphs of Semistructured DataNodes = objects.Labels on arcs (attributes, relationships).Atomic values at leaf nodes (nodes with no arcs out).Flexibility: no restriction on:Labels out of a node.Number of successors with a given label.10Example: Data GraphBudA.B.Gold1995MapleJoe’sM’lobbeer beerbarmanfmanfservedAtnamenamenameaddrprizeyear awardrootThe bar objectfor Joe’s BarThe beer objectfor BudNotice anew kindof data.11XMLXML = Extensible Markup Language.While HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.12Well-Formed and Valid XMLWell-Formed XML allows you to invent your own tags.Similar to labels in semistructured data.Valid XML involves a DTD (Document Type Definition), which limits the labels and gives a grammar for their use.13Well-Formed XMLStart the document with a declaration, surrounded by <? … ?> .Normal declaration is:<? XML VERSION = “1.0” STANDALONE = “yes” ?>“Standalone” = “no DTD provided.”Balance of document is a root tag surrounding nested tags.14TagsTags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .Tags may be nested arbitrarily.Tags requiring no matching ender, like <P> in HTML, are also permitted.15Example: Well-Formed XML<? XML VERSION = “1.0” STANDALONE = “yes” ?><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER></BAR><BAR> … </BARS>16XML and Semistructured DataWell-Formed XML with nested tags is exactly the same idea as trees of semistructured data.We shall see that XML also enables nontree structures, as does the semistructured data model.17ExampleThe <BARS> XML document is:Joe’s BarBud 2.50 Miller 3.00PRICEBARBARBARSNAME. . .BARPRICENAMEBEERBEERNAME18Document Type DefinitionsEssentially a context-free grammar for describing XML tags and their nesting.Each domain of interest (e.g., electronic components, bars-beers-drinkers) creates one DTD that describes all the documents this group will share.19DTD Structure<!DOCTYPE <root tag> [<!ELEMENT <name> ( <components> )<more elements>]>20DTD ElementsThe description of an element consists of its name (tag), and a parenthesized description of any nested tags.Includes order of subtags and their multiplicity.Leaves (text elements) have #PCDATA in place of nested tags.21Example: DTD<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]>A BARS object haszero or more BAR’snested within.A BAR has oneNAME and oneor more BEERsubobjects.A BEER has aNAME and aPRICE.NAME and PRICEare text.22Element DescriptionsSubtags must appear in order shown.A tag may be followed by a symbol to indicate its multiplicity.* = zero or more.+ = one or more.? = zero or one.Symbol | can connect alternative sequences of tags.23Example: Element DescriptionA name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address:<!ELEMENT NAME ((TITLE?, FIRST, LAST) | IPADDR)>24Use of DTD’s1. Set STANDALONE = “no”.2. Either:a) Include the DTD as a preamble of the XML document, orb) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.25Example (a)<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>]><BARS><BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER></BAR> <BAR> … </BARS>The DTDThe document26Example (b)Assume the BARS DTD is in file bar.dtd.<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE Bars SYSTEM “bar.dtd”><BARS><BAR><NAME>Joe’s

View Full Document
Download XML Semistructured Data Extensible Markup Language Document Type Definitions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...

Join to view XML Semistructured Data Extensible Markup Language Document Type Definitions and access 3M+ class-specific study document.

We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view XML Semistructured Data Extensible Markup Language Document Type Definitions 2 2 and access 3M+ class-specific study document.


By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?