CORNELL CS 632 - Database Systems and XML

Unformatted text preview:

Database Systems and XMLResearched PapersEfficiently Publishing Relational Data as XML DocumentsMotivationWhat is NeededSQL Based LanguageImplementation AlternativesEarly Tagging, Early StructuringSlide 9Late Tagging, Late StructuringSlide 11Slide 12Slide 13Slide 14Late Tagging, Early StructuringSlide 16Slide 17ExperiementBreakdown of ConstructionSummary of ResultsRelational Databases for Querying XML DocumentsWhy Bother?Basic IdeaTranslating XML to Relational SchemaPowerPoint PresentationTechniques to translate XML DTD to relations.Basic Inlining TechniqueBasic Inlining Technique (cont.)Tools used in creating relationsTools used in creating relationsCreating a RelationProblems with BasicShared Inlining TechniqueSlide 34Slide 35Problems with SharedHybrid Inlining TechniqueEvaluation MetricResults for N=3Slide 40Slide 41Slide 42Semi-Structured Queries to SQLSimple Path to SQLSimple Recursive Path to SQLArbitrary Path to Simple Recursive PathRelational Results to XML: Simple StructuringRelational Results to XML: Tag VariablesGroupingOther CasesOther CasesConclusionDatabase Systems and XMLDavid WuCS 632April 23, 2001Researched Papers•J. Shanmugasundaram, et al. "Efficiently Publishing Relational Data as XML Documents", VLDB Conference, September 2000. •J. Shanmugasundaram, et al. "Relational Databases for Querying XML Documents: Limitations and Opportunities," VLDB Conference, September 1999.Efficiently Publishing Relational Data as XML DocumentsMotivation•Relational database systems and XML are heavily used on the Web.•Would like some way to publish relational data as XML.What is Needed•Language to specify the conversion from relational data to XML.•Implementation to efficiently carry out the conversion.SQL Based LanguageImplementation AlternativesMain differences between relations and XML:•XML docs have tags•XML has nested structureEarly Tagging, Early Structuring•Stored Procedure Approach (outside engine)–Performs a nested-loop join by issuing queries for each nested structure in the desired XML.–High overhead due to the number of queries.–Fixed join order.Early Tagging, Early Structuring•Correlated CLOB Approach (inside engine)–Have one large query with sub-queries is run within the engine. –Must add XML constructor support to the engine.–XML fragments from the constructors are stored as CLOBs (Character Long Objects). Costly to handle.•De-Correlated CLOB Approach (inside)–Perform query de-correlation to give optimizer more flexibility.Late Tagging, Late StructuringTwo phases:1) Content creation2) Tagging and structuringLate Tagging, Late StructuringContent Creation: Redundant Relation Approach–Join all source tables–Both content and process redundancyLate Tagging, Late StructuringContent creation: Outer Union Approach–Separate the children of the same parent (e.g. one tuple should represent either account or purchaseOrder). –At the end outer union the results.–Still some data redundancy (e.g. parent info)Late Tagging, Late StructuringOuter Union Plan:Late Tagging, Late StructuringStructuring/Tagging: Hashed-based Tagger•Group by hashing•Extract tuples and tag them.Late Tagging, Early Structuring•Late Tagging, Late Structuring requires much memory for the hash table.•Fix by creating “structured content” and then tag.Late Tagging, Early StructuringStructured content: Sorted Outer Union Approach–Desired format1. Parent information comes before or with its child2. All info of a node and its descendants occur together3. Relative order of the tuples matches user-specified order–Achieve by performing a sort on ids on the result of the outer union.Late Tagging, Early Structuring•Tagging Sorted Data:ConstantSpaceTagger–Can append tags as soon as data is seen.–Only need to remember the parent ids of the last tuple seen to know when to append closing tags.Experiement•Inside Engine•Outside EngineBreakdown of ConstructionSummary of Results•Constructing inside the relational engine is more efficient.•When processing can be done in main mem, the Unsorted Outer Union approach wins.•When main mem is not enough, the Sorted Outer Union approach is best.Relational Databases for Querying XML DocumentsWhy Bother?•XML is becoming the standard for data representation in WWW.•A query engine designed to tap information from XML documents is valuable.•Relational database system is a mature technology and could be used to support XML querying.Basic IdeaStep 1: Generate a relational schema from the DTDStep 2: Parse the XML document and load the data into tuples of the relational table.Step 3: Translate the semi-structured XML queries into SQL corresponding to the relational data. Step 4: Convert the result back to XML.Translating XML to Relational SchemaMain Issues:1. DTDs complexity2. Arbitrary nesting of XML DTDs vs. two-level nature of relational schemas.3. Set-valued attributes and recursion1) Flattening transformation2) Simplification transformation of unary operations3) Grouping transformationTechniques to translate XML DTD to relations.•Basic Inlining Technique•Shared Inlining Technique•Hybrid Inlining techniqueBasic Inlining Technique•Inlining as many descendants of an element into a relation. (author:firstname,lastname,address)•Every element will have a relation corresponding to it. (firstname, lastname, and address will all have elements)Basic Inlining Technique (cont.)Complications:1) Set-valued attributes (eg. Article)•Solve by using foreign keys and other tables.2) Recursion•Solve with relational keys and relational recursive processing to retrieve the relationship.Tools used in creating relationsDTD Graph–Nodes are elements, attributes,operators–Each element appears once–Attributes and operators appear as many times as they do in the DTD–Cycles in the graph indicates recursionTools used in creating relationsElement Graphs–Generated from the DTD graph–Created by doing a DFS from an element nodeCreating a RelationGiven an element graph, the root it made intoa relation with all descendents inlined into it,except:1) Children directly below a “*” are made into separate relations;2) Each node with a backpointer edge are made into separate relations.These additional relations are named by their pathfrom the root and have parentID fields that serve asforeign keys (e.g. Article.author has the attributearticle.author.parentID)Problems with Basic•Large number


View Full Document

CORNELL CS 632 - Database Systems and XML

Download Database Systems and XML
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database Systems and XML and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Systems and XML 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?