1XML-Relational MappingCPS 116Introduction to Database Systems2Announcements (November 1) Homework #3 due next Tuesday Yi will conduct a help session next Monday Time/location will be announced by this weekend Project milestone #2 due in one week3Approaches to XML processing Text files (!) Specialized XML DBMS Lore (Stanford), Strudel (AT&T), Timber (Michigan), MonetDB/XQuery (CWI, Netherlands), Tamino (Software AG), eXist, Sedna, Apache XIndice, XML:DB API initiative… Still a long way to go Object-oriented DBMS ObjectStore, ozone, … Not as mature as relational DBMS Relational (and object-relational) DBMS Middleware and/or object-relational extensions4Mapping XML to relational Store XML in a CLOB (Character Large OBject) column Simple, compact Full-text indexing can help (often provided by DBMS vendors as object-relational “extensions”) Poor integration with relational query processingUd iUpdates are expensive Alternatives? Schema-oblivious mapping:well-formed XML → generic relational schema• Node/edge-based mapping for graphs• Interval-based mapping for trees• Path-based mapping for trees Schema-aware mapping:valid XML → special relational schema based on DTD5Node/edge-based: schema Element(eid, tag) Attribute(eid, attrName, attrValue) Attribute order does not matter ElementChild(eid, pos, child)ifi h d i f hildKey: (eid, attrName)Keys: (eid, pos), (child) pos specifies the ordering of children child references either Element(eid) or Text(tid) Text(tid, value) tid cannot be the same as any eid) Need to “invent” lots of id’s) Need indexes for efficiency, e.g., Element(tag), Text(value)6Node/edge-based: example<bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author><publisher>Addison Wesley</publisher><year>1995</year></book>…</bibliography>Element ElementChildeid pos childe0 1 e1e1 1 e2e1 2 e3e1 3 e4e1 4 e5eid tage0 bibliographye1 booke2 titlee3 authore4 authorAttributeTe x te1 5 e6e1 6 e7e2 1 t0e3 1 t1e4 1 t2e5 1 t3e6 1 t4e7 1 t5e5 authore6 publishere7 yeartid valuet0 Foundations of Databasest1 Abiteboult2 Hullt3 Vianut4 Addison Wesleyt5 1995eid attrName attrValuee1 ISBN ISBN-10e1 price 8027Node/edge-based: simple paths //title SELECT eid FROM Element WHERE tag = ‘title’;//section/title SELECT e2.eidFROM Element e1, ElementChild c, Element e2WHERE e1.tag = ‘section’AND e2.tag = ‘title’AND e1.eid = c.eidAND c.child = e2.eid;)Path expression becomes joins! Number of joins is proportional to the length of the path expression8Node/edge-based: more complex paths //bibliography/book[author=“Abiteboul”]/@price SELECT a.attrValueFROM Element e1, ElementChild c1,Element e2, Attribute aWHERE e1.tag = ‘bibliography’AND e1.eid = c1.eid AND c1.child = e2.eidAND e2 tag =‘book’AND e2.tag = bookAND e2.eid = a.eidAND a.attrName = ‘price’;AND EXISTS (SELECT * FROM ElementChild c2,Element e3, ElementChild c3, Text tWHERE e2.eid = c2.eid AND c2.child = e3.eidAND e3.tag = ‘author’AND e2.eid = c3.eid AND c3.child = t.tidAND t.value = ‘Abiteboul’)9Node/edge-based: descendent-or-self //book//title Requires SQL3 recursion WITH ReachableFromBook(id) AS((SELECT eid FROM Element WHERE tag = ‘book’)UNION ALL((SELECT c.childFROM ReachableFromBook r, ElementChild cWHERE r.eid = c.eid))SELECT eidFROM ElementWHERE eid IN (SELECT * FROM ReachableFromBook)AND tag = ‘title’;10Interval-based: schema Element(left, right, level, tag) left is the start position of the element right is the end position of the element level is the nesting depth of the element (strictly speaking, unnecessary) Key is left Text(left, right, level, value) Key is left Attribute(left, attrName, attrValue) Key is (left, attrName)11Interval-based: example1<bibliography>2<book ISBN=”ISBN-10” price=”80.00”>3<title>4Foundations of Databases</title>56<author>7Abiteboul</author>89<author>10Hull</author>1112<author>13Vianu</author>1415<publisher>16Addison Wesley</publisher>1718<year>191995</year>20</book>21…</bibliography>999bibliographybook1,999,12,21,2) Where did ElementChild go? E1 is the parent of E2 iff:[E1.left, E1.right] ⊃ [E2.left, E2.right], andE1.level = E2.level –1title author author author publisher year3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,312Interval-based: queries //section/title SELECT e2.leftFROM Element e1, Element e2WHERE e1.tag = ‘section’ AND e2.tag = ‘title’AND e1.left < e2.left AND e2.right < e1.rightAND e1.level = e2.level-1;;)Path expression becomes “containment” joins!• Number of joins is proportional to path expression length //book//title SELECT e2.leftFROM Element e1, Element e2WHERE e1.tag = ‘book’ AND e2.tag = ‘title’AND e1.left < e2.left AND e2.right < e1.right;)No recursion!313Summary of interval-based mapping Path expression steps become containment joins No recursion needed for descendent-or-self Comprehensive XQuery-SQL translation is possible14A path-based mappingLabel-path encoding Element(pathid, left, right, …), Path(pathid, path), … path is a label path starting from the root Why are left and right still needed?To preserve structureElementPathppathid left right ...1 1 999 ...2 2 21 ...3 3 5 ...4 6 8 ...4 9 11 ...4 12 14 ...... ... ... ...pathid path1 /bibliography2 /bibliography/book3 /bibliography/book/title4 /bibliography/book/author... ...15Label-path encoding: queries Simple path expressions with no conditions//book//title Perform string matching on Path Join qualified pathid’s with Element//book[publisher=‘Prentice Hall’]/title//book[publisher Prentice Hall ]/title Evaluate //book/title Evaluate //book/publisher[text()=‘Prentice Hall’] How to ensure title and publisher belong to the same book?)Path expression with attached conditions needs to be broken down, processed separately, and joined back16Another path-based mappingDewey-order encoding Each component of the id represents the order of the child within its parent Unlike label-path, this encoding is “lossless”bibliographybooktitle author author author publisher year11.11.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6Element(dewey_pid, tag)Te x t (dewey_pid, value)Attribute(dewey_pid, attrName, attrValue)17Dewey-order
View Full Document