1XML-Relational MappingCPS 116Introduction to Database Systems2Announcements (October 28) Homework #2 has been graded Homework #3 out last Thursday; due next Thursday Project milestone #2 due in 2 weeksj Check your email for feedback3Approaches to XML processing Text files (!) Specialized XML DBMS Lore (Stanford), Strudel (AT&T), Timber (Michigan), MonetDB/XQuery (CWI, Netherlands), Tamino (Software AG), eXist, Sedna, Apache XIndice, XML:DB API initiative… Still a long way to go Object-oriented DBMS ObjectStore, ozone, … Not as mature as relational DBMS Relational (and object-relational) DBMS Middleware and/or object-relational extensions24Mapping XML to relational Store XML in a CLOB (Character Large OBject) column Simple, compact Full-text indexing can help (often provided by DBMS vendors as object-relational “extensions”) Alternatives? Schema-oblivious mapping:well-formed XML → generic relational schema• Node/edge-based mapping for graphs• Interval-based mapping for trees• Path-based mapping for trees Schema-aware mapping:valid XML → special relational schema based on DTD5Node/edge-based: schema Element(eid, tag) Attribute(eid, attrName, attrValue) Attribute order does not matter ElementChild(eid, pos, child)ifi h d i f hildKey:Keys: pos specifies the ordering of children child references either Element(eid) or Text(tid) Text(tid, value) tid cannot be the same as any eid) Need to “invent” lots of id’s) Need indexes for efficiency, e.g., Element(tag), Text(value)6Node/edge-based: example<bibliography><book ISBN=”ISBN-10” price=”80.00”><title>Foundations of Databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author><publisher>Addison Wesley</publisher><year>1995</year></book>…</bibliography>Element ElementChildeid pos childe0 1 e1e1 1 e2e1 2 e3e1 3 e4e1 4 e5eid tage0 bibliographye1 booke2 titlee3 authore4 authorAttributeTe x te1 5 e6e1 6 e7e2 1 t0e3 1 t1e4 1 t2e5 1 t3e6 1 t4e7 1 t5e5 authore6 publishere7 yeartid valuet0 Foundations of Databasest1 Abiteboult2 Hullt3 Vianut4 Addison Wesleyt5 1995eid attrName attrValuee1 ISBN ISBN-10e1 price 8037Node/edge-based: simple paths //title SELECT eid FROM Element WHERE tag = ‘title’;//section/title SELECT e2.eidFROM Element e1, ElementChild c, Element e2WHERE e1.tag = ‘section’AND e2.tag = ‘title’AND e1.eid = c.eidAND c.child = e2.eid;)Path expression becomes8Node/edge-based: more complex paths //bibliography/book[author=“Abiteboul”]/@price SELECT a.attrValueFROM Element e1, ElementChild c1,Element e2, Attribute aWHERE e1.tag = ‘bibliography’AND e1.eid = c1.eid AND c1.child = e2.eidAND e2 tag =‘book’AND e2.tag = bookAND e2.eid = a.eidAND a.attrName = ‘price’;AND EXISTS (SELECT * FROM ElementChild c2,Element e3, ElementChild c3, Text tWHERE e2.eid = c2.eid AND c2.child = e3.eidAND e3.tag = ‘author’AND e2.eid = c3.eid AND c3.child = t.tidAND t.value = ‘Abiteboul’)9Node/edge-based: descendent-or-self //book//title Requires SQL3 recursion WITH ReachableFromBook(id) AS((SELECT eid FROM Element WHERE tag = ‘book’)UNION ALL((SELECT c.childFROM ReachableFromBook r, ElementChild cWHERE r.eid = c.eid))SELECT eidFROM ElementWHERE eid IN (SELECT * FROM ReachableFromBook)AND tag = ‘title’;410Interval-based: schema Element(left, right, level, tag) left is the start position of the element right is the end position of the element level is the nesting depth of the element (strictly speaking, unnecessary) Key is Text(left, right, level, value) Key is Attribute(left, attrName, attrValue) Key is11Interval-based: example1<bibliography>2<book ISBN=”ISBN-10” price=”80.00”>3<title>4Foundations of Databases</title>56<author>7Abiteboul</author>89<author>10Hull</author>1112<author>13Vianu</author>1415<publisher>16Addison Wesley</publisher>1718<year>191995</year>20</book>21…</bibliography>999bibliographybook1,999,12,21,2) Where did ElementChild go? E1 is the parent of E2 iff:title author author author publisher year3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,312Interval-based: queries //section/title SELECT e2.leftFROM Element e1, Element e2WHERE e1.tag = ‘section’ AND e2.tag = ‘title’AND e1.left < e2.left AND e2.right < e1.rightAND e1.level = e2.level-1;;)Path expression becomes “containment” joins!• Number of joins is proportional to path expression length //book//title SELECT e2.leftFROM Element e1, Element e2WHERE e1.tag = ‘book’ AND e2.tag = ‘title’AND e1.left < e2.left AND e2.right < e1.right;)No recursion!513Summary of interval-based mapping Path expression steps become containment joins No recursion needed for descendent-or-self Comprehensive XQuery-SQL translation is possible14A path-based mappingLabel-path encoding Element(pathid, left, right, …), Path(pathid, path), … path is a label path starting from the root Why are left and right still needed?ElementPathpathid left right ...1 1 999 ...2 2 21 ...3 3 5 ...4 6 8 ...4 9 11 ...4 12 14 ...... ... ... ...pathid path1 /bibliography2 /bibliography/book3 /bibliography/book/title4 /bibliography/book/author... ...15Label-path encoding: queries Simple path expressions with no conditions//book//title Perform string matching on Path Join qualified pathid’s with Element//book[publisher=‘Prentice Hall’]/title//book[publisher Prentice Hall ]/title616Another path-based mappingDewey-order encoding Each component of the id represents the order of the child within its parent Unlike label-path, this encoding is “lossless”bibliographybooktitle author author author publisher year11.11.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6Element(dewey_pid, tag)Te x t (dewey_pid, value)Attribute(dewey_pid, attrName, attrValue)17Dewey-order encoding: queries Examples://title//section/title//book//title//b k[ bli h ‘P ti H ll’]/titl//book[publisher=‘Prentice Hall’]/title Works similarly as interval-based mapping• Except parent/child and ancestor/descendant relationship are checked by prefix matching Serves a different purpose from label-path encoding Any advantage over interval-based mapping?18Schema-aware mapping Idea: use DTD to design a better schema Basic approach: elements of the same type go into one table Tag name → table name Attributes → columns•If one
View Full Document