1/19/20101CPS 296 1CPS 296.1Database and Programming Languages: Crossing the ChasmJun YangDuke UniversityJanuary 19, 2010† Thanks to contents/ideas borrowed from Hellerstein (http://redbook.cs.berkeley.edu/redbook3/lecs.html)and LeFevre (http://www.eecs.umich.edu/~klefevre/eecs584/Handouts/Stonebraker‐Hellerstein.pdf)Image from http://www.bccc.edu/887421129114724/blank/browse.asp?A=383&BMDRN=2000&BCOB=0&C=52764Announcementsy Sign up (by email) to lead discussions by Wednesday Student‐led discussions start next week!y Read the two papers on making DBMS object‐relational for Thursday Review for “Inclusion of New Types…” due by 9am on Thursday2 Submit your review on the Blackboard class forum by replying to my post Again, see course website for instructions on reviewing and submission, as well as tips on reading research papersOn leading discussionsy Three types of discussion: research papers, survey papers, or tutorials of systems/platformsy As leaders, you must finish reading/researching in advancey I will meet you to talk about the lecture By default, during office hours on the day of the lecture before 3the one you are leadingy Thursday lecture → meet on Tuesday; Tuesday lecture → meet on Thursday of the preceding weeky Besides providing summary, critique, and answering questions, strive to generate discussion from class A good way is to ask “facilitating” questionsOverviewy Stonebraker & Hellerstein. “What Goes Around Comes Around.” In Readings in Database Systems (aka “the red book”), 4thed., 2005 A retrospective survey of DBMS data models and query languagesLessons to learn from past experience4Lessons to learn from past experience And why XML is doomedÂWhat do you think?y Organize record types in hierarchies Each non‐root type has a single parent type Each record of a non‐root type has one parent of the parent typey Corollary: each record has a unique HSK (Hierarchical Sequence Key)y Model simplicity facilitates simple language & implementationHierarchical: IMS (1968)Part(50, Paint Brush, 3, white, 80, $5)Part(29, Power Drill, 10, red, 30, $60)5Part(pno, pname, psize, pcolor, qty, price)Supplier(15, General Supply, Boston, MA)Supplier(sno, sname, scity, sstate)Part(27, Power Saw, 7, silver, 100, $20)…Supplier(32, M&P Hardware, Durham, NC)Part(50, Paint Brush, 3, white, 40, $4)Part(27, Power Saw, 7, silver, 90, $19)……DB schema DB instanceIssues with modelPart(pno, pname, psize, pcolor, qty, price)Supplier(sno, sname, scity, sstate)DB schema #1Supplier(sno, sname, scity, sstate, qty, price)Part(pno, pname, psize, pcolor)DB schema #26y Information is repeated Schema #1: parts info repeated across suppliers Schema #2: supplier info repeated across partsy Existence depends on parent data Schema #1: what if nobody supplies a part? Schema #2: what if a supplier doesn’t supply anything?1/19/20102Issues with language (DL/1)y Conceptually, records are laid out in HSK order: depth‐first, left‐to‐right → get ingrained in language constructsy Record‐at‐a‐time languagey Programmer writes an algorithm for solving each query, e.g.: get unique Supplier with sno = 15until failure doget next within parent with pcolor= red7get next within parent with pcolor= redWhy is this bad?y Different underlying storage formats (sequential/B‐tree/hash)→ different restrictions on commands Heavy coupling between storage and client appsy Different data characteristics → different optimal algorithms Optimization is performed by programmer and DB designery Not declarative, poor physical data independenceHacking the modelSupply(pno, qty, price)Supplier(sno, sname, scity, sstate)Underlying physical schemaPart(pno, pname, psize, pcolor)“Logical” schemaSupply(pno, qty, price)Supplier(sno, sname, scity, sstate)Part(pno, pname, psize, pcolor)8y Store data in two physical databases No redundancyy IMS grafts together the two to present a logical view to progr ammers But lots of restrictions and complexity No complete transparencyLessonsy Lesson 1: physical/logical data independence is good ∆data À ∆app Changes to physical/logical representations of data should not require expensive changes to appsy Lesson 2: tree‐structured data models are restrictive9 Force navigation one way Need extensions/hacks to be generaly Lesson 3: logical reorganization of tree‐structured data is hardy Lesson 4: record‐at‐a‐time interface forces programmer to do manual query optimizationGraph/network: CODASYL (1969)y A directed graph where nodes are records types and arcs are “sets” (relationships) A type can have multiple owners (via incoming arcs) Owner‐child relationships are 1‐to‐manyPart(pno, pname, psize, pcolor)Is_supplied_by10Supplier(sno, sname, scity, sstate)Supply(qty, price)SuppliesDB schema(15, General Supply, Boston, MA) (100, $20)DB instance(29, Power Drill, 10, red)(27, Power Saw, 7, silver) (50, Paint Brush, 3, white)(32, M&P Hardware, Durham, NC)(30, $60) (80, $5)(90, $19) (40, $4)SupplierPartSupplyCODASYL programmingy Bachmann (Turing Award 1973): program by navigation get unique Supplier with sno = 15until failure doget next Supply in Suppliesget owner Part through Is_supplied_bycheck pcolor = red11py Alternatively, start navigating from PartsImprovements & limitationsy Model is powerful itself to avoid redundancy and dependency on owners’ existencey Arcs are just binary, though n‐ary relationships can be 12simulatedy Language is still record‐at‐a‐timey Programming over graphs is harder than over treesy Less logical data independence than IMSy Still no physical data independence1/19/20103Lessonsy Lesson 5: graphs are more flexible than trees but more complexy Lesson 6: loading and recovering graphs is harder than trees13Relational (1970)y Started with the 1970 proposal by Codd (Turing Award 1981) Motivated by heavy maintenance required with IMS applicationsy Data stored in flat tables—no nestingy High‐level, set‐oriented languagey Underlying physical stor age is completely up to vendors14y Example schema and querySupplier(sno, sname, scity, sstate)Part(pno, pname, psize, pcolor)Supplies(sno, pno, qty, price)SELECT * FROM Supplier, Supplies, PartWHERE Supplier.sno = 15 AND Part.pcolor = 'red'AND Supplier.sno = Supplies.sno AND Supplies.pno = Part.pno;The “Great Debate”y Ideological battle throughout the 1970’s Codd et
View Full Document