DOC PREVIEW
Duke CPS 296.1 - Lecture

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1/19/20101CPS 296 1CPS 296.1Database and Programming Languages: Crossing the ChasmJun YangDuke UniversityJanuary 19, 2010† Thanks to contents/ideas borrowed from Hellerstein (http://redbook.cs.berkeley.edu/redbook3/lecs.html)and LeFevre (http://www.eecs.umich.edu/~klefevre/eecs584/Handouts/Stonebraker‐Hellerstein.pdf)Image from http://www.bccc.edu/887421129114724/blank/browse.asp?A=383&BMDRN=2000&BCOB=0&C=52764Announcementsy Sign up (by email) to lead discussions by Wednesday Student‐led discussions start next week!y Read the two papers on making DBMS object‐relational for Thursday Review for “Inclusion of New Types…” due by 9am on Thursday2 Submit your review on the Blackboard class forum by replying to my post Again, see course website for instructions on reviewing and submission, as well as tips on reading research papersOn leading discussionsy Three types of discussion: research papers, survey papers, or tutorials of systems/platformsy As leaders, you must finish reading/researching in advancey I will meet you to talk about the lecture By default, during office hours on the day of the lecture before 3the one you are leadingy Thursday lecture → meet on Tuesday; Tuesday lecture → meet on Thursday of the preceding weeky Besides providing summary, critique, and answering questions, strive to generate discussion from class A good way is to ask “facilitating” questionsOverviewy Stonebraker & Hellerstein. “What Goes Around Comes Around.” In Readings in Database Systems (aka “the red book”), 4thed., 2005 A retrospective survey of DBMS data models and query languagesLessons to learn from past experience4Lessons to learn from past experience And why XML is doomedÂWhat do you think?y Organize record types in hierarchies Each non‐root type has a single parent type Each record of a non‐root type has one parent of the parent typey Corollary: each record has a unique HSK (Hierarchical Sequence Key)y Model simplicity facilitates simple language & implementationHierarchical: IMS (1968)Part(50, Paint Brush, 3, white, 80, $5)Part(29, Power Drill, 10, red, 30, $60)5Part(pno, pname, psize, pcolor, qty, price)Supplier(15, General Supply, Boston, MA)Supplier(sno, sname, scity, sstate)Part(27, Power Saw, 7, silver, 100, $20)…Supplier(32, M&P Hardware, Durham, NC)Part(50, Paint Brush, 3, white, 40, $4)Part(27, Power Saw, 7, silver, 90, $19)……DB schema DB instanceIssues with modelPart(pno, pname, psize, pcolor, qty, price)Supplier(sno, sname, scity, sstate)DB schema #1Supplier(sno, sname, scity, sstate, qty, price)Part(pno, pname, psize, pcolor)DB schema #26y Information is repeated Schema #1: parts info repeated across suppliers Schema #2: supplier info repeated across partsy Existence depends on parent data Schema #1: what if nobody supplies a part? Schema #2: what if a supplier doesn’t supply anything?1/19/20102Issues with language (DL/1)y Conceptually, records are laid out in HSK order: depth‐first, left‐to‐right → get ingrained in language constructsy Record‐at‐a‐time languagey Programmer writes an algorithm for solving each query, e.g.: get unique Supplier with sno = 15until failure doget next within parent with pcolor= red7get next within parent with pcolor= redWhy is this bad?y Different underlying storage formats (sequential/B‐tree/hash)→ different restrictions on commands Heavy coupling between storage and client appsy Different data characteristics → different optimal algorithms Optimization is performed by programmer and DB designery Not declarative, poor physical data independenceHacking the modelSupply(pno, qty, price)Supplier(sno, sname, scity, sstate)Underlying physical schemaPart(pno, pname, psize, pcolor)“Logical” schemaSupply(pno, qty, price)Supplier(sno, sname, scity, sstate)Part(pno, pname, psize, pcolor)8y Store data in two physical databases No redundancyy IMS grafts together the two to present a logical view to progr ammers But lots of restrictions and complexity No complete transparencyLessonsy Lesson 1: physical/logical data independence is good ∆data À ∆app Changes to physical/logical representations of data should not require expensive changes to appsy Lesson 2: tree‐structured data models are restrictive9 Force navigation one way Need extensions/hacks to be generaly Lesson 3: logical reorganization of tree‐structured data is hardy Lesson 4: record‐at‐a‐time interface forces programmer to do manual query optimizationGraph/network: CODASYL (1969)y A directed graph where nodes are records types and arcs are “sets” (relationships) A type can have multiple owners (via incoming arcs) Owner‐child relationships are 1‐to‐manyPart(pno, pname, psize, pcolor)Is_supplied_by10Supplier(sno, sname, scity, sstate)Supply(qty, price)SuppliesDB schema(15, General Supply, Boston, MA) (100, $20)DB instance(29, Power Drill, 10, red)(27, Power Saw, 7, silver) (50, Paint Brush, 3, white)(32, M&P Hardware, Durham, NC)(30, $60) (80, $5)(90, $19) (40, $4)SupplierPartSupplyCODASYL programmingy Bachmann (Turing Award 1973): program by navigation get unique Supplier with sno = 15until failure doget next Supply in Suppliesget owner Part through Is_supplied_bycheck pcolor = red11py Alternatively, start navigating from PartsImprovements & limitationsy Model is powerful itself to avoid redundancy and dependency on owners’ existencey Arcs are just binary, though n‐ary relationships can be 12simulatedy Language is still record‐at‐a‐timey Programming over graphs is harder than over treesy Less logical data independence than IMSy Still no physical data independence1/19/20103Lessonsy Lesson 5: graphs are more flexible than trees but more complexy Lesson 6: loading and recovering graphs is harder than trees13Relational (1970)y Started with the 1970 proposal by Codd (Turing Award 1981) Motivated by heavy maintenance required with IMS applicationsy Data stored in flat tables—no nestingy High‐level, set‐oriented languagey Underlying physical stor age is completely up to vendors14y Example schema and querySupplier(sno, sname, scity, sstate)Part(pno, pname, psize, pcolor)Supplies(sno, pno, qty, price)SELECT * FROM Supplier, Supplies, PartWHERE Supplier.sno = 15 AND Part.pcolor = 'red'AND Supplier.sno = Supplies.sno AND Supplies.pno = Part.pno;The “Great Debate”y Ideological battle throughout the 1970’s Codd et


View Full Document

Duke CPS 296.1 - Lecture

Documents in this Course
Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Lecture

Lecture

5 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?