DOC PREVIEW
Duke CPS 116 - Introduction

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1IntroductionCPS 116Introduction to Database Systems2A few words about myself (and databases) Have been doing (and enjoying) research in databases ever since grad school (1995) But didn’t take any database course as an undergrad Just didn’t appreciate it)No h o ld o nt to t ke 116?)Now, why would you want to take 116?) It’s not really about databases per se—it’s about principlesof data management E.g., Google won’t care whether you know SQL, but… They still ask you “big data” questions in interviews Brin was a grad student in the Stanford Database GroupTrend: Moore’s Law reversed Moore’s Law: Processing power doubles every 18 months Amount of data doubles every 9 months Disk sales (# of bits) doubles every 9 months Parkinson’s Law: Data expands to fill the space available for storage)M’Ld3)Moore s Law reversed:Time to process all data doubles every 18 months! Does your attention span double every 18 months? No, so we need smarter data management techniques4Misc. course information Course website: http://www.cs.duke.edu/courses/fall08/cps116/ Course information; tentative syllabus and reference sections in the book; lecture slides, assignments, programming notes Book: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom. 2ndEd. Gradiance (“Online Accelerated Learning”): see course website for purchase information Blackboard: for grades only Mailing list: [email protected] Messages of general interest only No “official” recitation sessions; help sessions for assignments, project, and exams to be scheduled5Grading[90%, 100%] A- / A / A+[80%, 90%) B- / B / B+[70%, 80%) C- / C / C+[60%, 70%) D[0% 60%)F[0%, 60%)F No curves Scale may be adjusted downwards (i.e., grades upwards) if, for example, an exam is too difficult Scale will not go upwards—mistake would be mine alone if I made an exam too easy6Course load Four homework assignments (35%) Including Gradiance as well as additional written and programming problems Course project (25%) Details to be given in the third week of class Midterm and final (20% each) Open book, open notes Final is comprehensive, but emphasizes the second half of the course27Example projects SensorDB: a system for managing, cleansing, and visualizing sensor data collected from the Duke Forest Ashley DeMass, Jonathan Jou, Jonathan Odom, 2007 SuperDatabase: GUI for creating schema with rich datatypes, as well as editing and querying such dataAd E i M R Li C iW dD idAndy Ewing, MacRae Linton, Congyi Wu, and David Zhang, 2007 yourTunes: social music networking Nick Patrick, 2006 Facebook+ Tyler Brock and Beth Trushkowsky, 2005 Web-based K-ville tenting management Zach Marshall, 20058So, what is a database system?From Oxford Dictionary: Database: an organized body of related information Database system, DataBase Management System (DBMS): a software system that facilitates the ()ycreation and maintenance and use of an electronic database9What do you want from a DBMS? Keep data around (persistent) Answer queries (questions) about data Update dataExample: a traditional banking applicationExample: a traditional banking application Data: Each account belongs to a branch, has a number, an owner, a balance, …; each branch has a location, a manager, … Persistency: Balance can’t disappear after a power outage Query: What’s the balance in Homer Simpson’s account? What’s the difference in average balance between Springfield and Capitol City accounts? Modification: Homer withdraws $100; charge account with lower than $500 balance with a $5 fee10Sounds simple!1001#Springfield#Mr. Morgan... ...00987-00654#Ned Flanders#2500.0000123-00456#Homer Simpson#400.0000142-00857#Montgomery Burns#1000000000.00... ... ASCII file Accounts/branches separated by newlines Fields separated by #’s11Query1001#Springfield#Mr. Morgan... ...00987-00654#Ned Flanders#2500.0000123-00456#Homer Simpson#400.0000142-00857#Montgomery Burns#1000000000.00... ... What’s the balance in Homer Simpson’s account? A simple script Scan through the accounts file Look for the line containing “Homer Simpson” Print out the balance12Query processing tricks Tens of thousands of accounts are not Homer’s)Cluster accounts by owner’s initial: those owned by “A...” go into file A; those owned by “B...” go into file B; etc. → decide which file to search using the initial)Keep accounts sorted by owner name → binary search?)Hash accounts using owner name → compute file offset directly)Index accounts by owner name: index entries have the formh owner_name, file_offset i → search index to get file offset)And the list goes on… What happens when the query changes to: What’s the balance in account 00142-00857?313Observations Tons of tricks (not only in storage and query processing, but also in concurrency control, recovery, etc.) Different tricks may work better in different usage scenarios (example?) Same tricks get used over and over again in different applications14The birth of DBMS – 1(Figure from Hans-J. Schek’s VLDB 2000 slides)15The birth of DBMS – 2(Figure from Hans-J. Schek’s VLDB 2000 slides)16The birth of DBMS – 3(Figure from Hans-J. Schek’s VLDB 2000 slides)17Early efforts “Factoring out” data management functionalities from applications and standardizing these functionalities is an important first step CODASYL standard (circa 1960’s))Bachman got a Turing award for this in 1973 But getting the abstraction right (the API between applications and the DBMS) is still tricky18CODASYL Query: Who have accounts with 0 balance managed by a branch in Springfield? Pseudo-code of a CODASYL application:Use index on account(balance) to get accounts with 0 balance;For each account recordFor each account record:Get the branch id of this account; Use index on branch(id) to get the branch record;If the branch record’s location field reads “Springfield”:Output the owner field of the account record. Programmer controls “navigation”: accounts → branches How about branches → accounts?419What’s wrong? The best navigation strategy & the best way of organizing the data depend on data/workload characteristics With the CODASYL approach To write correct code, application programmers need


View Full Document

Duke CPS 116 - Introduction

Documents in this Course
Part I

Part I

8 pages

XSLT

XSLT

4 pages

XSLT

XSLT

8 pages

Part I

Part I

8 pages

XSLT

XSLT

8 pages

Load more
Download Introduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?