DOC PREVIEW
Duke CPS 116 - Introduction to Database Systems

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1IntroductionCPS 116Introduction to Database Systems2Random things to do after this course3Course roadmap Relational databases Relational algebra, database design, SQL, app programming XML Data model and query languages, app programming, interplay between XML and relational databases Database internals Storage, indexing, query processing and optimization, concurrency control and recovery Topics beyond traditional databases Data warehousing and data mining Web and keyword searches Streams and continuous queries24Misc. course information Book: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom Either get the “value pack” bundled with Gradiance, or buy Gradiance separately Web site: http://www.cs.duke.edu/courses/fall07/cps116/ Course information; tentative syllabus and reference sections in GMUW; lecture slides, assignments, programming notes Blackboard: for grades only Mailing list: [email protected] Messages of general interest only No “official” recitation sessions; help sessions for assignments, project, and exams to be scheduled5Grading[90%, 100%] A- / A / A+[80%, 90%) B- / B / B+[70%, 80%) C- / C / C+[60%, 70%) D[0% 60%)F[0%, 60%)F No curves Scale may be adjusted downwards (i.e., grades upwards) if, for example, an exam is too difficult Scale will not go upwards—mistake would be mine alone if I made an exam too easy6Course load Four homework assignments (35%) Including Gradiance as well as additional written and programming problems Course project (25%) Details to be given in the third week of class Midterm and final (20% each) Open book, open notes Final is comprehensive, but emphasizes the second half of the course37Example projects Facebook+ Tyler Brock and Beth Trushkowsky, 2005 Web-based K-ville tenting management Zach Marshall, 2005 yourTunes: social music networking Nick Patrick, 2006 ERS: a content management system for capturing experimental and computational workflows Collaboration with Duke immunologists Babase tools: for a baboon life history database Collaboration with Duke and Princeton biologists8So, what is a database system?From Oxford Dictionary: Database: an organized body of related information Database system, DataBase Management System (DBMS): a software system that facilitates the ()ycreation and maintenance and use of an electronic database9What do you want from a DBMS? Keep data around (persistent) Answer queries (questions) about data Update dataExample: a traditional banking applicationExample: a traditional banking application Data: Each account belongs to a branch, has a number, an owner, a balance, …; each branch has a location, a manager, … Persistency: Balance can’t disappear after a power outage Query: What’s the balance in Homer Simpson’s account? What’s the difference in average balance between Springfield and Capitol City accounts? Modification: Homer withdraws $100; charge account with lower than $500 balance with a $5 fee410Sounds simple!1001#Springfield#Mr. Morgan... ...00987-00654#Ned Flanders#2500.0000123-00456#Homer Simpson#400.0000142-00857#Montgomery Burns#1000000000.00... ... ASCII file Accounts/branches separated by newlines Fields separated by #’s11Query1001#Springfield#Mr. Morgan... ...00987-00654#Ned Flanders#2500.0000123-00456#Homer Simpson#400.0000142-00857#Montgomery Burns#1000000000.00... ... What’s the balance in Homer Simpson’s account? A simple script Scan through the accounts file Look for the line containing “Homer Simpson” Print out the balance12Query processing tricks Tens of thousands of accounts are not Homer’s What happens when the query changes to: What’s the balance in accounts 00142-00857?513Observations Tons of tricks (not only in storage and query processing, but also in concurrency control, recovery, etc.) Different tricks may work better in different usage scenarios (example?) Same tricks get used over and over again in different applications14The birth of DBMS – 1(Figure stolen from Hans-J. Schek’s VLDB 2000 slides)15The birth of DBMS – 2(Figure stolen from Hans-J. Schek’s VLDB 2000 slides)616The birth of DBMS – 3(Figure stolen from Hans-J. Schek’s VLDB 2000 slides)17Early efforts “Factoring out” data management functionalities from applications and standardizing these functionalities is an important first step CODASYL standard (circa 1960’s))Bachman got a Turing award for this in 1973 But getting the abstraction right (the API between applications and the DBMS) is still tricky18CODASYL Query: Who have accounts with 0 balance managed by a branch in Springfield? Pseudo-code of a CODASYL application:Use index on account(balance) to get accounts with 0 balance;For each account recordFor each account record:Get the branch id of this account; Use index on branch(id) to get the branch record;If the branch record’s location field reads “Springfield”:Output the owner field of the account record. Programmer controls “navigation”: accounts → branches719What’s wrong? The best navigation strategy & the best way of organizing the data depend on data/workload characteristics With the CODASYL approach20The relational revolution (1970’s) A simple data model: data is stored in relations (tables) A declarative query language: SQLSELECT Account.ownerFROM Account, BranchWHERE Account.balance = 0cco . a a ce 0AND Branch.location = ’Springfield’AND Account.branch_id = Branch.branch_id; Programmer specifies what answers a query should return, but not how the query is executed DBMS picks the best execution strategy based on availability of indexes, data/workload characteristics, etc.) Provides physical data independence21Physical data independence Applications should not need to worry about how data is physically structured and stored Applications should work with a logical data model and declarative query language Leave the implementation details and optimization to DBMS The single most important reason behind the success of DBMS today And a Turing Award for E. F. Codd in 1981822Modern DBMS features Persistent storage of data Logical data model; declarative queries and updates → physical data independence Relational model is the dominating technology today XML is a hot


View Full Document

Duke CPS 116 - Introduction to Database Systems

Documents in this Course
Part I

Part I

8 pages

XSLT

XSLT

4 pages

XSLT

XSLT

8 pages

Part I

Part I

8 pages

XSLT

XSLT

8 pages

Load more
Download Introduction to Database Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Database Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Database Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?