Lecture 1 Overview of CSCI 585 Prof Shahram Ghandeharizadeh Director of USC Database Lab http dblab usc edu Computer Science Department University of Southern California Logistics Collection of technical papers Pre req for the course CSCI 485 Introduction to File and Database Management and Knowledge C programming language Extensive use of Blackboard for homework and project submissions Make sure to have access to ACM IEEE Springer digital libraries URLs work from USC machines http den usc edu Power point of presentations also available from http dblab usc edu Pre Req 585 assumes you know the following Transactions and their ACID properties Concurrency control protocols such as locking and time stamp based protocols Crash recovery techniques such as logging and shadow paging Physical characteristics of magnetic disks SQL Relational algebra operators ER data modeling Alternative normal forms Visit http dblab usc edu csci485 for an overview of this material Instructor Details Dr Shahram Ghandeharizadeh Office SAL 208 E mail shahram usc edu Phone 213 740 4781 Office Hours Tuesday 12 30 to 2 pm Thursday 4 30 to 5 30 pm Class URL http dblab usc edu csci585 TA Shahin Shayandeh Office SAL 200C E mail shayande usc edu Office Hours Mondays 3 30 to 5 pm Thursday 12 30 to 2 pm Outline Motivation for DBMS An outline for the course material Grading Assignments and projects Database Management Systems DBMS Used almost on a daily basis for either individual or business use Relational database vendors were one of the fastest growing sectors during the COM boom DATABASE DBMS Database An integrated collection of data usually stored on secondary storage typically describing the activities of one or more related organizations Database management system DBMS A collection of software programs designed to assist in maintaining and utilizing large collections of data BEFORE DBMS User 1 User 2 Application programs Application programs Data Data AFTER DBMS User 1 Application programs DBMS User 2 Application programs Data managed by DBMS WHY A DBMS 1 2 3 4 5 6 7 8 Reduced application development time Data independence Application programs not dependent on data representation and storage details Data sharing data is better utilized discovered and reused redundancy of data is minimized Data integrity and consistency one may enforce consistency constraints on data e g number of seats sold number of seats on the plane 1 1 Centralized control DBA tunes the database to balance user s needs Security mechanisms to prevent unauthorized access These mechanisms are based on content instead of file oriented approach Concurrency control avoids undesirable race conditions that arise with simultaneous access updates to data Crash recovery ensures the integrity of data in the presence of failures DBMS ARCHITECTURE User 1 DBMS User n DB Physical data Conceptua Conceptua llschema schema An Emerging Phenomena User 1 Application programs DBMS User 2 Application programs Data managed by DBMS Example F Chang et al Bigtable A Distributed Storage System for Structured Data In OSDI 2006 Last paragraph of the paper Finally we have found that there are significant advantages to building our own storage solution at Google We have gotten substantial amount of flexibility from designing our own data model for Bigtable In addition our control over Bigtable s implementation and the other Google infrastructure upon which Bigtable depends means that we can remove bottlenecks and inefficiencies as they arise WHAT HAS CHANGED 1 2 Relational database technology is now more than a quarter of century old While concepts such as concurrency control are extremely valuable the performance loss attributed to their use is not justified for some non banking applications E g A social networking site is not a banking application 3 RDBMS vendors increased functionality for their own niche increasing complexity Each application used a decreasing fraction of the provided features A deployment requires a specialist trained in database administration for maintainence 4 Availability of data is paramount Cost of downtime is estimated at thousands of dollars per minute 5 6 SQL is too general and cumbersome to use with some applications Storage has become larger and more economical 10 cents per Gigabyte of magnetic disk storage Flash as a new layer in the storage hierarchy DRAM Flash Disk 7 to 8 dollars per Gigabyte of DRAM A bank s data TPC benchmark becomes main memory resident Cross roads Since 1998 database researchers have been aware of the limitations More modular architecture based on simple component based building blocks One architecture will not satisfy all applications 585 Syllabus Storage and Storage Management M Seltzer Beyond Relational Databases Communications of the ACM July 2008 Vol 51 No 7 D A Patterson G Gibson and R H Katz A Case for Redundant Arrays of Inexpensive Disks RAID ACM SIGMOD 1988 G Graefe The five minute rule twenty years later and how flash memory changes the rules Proceedings of the Third International Workshop on Data Management on New Hardware DaMoN 2007 Flash as a new storage medium 2 3 weeks Start homework 1 using Berkeley DB 585 Syllabus Cont Parallel DBMS D DeWitt et al The Gamma Database Machine Project IEEE Transactions on Knowledge and Data Engineering Vol 2 1990 F Chang et al Bigtable A Distributed Storage System for Structured Data In OSDI 2006 J Dean and S Ghemawat MapReduce Simplified Data Processing on Large Clusters In Communications of the ACM Vol 51 No 1 2008 Data intensive applications can be parallelized effectively 2 Weeks 585 Syllabus Cont Spatial Index Structures A Guttman R Trees A Dynamic Index Structure for Spatial Searching In ACM SIGMOD 1984 P E O Neil and D Quass Improved Query Performance with Variant Indexes In ACM SIGMOD 1997 No substitute for smart data indexing techniques Brute force approaches are not acceptable 2 Weeks Initiate your project to build a relational query processing software using Berkeley DB 585 Syllabus Cont Query optimizations P G Selinger M M Astrahan D D Chamberlin R A Lorie T G Price Access Path Selection in Relational Database Management System In ACM SIGMOD 1979 S Chaudhuri An Overview of Query Optimization in Relational Systems PODS 1998 Techniques to select index structures Focus is on your project 2 Weeks 585 Syllabus Cont Decision Support R Agrawal and R Srikant Fast Algorithms for Mining Association Rules in Large Databases In VLDB 1994 J Gray et al Data Cube A
View Full Document
Unlocking...