Unformatted text preview:

Administrivia CS186 Class Wrap Up R G Chapters 1 28 Lecture 28 Final Exam Friday 12 12 5pm 8pm Room 4 LeConte You may have 2 pages of notes both sides The exam is cumulative Final Exam Review Tuesday 12 9 1pm 3pm 306 Soda Hall Homework 5 Due Monday 12 8 News cont Top Transaction Processing DBs News Winter Consulting s 2003 survey of Largest DBs http mxtest wintercorp com vldb 2003 TopTen Survey TopTenWinners asp The largest single database is 29 232 GB That s a single database at France Telecom Many companies have TBs of data but usually spread out among multiple databases file systems etc In 2001 largest DB was 10TB News cont Top Decision Support DBs 1 France Telecom 29 2 terabytes 2 AT T 26 3 terabytes 3 SBC 24 8 terabytes 4 Anonymous 16 2 terabytes 5 Amazon com 13 0 terabytes 6 Kmart 12 6 terabytes 7 Claria Corp 12 1 terabytes 8 HIRA 11 9 terabytes 9 FedEx Services 10 0 terabytes 10 Vodafone 9 1 terabytes 1 Land Registry 18 3 terabytes 2 BT plc 11 7 terabytes 3 United Parcel Service 9 0 terabytes 4 Caica Econ mica Federal 6 9 terabytes 5 US Patent and Trademark Office 5 4 terabytes 6 Verizon Communications 5 3 terabytes 7 Bureau of Customs and Border Protection 4 1 TB 8 Hewlett Packard 3 2 terabytes 9 Boeing 3 1 terabytes 10 CheckFree Corp 2 9 terabytes Lessons from the survey and this course DBs are a huge part of business today Companies have lots of data imagine tuning UPSs database with 41 billion rows DBs are based on theory of data modelling with lots of practical data management on top nice mix of theoretical and practical In most jobs useful to understand how DBs work 1 First what topics did we not cover Today In the book Chapter 21 Security and Authorization Chapter 22 Parallel and Distributed DBs Chapter 23 Object Database Systems Chapter 24 Deductive Databases Chapter 25 Data Warehousing and Decision Support Chapter 27 XML Data Chapter 28 Spatial Data Management Not in the book Federated Databases What topics did we cover What topics did we not cover And what topics did we cover 1 Overview of Database Systems Chapters 1 20 and 26 What is a Database A Database System Database and Data Model basics 1 3 4 16 Query Languages 4 5 4 16 Integrating DBs with other systems 6 7 2 8 Storing data in memory and disk 8 9 2 8 Tree and Hash Indexes 10 11 2 8 Join Sort cost Query Optimization 12 15 3 12 Concurrency Control Recovery 16 18 5 20 Normal Forms Database Design 19 2 8 Database Tuning 20 1 4 Data Mining 26 1 4 What are the useful characteristics of DBs When should you use a database When is the file system better 2 Database Design ER Models 3 The Relational Model Databases support many levels of abstraction possible to design at abstract level in one form store data in very different form The E R Model Useful for design easier for human to understand Specify entities attributes relationships Possible to convert ER schemas to Relational Schemas Most common data model for databases Based on tables rows and columns Tables connected using key foreign keys Integrity Constraints Domain constraints for field values Referential integrity for keys foreign keys Other constaints specified by real world e g 0 0 gpa 4 0 2 4 Relational Algebra and Calculus 5 SQL Queries Constraints Triggers Relational algebra Operators that act on sets of tuples etc procedural Relational Calculus Uses first order logic to describe query result does not describe how to get result i e declaritive studied Tuple Relational Calculus variables are tuples S S Sailors S rating 7 Data Definition Language DDL Create Table Constraints Triggers Data Manipulation Language DML SELECT DISTINCT target list FROM relation list WHERE qualification GROUP BY grouping list HAVING group qualification Set Operations subqueries etc 6 Database Applications 7 Internet Applications How to access DBs from programs embedded SQL SQLJ Dynamic APIs ODBC JDBG Cursors a way to iterate over relations Stored procedures in database language Accessing other programs from databases Extending postgres with C code Internet basics URIs HTTP stateless protocol Web data formats XML HTML DTD Different architectures Single tier Client server thick or thin client Three tier architecture 8 Storage and Indexing 9 Storing Data Disks and Files Different file organizations Heap Files unordered Sorted Files Clustered Files Unclustered Tree Unclustered Hash Tradeoffs in I O costs for various operations Hierarchy of storage Keeping data in files on disk How to arrange fields into records How to arrange records into pages How to arrange pages into files Managing disk and memory Buffer management LRU MRU Clock etc sname rating rating 8 S 2 Web browser thin client App server running business logic Database maintaining data 3 10 Tree Structured Indexes 11 Hash Based Indexes Trees best for range queries o k for equality ISAM less common usually best for data that doesn t change index doesn t adjust instead uses overflow pages if leaves fill B Trees present in virtually all databases tree adjusts index to stay balanced you should understand these pretty well after Hw4 Hash indexes best for equality useless for range queries Static hashing only good when data doesn t change uses overflow buckets Extendible hashing uses directory of buckets when overflow double directory size never needs overflow buckets Linear hashing no directory just a number indicating which buckets have split may need overflow buckets but doesn t need directory 12 Overview of Query Evaluation 13 External Sorting System catalogs info about all tables includes statistics about field values Access paths how to get at tuples file scan indexes Query plan tree of relational operators Database can sort any amount of info even if it doesn t fit in memory Sort runs that fit in memory then merge sorted runs together Used in Hw5 14 Evaluating Relational Operators 15 A Typical Relational Query Optimizer How to implement Selection Projection Join Algorithms Break query into query blocks Enumerate possible query plans Evaluate cost for each choose cheapest Nested Loops Indexed Nested Loops Sort Merge Join Hash Join 4 16 Overview of Transactions 17 Concurrency Control Anomalies Precedences Graphs Schedule Charateristics Seriazable View Serializable Conflict Serializable Recoverable Avoids Cascading Abort Strict Locking approaches 2PL strict 2PL dealing with deadlock Hierarchical locking Locking in B Trees Non locking approaches Optimistic CC Timestamp CC Multiversion CC Transactions unit of atomicity ACID


View Full Document

Berkeley COMPSCI 186 - 28 - WrapUp

Documents in this Course
Load more
Download 28 - WrapUp
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 28 - WrapUp and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 28 - WrapUp 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?