DOC PREVIEW
UMD CMSC 424 - Database Design

This preview shows page 1-2-3-4-5-38-39-40-41-42-43-77-78-79-80-81 out of 81 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 81 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC424 Database Design Instructor Amol Deshpande amol cs umd edu Databases Data Conceptual representation of the data Data Storage How where to store data how to access it Data Retrieval How to ask questions of the database How to answer those questions Data Models Integrity Manage crashes concurrency Manage semantic inconsistencies Outline Storage hierarchy Disks RAID File Organization Etc Storage Hierarchy Tradeoffs between speed and cost of access Volatile vs nonvolatile Volatile Loses contents when power switched off Sequential vs random access Sequential read the data contiguously Random read the data from anywhere at any time Storage Hierarchy Storage Hierarchy Cache Super fast volatile Typically on chip L1 vs L2 vs L3 caches Huge L3 caches available now a days Becoming more and more important to care about this Cache misses are expensive Similar tradeoffs as were seen between main memory and disks Cache coherency Storage Hierarchy Main memory 10s or 100s of ns volatile Pretty cheap and dropping 1GByte 100 Main memory databases feasible now a days Flash memory EEPROM Limited number of write erase cycles Non volatile slower than main memory especially writes Examples Question How does what we discuss next change if we use flash memory only Key issue Random access as cheap as sequential access Storage Hierarchy Magnetic Disk Hard Drive Non volatile Sequential access much much faster than random access Discuss in more detail later Optical Storage CDs DVDs Jukeboxes Used more as backups Why Very slow to write if possible at all Tape storage Backups super cheap painful to access IBM just released a secure tape drive storage solution Analogy How Far Away is the Data 10 9 Andromeda Tape Optical Robot 10 6 Disk 100 10 2 1 Memory On Board Cache On Chip Cache Registers 2 000 Years Pluto Sacramento 2 Years 1 5 hr This Hotel 10 min This Room My Head 1 min Storage Primary e g Main memory cache typically volatile fast Secondary e g Disks non volatile Tertiary e g Tapes Non volatile super cheap slow Outline Storage hierarchy Disks RAID File Organization Etc 1956 IBM RAMAC 24 platters 100 000 characters each 5 million characters 1979 SEAGATE 5MB 1998 SEAGATE 47GB 2004 Hitachi 400GB Height mm 25 4 Width mm 101 6 Depth mm 146 Weight max g 700 2006 Western Digital 500GB Weight max g 600g Latest Single hard drive Seagate Barracuda 7200 10 SATA 750 GB 7200 rpm weight 720g Uses perpendicular recording Microdrives IBM 1 GB Toshiba 80GB Typical Values Diameter 1 inch 15 inches Cylinders 100 2000 Surfaces 1 or 2 Tracks cyl 2 floppies 30 Sector Size 512B 50K Capacity 360 KB old floppy 300 GB Accessing Data Accessing a sector Time to seek to the track seek time Waiting for the sector to get under the head rotational latency very low About 10ms per access average 4 to 11ms Time to transfer the data transfer time average 4 to 10ms So if randomly accessed blocks can only do 100 block transfers 100 x 512bytes 50 KB s Data transfer rates Rate at which data can be transferred w o any seeks 30 50MB s Compare to above Seeks are bad Reliability Mean time to between failure MTTF MTBF 57 to 136 years Consider 1000 new disks 1 200 000 hours of MTTF each On average one will fail 1200 hours 50 days Disk Controller Interface between the disk and the CPU Accepts the commands checksums to verify correctness Remaps bad sectors Optimizing block accesses Typically sectors too small Block A contiguous sequence of sectors 512 bytes to several Kbytes All data transfers done in units of blocks Scheduling of block access requests Considerations performance and fairness Elevator algorithm Outline Storage hierarchy Disks RAID File Organization Etc RAID Redundant array of independent disks Goal Disks are very cheap Failures are very costly Use extra disks to ensure reliability If one disk goes down the data still survives Also allows faster access to data Many raid levels Different reliability and performance properties RAID Levels a No redundancy b Make a copy of the disks If one disk goes down we have a copy Reads Can go to either disk so higher data rate possible Writes Need to write to both disks RAID Levels c Memory style Error Correcting Keep extra bits around so we can reconstruct Superceeded by below d One disk contains parity for the main data disks Can handle a single disk failure Little overhead only 25 in the above case RAID Level 5 Distributed parity blocks instead of bits Subsumes Level 4 Normal operation Read directly from the disk Uses all 5 disks Write Need to read and update the parity block To update 9 to 9 read 9 and P2 compute P2 P2 xor 9 xor 9 write 9 and P2 RAID Level 5 Failure operation disk 3 has failed Read block 0 Read it directly from disk 2 Read block 1 which is on disk 3 Read P0 0 2 3 and compute 1 P0 xor 0 xor 2 xor 3 Write To update 9 to 9 read 9 and P2 Oh P2 is on disk 3 So no need to update it Write 9 Choosing a RAID level Main choice between RAID 1 and RAID 5 Level 1 better write performance than level 5 Level 5 2 block reads and 2 block writes to write a single block Level 1 only requires 2 block writes Level 1 preferred for high update environments such as log disks Level 5 lower storage cost Level 1 60 more disks Level 5 is preferred for applications with low update rate and large amounts of data Outline Storage hierarchy Disks RAID Buffer Manager File Organization Etc Buffer Manager Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it Buffer Mgr hides the fact that not all data is in RAM Buffer Manager Similar to virtual memory manager Buffer replacement policies What page to evict LRU Least Recently Used Throw out the page that was not used in a long time MRU Most Recently Used The opposite Why Clock An efficient implementation of LRU Buffer Manager Pinning a block Force output force write Force the contents of a block to be written to disk Order the writes Not allowed to write back to the disk This block must be written to disk before this block Critical for fault tolerant guarantees Otherwise the database has no control over whats on disk and whats not on disk Outline Storage hierarchy Disks RAID Buffer Manager File Organization Etc File Organization How are the relations mapped to the disk blocks Use a standard file system High end systems have their own OS file systems OS interferes more than helps in many cases Mapping of relations to file One to one


View Full Document

UMD CMSC 424 - Database Design

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Loading Unlocking...
Login

Join to view Database Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Design and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?