Slide 1DatabasesOutlineStorage HierarchyStorage HierarchyStorage HierarchyStorage HierarchyStorage HierarchyJim Gray’s Storage Latency Analogy: How Far Away is the Data?Storage…OutlineSlide 12Slide 13Slide 14Slide 15Slide 16Slide 17Accessing DataReliabilityDisk ControllerOptimizing block accessesOutlineRAIDRAID LevelsRAID LevelsRAID Level 5RAID Level 5Choosing a RAID levelOutlineBuffer ManagerBuffer ManagerBuffer ManagerOutlineFile OrganizationFile OrganizationFixed Length RecordsVariable-length RecordsFile OrganizationSequential File OrganizationSequential File OrganizationOutlineIndexOrdered IndexesPrimary Sparse IndexSecondary IndexMulti-level IndexesMulti-level IndexesOutlineExample B+-Tree IndexB+-Tree Node StructureProperties of B+-TreesExample B+-Tree IndexPropertiesExample B+-Tree IndexB+-Trees - SearchingTuple InsertionTuple InsertionB+-Trees: InsertionUpdates on B+-Trees: DeletionExamples of B+-Tree DeletionExamples of B+-Tree DeletionExample of B+-tree DeletionAnother B+Tree Insertion ExampleAnother Example: INSERT (125)Another Example: INSERT (125)Another Example: INSERT (125)Another Example: INSERT (125)Another Example: INSERT (125)B+ Trees in PracticeB+ Trees: SummaryMore…Secondary IndexB+-Tree File OrganizationB-TreeHash-based File OrganizationSlide 76Hash IndexesHash IndexesGrid FilesR-TreesConclusionsCMSC424: Database DesignInstructor: Amol Deshpande [email protected]Data ModelsConceptual representation of the dataData RetrievalHow to ask questions of the databaseHow to answer those questionsData StorageHow/where to store data, how to access itData IntegrityManage crashes, concurrencyManage semantic inconsistenciesDatabasesOutlineStorage hierarchyDisksRAIDFile OrganizationEtc….Storage HierarchyTradeoffs between speed and cost of accessVolatile vs nonvolatileVolatile: Loses contents when power switched offSequential vs random accessSequential: read the data contiguouslyRandom: read the data from anywhere at any timeStorage HierarchyStorage HierarchyCache Super fast; volatileTypically on chipL1 vs L2 vs L3 caches ???Huge L3 caches available now-a-daysBecoming more and more important to care about thisCache misses are expensiveSimilar tradeoffs as were seen between main memory and disksCache-coherency ??Storage HierarchyMain memory 10s or 100s of ns; volatilePretty cheap and dropping: 1GByte < 100$Main memory databases feasible now-a-daysFlash memory (EEPROM)Limited number of write/erase cyclesNon-volatile, slower than main memory (especially writes)Examples ?QuestionHow does what we discuss next change if we use flash memory only ?Key issue: Random access as cheap as sequential accessStorage HierarchyMagnetic Disk (Hard Drive)Non-volatileSequential access much much faster than random accessDiscuss in more detail laterOptical Storage - CDs/DVDs; JukeboxesUsed more as backups… Why ?Very slow to write (if possible at all)Tape storage Backups; super-cheap; painful to accessIBM just released a secure tape drive storage solutionJim Gray’s Storage Latency Analogy: How Far Away is the Data?RegistersOn Chip CacheOn Board CacheMemory Disk1210100Tape /Optical Robot109106SacramentoThis HotelThis RoomMy Head10 min1.5 hr2 Years1 minPluto2,000 YearsAndromedaStorage…Primarye.g. Main memory, cache; typically volatile, fastSecondarye.g. Disks; non-volatileTertiarye.g. Tapes; Non-volatile, super cheap, slowOutlineStorage hierarchyDisksRAIDFile OrganizationEtc….1956IBM RAMAC24” platters100,000 characters each5 million characters1979SEAGATE5MB1998SEAGATE47GB2004Hitachi400GBHeight (mm): 25.4. Width (mm): 101.6. Depth (mm): 146. Weight (max. g): 700 2006Western Digital500GBWeight (max. g): 600gLatest: Single hard drive: Seagate Barracuda 7200.10 SATA 750 GB 7200 rpm weight: 720g Uses “perpendicular recording”Microdrives IBM 1 GBToshiba 80GB“Typical” ValuesDiameter: 1 inch 15 inchesCylinders: 100 2000Surfaces: 1 or 2(Tracks/cyl) 2 (floppies) 30Sector Size: 512B 50KCapacity: 360 KB (old floppy) 300 GBAccessing DataAccessing a sectorTime to seek to the track (seek time)average 4 to 10ms+ Waiting for the sector to get under the head (rotational latency)average 4 to 11ms+ Time to transfer the data (transfer time)very lowAbout 10ms per accessSo if randomly accessed blocks, can only do 100 block transfers100 x 512bytes = 50 KB/sData transfer ratesRate at which data can be transferred (w/o any seeks)30-50MB/s (Compare to above)Seeks are bad !ReliabilityMean time to/between failure (MTTF/MTBF):57 to 136 yearsConsider:1000 new disks1,200,000 hours of MTTF eachOn average, one will fail 1200 hours = 50 days !Disk ControllerInterface between the disk and the CPUAccepts the commandschecksums to verify correctnessRemaps bad sectorsOptimizing block accessesTypically sectors too smallBlock: A contiguous sequence of sectors512 bytes to several KbytesAll data transfers done in units of blocksScheduling of block access requests ?Considerations: performance and fairnessElevator algorithmOutlineStorage hierarchyDisksRAIDFile OrganizationEtc….RAIDRedundant array of independent disksGoal:Disks are very cheapFailures are very costlyUse “extra” disks to ensure reliabilityIf one disk goes down, the data still survivesAlso allows faster access to dataMany raid “levels”Different reliability and performance propertiesRAID Levels(b) Make a copy of the disks. If one disk goes down, we have a copy. Reads: Can go to either disk, so higher data rate possible. Writes: Need to write to both disks.(a) No redundancy.RAID Levels(c) Memory-style Error CorrectingKeep extra bits around so we can reconstruct. Superceeded by below.(d) One disk contains “parity” for the main data disks.Can handle a single disk failure.Little overhead (only 25% in the above case).RAID Level 5Distributed parity “blocks” instead of bitsSubsumes Level 4Normal operation:“Read” directly from the disk. Uses all 5 disks“Write”: Need to read and update the parity blockTo update 9 to 9’read 9 and P2compute P2’ = P2 xor 9 xor 9’write 9’ and P2’RAID Level 5Failure operation (disk 3 has failed)“Read block
View Full Document