Slide 1Today’s TopicsA Typical Magnetic Disk ControllerDisk CachingDisk Arm and HeadMechanical Component of A Disk DriveDisk SectorsDisks Were LargeThey Are Now Much SmallerAreal Density vs. Moore’s Law50 Years Later (Mark Kryder at SNW 2006)Sample Disk Specs (from Seagate)Disk PerformanceMore on PerformanceFIFO (FCFS) orderSSTF (Shortest Seek Time First)Elevator (SCAN)C-SCAN (Circular SCAN)DiscussionsRAID (Redundant Array of Independent Disks)Synopsis of RAID LevelsRAID Level 6 and BeyondDealing with Disk FailuresNext Generation: FLASHSome Current FLASH ParametersWhat’s Wrong With FLASH?Where Is Flash GoingSummaryCOS 318: Operating SystemsStorage Devices2Today’s TopicsMagnetic disksMagnetic disk performanceDisk arrays (RAID)Flash memory3A Typical Magnetic Disk ControllerExternal connectionIDE/ATA, SATASCSI, SCSI-2, Ultra SCSI, Ultra-160 SCSI, Ultra-320 SCSIFibre channelCacheBuffer data between disk and interfaceControllerRead/write operationCache replacement Failure detection and recoveryDRAMcacheInterfaceControllerExternal connectionDisk4Disk CachingMethodUse DRAM to cache recently accessed blocks•Most disk has 16MB•Some of the RAM space stores “firmware” (an embedded OS)Blocks are replaced usually in an LRU orderPros of disk cachingGood for reads if accesses have localityConsCostNeed to deal with reliable writesDisk Arm and HeadDisk armA disk arm carries disk headsDisk headMounted on an actuatorRead and write on disk surfaceRead/write operationDisk controller receives a command with <track#, sector#>Seek the right cylinder (tracks) Wait until the right sector comesPerform read/write6Mechanical Component of A Disk DriveTracksConcentric rings around disk surface, bits laid out serially along each trackCylinderA track of the platter, 1000-5000 cylinders per zone, 1 spare per zoneSectorsEach track is split into arc of track (min unit of transfer)7Disk SectorsWhere do they come from?Formatting processLogical maps to physical What is a sector?Header (ID, defect flag, …)Real space (e.g. 512 bytes)Trailer (ECC code)What about errors?Detect errors in a sectorCorrect them with ECCIf not recoverable, replace it with a spareSkip bad sectors in the futureHdrSector…512 bytes ECCi i+1 i+2defectdefect8Disks Were LargeFirst Disk: IBM 305 RAMAC (1956)5MB capacity50 disks, each 24”9They Are Now Much SmallerForm factor: .5-1” 4” 5.7”Storage: 120-750GB Form factor: .4-.7” 2.7” 3.9”Storage: 60-200GB Form factor: .2-.4” 2.1” 3.4”Storage: 1GB-8GB10Areal Density vs. Moore’s Law (Mark Kryder at SNW 2006)1150 Years Later (Mark Kryder at SNW 2006)IBM RAMAC (1956)Seagate Momentus(2006)DifferenceCapacity 5MB 160GB 32,000Areal Density 2K bits/in2130 Gbits/in265,000,000Disks 50 @ 24” diameter 2 @ 2.5” diameter 1 / 2,300Price/MB $1,000 $0.01 1 / 3,200,000Spindle Speed1,200 RPM 5,400 RPM 5Seek Time 600 ms 10 ms 1 / 60Data Rate 10 KB/s 44 MB/s 4,400Power 5000 W 2 W 1 / 2,500Weight ~ 1 ton 4 oz 1 / 9,00012Sample Disk Specs (from Seagate)Cheetah 15k.5 Barracuda ES.2CapacityFormatted capacity (GB) 300 1000Discs 4 4Heads 8 8Sector size (bytes) 512 512PerformanceExternal interface Ultra320 SCSI, FC, S. SCSI SATASpindle speed (RPM) 15,000 7,200Average latency (msec) 2.0 4.16Seek time, read/write (ms) 3.5/4.0 8.5/9.5Track-to-track read/write (ms) 0.2-0.4 0.8/1.0External transfer (MB/sec) 300-400 350Internal transfer rate (MB/sec) 89-150 105 (sustained)Cache size (MB) 16 32ReliabilityRecoverable read errors 1 per 1012 bits read 1 per 1010 bits readNon-recoverable read errors 1 per 1016 bits read 1 per 1015 bits readDisk PerformanceSeekPosition heads over cylinder, typically 3.5-9.5 msRotational delayWait for a sector to rotate underneath the headsTypically 8 4 ms (7,200 – 15,000RPM) or ½ rotation takes 4 - 2msTransfer bytes Transfer bandwidth is typically 40-125 Mbytes/secPerformance of transfer 1 KbytesSeek (4 ms) + half rotational delay (2ms) + transfer (0.013 ms)Total time is 6.01 ms or 167 Kbytes/sec (1/360 of 60MB/sec)!14More on PerformanceWhat transfer size can get 90% of the disk bandwidth?Assume Disk BW = 60MB/sec, ½ rotation = 2ms, ½ seek = 4msBW * 90% = size / (size/BW + rotation + seek)size = BW * (rotation + seek) * 0.9 / 0.1 = 60MB * 0.006 * 0.9 / 0.1 = 3.24MBSeek and rotational times dominate the cost of small accessesDisk transfer bandwidth are wastedNeed algorithms to reduce seek timeSpeed depends on which sectors to accessBlock Size (Kbytes) % of Disk Transfer Bandwidth1Kbytes 0.28%1Mbytes 73.99%3.24Mbytes 90%15FIFO (FCFS) orderMethodFirst come first serveProsFairness among requestsIn the order applications expectConsArrival may be on random spots on the disk (long seeks)Wild swings can happen0 19998, 183, 37, 122, 14, 124, 65, 675316SSTF (Shortest Seek Time First)MethodPick the one closest on diskRotational delay is in calculationProsTry to minimize seek timeConsStarvationQuestionCan we avoid the starvation?0 19998, 183, 37, 122, 14, 124, 65, 67(65, 67, 37, 14, 98, 122, 124, 183)5317Elevator (SCAN)MethodTake the closest request in the direction of travelReal implementations do not go to the end (called LOOK)ProsBounded time for each requestConsRequest at the other end will take a while0 19998, 183, 37, 122, 14, 124, 65, 67(37, 14, 65, 67, 98, 122, 124, 183)5318C-SCAN (Circular SCAN)MethodLike SCANBut, wrap aroundReal implementation doesn’t go to the end (C-LOOK)ProsUniform service timeConsDo nothing on the return0 19998, 183, 37, 122, 14, 124, 65, 67(65, 67, 98, 122, 124, 183, 14, 37)5319DiscussionsWhich is your favorite?FIFOSSTFSCANC-SCANDisk I/O request bufferingWhere would you buffer requests?How long would you buffer requests?20RAID (Redundant Array of Independent Disks)Main ideaStore the error correcting codes on other disksGeneral error correcting codes are too powerfulUse XORs or single parityUpon any failure, one can recover the entire block from the spare disk (or any disk) using XORsProsReliabilityHigh bandwidthConsThe controller is complexD1 D2 D3 D4 PRAID controllerP = D1 D2 D3 D4D3 = D1 D2 P D421Synopsis of RAID
View Full Document