Administrivia CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems cont Homework 4 due today Homework 5 posted today Exam 2 on Thursday April 23 Cache simulator project questions Alan Sussman als cs umd edu l d d CMSC 411 18 some from Patterson Sussman others How is the I O bus connected How does CPU g get data from I O bus Do we connect it to Two solutions the memory bus or to the cache 2 Some mostly older machines have op codes that read or write it to t I O devices d i In memory mapped I O certain physical addresses are reserved for I O devices like disks so those reads and writes put on the I O bus are p Typical solution from Fig g 7 15 H P 3ed Usually I O is interrupt driven meaning that after the CPU requests a READ or WRITE it goes on with other work until the I O unit signals that it is finished CMSC 411 18 some from Patterson Sussman others 3 CMSC 411 18 some from Patterson Sussman others 4 DMA to make this work To allow the CPU to proceed need another controller to shepherd the READ or WRITE Direct memory access DMA h hardware d iis used d to t record the address and the number of bytes to be transferred act as bus master initiating each data transfer interrupt the CPU when the transfer is complete Reliability Availability and RAID In some cases these controllers are really separate sepa ate I O O p processors ocesso s CMSC 411 18 some from Patterson Sussman others 5 Failure rate vs Availability Example Disk arrays Failure rate concerns whether any of the hardware is broken Availability A il bili concerns whether h h the h system is i usable even if some pieces are broken Example 1 Your bank can improve the availability of the ATM system by installing two ATM machines so that one is available even if one breaks Example 2 Your bank can reduce the failure rate of the ATM system by installing a machine that does not break as often Suppose a machine has an array of 20 disks Case 1 If distribute the data across the disks striping then all ll 20 di disks k mustt be b working ki properly l in i order d to t access the th data but throughput can be improved Case 2 If store 20 copies of the data one copy per disk have good availability g y can access the data even if some disks fail But reliability of the 20 disks is less than reliability of a single disk the probability of one of the 20 disks failing is essentially 20 times the probability that a single disk will fail Also increases the availability Generally hope that more complicated hardware improves availability and performance but it also may increase the failure rate CMSC 411 18 some from Patterson Sussman others 7 CMSC 411 18 some from Patterson Sussman others 8 RAID from Fig 6 4 Disk arrays y cont There are various levels of RAID depending on the relative importance of availability accuracy and cost In Case 2 store multiple copies on multiple disks called RAID redundant arrays of inexpensive disks RAID is actually not inexpensive because of the cost of the controllers controllers power supplies supplies and fans fans so often the I is said to stand for independent More than 80 of non PC disk drive sales are now RAID a multi billion dollar industry Typically store 2 copies not 20 Used when availability is critical in applications such as airline reservations medical records stock market CMSC 411 18 some from Patterson Sussman others RAID level faults survived 0 Striped 0 8 0 widely used 1 Mirrored 1 8 8 EMC HP Tandem IBM 2 Memory style ECC 1 8 4 3 Bit interleaved parity 1 8 1 Storage Concepts 4 Block interleaved parity 1 8 1 Network Appliance 5 Block interleaved w distributed parity 1 8 1 widely used 6 P Q redundancy 2 8 2 Network Appliance 9 Example data disks RAID 3 One copy of data RAID 0 Bit interleaved parity RAID 3 10 One copy of the data stored among several disks and one extra disk to hold a parity bit checksum for the others Two full copies of data mirroring RAID 1 Example Suppose have 4 data disks and one piece of the data looks like this If one disk fails go to other Can also use this to distribute the load of READs Most expensive RAID option RAID 0 and 1 can be combined 1 0 or 10 mirror pairs of disks then stripe across pairs 0 1 or 01 stripe across one set of half the disks then mirror writes to both sets CMSC 411 18 some from Patterson Sussman others Companies CMSC 411 18 some from Patterson Sussman others RAID levels 0 1 Data D t striped t i d across a disk di k array Check disks 11 Disk 1 0 1 0 1 1 0 0 0 Disk 2 0 1 1 1 0 1 1 0 Disk 3 0 1 1 1 1 0 0 0 Disk 4 0 0 0 1 0 1 0 1 Then the parity bits are set by taking the sums mod 2 Disk 5 0 1 0 0 0 0 1 1 CMSC 411 18 some from Patterson Sussman others 12 RAID 3 cont RAID 4 So if the data on one of the disks becomes corrupted the parity bits on Disk 5 will be wrong so can tell t ll there th has h been b a failure f il Block interleaved parity RAID 4 Same organization of data as RAID 3 but cheaper reads and writes it Read Read one sector at a time and count on the disk s own error detection mechanisms for each sector Write In each write write note which bits are changing this is enough information to change the parity bits without reading from the other disks and be able to fix it if know which disk failed Disadvantage g Each data access must read from all 5 disks in order to retrieve the data and check for corruption also can can tt always tell where the error is could even be on the parity disk CMSC 411 18 some from Patterson Sussman others 13 CMSC 411 18 some from Patterson Sussman others RAID 4 example RAID 5 Fig g 7 19 from H P 3ed If the original contents are Disadvantage of RAID 4 Parity disk is a bottleneck so it is better to interleave the parity bottleneck information across all of the disks RAID 5 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 01011000 01110110 01111000 00010101 01000011 14 And write Disk 2 Disk 2 0 1 1 1 0 1 1 0 old 1 0 1 1 0 0 1 1 new Then since bits 0 1 5 and 7 changed need to flip those parity bits Disk 5 Disk 5 0 1 0 0 0 0 1 …
View Full Document