U of U CS 6810 - DISKS & Storage - D1007049

Home> Schools> University of Utah> Computer Science (CS) > CS 6810> DISKS & Storage

DOC PREVIEW

U of U CS 6810 - DISKS & Storage

School name University of Utah

Course Cs 6810- Computer Architecture

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Page 1 1 CS6810 School of Computing University of Utah DISKS & Storage Today’s topics: Faults & RAS RAID models Some underlying disk technology very brief – more complicated than you might guess more depth will appear in CS7810 2 CS6810 School of Computing University of Utah Reliability • RAS  reliability – absence of observable faults (hard, soft, human) » redundancy is always the key here  availability – system level concept » does it still supply the service » how much degradation under certain fault models  serviceability » can system be repaired while it’s running • lots of engineering issues to enable hot-swapPage 2 3 CS6810 School of Computing University of Utah Faults • Categories  HW » did something break • several types: wire, component, connector, power supply, cooling, …  design » bug in either software or hardware • check known errors in any current uP – software work arounds are key until next fab run  operational » most common: screw up by operations/maintenance staff  environmental » power or network loss, fire, flood, sabotage, … 4 CS6810 School of Computing University of Utah Fault Types • Transient  non-recurring » causes • environmental noise event – lightning • alpha particle strike » basically impossible to find so you need to compensate by design • parity, CRC, …, reboot  intermittent » recurring but somewhat rare • cross-talk • transistor malfunction at a certain temp that is rare » again compensate by design  permanent » something just breaks and stays broken » finding these are typically easy » compensate & service to meet RAS targetPage 3 5 CS6810 School of Computing University of Utah Failure Reality • System is what we care about  sum of it’s components – weakest link theory applies  N components fail N times more often » think early multi-engine airplanes  today small number of components have increased system reliability  somewhat surprising IC property » IC failure rate has remained fairly flat • even w/ Moore’s law growth of transistors » we are likely entering a different era • how to build reliable systems from flakey components? • hot current research topic • Metrics 6 CS6810 School of Computing University of Utah FIT Metric • 1 FIT = 1 failure in 109 hours  FIT ::= failure in time (billion hours) » billion hours = 114,155 years » 3-5 year expected lifetime » need ~10-5 FIT reliability • MTTF = MTBF  calculating MTBF » ri = FIT rate of ith component » qi is the quantity of the ith component » n is the total number of componentsPage 4 7 CS6810 School of Computing University of Utah Improving Reliability • Make better parts  doable in some cases & huge cost adder • Use less parts  natural consequence of higher levels of integration • Employ redundancy  common choice » 2x – OK as long as we agree » 3x – vote and 1 can fail » Nx – vote and (N/2)-1 can fail  duplicate what? » bits, components, wires, gates, …. » huge choice set • bits and components are common choices today • wires and gates may be in our future – if intra-IC devices become flakey • Bottom line – Pandora’s box just opened  Dan Siewiorek’s book is an excellent reference text 8 CS6810 School of Computing University of Utah Failure Model • No design makes sense without a reasonable failure model  amazing how many times this mistake is made  how reliable does your system have to be & what are the consequences of failure » note difference between PC and nuclear power plant monitors  characterize your components » MTBF equation comes into play • Examples  transistors and wires fail on a chip » highly localized  noise  burst errors in transmission  disk  oxide deterioration affects an area » area likely to expand over timePage 5 9 CS6810 School of Computing University of Utah Reliability, Disks, and Modern Systems • Think selfishly  what would be a bigger disaster » losing your files » losing your PC » if they are the same, you really should fix this YESTERDAY • The point  we view disk storage as archival in most cases  backups are increasingly on disk » commercial archives are often tape based for “old stuff” • cheaper but a pain in the tuckus to retrieve from the cave  checkpoints are always on disk  NVRAM option may be cost effective in the future » more on this next lecture • So let’s look at disk reliability  and then a brief glance at the underlying technology 10 CS6810 School of Computing University of Utah RAID • 1987 – Redundant array of inexpensive disks  Patterson, Gibson, Katz @ UCB » Gibson now at CMU » Katz made it happen while he was at DARPA » now it’s everywhere • Reliability through redundancy  key idea is to stripe data over more than 1 disk  avoid disaster on a single point failure » e.g. head crash, AWOL controller, … » even better • make sure disks are physically separate – EMP or earthquake takes out a warehouse  striping model determines RAID type » also improves access time for large files • no additional seeks between tracks » also impacts costPage 6 11 CS6810 School of Computing University of Utah RAID 0 • No redundancy  hence a bit of a misnomer  cheap but unable to withstand a single failure » except for those corrrectable w/ block CRC’s • access advantage is the only benefit source: Wikipedia 12 CS6810 School of Computing University of Utah RAID1 • Mirroring  files on both disks  CRC check block option says if one disk fails you’ll know » you’re betting that both won’t fail concurrently  note interesting option » read disk that delivers first • if taken this destroys arm synchronization which will penalize writes • as usual – you want to optimize the common case which is read access  most expensive » 2x disks for x capacity » w.r.t. RAID0 • read energy minimized – same as RAID 0 • write energy doubles over RAID 0

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

U of U CS 6810 - DISKS & Storage

Sign up for free to view:

Please select your school