Berkeley COMPSCI 294 - Recovery Oriented Computing - D2966502

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 294> Recovery Oriented Computing

DOC PREVIEW

Berkeley COMPSCI 294 - Recovery Oriented Computing

School name University of California, Berkeley

Course Compsci 294- Special Topics

Pages 61

This preview shows page 1-2-3-4-28-29-30-31-58-59-60-61 out of 61 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 61 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Recovery Oriented ComputingOutlineGoals,Assumptions of last 15 yearsAfter 15 year improving PerformanceDowntime Costs (per Hour)Jim Gray: Trouble-Free SystemsLampson: Systems ChallengesHennessy: What Should the “New World” Focus Be?The real scalability problems: AMETotal Cost of Ownership (IBM)Lessons learned from Past Projects for which might help AMELessons learned from Past Projects for AMELessons learned from InternetLessons learned from Past Projects for AMELessons learned from Past Projects for AMELearning from other fields: PSTNLessons learned from Past Projects for AMELessons Learned from Other FieldsLessons Learned from Other FieldsLessons Learned from Other FieldsHuman ErrorHuman Error: Automation ironyOther FieldsLessons Learned from Other CulturesOutlineRecovery-Oriented Computing HypothesisTentative ROC Principles: #1 Isolation and RedundancyTentative ROC Principles #2 Online verificationTentative ROC Principles #3 Undo supportTentative ROC Principles #4 Diagnosis SupportOverview towards AME via ROCRest of TalkWhat about claims of 5 9s?“Microsoft fingers technicians for crippling site outages”What is uptime of HP.com?Traditional HA vs. Internet realityHow does ROC differ from Fault Tolerant Computing (FTC)?Benchmarking availabilityExample: single-fault in SW RAIDSoftware RAID: QoS behaviorSoftware RAID: QoS behaviorSoftware RAID: maintainabilityInitial ApplicationsConclusionAn Approach to Recovery-Oriented Computers (ROC)An Approach to ROCAn Approach to ROCAn Approach to ROCAn Approach to ROCISTORE-1 BrickCost of Bandwidth, SafetyDisk Limit: Bus HierarchyClusters and TPC Software 8/’00Clusters and TPC-C BenchmarkCost of Storage System v. DisksSCSI v. IDE $/GBAvailability benchmark methodologyStage 4: Diagnosis aidsDiagnosis aidsTotal Cost of OwnershipSlide 1Recovery Oriented ComputingDave PattersonUniversity of California at [email protected]://roc.CS.Berkeley.EDU/September 2001Slide 2Outline• What have we been doing• Motivation for a new Challenge: making things work (including endorsements)• What have we learned• New Challenge: Recovery-Oriented Computer• Examples: benchmarks, prototypesSlide 3Goals,Assumptions of last 15 years• Goal #1: Improve performance• Goal #2: Improve performance• Goal #3: Improve cost-performance• Assumptions– Humans are perfect (they don’t make mistakes during installation, wiring, upgrade, maintenance or repair)– Software will eventually be bug free (good programmers write bug-free code)– Hardware MTBF is already very large (~100 years between failures), and will continue to increaseSlide 4After 15 year improving Performance• Availability is now a vital metric for servers!– near-100% availability is becoming mandatory» for e-commerce, enterprise apps, online services, ISPs– but, service outages are frequent» 65% of IT managers report that their websites were unavailable to customers over a 6-month period• 25%: 3 or more outages– outage costs are high» social effects: negative press, loss of customers who “click over” to competitorSource: InternetWeek 4/3/2000Slide 5Downtime Costs (per Hour)• Brokerage operations $6,450,000• Credit card authorization $2,600,000• Ebay (1 outage 22 hours) $225,000• Amazon.com $180,000• Package shipping services $150,000• Home shopping channel $113,000• Catalog sales center $90,000• Airline reservation center $89,000• Cellular service activation $41,000• On-line network fees $25,000• ATM service fees $14,000Source: InternetWeek 4/3/2000+ Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."Slide 6Jim Gray: Trouble-Free Systems • Manager –Sets goals–Sets policy– Sets budget– System does the rest.• Everyone is a CIO (Chief Information Officer)• Build a system – used by millions of people each day– Administered and managed by a ½ time person.» On hardware fault, order replacement part» On overload, order additional equipment» Upgrade hardware and software automatically.“What Next? A dozen remaining IT problems”Turing Award Lecture, FCRC, May 1999Jim GrayMicrosoftSlide 7Lampson: Systems Challenges• Systems that work– Meeting their specs– Always available– Adapting to changing environment– Evolving while they run– Made from unreliable components– Growing without practical limit• Credible simulations or analysis• Writing good specs• Testing• Performance– Understanding when it doesn’t matter“Computer Systems Research-Past and Future” Keynote address, 17th SOSP, Dec. 1999Butler LampsonMicrosoftSlide 8Hennessy: What Should the “New World” Focus Be?• Availability– Both appliance & service• Maintainability– Two functions:» Enhancing availability by preventing failure» Ease of SW and HW upgrades• Scalability– Especially of service• Cost– per device and per service transaction• Performance– Remains important, but its not SPECint“Back to the Future: Time to Return to LongstandingProblems in Computer Systems?” Keynote address, FCRC, May 1999John HennessyStanfordSlide 9The real scalability problems: AME• Availability– systems should continue to meet quality of service goals despite hardware and software failures• Maintainability– systems should require only minimal ongoing human administration, regardless of scale or complexity: Today, cost of maintenance = 10X cost of purchase• Evolutionary Growth– systems should evolve gracefully in terms of performance, maintainability, and availability as they are grown/upgraded/expanded• These are problems at today’s scales, and will only get worse as systems growSlide 10Total Cost of Ownership (IBM)HW management 3%Environmental14%Downtime20%Purchase20%Administration 13%Backup Restore 30%•Administration: all people time•Backup Restore: devices, media, and people time•Environmental: floor space, power, air conditioningSlide 11Lessons learned from Past Projects for which might help AME• Know how to improve performance (and cost)– Run system against workload, measure, innovate, repeat– Benchmarks standardize workloads, lead to competition, evaluate alternatives; turns debates into numbers• Major improvements in Hardware Reliability– 1990 Disks 50,000 hour MTBF to 1,200,000 in 2000– PC motherboards from 100,000 to 1,000,000 hours• Yet Everything has an error rate– Well designed and manufactured

View Full Document

Berkeley COMPSCI 294 - Recovery Oriented Computing

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-28-29-30-31-58-59-60-61 out of 61 pages.

Berkeley COMPSCI 294 - Recovery Oriented Computing

Sign up for free to view:

Please select your school