DOC PREVIEW
Berkeley COMPSCI 294 - Availability and Maintainability Benchmarks

This preview shows page 1-2-3-4-30-31-32-33-34-62-63-64-65 out of 65 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Availability and Maintainability Benchmarks A Case Study of Software RAID SystemsOverviewSlide 3Part IOutline: Availability BenchmarksWhy benchmark availability?Step 1: Availability metricsStep 2: Measurement techniquesStep 3: Reporting resultsCase studyBenchmark environmentBenchmark environment: faultsSingle-fault experimentsMultiple-fault experimentsComparison of systemsTransient error handlingTransient error handling (2)Reconstruction policyReconstruction policy: graphical viewReconstruction policy (2)Double-fault handlingAvailability Conclusions: Case studyConclusions: Availability benchmarksAvailability: Future opportunitiesPart IIOutline: Maintainability BenchmarksMotivationMetrics & ApproachMethodology1) Build a task taxonomySlide 312) Measure a task’s cost3) Measure task frequencies4) Apply a cost functionCase StudyExperimental platformExperimental procedureExperimental procedure (2)Experimental procedure (3)Sample results: timeAnalysis of time resultsAnalysis of time results (2)Learning curve resultsLearning curve results (2)Summary of resultsDiscussion of methodologyMaking the methodology practicalEarly reactionsLooking for feedback...ConclusionsDiscussion topics?Backup SlidesApproaching availability benchmarksExample Quality of Service metricsSystem configurationSingle-fault resultsBehavior A: no effectBehavior B: lost redundancyBehavior C: automatic reconstructionBehavior D: system failureSystem comparison: single-faultExample multiple-fault resultMulti-fault resultsMulti-fault results (2)Future Directions: MaintainabilitySlide 1Availability and Maintainability BenchmarksA Case Study of Software RAID SystemsAaron Brown, Eric Anderson, and David A. PattersonComputer Science DivisionUniversity of California at BerkeleyCS294-8 Guest Lecture7 November 2000Slide 2Overview•Availability and Maintainability are key goals for modern systems–and the focus of the ISTORE project•How do we achieve these goals?– start by understanding them– figure out how to measure them– evaluate existing systems and techniques– develop new approaches based on what we’ve learned»and measure them as well!Slide 3Overview•Availability and Maintainability are key goals for modern systems–and the focus of the ISTORE project•How do we achieve these goals?– start by understanding them– figure out how to measure them– evaluate existing systems and techniques– develop new approaches based on what we’ve learned»and measure them as well!•Benchmarks make these tasks possible!Slide 4Part IAvailability BenchmarksSlide 5Outline: Availability Benchmarks•Motivation: why benchmark availability?•Availability benchmarks: a general approach•Case study: availability of software RAID–Linux (RH6.0), Solaris (x86), and Windows 2000•ConclusionsSlide 6Why benchmark availability?•System availability is a pressing problem–modern applications demand near-100% availability»e-commerce, enterprise apps, online services, ISPs »at all scales and price points–we don’t know how to build highly-available systems!»except at the very high-end•Few tools exist to provide insight into system availability–most existing benchmarks ignore availability»focus on performance, and under ideal conditions–no comprehensive, well-defined metrics for availabilitySlide 7Step 1: Availability metrics•Traditionally, percentage of time system is up–time-averaged, binary view of system state (up/down)•This metric is inflexible–doesn’t capture degraded states»a non-binary spectrum between “up” and “down”–time-averaging discards important temporal behavior»compare 2 systems with 96.7% traditional availability:•system A is down for 2 seconds per minute•system B is down for 1 day per month•Our solution: measure variation in system quality of service metrics over time–performance, fault-tolerance, completeness, accuracySlide 8Step 2: Measurement techniques•Goal: quantify variation in QoS metrics as events occur that affect system availability•Leverage existing performance benchmarks–to measure & trace quality of service metrics –to generate fair workloads•Use fault injection to compromise system–hardware faults (disk, memory, network, power)–software faults (corrupt input, driver error returns)–maintenance events (repairs, SW/HW upgrades)•Examine single-fault and multi-fault workloads–the availability analogues of performance micro- and macro-benchmarksSlide 9TimeQoS Metric}normal behavior(99% conf)injectedfaultsystem handles fault0•Results are most accessible graphically–plot change in QoS metrics over time–compare to “normal” behavior»99% confidence intervals calculated from no-fault runsStep 3: Reporting results•Graphs can be distilled into numbersSlide 10Case study•Availability of software RAID-5 & web server–Linux/Apache, Solaris/Apache, Windows 2000/IIS•Why software RAID?–well-defined availability guarantees»RAID-5 volume should tolerate a single disk failure»reduced performance (degraded mode) after failure»may automatically rebuild redundancy onto spare disk–simple system–easy to inject storage faults•Why web server?–an application with measurable QoS metrics that depend on RAID availability and performanceSlide 11Benchmark environment•RAID-5 setup–3GB volume, 4 active 1GB disks, 1 hot spare disk•Workload generator and data collector–SPECWeb99 web benchmark»simulates realistic high-volume user load»mostly static read-only workload»modified to run continuously and to measure average hits per second over each 2-minute interval•QoS metrics measured–hits per second »roughly tracks response time in our experiments–degree of fault tolerance in storage systemSlide 12Benchmark environment: faults•Focus on faults in the storage system (disks)•Emulated disk provides reproducible faults–a PC that appears as a disk on the SCSI bus–I/O requests intercepted and reflected to local disk–fault injection performed by altering SCSI command processing in the emulation software•Fault set chosen to match faults observed in a long-term study of a large storage array–media errors, hardware errors, parity errors, power failures, disk hangs/timeouts–both transient and “sticky” faultsSlide 13Single-fault experiments•“Micro-benchmarks”•Selected 15 fault types–8 benign (retry required)–2 serious (permanently unrecoverable)–5 pathological (power failures and complete hangs)•An


View Full Document

Berkeley COMPSCI 294 - Availability and Maintainability Benchmarks

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Availability and Maintainability Benchmarks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Availability and Maintainability Benchmarks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Availability and Maintainability Benchmarks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?