CORNELL CS 501 - Lecture 21 Dependable Systems I Reliability - D2266247

Home> Schools> Cornell University> Computer Science (CS) > CS 501> Lecture 21 Dependable Systems I Reliability

DOC PREVIEW

CORNELL CS 501 - Lecture 21 Dependable Systems I Reliability

School name Cornell University

Course Cs 501- Software Engineering

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 501: Software Engineering Fall 2000AdministrationSoftware ReliabilityReliability MetricsReliability Metrics for Distributed SystemsUser Perception of ReliabilityCost of Improved ReliabilitySpecification of System ReliabilityPrinciples for Dependable SystemsSlide 10Quality Management ProcessesSlide 12Design and Code ReviewsBenefits of Design and Code ReviewsProcess (Plan) ReviewsStatistical TestingSlide 17Example: Dartmouth Time Sharing (1980)Slide 19Slide 20Factors for Fault Free SoftwareError AvoidanceDefensive ProgrammingDefensive Programming ExamplesSome Notable BugsCS 501: Software EngineeringFall 2000Lecture 21Dependable Systems IReliability2AdministrationAssignment 3• Report due tomorrow at 5 p.m.group design with individual parts• Presentations Wednesday through Fridayevery group member must present during the semester3Software ReliabilityFailure: Software does not deliver the service expected by the user (e.g., mistake in requirements)Fault: Programming or design error whereby the delivered system does not conform to specificationReliability: Probability of a failure occurring in operational use.Perceived reliability: Depends upon:user behaviorset of inputspain of failure4Reliability Metrics• Probability of failure on demand• Rate of failure occurrence (failure intensity)• Mean time between failures• Availability (up time)• Mean time to repair• Distribution of failuresHypothetical example: Cars are safer than airplane in accidents (failures) per hour, but less safe in failures per mile.5Reliability Metrics for Distributed SystemsTraditional metrics are hard to apply in multi-component systems:• In a big network, at a given moment something will be giving trouble, but very few users will see it.• A system that has excellent average reliability may give terrible service to certain users.• There are so many components that system administrators rely on automatic reporting systems to identify problem areas.6User Perception of Reliability1. A personal computer that crashes frequently v. a machine that is out of service for two days.2. A database system that crashes frequently but comes back quickly with no loss of data v. a system that fails once in three years but data has to be restored from backup.3. A system that does not fail but has unpredictable periods when it runs very slowly.7Cost of Improved Reliability$Up time99%100%Will you spend your money on new functionality or improved reliability?8Specification of System ReliabilityExample: ATM card readerFailure class Example MetricPermanent System fails to operate 1 per 1,000 daysnon-corrupting with any card -- rebootTransient System can not read 1 in 1,000 transactionsnon-corrupting an undamaged cardCorrupting A pattern of Never transactions corrupts database9Principles for Dependable SystemsThe human mind can encompass only limited complexity:=> Comprehensibility=> Simplicity=> Partitioning of complexity10Principles for Dependable SystemsHigh-quality has to be built-in=> Each stage of development must be done well=> Testing and correction does not lead to quality=> Changes should be incorporated into the structure11Quality Management ProcessesAssumption:Good processes lead to good softwareThe importance of routine:Standard terminology (requirements, specification, design, etc.)Software standards (naming conventions, etc.)Internal and external documentationReporting procedures12Quality Management ProcessesChange management:Source code management and version controlTracking of change requests and bug reportsProcedures for changing requirements specifications, designs and other documentationRelease control13Design and Code Reviews• Colleagues review each other's work: can be applied to any stage of software development can be formal or informal• The developer provides colleagues with: documentation (e.g., specification or design), or code listing talks through the work while answering questions• Most effective when developer and reviewers prepare well14Benefits of Design and Code ReviewsBenefits:• Extra eyes spot mistakes, suggest improvements• Colleagues share expertise; helps with training• An occasion to tidy loose ends• Incompatibilities between modules can be identified• Helps scheduling and management control Fundamental requirements:• Senior team members must show leadership• Must be helpful, not threatening15Process (Plan) ReviewsObjectives:• To review progress against plan (formal or informal)• To adjust plan (schedule, team assignments, functionality, etc.)Impact on quality:Good quality systems usually result from plans that are demanding but realisticGood people like to be stretched and to work hard, but must not be pressed beyond their capabilities.16Statistical Testing• Determine the operational profile of the software• Select or generate a profile of test data• Apply test data to system, record failure patterns• Compute statistical values of metrics under test conditions17Statistical TestingAdvantages:• Can test with very large numbers of transactions• Can test with extreme cases (high loads, restarts, disruptions)• Can repeat after system modificationsDisadvantages:• Uncertainty in operational profile (unlikely inputs)• Expensive• Can never prove high reliability18Example: Dartmouth Time Sharing (1980)A central computer serves the entire campus. Any failure is serious.Step 1. Gather data on every failure • 10 years of data in a simple data base• Every failure analyzed:hardwaresoftware (default)environment (e.g., power, air conditioning)human (e.g., operator error)19Example: Dartmouth Time Sharing (1980)Step 2. Analyze the data.• Weekly, monthly, and annual statisticsNumber of failures and interruptionsMean time to repair• Graphs of trends by component, e.g.,Failure rates of disk drivesHardware failures after power failuresCrashes caused by software bugs in each module20Example: Dartmouth Time Sharing (1980)Step 3. Invest resources where benefit will be maximum, e.g.,• Orderly shut down after power failure• Priority order for software improvements• Changed procedures for operators• Replacement hardware21Factors for Fault Free Software• Precise, unambiguous specification• Organization culture that

View Full Document