1University of TorontoDepartment of Computer Science© Easterbrook 20041Lecture 10:Risk General ideas about Risk Risk Management Identifying Risks Assessing Risks Case Study: Mars Polar LanderUniversity of TorontoDepartment of Computer Science© Easterbrook 20042Risk Management About Risk Risk is “the possibility of suffering loss” Risk itself is not bad, it is essential to progress The challenge is to manage the amount of risk Two Parts: Risk Assessment Risk Control Useful concepts: For each risk: Risk Exposure RE = p(unsat. outcome) X loss(unsat. outcome) For each mitigation action: Risk Reduction Leverage RRL = (REbefore - REafter) / cost of interventionUniversity of TorontoDepartment of Computer Science© Easterbrook 20043Principles of Risk Management Global Perspective View software in context of a largersystem For any opportunity, identify both:Potential valuePotential impact of adverse results Forward Looking View Anticipate possible outcomes Identify uncertainty Manage resources accordingly Open Communications Free-flowing information at allproject levels Value the individual voiceUnique knowledge and insights Integrated Management Project management is riskmanagement! Continuous Process Continually identify and manage risks Maintain constant vigilance Shared Product Vision Everybody understands the missionCommon purposeCollective responsibilityShared ownership Focus on results Teamwork Work cooperatively to achieve thecommon goal Pool talent, skills and knowledgeSource: Adapted from SEI Continuous Risk Management GuidebookUniversity of TorontoDepartment of Computer Science© Easterbrook 20044Continuous Risk Management Identify: Search for and locate risks beforethey become problemsSystematic techniques to discover risks Analyse: Transform risk data into decision-making information For each risk, evaluate:ImpactProbabilityTimeframe Classify and Prioritise Risks Plan Choose risk mitigation actions Track Monitor risk indicators Reassess risks Control Correct for deviations from the riskmitigation plans Communicate Share information on current andemerging risksSource: Adapted from SEI Continuous Risk Management Guidebook2University of TorontoDepartment of Computer Science© Easterbrook 20045Fault Tree AnalysisWrong or inadequatetreatment administeredVital signserroneously reportedas exceeding limitsVital signs exceedcritical limits but notcorrected in timeFrequency ofmeasurementtoo lowVital signsnot reportedComputerfails to raisealarmNurse doesnot respondto alarmComputer doesnot read withinrequired timelimitsHuman setsfrequencytoo lowSensorfailureNurse failsto input themor does soincorrectlyetcEvent that results froma combination of causesBasic fault eventrequiring no furtherelaborationOr-gateAnd-gateSource: Adapted from Leveson, “Safeware”, p321University of TorontoDepartment of Computer Science© Easterbrook 20046Likelihood of OccurrenceVery likelyPossibleUnlikely(5) Loss of LifeCatastrophicCatastrophicSevere(4) Loss ofSpacecraftCatastrophicSevereSevere(3) Loss ofMissionSevereSevereHigh(2) DegradedMissionHighModerateLow(1) InconvenienceModerateLowLowRisk Assessment Quantitative: Measure risk exposure using standard cost & probability measures Note: probabilities are rarely independent Qualitative: Develop a risk classification matrix:University of TorontoDepartment of Computer Science© Easterbrook 20047Source: Adapted from Boehm, 1989Top 10 Development Risks (+ Countermeasures) Personnel Shortfalls use top talent team building training Unrealistic schedules/budgets multisource estimation designing to cost requirements scrubbing Developing the wrong Softwarefunctions better requirements analysis organizational/operational analysis Developing the wrong UserInterface prototypes, scenarios, task analysis Gold Plating requirements scrubbing cost benefit analysis designing to cost Continuing stream of reqtschanges high change threshold information hiding incremental development Shortfalls in externally furnishedcomponents early benchmarking inspections, compatibility analysis Shortfalls in externallyperformed tasks pre-award audits competitive designs Real-time performance shortfalls targeted analysis simulations, benchmarks, models Straining computer sciencecapabilities technical analysis checking scientific literatureUniversity of TorontoDepartment of Computer Science© Easterbrook 20048Case Study: Mars Polar Lander Launched 3 Jan 1999 Mission Land near South Pole Dig for water ice with arobotic arm Fate: Arrived 3 Dec 1999 No signal received afterinitial phase of descent Cause: Several candidate causes Most likely is prematureengine shutdown due to noiseon leg sensors3University of TorontoDepartment of Computer Science© Easterbrook 20049What happened? Investigation hampered bylack of data spacecraft not designed to sendtelemetry during descent This decision severely criticized byreview boards Possible causes: Lander failed to separate from cruisestage (plausible but unlikely) Landing site too steep (plausible) Heatshield failed (plausible) Loss of control due to dynamiceffects (plausible) Loss of control due to center-of-mass shift (plausible) Premature Shutdown of DescentEngines (most likely!) Parachute drapes over lander(plausible) Backshell hits lander (plausible butunlikely)University of TorontoDepartment of Computer Science© Easterbrook 200410Premature Shutdown Scenario Cause of error Magnetic sensor on each leg senses touchdown Legs unfold at 1500m above surface transient signals on touchdown sensors during unfolding software accepts touchdown signals if they persist for 2 timeframes transient signals likely to be long enough on at least one leg FactorsSystem requirement to ignore the transient signals But the software requirements did not describe the effect s/w designers didn’t understand the effect, so didn’t implement the requirement Engineers present at code inspection didn’t understand the effect Not caught in testing because: Unit testing didn’t include the transients Sensors improperly wired during integration tests (no touchdown
View Full Document