Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49iClickers!How many CS classes are you planning on taking in the fall?A. 1B. 2C. 3D. 4E. none :(CS61A Lecture 27Therac Case Study/Programming PracticesHamilton NguyenAdministrivia●Review Session TONIGHT – 306 Soda, 6:30-9:30pm●Homework 13 (last HW!) due TONIGHT – 11:59pm●Wed/Thurs (8/12) Sections converted to office hours/general review●Final THURSDAY 8/12, 155 Dwinelle, 7-10pmPart I: Therac Case StudyTherac-25What happened?●6 accidents – serious burns●4 deaths●Otherwise effective – saved hundreds of livesLesson to be learned●Social responsibility in engineering●First real incident of fatal software failure●What is good software engineering?Lesson in Ethics?●Not that simple...●There were no bad guys●Honestly believed there were no issues●But something was clearly wrong......so why couldn't they see it?“Software Rot”●Other engineering fields: clear sense of degradation and decay●Software doesn't become brittle or fractured... does it?●Phenomenon of software degrading after timeA bigger picture●All software is part of a bigger system●Software degrades because:●Other piece of software changes●Hardware changesEx: Compatibility IssuesA bigger issue●The makers of the Therac did not fully understand the complexity of their software●Characterized by intricate web of dependencies and relations●Other engineering disciplines – complexity of their creations are more apparentA “simple” program●One of my favorites...●Spider Solitairesource: CS161 Sp11Complexity and You●Hyper-technological modern society●Limitless reach of software complexity●Is every piece of software lethal?Problems with Therac-25●No atomic test-and-set●No more hardware interlocks●Abundant user interface issuesUI Problems●Cursor position and form entryUI Problems●Cursor position and form entry●Default valuesUI Problems●Cursor position and form entry●Default values●Too many error messagessource: CS161 Sp11How would you solve these?●Cursor position and form entry●Default values●Too many error messagesProblems with Therac-25●No atomic test-and-set●No more hardware interlocks●Abundant user interface issues●Bad documentation●Organization ResponseHow do we solve these problems?●One idea:●Responsible programming●Big idea:●Redundancy(define (mc-eval exp env)(cond ((self-evaluating?...((variable?......(else(error “Unknown exp”...How do we solve these problems?●Redundancy●Know your user●Fail-Soft (or Fail-Safe)●Audit Trail●Correctness from the startCorrectness from the start●Edsger Dijkstra: “On the Cruelty of Teaching Mathematics”●CS students shouldn't use computers●Rigorously prove correctness of programVerification Techniques●Correctness proofs●Compilation (pre-execution) analysisDebugging Techniques●Black box debugging●Glass box debugging●Don't break what works●And the golden rule of debugging...“Debug by subtraction, not by addition”Prof. Brian HarveyCan you think of examples?●Redundancy●Know your user●Fail-Soft (or Fail-Safe)●Audit Trail●Correctness from the startBreakiClickers!Which project was your favorite?A. 1 – Twenty-one B. 2 – Painter languageC. 3 – Adventure gameD. 4 – Logo interpreterE. All of them! :)●Redundancy●Know your user●Fail-Soft (or Fail-Safe)●Audit Trail●Correctness from the startBig IdeasFlashback: MapReducesource: blog.maxgarfinkel.comFailure?Is this even an issue?Warehouse Scale ComputingiClickers!Let's say you have... 50,000 servers. Each server has four disks. On average, how often do you get a disk failure?A. Once per yearB. Once per monthC. Once per weekD. Once per dayE. Once per houriClickers!Failure rate of disk is 2%-10% per year. Let's assume 4%. In one year, 4% of 200,000 disks fail = 8,000 disks. There are 8,760 hours in a year.A. Once per yearB. Once per monthC. Once per weekD. Once per dayE. Once per hourWarehouse Scale ComputingGoogle is estimated to have 900,000 servers.Is failure even an issue? Yes.Redundant redundancy●How do they deal with a worker failing?●Answer: redundancy●When a worker fails, one of its “superiors” (a scheduler node) assigns a new worker to complete its taskRedundant redundancy●How do they know a worker has failed?●Answer: redundancy●Workers are programmed to periodically report to their superiors●If a worker falls “silent”, it is no longer capable of operatingRedundant redundancy●How can they always replace downed workers?●Answer: redundancy●Hundreds of thousands of possible replacements●What is the result of all of this?Redundant redundancy●How can they always replace downed workers?●Answer: redundancy●Hundreds of thousands of possible replacements●What is the result of all of this?●Answer: When was the last time Google search was down?Safe Browsing●Let's say you visit some website... like Facebook●How do you know it's really Facebook, and not some evil site that only looks like Facebook?Safe Browsing●Answer: Website certificates●Verify through a trusted 3rd party that website displays correct certificate●What if website has been certified by 3rd party that is not necessarily trusted?●What if we can't receive the certification at all?Fail-Safe
View Full Document