CS514: Intermediate Course in Operating SystemsPerspectives on Computing Systems and NetworksStyles of CourseRecent TrendsUnderstanding TrendsSlide 6Ken’s biasButler Lampson’s InsightExample: Air Traffic Control using Web technologiesATC systems divide country upMore details on ATCIssues with old systemsConcept of IBM’s 1994 systemATC ArchitectureSo… how to build it?IBM: Independent consoles… backed by ultra-reliable componentsFrance: Multiple consoles… but in some ways they function like oneDifferent emphasisOther technologies usedIBM Project Was a Fiasco!!Where did IBM go wrong?ATC problem lingers in USA…Free Flight (cont)Impact of technology trendsExamples of mission-critical applicationsWe depend on distributed systems!Critical Needs of Critical ApplicationsSo what makes it hard?End-to-End argumentSlide 30Saltzer et. al. analysisGeneralized End-to-End view?E2E is visible in J2EE and .NETExample: Server replicationSplit brain Syndrome…Slide 36Split brain SyndromeImplication?Can we fix this problem?Can we fix this problem?CS514 projectYou can work in small teamsNot much homework or examsPlanned coverage of topicsTextbook and readingsCS514: Intermediate Course in Operating SystemsProfessor Ken BirmanVivek Vishnumurthy: TAPerspectives on Computing Systems and NetworksCS314: Hardware and architectureCS414: Operating Systems CS513: Security for operating systems and appsCS514: Emphasis on “middleware”: networks, distributed computing, technologies for building reliable applications over the middlewareCS519: Networks, aimed at builders and usersCS614: A survey of current research frontiers in the operating systems and middleware spaceCS619: A reading course on research in networksStyles of CourseCS514 tries to be practical in emphasis:We look at the tools used in real products and real systemsThe focus is on technology one could build / buyBut not specific productsOur emphasis:What’s out there?How does it work?What are its limits?Can we find ways to hack around those limits?Recent TrendsThe internet boom is maturingWe understand how to build big data centers and have a new architecture, Web Services, to let computers talk directly to computers using XML and other Web standardsThere are more and more small devices, notably web-compatible cell phonesObject orientation and components have emerged as prevailing structural optionCORBA, J2EE, .NETWidespread use of transactions for reliability and atomicityUnderstanding TrendsBasically two optionsStudy the fundamentalsThen apply to specific toolsOrStudy specific toolsExtract fundamental insights from examplesUnderstanding TrendsBasically two optionsStudy the fundamentalsThen apply to specific toolsOrStudy specific toolsExtract fundamental insights from examplesKen’s biasI work on reliable, secure distributed computingAir traffic control systems, stock exchanges, electric power gridMilitary “Information Grid” systemsModern data centersTo me, the question is:How can we build systems that do what we need them to do, reliably, accurately, and securely?Butler Lampson’s InsightWhy computer scientists didn’t invent the webCS researchers would have wanted it to “work”The web doesn’t really workBut it doesn’t really need to!Gives some reason to suspect that Ken’s bias isn’t widely shared!Example: Air Traffic Control using Web technologiesAssume a “private” networkWeb browser could easily show planes, natural for controller interactionsWhat “properties” would the system need?Clearly need to know that trajectory and flight data is current and consistentWe expect it to give sensible advice on routing options (e.g. not propose dangerous routes)Continuous availability is vital: zero downtimeExpect a soft form of real-time responsiveness Security and privacy also required (post 9/11!)ATC systems divide country upFranceMore details on ATCEach sector has a control centerCenters may have few or many (50) controllersIn USA, controller works aloneIn France, a “controller” is a team of 3-5 peopleData comes from a radar system that broadcasts updates every 10 secondsDatabase keeps other flight dataControllers each “own” smaller sub-sectorsIssues with old systemsOverloaded computers that often crashAttempt to build a replacement system failed, expensively, back in 1994Getting slow as volume of air traffic risesInconsistent displays a problem: phantom planes, missing planes, stale informationSome major outages recently (and some near-miss stories associated with them)TCAS saved the day: collision avoidance system of last resort… and it works….Concept of IBM’s 1994 systemReplace video terminals with workstationsBuild a highly available real-time system guaranteeing no more than 3 seconds downtime per yearOffer much better user interface to ATC controllers, with intelligent course recommendations and warnings about future course changes that will be neededATC ArchitectureNETWORK INFRASTRUCTURENETWORK INFRASTRUCTURE DATABASEDATABASESo… how to build it?In fact IBM project was just one of two at the time; the French had one tooIBM approach was based on lock-step replicationReplace every major component of the system with a fault-tolerant component setReplicate entire programs (“state machine” approach)French approach used replication selectivelyAs needed, replicate specific data items. Program “hosts” a data replica but isn’t itself replicatedIBM: Independent consoles… backed by ultra-reliable componentsConsoleATCdatabaseATC database is really a high-availability clusterRadar processing system is redundantATCdatabaseFrance: Multiple consoles… but in some ways they function like oneConsole AConsole BConsole CATCdatabaseATC database only sees one connectionRadar updates sent with hardware broadcastsDifferent emphasisIBM imagined pipelines of processing with replication used throughout. “Services” did much of the work.French imagined selectively replicated data, for example “list of planes currently in sector A.17”E.g. controller interface programs could maintain replicas of certain data structures or variables with system-wide valuePrograms did computing on their own helped by databasesOther technologies usedBoth used standard off-the-shelf
View Full Document