CS514: Intermediate Course in Operating SystemsRecapSlide 3Slide 4Slide 5Trustworthy Web ServicesTrustworthy ComputingCategories of systems…ExamplesTechniques vary!Importance of “COTS”The dilemmaAre COTS trustworthy?Slide 14Is this enough?Slide 16SoS and SOAsExample: the Air Force JBIInside the Battlespace InfoSphere (circa 1999)JBI BasicsArchitectural ConceptA fusion of BIG systemsObservations?Systems of Systems (SoS) and Service Oriented Architectures (SOAs)Implications of bigness?Trusting multi-component systemsCS514 threat modelOur modelNetwork modelExecution model: asynchronousSynchronous and Asynchronous ExecutionsReality: neither oneFailure modelDetecting failuresThought problemSam and JillThey eat inside! Sam reasons:Sam had better send an AckWhy didn’t this help?New and improved protocolHow Sam and Jill’s romance endedThings we just can’t doConsistencyDoes this matter in big systems?Why is this important?A bad news story?Trust and ConsistencyLooking aheadHomework (don’t hand it in)CS514: Intermediate Course in Operating SystemsProfessor Ken BirmanVivek Vishnumurthy: TARecapWe started by thinking about Web ServicesBasically, a standardized architecture that clients client systems talk to serversUses XML and other Web protocolsAnd will be widely popular (“ubiquitous”)Our goal is to build “trustworthy” systems using these standard, off-the-shelf techniquesSo we started to look at the issues top downRecapFirst we looked at naming/discoveryWe asked what decisions need to be madeClient needs to pick the right service I want this particular database, or display deviceService may have a high-level routing decisionSend “East Coast” requests to the New Jersey centerService also makes lower-level decisionsJohn Smith is doing a transaction; send requests to the same node if possible to benefit from cachingAnd finally the network does routingRecapIn the case of naming/discoveryWe observed that the architecture doesn’t really offer “slots” for the associated logicDevelopers can solve these problemsI.e. by using the DNS to redirect requestsBut the solutions feel like hacksIdeally one would wish that Web Services tackled such issues. One day they will! But not for a decade…RecapNext we looked at performance issuesWe imagined that we’re building a service and want to increase load on itLed us to think about threading, staged event queuing (SEDA)Eventually leads us to a clustered architecture with load-balancersAgain, found that WS lacks key featuresTrustworthy Web ServicesTo have confidence in solutions we need rigorous technical answersTo questions like “tracking membership” or “data replication” or “recovery after crash”And we need these embodied into WSFor example, would want best-of-breed answers in some sort of discovery “tool” that applications can exploitTrustworthy ComputingOverall, we want to feel confident that the systems we build are “trustworthy”But what should this mean, and how realistic a goal is it?TodayDiscuss some interpretations of the termSettle on the “model” within which we’ll work during the remainder of the termCategories of systems…Roles computing systems play vary widelyMost computing systems aren’t critical in a minute-by-minute sense… but some systems matter more; if they are down, the enterprise is losing money… and very rarely, we need to build ultra-reliable systems for mission-critical usesExamplesFly-by-wire control system for airplaneMilitary weapons targeting systemMicrosoft.com websiteLess “critical”More “critical”Benign threatsMalicious attackHospital billing systemControl of electric power gridAuthentication system of a campus networkOur focusTechniques vary!Less critical systems that face accident (not attack) lend themselves to cheaper solutionsParticularly if we don’t mind outages when something crashesHigh or continuous availability is harderThe mixture of time-critical, very secure, very high availability is particularly difficultSolutions don’t integrate well with standard tools“Secure and highly available” can also be slowImportance of “COTS”The term means “commercial off the shelf”To understand importance of COTS we need to understand history of computingPrior to 1980, “roll your own” was commonBut then with CORBA (and its predecessors) well-supported standards won the dayProductivity benefits of using standards are enormous: better development tools, better system management support, better feature setsToday, most projects mandate COTSThe dilemmaBut major products have been relaxed about:Many aspects of securityReliabilityTime-critical computing (not the same as “fast”)Jim Gray: “Microsoft is mostly interested in multi-billion dollar markets. And it isn’t feasible to make 100% of our customers happy. If we can make 80% of them happy 90% of the time, we’re doing just fine.”Are COTS trustworthy?Security is improving but still pretty weakData is rarely protected “on the wire”Systems are not designed with the threat of overt attack in mindOften limited to perimeter security; if the attacker gets past the firewall, she’s home freeAuditing and system management functions are frequently inadequateAre COTS trustworthy?Most COTS technologies do anticipate crashes and the need to restartYou can usually ask the system to watch your application and relaunch after failureYou can even ask for a restart on a different node… but there won’t be any protection against split-brain problemsSo-called “transactional” model can helpAlternatively can make checkpoints, or replicate critical data, but without platform helpIs this enough?The way COTS systems provide restart is potentially slowTransactional “model” can’t offer high availability (we’ll see why later)Often must wait for failed machine to reboot, clean up its data structures, relaunch its main applications, etcIn big commercial systems could be minutes or even hoursNot enough… if we want high availabilityAre COTS trustworthy?Security… reliability… what about:Time-critical applications, where we want to guarantee a response within some bounded time (and know that the application is fast enough… but worry about platform overheads and
View Full Document