CS514: Intermediate Course in Operating SystemsRecapThings data centers needSlide 4Slide 5Slide 6Trustworthy Web ServicesTrustworthy ComputingCategories of systems…ExamplesTechniques vary!Importance of “COTS”The dilemmaAre COTS trustworthy?Slide 15Is this enough?Slide 17SoS and SOAsExample: the Air Force JBIInside the Battlespace InfoSphere (circa 1999)JBI BasicsArchitectural ConceptA fusion of BIG systemsObservations?Implications of bigness?Trusting multi-component systemsCS514 threat modelOur modelNetwork modelExecution model: asynchronousSynchronous and Asynchronous ExecutionsReality: neither oneFailure modelDetecting failuresThought problemSam and JillThey eat inside! Sam reasons:Sam had better send an AckWhy didn’t this help?New and improved protocolHow Sam and Jill’s romance endedThings we just can’t doConsistencyDoes this matter in big systems?Why is this important?A bad news story?Trust and ConsistencyLooking aheadHomework (don’t hand it in)CS514: Intermediate Course in Operating SystemsProfessor Ken BirmanVivek Vishnumurthy: TARecapWe started by thinking about Web ServicesBasically, a standardized architecture that clients client systems talk to serversUses XML and other Web protocolsAnd will be widely popular (“ubiquitous”)Our goal is to build “trustworthy” systems using these standard, off-the-shelf techniquesSo we started to look at the issues top downThings data centers needFront ends to build pages and run business logic for both human and computer clientsA means for clients to “discover” a good server (close by… not overloaded… affinity)Tools for building the data center itself: communication, replication, load-balancing, self-monitoring and management, etcRecapWith this model in mind we looked at naming/discoveryWe asked what decisions need to be madeClient needs to pick the right service I want this particular database, or display deviceService may have a high-level routing decisionSend “East Coast” requests to the New Jersey centerService also makes lower-level decisionsJohn Smith is doing a transaction; send requests to the same node if possible to benefit from cachingAnd finally the network does routingRecapIn the case of naming/discoveryWe observed that the architecture doesn’t really offer “slots” for the associated logicDevelopers can solve these problemsI.e. by using the DNS to redirect requestsBut the solutions feel like hacksIdeally Web Services should address such issues. One day it will, by generalizing the content distribution “model” popularized by AkamaiRecapNext we looked at scalability issuesWe imagined that we’re building a service and want to increase load on itLed us to think about threading, staged event queuing (SEDA)Eventually leads us to a clustered architecture with load-balancersAgain, found that WS lacks key featuresTrustworthy Web ServicesTo have confidence in solutions we need rigorous technical answersTo questions like “tracking membership” or “data replication” or “recovery after crash”And we need these embodied into WSFor example, would want best-of-breed answers in some sort of discovery “tool” that applications can exploitTrustworthy ComputingOverall, we want to feel confident that the systems we build are “trustworthy”But what should this mean, and how realistic a goal is it?TodayDiscuss some interpretations of the termSettle on the “model” within which we’ll work during the remainder of the termCategories of systems…Roles computing systems play vary widelyMost computing systems aren’t critical in a minute-by-minute sense… but some systems matter more; if they are down, the enterprise is losing money… and very rarely, we need to build ultra-reliable systems for mission-critical usesExamplesFly-by-wire control system for airplaneMilitary weapons targeting systemElectronic medical healthcare recordsLess “critical”More “critical”Benign threatsMalicious attackHospital billing systemControl of electric power gridAuthentication system of a campus networkOur focusTechniques vary!Less critical systems that face accident (not attack) lend themselves to cheaper solutionsParticularly if we don’t mind outages when something crashesHigh or continuous availability is harderThe mixture of time-critical, very secure, very high availability is particularly difficultSolutions don’t integrate well with standard tools“Secure and highly available” can also be slowImportance of “COTS”The term means “commercial off the shelf”To understand importance of COTS we need to understand history of computingPrior to 1980, “roll your own” was commonBut then with CORBA (and its predecessors) well-supported standards won the dayProductivity benefits of using standards are enormous: better development tools, better system management support, better feature setsToday, most projects mandate COTSThe dilemmaBut major products have been relaxed about:Many aspects of securityReliabilityTime-critical computing (not the same as “fast”)Jim Gray: “Microsoft is mostly interested in multi-billion dollar markets. And it isn’t feasible to make 100% of our customers happy. If we can make 80% of them happy 90% of the time, we’re doing just fine.”Are COTS trustworthy?Security is improving but still pretty weakData is rarely protected “on the wire”Systems are not designed with the threat of overt attack in mindOften limited to perimeter security; if the attacker gets past the firewall, she’s home freeAuditing and system management functions are frequently inadequateAre COTS trustworthy?Most COTS technologies do anticipate crashes and the need to restartYou can usually ask the system to watch your application and relaunch after failureYou can even ask for a restart on a different node… but there won’t be any protection against split-brain problemsSo-called “transactional” model can helpAlternatively can make checkpoints, or replicate critical data, but without platform helpIs this enough?The way COTS systems provide restart is potentially slowTransactional “model” can’t offer high availability (we’ll see why later)Often must wait for failed machine to reboot, clean up its data structures, relaunch its main applications,
View Full Document