Distributed OSes Continued Andy Wang COP 5611 Advanced Operating Systems More Introductory Materials Important Issues in distributed OSes Important distributed OS tools and mechanisms More Important Issues in Distributed Operating Systems Autonomy Consistency and transactions Autonomy To some degree users need to control their own resources The more a system encourages interdependence the less autonomy How to best trade off sharing and interdependence versus autonomy Problems with Too Much Interdependence Vulnerability to failures Global control Hard to pinpoint responsibility Hard security problems Problems with Too Much Autonomy Redundancy of functions Heterogeneity Especially in software Poor resource sharing Methods to Improve Autonomy Without causing problems with sharing Replicate vital services on each machine Don t export services that are unnecessary Provide strong security guarantee Consistency Maintaining consistency is a major problem in distributed systems If more than one system accesses data can be hard to ensure consistency But if cooperating processes see inconsistent data disasters are possible A Sample Consistency Problem Site A Data Item 1 Site B Site C A Sample Consistency Problem Site A Data Item 1 Site B Site C A Sample Consistency Problem Site A Data Item 1 Site B Site C A Sample Consistency Problem Site A Data Item 1 Site B Site C A Sample Consistency Problem Site A Data Item 1 Site B Site C A Sample Consistency Problem Site A Data Item 1 Site B Site C Causes of Consistency Problems Failures and partitions Caching effects Replication of data So why do this stuff Note these problems arise because of what are otherwise desirable features Failures and partitions Caching Working in the face of failures Avoiding repetition of expensive operations Replication Higher availability Handling Consistency Problems Don t share data Generally not feasible Callbacks Invalidations Ignore the problem Sometimes OK but not always Callback Methods Check that your data view is consistent whenever there might be a problem In most general case on every access More practically every so often Extremely expensive if remote check required High overheads if there s usually no problem Invalidation Methods When situations change inform those who know about the old situation Requires extensive bookkeeping Practical in some cases when changes infrequent High overheads if there s usually no problem Consistency and Atomicity Atomic actions are all or nothing Either the entire set of actions occur Or none of them do At all times including while being performed Apparently indivisible and instantaneous Relatively easy to provide in single machine systems Atomic Actions in Single Processors Lock all associated resources with semaphores or other synchronization mechanisms Perform all actions without examining unlocked resources Unlock all resources Real trick is to provide atomicity even if process is switched in the middle Why are distributed atomic actions hard Lack of centralized control What if multiple processes on multiple machines want to perform an atomic action How do you properly lock everything How do you properly unlock everything Failure conditions especially hard Important Distributed OS Tools and Mechanisms Caching and replication Transactions and two phase commit Hierarchical name space Optimistic methods Caching and Replication Remotely accessing data in the pits It almost always takes longer It s less predictable It clogs the network It annoys other nodes Other nodes annoy your It s less secure Caching vs Replication Temporary Read only Improve performance The notion of an original source Data Not aware of other caches Permanent Writable Improve availability Equal peers Data metadata Aware of other replicas But what else can you do Data must be shared And by off machine processes If the data isn t local and you need it you must get it So make sure data you need is local The problem is that everyone else also wants their data local Making Data Local Store what you need locally Make copies Migrate necessary data in Cache data Replicate data Store It Locally Each site stores the data it needs on local media But what if two sites need to store the same data Or if you don t have enough room for all your data Local Storage Example Site B Site A Bar Foo Site C Froz Make Copies Each site stores its own copy of the data it needs Works well for rarely updated data Like copies of system utility programs Works poorly for frequently written data Doesn t solve the problem of lack of local space Copying Example Site B Site A Copy of Foo Foo Site C Copy of Foo Migrate the Data In When you need a piece of data find it and bring it to your site Taking it away from the old site Works poorly for highly shared data Can cause severe storage problems Can overburden the network Essentially how shared software licenses work Migration Example Site B Site A Foo I need Foo Site C Migration Example Site B Site A Foo Site C Caching When data is accessed remotely temporarily store a copy of it locally Perhaps using callback or invalidation for consistency Or perhaps not Avoids problems of storage Still not quite right for frequently written data Caching Example Site B Site A Cached Foo Foo Site C Cached Foo Replication Maintain multiple local replicas of the data Changes made to one replica are automatically propagated to other replicas Logically connects copies of data into a single entity Doesn t answer question of limited space Replication Example Site B Site A Foo2 Foo1 Site C Foo3 Replication Advantages Most accesses to data are purely local Fault tolerance So performance is good Failure of a single node doesn t lose data Partitioned sites can access data Load balancing Replicas can share the work Replication and Updates When a data item is replicated updates to that item must be propagated to all replicas Updates come to one replica Something must assure they get to the others Replication Update Example Site B Site A Foo2 Foo1 Site C update Foo Foo3 Replication Update Example Site B Site A Foo2 Foo1 Site C update Foo Foo3 Update Propagation Methods Instant versus delayed Synchronous versus asynchronous Atomic versus non atomic Instant vs Delayed Propagation Instant can t mean instant in a distributed system Instant notification not always possible But it can mean quickly One update maps to one propagation What if a site storing a replica is down So some delayed version of update is also required
View Full Document