Maintenance-Free Global Data StorageGoal of the paperOceanStoreHow they plan to achieve it ?System OverviewTapestry- The Routing SubsystemHow does Tapestry route information ?What happens when a new node comes up ?What happens when a replica is added ?Fault tolerance of TapestryInner RingDurabilityByzantine Update CommitmentObject RedundancyAuto RepairIntrospective Replica Mgmt.ConclusionThank YouCS791 Aravind ElangoMaintenance-Free Global Data StorageSean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John KubiatowiczGoal of the paperTo describe a persistent, distributed data storage system called ‘OceanStore’. To explain aspects of its design and the efficiency of the methodologies used in the system.OceanStore concentrates on preservation of bits, aiming for the same thing as Intermemory.OceanStoreIt is based on the idea that there are a number of servers on the internet, collectively on which data from a multitude of client devices can be stored. Unlike Intermemory, no guarantee of infinite persistence or condition that a server has to be available for a fixed amount of time.How they plan to achieve it ?A self-organizing routing infrastructureErasure codesByzantine update commitment for recording changeIntrospective replica managementSystem OverviewA data unit is divided into a number of fragments and dispersed through the network. The correctness of the fragment and the data block it belongs to is identified by Globally unique identifier (GUID).The GUID is currently a 160 bit hash generated based on the object’s content.Tapestry- The Routing SubsystemOverlay on top of IP and uses UDP for checking the validity of nodes.Handles communication between nodes and locating the objects.Nodes are identified by a unique NodeID. Automatically detects the loss of nodes and corrects routing information.http://oceanstore.cs.berkeley.edu/publications/papers/pdf/ieeeic.pdfHow does Tapestry route information ?What happens when a new node comes up ?The node should randomly choose a NodeID for itself and should know NodeID of at least one other currently active node.The new node uses node it is connected to, to find nodes that share incrementally long prefixes.The new node can advertise its services to world once it is added, but would need recommendation to become a member of an inner ring.What happens when a replica is added ?Using the GUID of the replicated object, it is possible to find the root of the object.A pointer to the new replica is copied onto each node in the path between the node in which the replica is stored and its root node.Fault tolerance of TapestryTapestry uses redundant neighbor pointers to route request when an object is unavailable. Even when half the number of nodes are down, the tapestry stands a 10% chance of reaching the destination node.Inner RingEach data unit is divided into fragments and is distributed onto a small number of nodes. This acts as the object’s primary replica.These set of nodes are called as the inner ring of the object and are responsible for the object. Typically the inner ring has < 10 nodes.http://oceanstore.cs.berkeley.edu/publications/papers/pdf/ieeeic.pdfDurabilityByzantine Update CommitmentWhen there are f faulty nodes among (3f+1) nodes, the system still functions as desired.It is using this strategy that fragments are verified and modifications are made. It is assumed that no more than f nodes in the inner ring would fail. The nodes on the inner ring can also be changed without affecting the hashcode it would generate for the object.Object RedundancyApart from being replicated in the inner ring, the object is divided and distributed among other servers as well. Those are secondary replicas.The nodes holding secondary replicas receive consistency application from the nodes of the inner ring.Auto RepairDisks are constantly monitored for signs of impending failure.Using the hash codes, the individual segments could be verified.When the number of replicas of a given object falls below a particular value, new replicas are created and distributed.Introspective Replica Mgmt.If an object is highly used, more replicas are created.Clients might request more replicas when QOS factors go below acceptable limits.The Tapestry could detect requests over long distances and suggest replicas for new locations.ConclusionResearch effort seems more promising than Intermemory providing more realistic promises and higher level functions including client side cryptography.Demonstration efforts are still in the nascent stage.Thank
View Full Document