Persistence of Data in a Dynamic Unreliable NetworkOutlineThe Data ObjectMutable DataBranching and VersioningOverview of macrobranchingMacrobranching StoryMacrobranching DetailsApplication: NFS w/ Branching and Time TravelDynamic AccessDHT AdvantagesBasic AssumptionsBasic Cost ModelBW for Redundancy MaintenanceNeed Too Much BW to Maintain RedundancyHardware TrendsSolution IndirectionModels for Comparison I: DHT vs DDModels for Comparison II: DHT vs DDBW/N vs LifetimeBW/N vs Data/PtrReplication vs CodingProblems with DD IProblems with DD IIEfficient HeartbeatsWhen TriggersConclusionsPersistence of Data in a Dynamic Unreliable NetworkPresented by Rachel Rubin and Hakim WeatherspoonCS294-4: Peer-to-Peer SystemsFastestFlakySlowerFlakySlowFasterStableFast Slowtest10 .. 100s GB/Node of Idle Cheap DiskDistributed Data Store w/ all the *ilities:High AvailabilityGood ScalabilityHigh ReliabilityMaintainabilityFlexibilityReliable SubstrateP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:2Outline•Motivation/Desires–Reliable distributed data store with dynamic members–Harness aggregate power of system.•Questions about data store–How data is structured–How to access data–Amount of resources used to keep data durable? Storage? Bandwidth?•Branching•Cost of maintaining redundancy•Optimized implementation•ConclusionP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:3The Data ObjectDataBlocksVGUIDiVGUIDi + 1d2d4d3d8d7d6d5d9d1Data B -TreeIndirectBlocksMd'8d'9Mback pointercopy on writecopy on writeAGUID = hash{name+keys}P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:4Mutable Data•Need mutable data for real system.–Entity in network.–A-GUID to V-GUID mapping.–Point of serialization for integrity–Atomically applies update.•Versioning system–Each version is inherently read-only.P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:5Branching and Versioning•Modifying old versions of the data–Possible conflict merging modified old version with the current head•Branching –Provides different data threads•Makes time-travel more functional–Multiple data threads•Operational–Defer Conflicts –Not abort updates•Disconnected operationsP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:6Overview of macrobranching•Each branch is treated as its own object•Branches are created from the main objectP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:7Macrobranching StoryAGUID2AGUID3AGUID4AGUID1TimeP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:8Macrobranching Details•Writing–Create branch in serializer–Mark in main branch metadata new branch creation–Mark in new branch which object and version it was created from–New AGUID needs to be managed•Reading–From new AGUID•Close–Can no longer write to branch–Merge with main branch if specified•RecoveryP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:9Application: NFS w/ Branching and Time Travel•Accessing data from a point in the past and modifying from there•Directories can be rolled back and modified•Modifications are not automatically the main branch head•Organizationally more cleanP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:10 Dynamic Access•Access data reliably in a dynamic network.•How much does this cost?P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:11DHT Advantages•Spread storage burden evenly–Avoid hot spots•Tolerate unreliable participants•O(log N) algorithm•Simple–DHT automatically and autonomously maintains data•Decides who, what, when, where, why, and howP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:12Basic Assumptions • P2P Purist Ideals–Cooperation, Symmetry, Decentralized•DHT Assumptions–Simple redundancy maintenance mechanisms•enter and exit–Static data placement strategy (f: RB-> N)–Identical per-node space and bandwidth contributions–Constant rate of entering and exiting.–Independence of exit events–Constant steady-state number of nodes and total data size–Maintenance bandwidth•Average case analysisP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:13Basic Cost Model•N: number of hosts•D: data•S: data + redundancy (S = kD): entering rate: exiting rate ( = )•T: lifetime (T=N/)•B: bandwidth: Membership timeout–distinguish true departures from temporary downtime–delay its response to failures•a: availability–Hosts serve as a fraction of time–More redundancy is needed–Effective bandwidth is reducedP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:14BW for Redundancy Maintenance•maintenance BW 200 Kbps•lifetime = Median 2001-Gnutella session = 1 hour•served space = 90 MB/node << donatable storage!P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:15Need Too Much BW toMaintain Redundancy•Wait! It gets worse… HW trendsHighAvailabilityScalableStorageDynamicMembershipMust Pick TwoP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:16Hardware Trends•The picture only gets worse•Participation should be more stable to contribute meaningful fraction of disksP2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:17Solution Indirection•Distributed directory (DD)–Uses a level of indirection–Decouples networking layer from data layer–Controls the data placement–Exploits heterogeneity (availability, lifetime, and bandwidth)P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:18Models for Comparison I:DHT vs DD•Data extracted from [Bhagwan, Savage, and Voelker 2003]P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:19Models for Comparison II:DHT vs DD•Reliable nodes–Greater than 70% availability•Model 1–DHT•Model 2–Reliable nodes (Model 2.a): store data and DD pointers. –Unreliable nodes (Model 2.b): store DD pointers only•Model 3–Reliable nodes (Model 3.a): store all data and DD pointers.–Unreliable nodes (Model 3.b): do nothing (I.e. free loader)P2P Systems 2003 ©2003 Rachel Rubin and Hakim Weatherspoon Persistent Data:20BW/N vs LifetimeP2P Systems 2003 ©2003 Rachel
View Full Document