Berkeley COMPSCI 268 - Beyond Theory: DHTs in Practice

Unformatted text preview:

Beyond Theory: DHTs in PracticeTalk OutlineMaking DHTs Robust: The Problem of Membership ChurnHow Bad is Churn in Real Systems?Refresher: DHT Lookup/RoutingCan DHTs Handle Churn? A Simple TestTest ResultsThe Bamboo DHTHow Bamboo Handles Churn (Overview)Bamboo Basics: Partition Key SpaceBamboo Basics: Build Overlay NetworkBamboo Basics: Route Puts/Gets Thru OverlayRouting Around FailuresSlide 14Computing Good TimeoutsSlide 16Slide 17Timeout Estimation PerformanceRecovering From FailuresSlide 20Slide 21Slide 22The Problem with Reactive RecoverySlide 24Periodic RecoverySlide 26Slide 27Slide 28Periodic Recovery PerformanceProximity Neighbor Selection (PNS)Slide 31Slide 32How Important is PNS?PNS by Random SamplingPNS ResultsPlanetLab DeploymentExcellent AvailabilitySlide 38A Small Sample of DHT ApplicationsQuestions:Benefits of Sharing a DHTChallenges in Sharing a DHTThe DHT as a ServiceThe DHT as a ServiceSlide 45Slide 46Slide 47It’s not lookup()How are DHTs Used?What about put/get?Recursive Distributed RendezvousReDiRSlide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61ReDiR Performance (On PlanetLab)OpenDHT Service ModelSlide 64Protecting Against OveruseFair Storage AllocationThe Problem of StarvationPreventing StarvationSlide 69Slide 70Slide 71Slide 72Slide 73Defining “Most Under-Represented”Slide 75Slide 76FST PerformanceSlide 78Future Work: ThroughputSlide 80Future Work: UpcallsSlide 82For more information, see http://bamboo-dht.org/ http://opendht.org/Beyond Theory: DHTs in PracticeCS 268 - NetworksSean C. RheaApril 18, 2005In collaboration with: Dennis Geels, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Timothy Roscoe, Scott Shenker, Ion Stoica, and Harlan YuSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Talk Outline•Bamboo: a churn-resilient DHT–Churn resilience at the lookup layer [USENIX’04]–Churn resilience at the storage layer [Cates’03], [Unpublished]•OpenDHT: the DHT as a service–Finding the right interface [IPTPS’04]–Protecting against overuse [Under Submission]•Future workSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Making DHTs Robust:The Problem of Membership Churn•In a system with 1,000s of machines, some machines failing / recovering at all times•This process is called churn•Without repair, quality of overlay network degrades over time•A significant problem deployed peer-to-peer systemsSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005How Bad is Churn in Real Systems?Authors Systems Observed Session TimeSGG02 Gnutella, Napster 50% < 60 minutesCLL02 Gnutella, Napster 31% < 10 minutesSW02 FastTrack 50% < 1 minuteBSV03 Overnet 50% < 60 minutesGDS03 Kazaa 50% < 2.4 minutestimearrive depart arrive departSessionTimeLifetimeAn hour is an incredibly short MTTF!Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Refresher: DHT Lookup/RoutingK VK VK VK VK VK VK VK VK VK Vk1,v1put(k1,v1)Put and get must find the same machinek1v1get(k1)Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Can DHTs Handle Churn?A Simple Test•Start 1,000 DHT processes on a 80-CPU cluster–Real DHT code, emulated wide-area network–Models cross traffic and packet loss •Churn nodes at some rate•Every 10 seconds, each machine asks:“Which machine is responsible for key k?”–Use several machines per key to check consistency–Log results, process them after testSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Test Results•In Tapestry (the OceanStore DHT), overlay partitions–Leads to very high level of inconsistencies–Worked great in simulations, but not on more realistic network•And the problem isn’t limited to Tapestry:FreePastry MIT ChordSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005The Bamboo DHT•Forget about comparing Chord-Pastry-Tapestry–Too many differing factors–Hard to isolate effects of any one feature•Instead, implement a new DHT called Bamboo–Same overlay structure as Pastry–Implements many of the features of other DHTs–Allows testing of individual features independentlySean C. Rhea OpenDHT: A Public DHT Service April 14, 2005How Bamboo Handles Churn(Overview)1. Routes around suspected failures quickly–Abnormal latencies indicate failure or congestion–Route around them before we can tell difference2. Recovers failed neighbors periodically–Keeps network load independent of churn rate–Prevents overlay-induced positive feedback cycles3. Chooses neighbors for network proximity–Minimizes routing latency in non-failure case–Allows for shorter timeoutsSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Bamboo Basics: Partition Key Space•Each node in DHT will store some k,v pairs•Given a key space K, e.g. [0, 2160):–Choose an identifier for each node, idi  K, uniformly at random–A pair k,v is stored at the node whose identifier is closest to k0 2160Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Bamboo Basics: Build Overlay Network•Each node has two sets of neighbors•Immediate neighbors in the key space–Important for correctness•Long-hop neighbors–Allow puts/gets in O(log n) hops0 2160Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Bamboo Basics: Route Puts/Gets Thru Overlay•Route greedily, always making progress0 2160kget(k)Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Routing Around Failures•Under churn, neighbors may have failed•To detect failures, acknowledge each hop0 2160kACKACKSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Routing Around Failures•If we don’t receive an ACK, resend through different neighbor0 2160kTimeout!Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Computing Good Timeouts•Must compute timeouts carefully–If too long, increase put/get latency–If too short, get message explosion0 2160kTimeout!Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Computing Good Timeouts•Chord errs on the side of caution–Very stable, but gives long lookup latencies0 2160kTimeout!Sean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Computing Good Timeouts•Keep past history of latencies–Exponentially weighted mean, variance•Use to compute timeouts for new requests–timeout = mean + 4  variance •When a timeout occurs–Mark node “possibly down”: don’t use for now–Re-route through alternate neighborSean C. Rhea OpenDHT: A Public DHT Service April 14, 2005Timeout Estimation


View Full Document

Berkeley COMPSCI 268 - Beyond Theory: DHTs in Practice

Documents in this Course
Lecture 8

Lecture 8

33 pages

L-17 P2P

L-17 P2P

50 pages

Multicast

Multicast

54 pages

Load more
Download Beyond Theory: DHTs in Practice
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Beyond Theory: DHTs in Practice and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Beyond Theory: DHTs in Practice 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?