A Scalable, Content-Addressable NetworkOutlineInternet-scale hash tablesSlide 4Content-Addressable Network (CAN)Slide 6Slide 7Problem ScopeSlide 9CAN: basic ideaSlide 11Slide 12Slide 13Slide 14CAN: solutionCAN: simple exampleSlide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27CANCAN: routing tableCAN: routingSlide 31Slide 33Slide 34Slide 35Slide 36Slide 37CAN: node failuresCAN: takeover algorithmSlide 40Design recapSlide 42EvaluationCAN: scalabilityCAN: low-latencySlide 46Slide 47CAN: load balancingUniform PartitioningSlide 50CAN: RobustnessSlide 52StrengthsWeaknessesSuggestionsSlide 56Ongoing WorkDistributed BinningDistributed BinningOngoing Work (cont’d)Slide 61SummarySylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott ShenkerA Scalable, Content-Addressable NetworkACIRIU.C.BerkeleyTahoe Networks12 31,2 3 11,2 1Outline•Introduction•Design•Evaluation•Strengths & Weaknesses•Ongoing WorkInternet-scale hash tables•Hash tables– essential building block in software systems•Internet-scale distributed hash tables– equally valuable to large-scale distributed systems?•Hash tables– essential building block in software systems•Internet-scale distributed hash tables– equally valuable to large-scale distributed systems?•peer-to-peer systems–Napster, Gnutella, Groove, FreeNet, MojoNation…•large-scale storage management systems–Publius, OceanStore, PAST, Farsite, CFS ...•mirroring on the WebInternet-scale hash tablesContent-Addressable Network(CAN)•CAN: Internet-scale hash table•Interface–insert(key,value)–value = retrieve(key)Content-Addressable Network(CAN)•CAN: Internet-scale hash table•Interface–insert(key,value)–value = retrieve(key) •Properties–scalable–operationally simple–good performance (w/ improvement)Content-Addressable Network(CAN)•CAN: Internet-scale hash table•Interface–insert(key,value)–value = retrieve(key) •Properties–scalable–operationally simple–good performance•Related systems: Chord/Pastry/Tapestry/Buzz/Plaxton ...Problem Scope Design a system that provides the interface scalability robustness performance security Application-specific, higher level primitives keyword searching mutable content anonymityOutline•Introduction•Design•Evaluation•Strengths & Weaknesses•Ongoing WorkK VCAN: basic ideaK VK VK VK VK VK VK VK VK VK VCAN: basic ideainsert(K1,V1)K VK VK VK VK VK VK VK VK VK VK VCAN: basic ideainsert(K1,V1)K VK VK VK VK VK VK VK VK VK VK VCAN: basic idea(K1,V1)K VK VK VK VK VK VK VK VK VK VK VCAN: basic idearetrieve (K1)K VK VK VK VK VK VK VK VK VK VK VCAN: solution•virtual Cartesian coordinate space•entire space is partitioned amongst all the nodes –every node “owns” a zone in the overall space•abstraction–can store data at “points” in the space –can route from one “point” to another•point = node that owns the enclosing zoneCAN: simple example1CAN: simple example1 2CAN: simple example123CAN: simple example1234CAN: simple exampleCAN: simple example ICAN: simple examplenode I::insert(K,V) I(1) a = hx(K) CAN: simple examplex = anode I::insert(K,V) I(1) a = hx(K) b = hy(K)CAN: simple examplex = ay = bnode I::insert(K,V) I(1) a = hx(K) b = hy(K)CAN: simple example (2) route(K,V) -> (a,b)node I::insert(K,V) ICAN: simple example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V)node I::insert(K,V) I(1) a = hx(K) b = hy(K)CAN: simple example (2) route “retrieve(K)” to (a,b) (K,V)(1) a = hx(K) b = hy(K)node J::retrieve(K) JData stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)CANCAN: routing tableCAN: routing(a,b)(x,y)A node only maintains state for its immediate neighboring nodesCAN: routingCAN: node insertionInew node1) discover some node “I” already in CANCAN: node insertion2) pick random point in spaceI(p,q)new nodeCAN: node insertion(p,q)3) I routes to (p,q), discovers node J IJnew nodeCAN: node insertionnewJ4) split J’s zone in half… new owns one halfInserting a new node affects only a single other node and its immediate neighborsCAN: node insertionCAN: node failures•Need to repair the space –recover database (weak point)•soft-state updates•use replication, rebuild database from replicas–repair routing •takeover algorithmCAN: takeover algorithm•Simple failures–know your neighbor’s neighbors–when a node fails, one of its neighbors takes over its zone•More complex failure modes–simultaneous failure of multiple adjacent nodes –scoped flooding to discover neighbors–hopefully, a rare eventOnly the failed node’s immediate neighbors are required for recoveryCAN: node failuresDesign recap•Basic CAN–completely distributed–self-organizing–nodes only maintain state for their immediate neighbors•Additional design features–multiple, independent spaces (realities)–background load balancing algorithm–simple heuristics to improve performanceOutline•Introduction•Design•Evaluation•Strengths & Weaknesses•Ongoing WorkEvaluation•Scalability•Low-latency•Load balancing•RobustnessCAN: scalability•For a uniformly partitioned space with n nodes and d dimensions –per node, number of neighbors is 2d–average routing path is (dn1/d)/4 hops–simulations show that the above results hold in practice•Can scale the network without increasing per-node state •Chord/Plaxton/Tapestry/Buzz–log(n) nbrs with log(n) hopsCAN: low-latency•Problem–latency stretch = (CAN routing delay) (IP routing delay)–application-level routing may lead to high stretch •Solution–increase dimensions, realities (reduce the path length)–Heuristics (reduce the per-CAN-hop latency)•RTT-weighted routing•multiple nodes per zone (peer nodes)•deterministically replicate entriesCAN: low-latency#nodesLatency stretch02040608010012014016018016K 32K 65K 131K#dimensions = 2w/o heuristicsw/ heuristics0246810CAN: low-latency#nodesLatency stretch16K 32K 65K 131K#dimensions = 10w/o heuristicsw/ heuristicsCAN: load balancing•Two pieces –Dealing with hot-spots•popular (key,value) pairs•nodes cache recently requested entries•overloaded node replicates popular entries at neighbors–Uniform coordinate space partitioning•uniformly spread (key,value) entries•uniformly spread out routing loadUniform
View Full Document