Distributed Hash TablesDHTs●Like it sounds – a distributed hash table●Put(Key, Value)●Get(Key) -> ValueInterface vs. Implementation●Put/Get is an abstract interface–Very convenient to program to–Doesn't require a “DHT” in today's sense of the world. –e.g., Amazon's S^3 storage service●/bucket-name/object-id -> data●We'll mostly focus on the back-end log(n) lookup systems like Chord–But researchers have proposed alternate architectures that may work better, depending on assumptions!Last time: Unstructured Lookup●Pure flooding (Gnutella), TTL-limited–Send message to all nodes●Supernodes (Kazaa)–Flood to supernodes only●Adaptive “super”-nodes and other tricks (GIA)●None of these scales well for searching for needlesAlternate Lookups●Keep in mind contrasts to...●Flooding (Unstructured) from last time●Hierarchical lookups–DNS–Properties? Root is critical. Today's DNS root is widely replicated, run in serious secure datacenters, etc. Load is asymmetric.●Not always bad – DNS works pretty well●But not fully decentralized, if that's your goalP2P Goal (general)●Harness storage & computation across (hundreds, thousands, millions) of nodes across Internet●In particular:–Can we use them to create a gigantic, hugely scalable DHT?P2P Requirements●Scale to those sizes...●Be robust to faults and malice●Specific challenges:–Node arrival and departure – system stability–Freeloading participants–Malicious participants–Understanding bounds of what systems can and cannot be built on top of p2p frameworksDHTs●Two options:–lookup(key) -> node ID –lookup(key) -> data●When you know the nodeID, you can ask it directly for the data, but specifying interface as -> data provides more opportunities for caching and computation at intermediaries●Different systems do either. We'll focus on the problem of locating the node responsible for the data. The solutions are basically the same.Algorithmic Requirements●Every node can find the answer●Keys are load-balanced among nodes–Note: We're not talking about popularity of keys, which may be wildly different. Addressing this is a further challenge...●Routing tables must adapt to node failures and arrivals●How many hops must lookups take?–Trade-off possible between state/maint. traffic and num lookups...Consistent Hashing●How can we map a key to a node?●Consider ordinary hashing–func(key) % N -> node ID–What happens if you add/remove a node?●Consistent hashing:–Map node IDs to a (large) circular space–Map keys to same circular space–Key “belongs” to nearest node15-441 Spring 2004, Jeff Pang 11DHT: Consistent HashingN32N90N105K80K20K5Circular ID spaceKey 5Node 105A key is stored at its successor: node with next higher ID15-441 Spring 2004, Jeff Pang 12Consistent Hashing●Very useful algorithmic trick outside of DHTs, etc.–Any time you want to not greatly change object distribution upon bucket arrival/departure●Detail:–To have good load balance–Must represent each bucket by log(N) “virtual” buckets15-441 Spring 2004, Jeff Pang 13DHT: Chord Basic LookupN32N90N105N60N10N120K80“Where is key 80?”“N90 has K80”15-441 Spring 2004, Jeff Pang 14DHT: Chord “Finger Table”N801/21/41/81/161/321/641/128•Entry i in the finger table of node n is the first node that succeeds or equals n + 2i•In other words, the ith finger points 1/2n-i way around the ring15-441 Spring 2004, Jeff Pang 15DHT: Chord Join•Assume an identifier space [0..8]•Node n1 joins01234567i id+2i succ0 2 11 3 12 5 1 Succ. Table15-441 Spring 2004, Jeff Pang 16DHT: Chord Join•Node n2 joins01234567i id+2i succ0 2 21 3 12 5 1 Succ. Tablei id+2i succ0 3 11 4 12 6 1 Succ. Table15-441 Spring 2004, Jeff Pang 17DHT: Chord Join•Nodes n0, n6 join 01234567i id+2i succ0 2 21 3 62 5 6 Succ. Tablei id+2i succ0 3 61 4 62 6 6 Succ. Tablei id+2i succ0 1 11 2 22 4 0 Succ. Tablei id+2i succ0 7 01 0 02 2 2 Succ. Table15-441 Spring 2004, Jeff Pang 18DHT: Chord Join•Nodes: n1, n2, n0, n6•Items: f7, f201234567i id+2i succ0 2 21 3 62 5 6 Succ. Tablei id+2i succ0 3 61 4 62 6 6 Succ. Tablei id+2i succ0 1 11 2 22 4 0 Succ. Table7Items 1Items i id+2i succ0 7 01 0 02 2 2 Succ. Table15-441 Spring 2004, Jeff Pang 19DHT: Chord Routing•Upon receiving a query for item id, a node:•Checks whether stores the item locally•If not, forwards the query to the largest node in its successor table that does not exceed id01234567i id+2i succ0 2 21 3 62 5 6 Succ. Tablei id+2i succ0 3 61 4 62 6 6 Succ. Tablei id+2i succ0 1 11 2 22 4 0 Succ. Table7Items 1Items i id+2i succ0 7 01 0 02 2 2 Succ. Tablequery(7)15-441 Spring 2004, Jeff Pang 20DHT: Chord Summary•Routing table size?–Log N fingers•Routing time?–Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.15-441 Spring 2004, Jeff Pang 21Alternate structures●Chord is like a skiplist: each time you go ½ way towards the destination. Other topologies do this too...15-441 Spring 2004, Jeff Pang 22Tree-like structures●Pastry, Tapestry, Kademlia●Pastry:–Nodes maintain a “Leaf Set” size |L|●|L|/2 nodes above & below node's ID●(Like Chord's successors, but bi-directional)–Pointers to log_2(N) nodes at each level i of bit prefix sharing with node, with i+1 different●e.g., node id 01100101●stores to neighbor at 1, 00, 010, 0111, ...15-441 Spring 2004, Jeff Pang 23Hypercubes●the CAN DHT●Each has ID●Maintains pointers to a neighbor who differs in one bit position●Only one possible neighbor in each direction●But can route to receiver by changing any bit15-441 Spring 2004, Jeff Pang 24So many DHTs...●Compare along two axes:–How many neighbors can you choose from when forwarding? (Forwarding Selection)–How many nodes can you choose from when selecting neighbors? (Neighbor Selection)●Failure resilience: Forwarding choices●Picking low-latency neighbors: Both help15-441 Spring 2004, Jeff Pang 25Proximity●Ring: –Forwarding: log(N) choices for next-hop when going around ring–Neighbor selection:
View Full Document