Page 1 CS162!Operating Systems and!Systems Programming!Lecture 24!DHTs and Cloud Computing"April 25, 2011!Ion Stoica!http://inst.eecs.berkeley.edu/~cs162!Lec 24.2!4/25! Ion Stoica CS162 ©UCB Spring 2011!Distributed Hash Tables (DHTs)"• Distribute (partition) a hash table data structure across a large number of servers!– Also called, key-value store!• Two operations!– put(key, data); // insert “data” identified by “key”!– data = get(key); // get data associated to “key” !key, value …"Lec 24.3!4/25! Ion Stoica CS162 ©UCB Spring 2011!Distributed Hash Tables (DHTs) (contʼd)"• Just need a lookup service, i.e., given a key (ID), map it to machine n!n = lookup(key);!• Invoking put() and get() at node m!!m.put(key, data) { !!n = lookup(key); // get node “n” mapping “key”!!n.store(key, data); // store data at node “n”!!}!!data = m.get(key) { !!n = lookup(key); // get node “n” storing data associated to “key” !!return n.retrieve(key); // get data stored at “n” associated to “key” !!}!Lec 24.4!4/25! Ion Stoica CS162 ©UCB Spring 2011!Distributed Hash Tables (DHTs) (contʼd)"• Many lookup proposals: CAN, Chord, Pastry, Tapestry, Kademlia, …!• Used in practice:!– p2p: eDonkey (based on Kademlia)!– Dynamo (Amazon)!– Cassandra (Facebook)!– … !Page 2 Lec 24.5!4/25! Ion Stoica CS162 ©UCB Spring 2011!Challenges"• System churn: machines can fail or exit the system any time!• Scalability: need to scale to 10s or 100s of thousands machines !• Heterogeneity:!– Latency: 1ms to 1000ms!– Bandwidth: 32Kb/s to 100Mb/s!– Nodes stay in system from 10s to a year!…"Lec 24.6!4/25! Ion Stoica CS162 ©UCB Spring 2011!Chord Lookup Service"• Associate to each node and item a unique id/key in an uni-dimensional space 0..2m-1!– Partition this space across N machines!– Each id is mapped to the node with the smallest largest ID (consistent hashing)!• Key design decision!– Decouple correctness from efficiency!• Properties !– Routing table size O(log(N)) , where N is the total number of nodes!– Guarantees that a file is found in O(log(N)) steps!Lec 24.7!4/25! Ion Stoica CS162 ©UCB Spring 2011!Identifier to Node Mapping Example (Consistent hashing)"• Node 8 maps [5,8]!• Node 15 maps [9,15]!• Node 20 maps [16, 20]!• …!• Node 4 maps [59, 4]!• Each node maintains a pointer to its successor!4 20 32 35 8 15 44 58 Lec 24.8!4/25! Ion Stoica CS162 ©UCB Spring 2011!Lookup"• Each node maintains pointer to its successor !• Route packet (ID, data) to the node responsible for ID using successor pointers!• E.g., node=4 lookups for node responsible for ID=37 !4 20 32 35 8 15 44 58 lookup(37) node=44 is responsible for ID=37Page 3 Lec 24.9!4/25! Ion Stoica CS162 ©UCB Spring 2011!Stabilization Procedure"• Periodic operation performed by each node n to maintain its successor when new nodes join the system!n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x; // if x better successor, update ! succ.notify(n); // n tells successor about itself "n.notify(nʼ)" if (pred = nil or nʼ (pred, n))" pred = nʼ; // if nʼ is better predecessor, update!€ ∈€ ∈Lec 24.10!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 Node with id=50 joins the ring! Node 50 needs to know at least one node already in the system!- Assume known node is 15!! !succ=4 pred=44 succ=nil pred=nil succ=58"pred=35"Lec 24.11!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=50 sends join(50) to node 15 ! n=44 returns node 58 ! n=50 updates its successor to 58!join(50) succ=4 pred=44 succ=nil pred=nil succ=58 pred=35 58 succ=58 Lec 24.12!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=50 executes stabilize()! nʼs successor (58) returns x = 44!pred=nil succ=58 pred=35 x=44 succ=4 pred=44 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58Page 4 Lec 24.13!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=50 executes stabilize()! x = 44! succ = 58!pred=nil succ=58 pred=35 succ=4 pred=44 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58 Lec 24.14!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=50 executes stabilize()! x = 44! succ = 58! n=50 sends to itʼs successor (58) notify(50)!pred=nil succ=58 pred=35 succ=4 pred=44 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58 notify(50) Lec 24.15!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=58 processes notify(50)! pred = 44! nʼ = 50!pred=nil succ=58 pred=35 succ=4 pred=44 n.notify(nʼ)" if (pred = nil or nʼ (pred, n))" pred = nʼ"€ ∈succ=58 notify(50) Lec 24.16!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=58 processes notify(50)! pred = 44! nʼ = 50! set pred = 50!pred=nil succ=58 pred=35 succ=4 pred=44 n.notify(nʼ)" if (pred = nil or nʼ (pred, n))" pred = nʼ"€ ∈succ=58 notify(50) pred=50Page 5 Lec 24.17!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=44 runs stabilize()! nʼs successor (58) returns x = 50!pred=nil succ=58 pred=35 succ=4 pred=50 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58 x=50 Lec 24.18!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=44 runs stabilize()! x = 50! succ = 58!pred=nil succ=58 pred=35 succ=4 pred=50 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58 Lec 24.19!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50 n=44 runs stabilize()! x = 50! succ = 58! n=44 sets succ=50!pred=nil succ=58 pred=35 succ=4 pred=50 n.stabilize()" x = succ.pred;" if (x (n, succ))" succ = x;" succ.notify(n);"€ ∈succ=58 succ=50 Lec 24.20!4/25! Ion Stoica CS162 ©UCB Spring 2011!Joining Operation"4 20 32 35 8 15 44 58 50
View Full Document