1 15-744: Computer Networking L-16 Naming 2 Overview • P2P Lookup Overview • Centralized/Flooded Lookups • Routed Lookups – Chord • Comparison of DHTs 3 Peer-to-Peer Networks • Typically each member stores/provides access to content • Has quickly grown in popularity • Bulk of traffic from/to CMU is P2P! • Basically a replication system for files • Always a tradeoff between possible location of files and searching difficulty • Peer-to-peer allow files to be anywhere searching is the challenge • Dynamic member list makes it more difficult • What other systems have similar goals? • Routing, DNS 4 The Lookup Problem Internet N1 N2 N3 N6 N5 N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ?2 5 Centralized Lookup (Napster) Publisher@ Client Lookup(“title”) N6 N9 N7 DB N8 N3 N2 N1 SetLoc(“title”, N4) Simple, but O(N) state and a single point of failure Key=“title” Value=MP3 data… N4 6 Flooded Queries (Gnutella) N4 Publisher@ Client N6 N9 N7 N8 N3 N2 N1 Robust, but worst case O(N) messages per lookup Key=“title” Value=MP3 data… Lookup(“title”) 7 Routed Queries (Chord, etc.) N4 Publisher Client N6 N9 N7 N8 N3 N2 N1 Lookup(“title”) Key=“title” Value=MP3 data… 8 Overview • P2P Lookup Overview • Centralized/Flooded Lookups • Routed Lookups – Chord • Comparison of DHTs3 9 Centralized: Napster • Simple centralized scheme motivated by ability to sell/control • How to find a file: •On startup, client contacts central server and reports list of files •Query the index system return a machine that stores the required file •Ideally this is the closest/least-loaded machine •Fetch the file directly from peer 10 Centralized: Napster • Advantages: •Simple •Easy to implement sophisticated search engines on top of the index system • Disadvantages: •Robustness, scalability •Easy to sue! 11 Flooding: Old Gnutella • On startup, client contacts any servent (server + client) in network • Servent interconnection used to forward control (queries, hits, etc) • Idea: broadcast the request • How to find a file: • Send request to all neighbors • Neighbors recursively forward the request • Eventually a machine that has the file receives the request, and it sends back the answer • Transfers are done with HTTP between peers 12 Flooding: Old Gnutella • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) • Especially hard on slow clients • At some point broadcast traffic on Gnutella exceeded 56kbps – what happened? • Modem users were effectively cut off!4 13 Flooding: Old Gnutella Details • Basic message header • Unique ID, TTL, Hops • Message types • Ping – probes network for other servents • Pong – response to ping, contains IP addr, # of files, # of Kbytes shared • Query – search criteria + speed requirement of servent • QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, number of hits, hit results, servent ID • Push – request to servent ID to initiate connection, used to traverse firewalls • Ping, Queries are flooded • QueryHit, Pong, Push reverse path of previous message 14 Flooding: Old Gnutella Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A B C D E F m1 m2 m3 m4 m5 m6 E? E? E? E? E E E 15 Flooding: Gnutella, Kazaa • Modifies the Gnutella protocol into two-level hierarchy • Hybrid of Gnutella and Napster • Supernodes • Nodes that have better connection to Internet • Act as temporary indexing servers for other nodes • Help improve the stability of the network • Standard nodes • Connect to supernodes and report list of files • Allows slower nodes to participate • Search • Broadcast (Gnutella-style) search across supernodes • Disadvantages • Kept a centralized registration allowed for law suits 16 Overview • P2P Lookup Overview • Centralized/Flooded Lookups • Routed Lookups – Chord • Comparison of DHTs5 17 Routing: Structured Approaches • Goal: make sure that an item (file) identified is always found in a reasonable # of steps • Abstraction: a distributed hash-table (DHT) data structure • insert(id, item); • item = query(id); • Note: item can be anything: a data object, document, file, pointer to a file… • Proposals • CAN (ICIR/Berkeley) • Chord (MIT/Berkeley) • Pastry (Rice) • Tapestry (Berkeley) • … 18 Routing: Chord •Associate to each node and item a unique id in an uni-dimensional space •Properties •Routing table size O(log(N)) , where N is the total number of nodes •Guarantees that a file is found in O(log(N)) steps 19 Aside: Hashing • Advantages • Let nodes be numbered 1..m • Client uses a good hash function to map a URL to 1..m • Say hash (url) = x, so, client fetches content from node x • No duplication – not being fault tolerant. • One hop access • Any problems? • What happens if a node goes down? • What happens if a node comes back up? • What if different nodes have different views? 20 Robust hashing • Let 90 documents, node 1..9, node 10 which was dead is alive again • % of documents in the wrong node? • 10, 19-20, 28-30, 37-40, 46-50, 55-60, 64-70, 73-80, 82-90 • Disruption coefficient = • Unacceptable, use consistent hashing – idea behind Akamai!6 21 Consistent Hash • “view” = subset of all hash buckets that are visible • Desired features • Balanced – in any one view, load is equal across buckets • Smoothness – little impact on hash bucket contents when buckets are added/removed • Spread – small set of hash buckets that may hold an object regardless of views • Load – across all views # of objects assigned to hash bucket is small 22 Consistent Hash – Example • Smoothness addition of bucket does not cause much movement between existing buckets • Spread & Load small set of buckets that lie near object • Balance no
View Full Document