EECS 122 Introduction to Computer Networks CDNs and Peer to Peer Computer Science Division Department of Electrical Engineering and Computer Sciences University of California Berkeley Berkeley CA 94720 1776 Katz Stoica F04 Today s Lecture 18 17 18 19 2 6 10 11 Transport 14 15 16 7 8 9 21 22 23 25 Application Network IP Link Physical 01 14 19 09 56 Katz Stoica F04 2 This Lecture This will be a why lecture not a how to one Emphasis is on why these developments are important and where the fit into the broader picture TAs will fill in the technical details 01 14 19 09 56 Katz Stoica F04 3 Outline Motivation information sharing what s the role of peer to peer P2P Centralized P2P networks Napster Decentralized but unstructured P2P networks Gnutella Decentralized but structured P2P networks Distributed Hash Tables 01 14 19 09 56 Implications for the Internet speculative Katz Stoica F04 4 Information Sharing in the Internet The Internet contains a vast collection of information documents web pages media etc One goal of the Internet is to make it easy to share this information There are many different ways this can be done 01 14 19 09 56 Katz Stoica F04 5 In the beginning there was FTP People put files on a server and allowed anonymous FTP does anyone here remember anonymous FTP Only people who were explicitly told about the file would know to retrieve it But it was a painful command line interface 01 14 19 09 56 Katz Stoica F04 6 The Early Web The early web was essentially a GUI for anon ftp URLs were easily distributed pointers to files Browsers allowed one to easily retrieve files Web pages could contain pointers to other files not all downloads were result of being explicitly told But information sharing was still mostly explicitly arranged someone sent you a URL and you bookmarked it 01 14 19 09 56 Katz Stoica F04 7 The Current Web Search engines changed the web long before your time Now one can proactively find the desired information not just wait for someone to tell you about it In the process it became less important who was hosting the information because they don t need to tell you the nature of the content is all that matters now 01 14 19 09 56 Katz Stoica F04 8 Two Transitions From push to pull old people would tell others about information push new people can find information via google pull From hosts to servers anonymous ftp could run on anyone s desktop then migrated to specialized servers the web almost exclusively uses servers popular sites have to use big server farms What about pull with hosts that s peer to peer networking 01 14 19 09 56 Katz Stoica F04 9 Why Is Pull Host Relevant There are many pieces of content that are already widely replicated on many machines people want but don t know where it is Setting up a web site for all such content would attract huge amount of traffic require sizable investment in server farm and bandwidth If we could harness the hosts that already have the content we wouldn t need a server farm 01 14 19 09 56 But how do users know which host to contact Katz Stoica F04 10 Peer to Peer P2P Networking Aims to use the bandwidth and storage of the many hosts sum of access line speeds and disk space But to use this collection of machines effectively requires coordination on a massive scale key challenge who has the content you are looking for Moreover the hosts are very flaky behind slow links often connected only a few minutes so system must be very robust 01 14 19 09 56 Katz Stoica F04 11 Napster Centralized search engine all hosts with songs register them with central site users do keyword search on site to find desired song site then lists the hosts that have the song user then downloads content What makes this work central site only has to handle searches little bandwidth vast collection of hosts can supply huge aggregate bandwidth system is self scaling more users means more resources 01 14 19 09 56 Katz Stoica F04 12 What Happened to Napster Fastest growing Internet application ever P2P traffic became and remains one of the biggest sources of traffic on the Internet But legal issues shut site down Centralized system was vulnerable to legal attacks and system couldn t function without central site Can one still do pull without central site that s the hard question in peer to peer networking 01 14 19 09 56 Katz Stoica F04 13 Gnutella An example of an unstructured decentralized P2P system Context many hosts join a system each offers to share its own content in return each can make queries for others content Goal enable users to find desired content on other hosts 01 14 19 09 56 Katz Stoica F04 14 Basic Gnutella Step one form an overlay network each host when it joins connects to several existing Gnutella members an overlay link is merely the fact that the nodes know each other s IP address and thus can send each other packets 01 14 19 09 56 Katz Stoica F04 15 Unstructured Overlay Gnutella is unstructured in two senses Links between nodes are essentially random The content of each node is random at least from the perspective of Gnutella Implications Can t route on Gnutella Wouldn t know where to route even if could 01 14 19 09 56 Katz Stoica F04 16 Querying in Gnutella Queries are typically keyword searches Each query is flooded within some scope TTL is used to limit scope of flood flooding means you don t need any routing infrastructure All responses to queries are forwarded back along path query came from path marked with breadcrumbs gives a degree of privacy to requester 01 14 19 09 56 Katz Stoica F04 17 Gnutella Performance Tradeoff if TTL is small then searches won t find desired content if TTL is large network will get overloaded Either Gnutella overloads network or doesn t provide good search results 01 14 19 09 56 Katz Stoica F04 18 Gnutella Enhancements Supernodes normal nodes attach to supernodes who search for them only flood among well connected supernodes Random walk rather than flooding provides correct TTL automatically Proactive replication replicate content that is frequently queried to make it easier to find 01 14 19 09 56 Katz Stoica F04 19 In Reality Gnutella works well enough KaZaA etc Why enhancements supernodes query distribution Most downloads are for widely replicated content Gnutella is good at finding the hay But how would you find needles 01 14 19 09 56 Katz Stoica F04 20 Finding Objects by Name Assume you know the name of an object song title file name etc Assume that there is one copy of this object in the
View Full Document