Peer-to-Peer FilesystemsAnnouncementsGoals for TodayNature of P2P SystemsSlide 5Slide 6Case study: NapsterNapster ArchitectureNapster ProtocolSlide 10Slide 11Slide 12Napster DiscussionOther P2P File Sharing SystemsOverlaysOverlays: UnstructuredSlide 17Overlays: StructuredDistributed Hash TablesSlide 20Slide 21Applications of DHTsPastry: Node statePastry: Node JoinsSlide 25Pastry: RoutingHow Does Routing/Lookup Work?PAST: Pastry FilesystemPAST: File ReplicationPAST: TradeoffsRationale and ValidationPAST: comparsion to CFSPAST: comparison to CFSDHT Deployment TodayWhy so many DHT implementations?DHT Deployment Tomorrow?IssuesConclusionsReferencesSlide 40Peer-to-Peer FilesystemsSlides originally created by Tom RoederAnnouncementsWelcome back from Spring break!We are in the final stretchHomework 4 and Project 4 design doc this weekHomework 5 and 6 due in AprilProject 5 and 6 due in April and beginning of MayPrelim II, Thursday April 26th???Final, Wednesday May 17thGoals for TodayWhat is P2P?P2P File Sharing SystemsCase study: NapsterOverlaysStructuredUnstructuredDistributed hash tables (DHTs)Hash table ontop of structured overlayCase studies: PAST, CFS, OpenDHTContent-Addressable Storage (CAS)Nature of P2P SystemsP2P: communicating peers in the systemnormally an overlay in the networkHot topic because of Napster, Gnutella, etcIn some sense, P2P is older than the namemany protocols used symmetric interactionsnot everything is client-serverWhat’s the real definition?no-one has a good one, yetdepends on what you want to fit in the classNature of P2P SystemsStandard definitionsymmetric interactions between peersno distinguished serverMinimally: is the Web a P2P system?We don’t want to say that it isbut it is, under this definitionI can always run a server if I want: no asymmteryThere must be more structure than thisLet’s try againNature of P2P SystemsRecent definitionNo distinguished initial stateEach server has the same codeservers cooperate to handle requestsclients don’t matter: servers are the P2P systemTry again: is the Web P2P?No, not under this def: servers don’t interactIs the Google server farm P2P?Depends on how it’s set up? Probably not.Case study: NapsterFile Sharing service created by Shawn Fanning in 1999Flat FS: single-level FS with no hierarchyMultiple files can have the same nameAll storage done at edges:Hosts export set of files stored locallyHost is registered with centralized directoryUses keepalive messages to check for connectivityCentralized directory notified of file names exported by the hostFile lookup: client sends request to central directoryDirectory server sends 100 files matching the request to clientClient pings each host, computes RTT and displays resultsClient transfers files from the closest hostFile transfers are peer-to-peer; central directory not partNapster ArchitectureNetworkFirewallIP Sprayer/RedirectorNapsterDirectoryServer 1NapsterDirectoryServer 2NapsterDirectoryServer 3Napster.comH1H2H3Napster ProtocolNetworkFirewallIP Sprayer/RedirectorNapsterDirectoryServer 1NapsterDirectoryServer 2NapsterDirectoryServer 3Napster.comH1H2H3I have “metallica / enter sandman”Napster ProtocolNetworkFirewallIP Sprayer/RedirectorNapsterDirectoryServer 1NapsterDirectoryServer 2NapsterDirectoryServer 3Napster.com“who has metallica ?”“check H1, H2”H1H2H3I have “metallica / enter sandman”Napster ProtocolNetworkFirewallIP Sprayer/RedirectorNapsterDirectoryServer 1NapsterDirectoryServer 2NapsterDirectoryServer 3Napster.com“who has metallica ?”“check H1, H2”pingpingH1H2H3I have “metallica / enter sandman”Napster ProtocolNetworkFirewallIP Sprayer/RedirectorNapsterDirectoryServer 1NapsterDirectoryServer 2NapsterDirectoryServer 3Napster.com“who has metallica ?”“check H1, H2”pingpingH1H2H3transferI have “metallica / enter sandman”Napster DiscussionIssues:Centralized file location directoryLoad balancingRelies on keepalive messagesScalability an issue!Success: ability to create and foster an online communityBuilt in ethicsBuilt in faultsCommunication mediumHad upto 40 million users in June 2000!May actually be lower, 26.4 million by February 2001Ultimately found to be illegal serviceOther P2P File Sharing SystemsNapster has a central database! Removing it will make regulating file transfers harderFreenet, gnutella, kazaa … all are decentralizedFreenet: anonymous, files encryptedSo not known which files stored locally, which file searchedKazaa: allows parallel downloads(Bit)Torrents for faster downloadLegality. Are there any good legal uses for P2P systems?OverlaysP2P systems possess some degree of self-organization each node finds its peers and helps maintain the system structureTwo types of overlays connect most P2P systemsUnstructuredNo infrastructure set up for routingRandom walks, flood searchStructuredSmall World Phenomenon: KleinbergSet up enough structure to get fast routingWe will see O(log n)For special tasks, can get O(1)Overlays: UnstructuredFrom Gribblea common unstructured overlaylook at connectivitymore structure than it seems at firstOverlays: UnstructuredGossip: state synchronization techniqueInstead of forced flooding, share stateDo so infrequently with one neighbor at a timeOriginal insight from epidemic theoryConvergence of state is reasonably fastwith high probability for almost all nodesgood probabilistic guaranteesTrivial to implementSaves bandwidth and energy consumptionOverlays: StructuredNeed to build up long distance pointersthink of routing within levels of a namespaceeg. namespace is 10 digit numbers base 40112032101then you can hop levels to find other nodesThis is the most common structure imposedDistributed Hash TablesOne way to do this structured routingAssign each node each node an id from spaceeg. 128 bits: SHA-1 salted hash of IP addressbuild up a ring: circular hashingassign nodes into this spaceValuediversity of neighborseven coverage of spaceless chance of attack?Distributed Hash TablesWhy “hash tables”?Stored named objects by hash codeRoute the object to the nearest location in spacekey idea: nodes and objects share id
View Full Document