DOC PREVIEW
CORNELL CS 514 - Study Notes

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS514: Intermediate Course in Computer SystemsLecture 25: Nov 17, 2003“Peer-to-peer protocols for file and data replication: file sharing”CS514P2P file sharing| File sharing dominates traffic usage for University of Washingtonz Recent study, presented recently at Cornell by Hank Levyz Kazaa| This is a recent sea change, so P2P phenomenon worth looking at2CS514File sharing is nothing new| Has been going on over IRC (Internet Relay Chat) for yearsz Chat groups focusing on certain artists or genres| Upload servers were popular for a whilez Users would upload (FTP) a songz This allowed them to download N songsz Performance typically suckedz The record industry shut these down| In all the above it took user effort to find what was wantedCS514Napster changed everything| Central search engine, but peer-to-peer file transfer| 160+ search engines at peekz User attaches to one of themz Engine would index collections of its own active usersz Search on a given engine returned results from that enginez Unless not enough results, then would ask other engines• This is what I understand from Saroiu et.al. U-Wash measurement paper| Peer-to-peer file transfer improved scalability3CS514Napster problem| As we all know, the problem with Napster is that it is a single point of litigation| Gnutella designed as a lawyer-resilient Napsterz Nothing more, nothing less (in my opinion)CS514In the meanwhile…| Ian Clarke (Edinburgh, 1999*) was thinking about P2P from an anonymity perspectivez He was interested in free speechz Whistleblowers, political dissent, etc.| Ian designed Freenet as a class project| Freenet is not so much file sharing as it is a publishing medium* Before Gnutella4CS514Freenet, Gnutella, and DHTs: One out of three…GnutellaFreenetDHTAnonymityKeyword searchScalabilityCS514Get-em-out-quick projects| Both Gnutella and Freenet were quick-and-dirty prototypesz Neither really worked through all the issues| As a result, both are deeply flawed| Nevertheless, both captured the popular imagination, and so are worth talking aboutz And both have spawned new workz For instance, FastTrack (Kazaa, Grokster …) spawned from Gnutella5CS514Lets look at these aspects| System Semantics| Bootstrap| Neighbor discovery and selection| Network (neighbor) maintenance| File insertion and storage| File search and retrieval| File deletionCS514System Semantics| DHT:z Hash Table (general purpose)| Gnutella:z Keyword search (with wildcard) of file name and metadata| Freenetz Hash of file contentsz Hash of file-owner public key plus file name• Allows for owner-write, anyone-read “subspaces”6CS514Bootstrap| DHT: z Have to know at least one active member| Gnutella: z Have to know at least one active member| Freenet: z Have to know at least one active member| Scaling issue for everybody| But fundamental operational issue for Gnutella (which cannot have any central point of control)CS514Bootstrap| Various possible ways to know a current memberz Email from friends, web site with list of addresses, rendezvous server with active knowledge of P2P networkz Pastry folks suggest using a universal DHT to bootstrap other DHTs| Gnutella uses rendezvous server approach, called “pong server” or “host cache”!!!z Host cache lists are distributed with software• They may change, so also listed on various websites, forums, etc.z Single point of litigation!z Scaling issue also7CS514Neighbor discovery and selection| DHT:z Search network, download neighbor tables, adjust as needed| Gnutella:z “Pong” message truncated flooded through networkz “Ping” messages returned from each node via reverse path of pong| Freenet:z Not sure about initial discovery and selectionCS514Network (neighbor) maintenance| DHT:z Nodes ping each other, do repair when neighbor lost (or in background)| Gnutella:z Nodes only need to have enough active neighbors … no structure to maintain| Freenet:z Loose structure…learns of neighbors near itself in the node ID space over time through process of insertion and search8CS514File insertion and storage| DHT:z File (or file pointer) stored at hashed nodez may be replicated or cached| Gnutella:z File stored at owner node| Freenet:z File stored in “vicinity” of hashed nodez may be cachedCS514File search and retrieval| DHT:z Search for hashed node or replicaz Opportunistically may find cache| Gnutella:z Truncated flood of search query| Freenet:z “Soft” search based on “steepest-ascent hill-climbing”9CS514File deletion| DHT:z Send delete message to hashed node| Gnutella:z Delete from local directory| Freenet:z LRU replacement policyz No explicit deletez (No update either, because don’t know how to flush old copies)CS514Freenet structure| Every node gets a GUIDz Globally Unique IDz Same hash space as filesz Assigned by some cryptographic distributed random number generation protocol run among nodes discovered with a truncated random walk• This is supposed to prevent a newcomer from making up its own GUID, though seems easy to break to me…10CS514Freenet routing trains itself| Searches route through network, and answer follows reverse path of searchz Same search used for both insert and find| From these answers, every node learns, over time, some keys that other nodes have| When routing a search, pick node with key (or node?) ID closest to the key being searched| If search reaches a dead-end or loops, back track and try next best node, etc.CS514Freenet routing trains itself| Idea is this:z Files with certain keys will tend to exist on nodes with nearby GUIDz Nodes will tend to learn of other nodes with nearby keys and GUIDsz These two trends reinforce each otherz Searches get more and more efficient11CS514How Freenet supports anonymity| All messages are encrypted (hop-by-hop)| Source of search (find and insert) is hiddenz Each node remembers previous hop in chain back to source| Originator of file does not store file in network| Nodes in search path occasionally (randomly) claim to be the holder of a file| Search can start with initial random walk to hide “location” of searcher (partly deducible through TTL value)CS514Freenet scalability: Fits curve of N0.28That’s pretty good! But . . .(simulation, no deletes)12CS514Distribution of node degreeArtificial max node degree of 250CS514Distribution of node degree| Reinforcing nature of learning tends to concentrate knowledge on a small percentage of nodes| Freenet inventors consider this a feature, but . . .z “Small world


View Full Document

CORNELL CS 514 - Study Notes

Documents in this Course
LECTURE

LECTURE

29 pages

LECTURE

LECTURE

28 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?