Unformatted text preview:

Adaptive Peer Selection and Ivy A Read Write Peer to Peer File System Lidan Wang April 19 2007 4 18 2007 Peer Selection Issues Data is replicated among peers but how to decide from which peer to download Current techniques not general enough Some hosts are not likely to be encountered more than once so policy based on prior experience with specific hosts will not apply This paper applies machine learning to solve the problem of peer selection 4 18 2007 Adaptive Peer Selection Problem statement given a set of peers who have the desired file switch among them and finally settle on an optimal one in terms of total time Two stages for solving this problem 4 18 2007 1 Rate each peer on the list in terms of expected transfer time Decision tree is used to do this 2 Perform partial downloads from the most promising peers and settle on a final peer Markov decision process MDP is a framework which is used to decide how to perform partial downloads Stage 1 Decision Tree Two steps to get a decision tree Collect training data Train a decision tree that fits the training data Training data in our setting Data collected separately for each client Peer search response message 4 18 2007 Speed field connection speed Busy flag if upload slots are full Uploaded flag successfully uploaded at least one file Firewall flag if peer is firewalled Etc Stage 1 Decision Tree contin Taken from Adaptive Peer Selection Bernstein et al Input a set of attributes that describes a potential peer to get the file from Output whether the download from this potential peer will be fast or slow 4 18 2007 Use Decision Tree to Rate Peers Each leaf is associated with a subset of training instances If a leaf does not have many instances associated with it mark it as uncertain It can not provide reliable information If a leaf s entropy is too large i e E 0 918 too much uncertainty Mark it uncertain For each of the remaining leaves if majority of instances are above the median speed and entropy 0 65 mark it likely fast If below the median speed and entropy 0 65 mark it likely slow For each of the remaining leaves if majority of instances are above the median speed and entropy 0 65 mark it very likely fast If below the median speed and entropy 0 65 mark it very likely slow 4 18 2007 Attribute Download Speed vs Rating Taken from Adaptive Peer Selection Bernstein et al The download speed attribute of a peer appears to be highly correlated with the peer s rating 4 18 2007 Stage 2 Peer Selection After stage 1 each peer is now associated with a rating Stage 2 deals with how to perform partial file downloads and finally settle on a peer need a policy for this Policy when the client should abort the current download and start over with the next peer on the list Markov decision process MDP can be used to derive this policy Each client just solves the MDP to get a selection policy that is optimal with respect to its MDP model 4 18 2007 Markov Decision Processes for Peer Selection Problem formulation given a set of states S a set of actions A a transition function T and a cost function C find an optimal policy for reaching the end goal state so the expected sum of future costs total time to receive a file upon executing a policy from any given state is minimized Define S A T C in our context and the problem is solved with dynamic programming State S Pre connecting states P B VLS LS U LF VLF Connecting states N B VLS LS U LF VLF x 0 5 1 0 1 5 2 0 2 5 3 0 Numeric values indicate time spent on connecting so far Max is 3 0 seconds Downloading states D B VLS LS U LF VLF x 1 0 2 0 3 0 x 0 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 infinity First set of numeric values indicates time spent downloading the second set of numeric values indicates the average speed so far 4 18 2007 MDP contin Action A continue with the downloading or start over with a new peer Chosen according to a policy Transition T the probability of transitioning from state s to s These probabilities are obtained through training Cost C time For transitions into a pre connecting state 0 cost into connection and download states 0 5 and 1 0 respectively Now the MDP problem is well defined and it can be solved for an optimal policy for peer selection 4 18 2007 The Resulting Policies The clients policies decide whether to continue or abort based on the state of the current download in order to minimize the total time to receive the file Each client s MDP is solved Their resulting policies are similar Most of the times the policy aborts if no connection has been made in 0 5 seconds or if connection speed is too low 4 18 2007 Ivy an Overview A distributed multi user read write peer to peer file system Single file system image like NFS only when the underlying network is fully connected No dedicated server All data and meta data is stored in the DHash a peerto peer storage system Logs are used to re construct file content Each participant gets data by consulting all logs but can only modify its own log records Hence no locking is necessary Logs are stored in the DHash distributed hash table One log per participant DHash can distribute and replicate blocks Improves availability 4 18 2007 Ivy an Overview contin Ivy prevents attacks from non participants and corrupted servers by the following means Encrypts the data log records stored in DHash An Ivy user has the freedom to choose what logs it wants to read from Hence ignore the logs of untrustworthy participants If network is partitioned DHash replication lets Ivy participants modify files in multiple partitions Logs contain version vectors that can be used to detect conflicting updates after partitions merge Performance is within a factor of 2 to 3 of NFS Bottleneck network latency and cost of generating digital signatures on log records stored in DHash 4 18 2007 DHash and Logs Log a list of immutable log records Contains all of one participant s changes to file system data and meta data Each participants can only append to its own log but can read from all logs Log head mutable a mutable log record that points to the most recent log record of the participant it stores the content hash key of the most recent log record DHash all the data is stored in DHash A block is a key log record content pair Ivy participants communicate through DHash Integrity of Dhash content is ensured by Content hash block immutable i e log records except log heads block s key is a cryptographic hash of the block s value Public key …


View Full Document

UMD CMSC 818S - Adaptive Peer Selection

Loading Unlocking...
Login

Join to view Adaptive Peer Selection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Adaptive Peer Selection and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?