DOC PREVIEW
Berkeley COMPSCI 252 - Distributed Cluster Repair for OceanStore

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 38 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Distributed Cluster Repair for OceanStoreOceanStore OverviewWhere our project fits inThe internetChoosing locations for storing a fragmentSlide 6Slide 7ClusteringSlide 9OceanStore solutionSlide 11Cluster creationDistributed clusteringSlide 14Slide 15Slide 16Slide 17Initial ideaInitial idea (cont)EvaluationSimulating Network Evolution DynamicsSlide 22NS Algo 1: Sanity Check 1NS Algo 2: Acid Test 1NS Algo 3: Acid Test 2NS Algo 4: Acid Test 3NS Algo 5: Acid Test 4PowerPoint PresentationSlide 29Slide 30Slide 31Still problemsSlide 33How to fix this?Under development…How to improveSummary of achievementsThanks for listening !Distributed Cluster Repair for OceanStoreIrena Nadjakova and Arindam ChakrabartiAcknowledgements:Hakim WeatherspoonJohn KubiatowiczDec 9, 2003 Distributed Cluster Repair for OceanStore 2OceanStore OverviewData Storage Utility•Robustness•Security•Durability •High availability •Global-scaleDec 9, 2003 Distributed Cluster Repair for OceanStore 3Where our project fits inDurability•Automatic version-management•Highly redundant erasure-coding•Massive dissemination of fragments on machines with highly uncorrelated availability.Dec 9, 2003 Distributed Cluster Repair for OceanStore 4The internetDec 9, 2003 Distributed Cluster Repair for OceanStore 5Choosing locations for storing a fragmentDec 9, 2003 Distributed Cluster Repair for OceanStore 6Choosing locations for storing a fragmentDec 9, 2003 Distributed Cluster Repair for OceanStore 7Choosing locations for storing a fragmentDec 9, 2003 Distributed Cluster Repair for OceanStore 8ClusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 9ClusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 10OceanStore solution•Availability of each machine tracked over time.•Machines that have very little availability are not used for fragment storage.•Distance between each pair of machines computed. (high mutual information ) close)•Cluster the machines into chunks based on this distance using normalized cuts.•All the computation is done on one central computer (Cluster Server).Dec 9, 2003 Distributed Cluster Repair for OceanStore 11OceanStore solution•Machines that are highly correlated in availability are in same cluster.•Machines in separate clusters have low correlation in availability.•When a node needs to store replica fragments, it requests cluster information from the cluster server and uses it to send each fragment to k nodes: one from each of k different clusters.Dec 9, 2003 Distributed Cluster Repair for OceanStore 12Cluster creation•Needs centralized computation.•Can we do it in a distributed manner ?•NCuts is one stumbling block. It seems to need the entire graph.•Having to pull the cluster info from one central cluster server: single point of failure•Can we have a “Distributed NCuts” algo to look at subgraphs ? How to make subgraphs? Do we need to know the entire graph to decide how to divide it into pieces ?Dec 9, 2003 Distributed Cluster Repair for OceanStore 13Distributed clusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 14Distributed clusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 15Distributed clusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 16Distributed clusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 17Distributed clusteringDec 9, 2003 Distributed Cluster Repair for OceanStore 18Initial idea•We run the centralized algorithm once for some time period (chose 73 days) to generate some initial clustering (expensive!)•We distribute the machines among some f cluster servers–Each has a smaller subset of size num of the initial machines–Keeping the initial clustering proportions for each node–Each machine occurs in approximately equal number of cluster serversDec 9, 2003 Distributed Cluster Repair for OceanStore 19Initial idea (cont)•Now we can afford to recluster the machines on each server frequently to keep up with the network changes.–Chose to do it once every 30 days for the simulation purposes, but can easily be done a lot more oftenDec 9, 2003 Distributed Cluster Repair for OceanStore 20Evaluation•To see how well this does, we want to compare it with the original global algorithm, run in the same time period.•Metric – the average mutual informationI(x,y) =  P(x,y) log P(x,y)/P(x)P(y)–Average MI for a single server is just the average of the mutual information between pairs of machines in different clusters on the server–On multiple servers, we compute the above on every server, then average among serversDec 9, 2003 Distributed Cluster Repair for OceanStore 21Simulating Network Evolution Dynamics•We have availability data for 1000 machines for a period of 73 days.•We use it to simulate the behavior of a network with 1000 machines over a period of 730 days = 2 years.•We simulate networks with varying evolution characteristics to evaluate the robustness of our distributed cluster repair algorithm.Dec 9, 2003 Distributed Cluster Repair for OceanStore 22Simulating Network Evolution DynamicsQualities of a good network:•Maybe server availability (AV) should not vary drastically in the future ?•Maybe average server repair time (MTTR) should not vary drastically ?•Maybe mean time to failure (MTTF) should not vary drastically ?•Maybe failure correlations (FCOR) should also not vary drastically ?Dec 9, 2003 Distributed Cluster Repair for OceanStore 23NS Algo 1: Sanity Check 1Global déjà vu•Maintains AV, MTTF, MTTR, FCOR•Simulates a well-behaved network.•Our distributed update algorithm should do very well on this.Dec 9, 2003 Distributed Cluster Repair for OceanStore 24NS Algo 2: Acid Test 1Local déjà vu•Maintains AV, MTTF, MTTR, but not FCOR.Dec 9, 2003 Distributed Cluster Repair for OceanStore 25NS Algo 3: Acid Test 2Births and Deaths•Maintains AV, MTTF, MTTR, and FCOR, but only for some nodes, and for some time.•Nodes are taken off (die) the network or are added to (born) the network at certain times. When they are actually on the network, they maintain their AV, MTTF, MTTR, FCOR.Dec 9, 2003 Distributed Cluster Repair for OceanStore 26NS Algo 4: Acid Test 3Noisy Global déjà vu•Maintains AV, MTTF, MTTR, FCOR to a large extent, but adds some Gaussian noise, representing the variations that may be observed in a real network.Dec 9, 2003 Distributed Cluster Repair for OceanStore 27NS Algo 5: Acid Test 4Noisy Local déjà vu•Maintains AV, MTTF,


View Full Document

Berkeley COMPSCI 252 - Distributed Cluster Repair for OceanStore

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Distributed Cluster Repair for OceanStore
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Cluster Repair for OceanStore and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Cluster Repair for OceanStore 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?