Parallel and distributed databases IISome interesting recent systemsThen and nowA modern search engineMapReduceSlide 6Slide 7ExampleOther MapReduce usesDynamoEventual consistencySlide 12MechanismsEpidemic replicationAnti-entropyWhat if I get a conflict?Slide 17Slide 18How to resolve conflicts?Peer-to-peerPeer-to-peer originsNapsterGnutellaCharacteristicsSlide 25Joining the networkSlide 27SearchDownloadFailuresScalability!How to make more scalable?Iterative deepeningDirected breadth-first searchRandom walkRandom walk with replicationSupernodesSome interesting observations“Structured” networksDistributed Hash TablesChordSearchingBetter searchingJoiningSlide 45Slide 46InsertingWhat is actually stored?Good propertiesWhat if finger table is incomplete?IssuesParallel and distributed databases IISome interesting recent systemsMapReduceDynamoPeer-to-peerThen and nowA modern search engineMapReduceHow do I write a massively parallel data intensive program?Develop the algorithmWrite the code to distribute work to machinesWrite the code to distribute data among machinesWrite the code to retry failed work unitsWrite the code to redistribute data for a second stage of processingWrite the code to start the second stage after the first finishesWrite the code to store intermediate result dataWrite the code to reliably store final result dataMapReduceTwo phasesMap: take input data and map it to zero or more key/value pairsReduce: take key/value pairs with the same key and reduce them to a resultMapReduce framework takes care of the restPartitioning data, repartitioning data, handling failures, tracking completion…MapReduceXMapReduceXExampleMapReduceCount the number of times each word appears on the webapplebananaapplegrapeappleapplegrapeappleappleapplebananabananaappleapplegrapegrapeappleappleappleapplegrapegrapeappleappleapple,1apple,1banana,1banana,1apple,1apple,1grape,1grape,1apple,1apple,1grape,1grape,1apple,1apple,1apple,1apple,1apple,1apple,1banana,1banana,1apple,1apple,1grape,1grape,1apple,1apple,1grape,1grape,1apple,1apple,1apple,1apple,1apple,5grape,5banana,5Other MapReduce usesGrepSortAnalyze web graphBuild inverted indexesAnalyze access logsDocument clusteringMachine learningDynamoAlways writable data storeDo I need ACID for this?Eventual consistencyWeak consistency guarantee for replicated dataUpdates initiated at any replicaUpdates eventually reach every replicaIf updates cease, eventually all replicas will have same stateTentative versus stable writesTentative writes applied in per-server partial orderStable writes applied in global commit orderBayou system at PARCCommit orderStorageunitInconsistent copies visibleStorageunitLocal write order preservedEventual consistencyMaster storageunit(Joe, 22, Arizona)(Joe, 32, Arizona)(Joe, 22, Montana)(Joe, 22, Montana)Arizona Montana(Joe, 32, Montana)(Joe, 32, Montana)(Bob, 22, Montana)Joe Bob(Bob, 32, Montana)(Bob, 32, Montana)(Bob, 32, Montana)(Bob, 32, Montana)All replicas end up in same state22 32Arizona MontanaJoe BobArizona MontanaJoe Bob22 3222 32MechanismsEpidemics/rumor mongering Updates are “gossiped” to random sitesGossip slows when (almost) every site has heard itAnti-entropyPairwise sync of whole replicasVia log-shipping and occasional DB snapshotEnsure everyone has heard all updatesPrimary copyOne replica determines the final commit order of updatesEpidemic replicationSusceptible (no update) Infective (spreading update) Removed (updated, not spreading)UpdateNode is already infectedAnti-entropyInfective (spreading update) Removed (updated, not spreading)Will not receive updatePairwise syncSusceptible (no update)What if I get a conflict?How to detect?Version vector: (A’s count, B’s count)ABrian=‘GoodProfessor’BBrian=‘BadProfessor’What if I get a conflict?How to detect?Version vector: (A’s count, B’s count)Initially, (0,0) at bothA writes, sets version vector to (1,0)(1,0) dominates B’s version (0,0)No conflictABrian=‘GoodProfessor’BWhat if I get a conflict?How to detect?Version vector: (A’s count, B’s count)Initially, (0,0) at bothA writes, sets version vector to (1,0)B writes, sets version vector to (0,1)Neither vector dominates the otherConflict!!ABrian=‘GoodProfessor’BBrian=‘BadProfessor’How to resolve conflicts?Commutative operations: allow bothAdd “Fight Club” to shopping cartAdd “Legends of the Fall” shopping cartDoesn’t matter what order they occur inThomas write rule: take the last updateThat’s the one we “meant” to have stickLet the application cope with itExpose possible alternatives to applicationApplication must write back one answerPeer-to-peerGreat technologyShady business modelFocus on the technology for nowPeer-to-peer originsWhere can I find songs for download?Web interfaceQ?NapsterQ?Q?Q?GnutellaQ?CharacteristicsPeers both generate and process messagesServer + client = “servent”Massively parallelDistributedData-centricRoute queries and data, not packetsGnutellaJoining the networkpingpingpingpongpingpingpingpingpongpongpingpingpingpingpingpingpingpingpingpongpongpongpongpongpongJoining the networkSearchQ?TTL = 4DownloadFailuresXScalability!Messages flood the networkExample: Gnutella meltdown, 2000How to make more scalable?Search more intelligentlyReplicate informationReorganize the topologyIterative deepeningQ?Yang and Garcia-Molina 2002, Lv et al 2002Directed breadth-first searchQ?Yang and Garcia-Molina 2002Random walkQ?Adamic et al 2001Random walk with replicationQ?Cohen and Shenker 2002, Lv et al 2002SupernodesQ?Kazaa, Yang and Garcia-Molina 2003Some interesting observationsMost peers are short-livedAverage up-time: 60 minutesFor a 100K network, this implies churn rate of 1,600 nodes per minuteSaroiu et al 2002Most peers are “freeloaders”70 percent of peers share no filesMost results come from 1 percent of peersAdar and Huberman 2000Network tends toward a power-law topologyPower-law: nth most connected peer has k/n connectionsA few peers have many connections, most peers have fewRipeanu and Foster 2002“Structured” networksIdea: form a structured topology that gives certain performance guaranteesNumber of hops needed for
View Full Document