Concurrent AlgorithmsSumming the elements of an arrayParallel sum and parallel prefix sumSlide 4Batcher’s Bitonic sortMapReduceBasic idea of MapReduceExample: Counting words (Python)Example: Counting words (Java)Example: Average movie ratingsThe EndConcurrent AlgorithmsSumming the elements of an array27 3 15 10 13 18 6 4 10 25 31 10 35 4176Parallel sum and parallel prefix sumIt’s relatively easy to see how to sum up the elements of an array in a parallel fashionThis is a special case of a reduce operation—combining a number of values into a single valueIt’s harder to see how to do a prefix (cumulative) sumFor example, the list [3, 1, 4, 1, 6] to [3, 4, 8, 9, 15]This is a special case of what is sometimes called a scan operationAn example is shown on the next slideThe algorithm is done in two passes:The first pass is “up” the tree, retaining the summandsThe second pass is “down” the treeNote: These two examples are from Principles of Parallel Programming by Calvin Lin and Lawrence Snyder3Summing the elements of an array47 3 15 10 13 18 6 4 10 = 7 + 3 25 = 15 + 10 31 = 13 + 18 10 = 6 + 4 35 = 10 + 25 41 = 31 + 1076 = 35 + 41035 (0+35)003510 (0+10)66 (35+31)03525 (10+15) 7106648 (41+13) 727 10 25 35 48 66 72 76Batcher’s Bitonic sortBatcher’s bitonic sort is a sorting algorithm with the following characteristics:It’s a variation of MergeSortIt’s designed for 2n processorsIt fully occupies all 2n processorsUnlike array sum, which uses fewer processors on each passI’m not going to go through this algorithm—I just want you to be able to say you’ve heard of it 5MapReduceMapReduce is a patented technique perfected by Google to deal with huge data sets on clusters of computersFrom Wikipedia:"Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node."Reduce" step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.Hadoop is a free Apache version of MapReduce6Basic idea of MapReduceIn MapReduce, the programmer has to write only two functions, and the framework takes care of everything elseThe Map function is applied (in parallel) to each item of data, producing a list of key-value pairsThe framework collects all the lists, and groups the key-value pairs by keyThe Reduce function is applied (in parallel) to each group, returning either a single value, or nothingThe framework collects all the returns 7Example: Counting words (Python)The following Python program counts how many times each word occurs in a set of data, and returns the list of words and their countsdef mapper(key, value): words=key.split() for word in words: Wmr.emit(word, '1')def reducer(key, iter): sum = 0 for s in iter: sum = sum + int(s) Wmr.emit(key, str(sum))8Example: Counting words (Java)* Mapper for word count */class Mapper { public void mapper(String key, String value) { String words[] = key.split(" "); int i = 0; for (i = 0; i < words.length; i++) Wmr.emit(words[i], "1"); }}/* Reducer for word count */class Reducer { public void reducer(String key, WmrIterator iter) { int sum = 0; while (iter.hasNext()) { sum += Integer.parseInt(iter.next()); } Wmr.emit(key, Integer.valueOf(sum).toString()); }}9Example: Average movie ratings#!/usr/bin/env pythondef mapper(key, value): avgRating = float(value) binRating = 0.0 if (0 < avgRating < 1.25): binRating = 1.0 elif (1.25 <= avgRating < 1.75): binRating = 1.5 elif (1.75 <= avgRating < 2.25): binRating = 2.0 elif (2.25 <= avgRating < 2.75): binRating = 2.5 elif (2.75 <= avgRating < 3.25): binRating = 3.0 elif (3.25 <= avgRating < 3.75): binRating = 3.5 elif (3.75 <= avgRating < 4.25): binRating = 4.0 elif (4.25 <= avgRating < 4.75): binRating = 4.5 elif (4.75 <= avgRating < 5.0): binRating = 5.0 else: binRating = 99.0 Wmr.emit(str(binRating), key)#!/usr/bin/env pythondef reducer(key, iter): count = 0 for s in iter: count = count + 1 Wmr.emit(key, str(count))10The
View Full Document