Slide 1What next?The prefix-sum problemThe Parallel Prefix-Sum AlgorithmSlide 5First passSecond passThe algorithm, part 1The algorithm, part 2Sequential cut-offParallel prefix, generalizedSlide 12FilterParallel prefix to the rescueFilter commentsQuicksort reviewQuicksortDoing betterParallel partition (not in place)ExampleNow mergesortParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeThe RecursionAnalysisCSE332: Data AbstractionsLecture 21: Parallel Prefix and Parallel SortingTyler RobisonSummer 20101What next?2Done:Simple ways to use parallelism for counting, summing, finding elementsAnalysis of running time and implications of Amdahl’s LawNow:Clever ways to parallelize more effectively than is intuitively possibleParallel prefix: This “key trick” typically underlies surprising parallelizationEnables other things like filtersParallel sorting: quicksort (not in place) and mergesortEasy to get a little parallelismWith cleverness can get a lotThe prefix-sum problem3Given int[] input, produce int[] output where output[i] is the sum of input[0]+input[1]+…input[i]Sequential is easy enough for a CSE142 exam:int[] prefix_sum(int[] input){ int[] output = new int[input.length]; output[0] = input[0]; for(int i=1; i < input.length; i++) output[i] = output[i-1]+input[i]; return output;}This does not appear to be parallelizable; each cell depends on previous cell–Work: O(n), Span: O(n)–This algorithm is sequential, but we can design a different algorithm with parallelism for the same probleminout6 4 16 10 16 14 2 86 10 26 36 52 66 68 76The Parallel Prefix-Sum Algorithm4The parallel-prefix algorithm has O(n) work but a span of 2log nSo span is O(log n) and parallelism is n/log n, an exponential speedup just like array summingThe 2 is because there will be two “passes” on the tree – more laterHistorical note / local bragging:Original algorithm due to R. Ladner and M. Fischer in 1977Richard Ladner joined the UW faculty in 1971 and hasn’t left1968? 1973? recent5inputoutput6 4 16 10 16 14 2 8 range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s fThe (completely non-obvious) idea:Do an initial pass to gather information, enabling us to do a second pass to get the answerFirst we’ll gather the ‘sum’ for each recursive blockFirst pass6inputoutput6 4 16 10 16 14 2 8 range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 10364076For each node, get the sum of all values in its range; propagate sum up from leavesWill work like parallel sum, but recording intermediate informationSecond pass7inputoutput6 4 16 10 16 14 2 86 10 26 36 52 66 68 76range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 103640760000361036 666 26526810 6636Using ‘sum’, get the sum of everything to the left of this range (call it ‘fromleft’); propagate down from rootThe algorithm, part 181. Propagate ‘sum’ up: Build a binary tree where Root has sum of input[0]..input[n-1]Each node has sum of input[lo]..input[hi-1] Build up from leaves; parent.sum=left.sum+right.sumA leaf’s sum is just it’s value; input[i]This is an easy fork-join computation: combine results by actually building a binary tree with all the sums of rangesTree built bottom-up in parallelCould be more clever; ex: heap-like ‘array as tree’ representationAnalysis of this step: O(n) work, O(log n) spanThe algorithm, part 292. Propagate ‘fromleft’ down:Root given a fromLeft of 0Node takes its fromLeft value andPasses its left child the same fromLeftPasses its right child its fromLeft plus its left child’s sum (as stored in part 1)At the leaf for array position i, output[i]=fromLeft+input[i]This is another fork-join computation: traverse the tree built in step 1 and assign to output at leaves (don’t return a result)Analysis of this step: O(n) work, O(log n) spanTotal for algorithm: O(n) work, O(log n) spanSequential cut-of10Adding a sequential cut-off isn’t too bad:Step One: Propagating Up: Sequentially compute sum for rangeThe tree itself will be shallowerStep Two: Propagating Down: output[lo] = fromLeft + input[lo]; for(i=lo+1; i < hi; i++) output[i] = output[i-1] + input[i]Parallel prefix, generalized11Just as sum-array was the simplest example of a pattern that matches many problems, so is prefix-sumArray that stores minimum/maximum of all elements to the left of i, for any iIs there an element to the left of i satisfying some property?Count of all elements to the left of i satisfying some propertyWe did an inclusive sum, but exclusive is just as easy12inputoutput6 4 16 10 16 14 2 8 range 0,8minfromleftrange 0,4minfromleftrange 4,8minfromleftrange 6,8minfromleftrange 4,6minfromleftrange 2,4minfromleftrange 0,2minfromleftr 0,1m fr 1,2m fr 2,3m fr 3,4m fr 4,5m fr 5,6m fr 6,7m fr 7.8m f‘Min to the left of i‘: Step One: Find ‘min’ of each rangeStep Two: Find ‘fromleft’Filter13[Non-standard terminology]Given an array input, produce an array output containing only elements such that f(elt) is trueExample: input [17, 4, 6, 8, 11, 5, 13, 19, 0, 24] f: is elt > 10 output [17, 11, 13, 19, 24]Looks hard to parallelizeDetermining whether an element belongs in the output is easyBut getting them in the right place in the output is hard; seems to depend on previous resultsParallel prefix to the rescue141. Use a parallel map to compute a bit-vector for true elementsinput [17, 4, 6, 8, 11, 5, 13, 19, 0, 24]bits [1, 0, 0, 0, 1, 0, 1, 1, 0, 1]2. Do parallel-prefix sum on the bit-vectorbitsum [1, 1, 1, 1, 2, 2, 3, 4, 4, 5]3. Allocate an output array with size bitsum[input.length-1]4. Use a parallel map on input; if element i passes test, put it in
View Full Document