DOC PREVIEW
UW CSE 332 - Lecture Notes

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1What next?The prefix-sum problemThe Parallel Prefix-Sum AlgorithmSlide 5First passSecond passThe algorithm, part 1The algorithm, part 2Sequential cut-offParallel prefix, generalizedSlide 12FilterParallel prefix to the rescueFilter commentsQuicksort reviewQuicksortDoing betterParallel partition (not in place)ExampleNow mergesortParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeThe RecursionAnalysisCSE332: Data AbstractionsLecture 21: Parallel Prefix and Parallel SortingTyler RobisonSummer 20101What next?2Done:Simple ways to use parallelism for counting, summing, finding elementsAnalysis of running time and implications of Amdahl’s LawNow:Clever ways to parallelize more effectively than is intuitively possibleParallel prefix: This “key trick” typically underlies surprising parallelizationEnables other things like filtersParallel sorting: quicksort (not in place) and mergesortEasy to get a little parallelismWith cleverness can get a lotThe prefix-sum problem3Given int[] input, produce int[] output where output[i] is the sum of input[0]+input[1]+…input[i]Sequential is easy enough for a CSE142 exam:int[] prefix_sum(int[] input){ int[] output = new int[input.length]; output[0] = input[0]; for(int i=1; i < input.length; i++) output[i] = output[i-1]+input[i]; return output;}This does not appear to be parallelizable; each cell depends on previous cell–Work: O(n), Span: O(n)–This algorithm is sequential, but we can design a different algorithm with parallelism for the same probleminout6 4 16 10 16 14 2 86 10 26 36 52 66 68 76The Parallel Prefix-Sum Algorithm4The parallel-prefix algorithm has O(n) work but a span of 2log nSo span is O(log n) and parallelism is n/log n, an exponential speedup just like array summingThe 2 is because there will be two “passes” on the tree – more laterHistorical note / local bragging:Original algorithm due to R. Ladner and M. Fischer in 1977Richard Ladner joined the UW faculty in 1971 and hasn’t left1968? 1973? recent5inputoutput6 4 16 10 16 14 2 8 range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s fThe (completely non-obvious) idea:Do an initial pass to gather information, enabling us to do a second pass to get the answerFirst we’ll gather the ‘sum’ for each recursive blockFirst pass6inputoutput6 4 16 10 16 14 2 8 range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 10364076For each node, get the sum of all values in its range; propagate sum up from leavesWill work like parallel sum, but recording intermediate informationSecond pass7inputoutput6 4 16 10 16 14 2 86 10 26 36 52 66 68 76range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 103640760000361036 666 26526810 6636Using ‘sum’, get the sum of everything to the left of this range (call it ‘fromleft’); propagate down from rootThe algorithm, part 181. Propagate ‘sum’ up: Build a binary tree where Root has sum of input[0]..input[n-1]Each node has sum of input[lo]..input[hi-1] Build up from leaves; parent.sum=left.sum+right.sumA leaf’s sum is just it’s value; input[i]This is an easy fork-join computation: combine results by actually building a binary tree with all the sums of rangesTree built bottom-up in parallelCould be more clever; ex: heap-like ‘array as tree’ representationAnalysis of this step: O(n) work, O(log n) spanThe algorithm, part 292. Propagate ‘fromleft’ down:Root given a fromLeft of 0Node takes its fromLeft value andPasses its left child the same fromLeftPasses its right child its fromLeft plus its left child’s sum (as stored in part 1)At the leaf for array position i, output[i]=fromLeft+input[i]This is another fork-join computation: traverse the tree built in step 1 and assign to output at leaves (don’t return a result)Analysis of this step: O(n) work, O(log n) spanTotal for algorithm: O(n) work, O(log n) spanSequential cut-of10Adding a sequential cut-off isn’t too bad:Step One: Propagating Up: Sequentially compute sum for rangeThe tree itself will be shallowerStep Two: Propagating Down: output[lo] = fromLeft + input[lo]; for(i=lo+1; i < hi; i++) output[i] = output[i-1] + input[i]Parallel prefix, generalized11Just as sum-array was the simplest example of a pattern that matches many problems, so is prefix-sumArray that stores minimum/maximum of all elements to the left of i, for any iIs there an element to the left of i satisfying some property?Count of all elements to the left of i satisfying some propertyWe did an inclusive sum, but exclusive is just as easy12inputoutput6 4 16 10 16 14 2 8 range 0,8minfromleftrange 0,4minfromleftrange 4,8minfromleftrange 6,8minfromleftrange 4,6minfromleftrange 2,4minfromleftrange 0,2minfromleftr 0,1m fr 1,2m fr 2,3m fr 3,4m fr 4,5m fr 5,6m fr 6,7m fr 7.8m f‘Min to the left of i‘: Step One: Find ‘min’ of each rangeStep Two: Find ‘fromleft’Filter13[Non-standard terminology]Given an array input, produce an array output containing only elements such that f(elt) is trueExample: input [17, 4, 6, 8, 11, 5, 13, 19, 0, 24] f: is elt > 10 output [17, 11, 13, 19, 24]Looks hard to parallelizeDetermining whether an element belongs in the output is easyBut getting them in the right place in the output is hard; seems to depend on previous resultsParallel prefix to the rescue141. Use a parallel map to compute a bit-vector for true elementsinput [17, 4, 6, 8, 11, 5, 13, 19, 0, 24]bits [1, 0, 0, 0, 1, 0, 1, 1, 0, 1]2. Do parallel-prefix sum on the bit-vectorbitsum [1, 1, 1, 1, 2, 2, 3, 4, 4, 5]3. Allocate an output array with size bitsum[input.length-1]4. Use a parallel map on input; if element i passes test, put it in


View Full Document

UW CSE 332 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?