DOC PREVIEW
UW CSE 332 - Lecture Notes

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1What next?The prefix-sum problemParallel prefix-sumExampleExampleThe algorithm, part 1The algorithm, part 2Sequential cut-offParallel prefix, generalizedFilterParallel prefix to the rescueFilter commentsQuicksort reviewQuicksortDoing betterParallel partition (not in place)ExampleNow mergesortParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeParallelizing the mergeThe RecursionAnalysisAnalysis continuedCSE332: Data AbstractionsLecture 20: Parallel Prefix and Parallel SortingDan GrossmanSpring 2010What next?Done:–Simple ways to use parallelism for counting, summing, finding–Even though in practice getting speed-up may not be simple–Analysis of running time and implications of Amdahl’s LawNow:–Clever ways to parallelize more than is intuitively possible–Parallel prefix: •This “key trick” typically underlies surprising parallelization•Enables other things like filters–Parallel sorting: quicksort (not in place) and mergesort•Easy to get a little parallelism•With cleverness can get a lotSpring 2010 2CSE332: Data AbstractionsThe prefix-sum problemGiven int[] input, produce int[] output where output[i] is the sum of input[0]+input[1]+…input[i]Sequential is easy enough for a CSE142 exam:Spring 2010 3CSE332: Data Abstractionsint[] prefix_sum(int[] input){ int[] output = new int[input.length]; output[0] = input[0]; for(int i=1; i < input.length; i++) output[i] = output[i-1]+input[i]; return output;}This does not appear to be parallelizable–Work: O(n), Span: O(n)–This algorithm is sequential, but we can design a different algorithm with parallelism (surprising)Parallel prefix-sumThe parallel-prefix algorithm has O(n) work but a span of 2log n–So span is O(log n) and parallelism is n/log n, an exponential speedup just like array summing•The 2 is because there will be two “passes” one “up” one “down”•Historical note / local bragging:–Original algorithm due to R. Ladner and M. Fischer in 1977–Richard Ladner joined the UW faculty in 1971 and hasn’t leftSpring 2010 4CSE332: Data Abstractions1968? 1973? recentExampleSpring 2010 5CSE332: Data Abstractionsinputoutput6 4 16 10 16 14 2 8 range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 10364076ExampleSpring 2010 6CSE332: Data Abstractionsinputoutput6 4 16 10 16 14 2 86 10 26 36 52 66 68 76range 0,8sumfromleftrange 0,4sumfromleftrange 4,8sumfromleftrange 6,8sumfromleftrange 4,6sumfromleftrange 2,4sumfromleftrange 0,2sumfromleftr 0,1s fr 1,2s fr 2,3s fr 3,4s fr 4,5s fr 5,6s fr 6,7s fr 7.8s f6 4 16 10 16 14 2 810 26 30 103640760000361036 666 26 526810 6636The algorithm, part 11. Up: Build a binary tree where –Root has sum of input[0]..input[n-1]–If a node has sum of input[lo]..input[hi] and hi>lo, •Left child has sum of input[lo]..input[middle]•Right child has sum of input[middle]..input[hi]–A leaf has sum of input[i]..input[i], i.e., input[i]This is an easy fork-join computation: combine results by actually building a binary tree with all the sums of ranges–Tree built bottom-up in parallel–Could be more clever with an array like with heapsAnalysis: O(n) work, O(log n) spanSpring 2010 7CSE332: Data AbstractionsThe algorithm, part 22. Down: Pass down a value fromLeft–Root given a fromLeft of 0–Node takes its fromLeft value and•Passes its left child the same fromLeft•Passes its right child its fromLeft plus its left child’s sum (as stored in part 1)–At the leaf for array position i, output[i]=fromLeft+input[i]This is an easy fork-join computation: traverse the tree built in step 1 and produce no result (leafs assign to output)–Invariant: fromLeft is sum of elements left of the node’s rangeAnalysis: O(n) work, O(log n) spanSpring 2010 8CSE332: Data AbstractionsSequential cut-offAdding a sequential cut-off is easy as always:•Up: just a sum, have leaf node hold the sum of a range•Down: output[lo] = fromLeft + input[lo]; for(i=lo+1; i < hi; i++) output[i] = output[i-1] + input[i]Spring 2010 9CSE332: Data AbstractionsParallel prefix, generalizedJust as sum-array was the simplest example of a pattern that matches many, many problems, so is prefix-sum•Minimum, maximum of all elements to the left of i•Is there an element to the left of i satisfying some property?•Count of all elements to the left of i satisfying some property–This last one is perfect for an efficient parallel filter…–Perfect for building on top of the “parallel prefix trick”•We did an inclusive sum, but exclusive is just as easySpring 2010 10CSE332: Data AbstractionsFilter[Non-standard terminology]Given an array input, produce an array output containing only elements such that f(elt) is trueExample: input [17, 4, 6, 8, 11, 5, 13, 19, 0, 24] f: is elt > 10 output [17, 11, 13, 19, 24]Looks hard to parallelize–Finding elements for the output is easy–But getting them in the right place is hardSpring 2010 11CSE332: Data AbstractionsParallel prefix to the rescue1. Use a parallel map to compute a bit-vector for true elementsinput [17, 4, 6, 8, 11, 5, 13, 19, 0, 24]bits [1, 0, 0, 0, 1, 0, 1, 1, 0, 1]2. Do parallel-prefix sum on the bit-vectorbitsum [1, 1, 1, 1, 2, 2, 3, 4, 4, 5]3. Use a parallel map to produce the outputoutput [17, 11, 13, 19, 24]Spring 2010 12CSE332: Data Abstractionsoutput = new array of size bitsum[n-1]if(bitsum[0]==1) output[0] = input[0];FORALL(i=1; i < input.length; i++) if(bitsum[i] > bitsum[i-1]) output[bitsum[i]-1] = input[i];Filter comments•First two steps can be combined into one pass–Just using a different base case for the prefix sum–Has no effect on asymptotic complexity•Parallelized filters will help us parallelize quicksort•Analysis: O(n) work, O(log n) span –2 or 3 passes, but 3 is a constantSpring 2010 13CSE332: Data AbstractionsQuicksort reviewRecall quicksort was sequential, in-place, expected time O(n log n)Spring 2010 14CSE332: Data Abstractions Best / expected case work1. Pick a pivot element O(1)2. Partition all the data into: O(n)A. The elements less than the pivotB. The pivotC. The elements greater than the pivot3. Recursively sort A and C


View Full Document

UW CSE 332 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?