Unformatted text preview:

Parallel Prefix Algorithms, or Tricks with TreesParallel Vector OperationsBroadcast and reductionParallel Prefix AlgorithmsExample of a prefixWhat do you think?A clue?Slide 8Slide 9Slide 10Slide 11Non-recursive view of parallel prefix scanSlide 13Scan (Parallel Prefix) OperationsApplications of scansE.g., Using Scans for Array CompressionE.g., Fibonacci via Matrix Multiply PrefixCarry-Look Ahead Addition (Babbage 1800’s)Slide 19Slide 20Slide 21Slide 22Adding two n-bit integers in O(log n) timeSegmented OperationsSlide 25Slide 26Copy Prefix: x y = x (is associative)Multiplying n-by-n matrices in O(log n) timeInverting dense n-by-n matrices in O(log2 n) timeEvaluating arbitrary expressionsThe myth of log nSummary of tree algorithmsParallel Prefix Algorithms,orTricks with TreesSome slides from Jim Demmel, Kathy Yelick, Alan Edelman, and a cast of thousands …Parallel Vector Operations•Vector add: z = x + y •Embarrassingly parallel if vectors are aligned•DAXPY: z = a*x + y (a is scalar)•Broadcast a, followed by independent * and +•DDOT: s = xTy = j x[j] * y[j]•Independent * followed by + reductionBroadcast and reduction•Broadcast of 1 value to p processors in log p time•Reduction of p values to 1 in log p time•Takes advantage of associativity in +, *, min, max, etc.a8 1 3 1 0 4 -6 3 2Add-reductionBroadcast•A theoretical secret for turning serial into parallel•Surprising parallel algorithms: If “there is no way to parallelize this algorithm!” …•… it’s probably a variation on parallel prefix!Parallel Prefix AlgorithmsExample of a prefixSum Prefix Input x = (x1, x2, . . ., xn) Output y = (y1, y2, . . ., yn) yi = Σj=1:i xjExample x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36)Prefix Functions-- outputs depend upon an initial stringWhat do you think?•Can we really parallelize this?•It looks like this kind of code: y(0) = 0; for i = 1:n y(i) = y(i-1) + x(i); •The ith iteration of the loop depends completely on the (i-1)st iteration. •Work = n, span = n, parallelism = 1.•Impossible to parallelize, right?A clue? x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36)Is there any value in adding, say, 4+5+6+7?If we separately have 1+2+3, what can we do?Suppose we added 1+2, 3+4, etc. pairwise -- what could we do?8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3 7 11 15 19 23 27 31 (Recursively compute prefix sums) 3 10 21 36 55 78 105 136 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136Prefix sum in parallelAlgorithm: 1. Pairwise sum 2. Recursive prefix 3. Pairwise sum01/15/199• What’s the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update “odds” 1 3 6 10 15 21 28 36•T1(n) = n/2 + n/2 + T1 (n/2) = n + T1 (n/2) = 2n – 1at the cost of more work!Parallel prefix cost01/15/1910• What’s the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update “odds” 1 3 6 10 15 21 28 36•T1(n) = n/2 + n/2 + T1 (n/2) = n + T1 (n/2) = 2n – 1 Parallelism at the cost of more work!Parallel prefix cost01/15/19• What’s the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update “odds” 1 3 6 10 15 21 28 36•T1(n) = n/2 + n/2 + T1 (n/2) = n + T1 (n/2) = 2n – 1•T∞(n) = 2 log n Parallelism at the cost of more work!11Parallel prefix cost: Work and Span12Non-recursive view of parallel prefix scan•Tree summation: two phases•up sweep•get values L and R from left and right child•save L in local variable Mine•compute Tmp = L + R and pass to parent•down sweep•get value Tmp from parent•send Tmp to left child•send Tmp+Mine to right child6543 2 4 1Up sweep: mine = left tmp = left + right46 95 43 1 2 0 4 1 1 36543 2 4 10600 33 4 6 6 10 11 12 15+X = 3 1 2 0 4 1 1 344 6 6 10 116 1112Down sweep: tmp = parent (root is 0) right = tmp + mineAll (and) Any ( or)Input: Bits (Boolean)Sum (+)Product (*)MaxMinInput: RealsAny associative operation worksAssociative:(a  b)  c = a  (b  c)MatMulInput: Matrices01/15/1914Scan (Parallel Prefix) Operations•Definition: the parallel prefix operation takes a binary associative operator , and an array of n elements [a0, a1, a2, … an-1] and produces the array [a0, (a0 a1), … (a0 a1 ... an-1)]•Example: add scan of [1, 2, 0, 4, 2, 1, 1, 3] is [1, 3, 3, 7, 9, 10, 11, 14]01/15/1915Applications of scans•Many applications, some more obvious than others•lexically compare strings of characters•add multi-precision numbers•add binary numbers fast in hardware•graph algorithms•evaluate polynomials•implement bucket sort, radix sort, and even quicksort•solve tridiagonal linear systems•solve recurrence relations•dynamically allocate processors•search for regular expression (grep)•image processing primitives01/15/1916E.g., Using Scans for Array Compression•Given an array of n elements [a0, a1, a2, … an-1] and an array of flags [1,0,1,1,0,0,1,…] compress the flagged elements into [a0, a2, a3, a6, …]•Compute an add scan of [0, flags] :[0,1,1,2,3,3,4,…]•Gives the index of the ith element in the compressed array•If the flag for this element is 1, write it into the result array at the given position01/15/1917E.g., Fibonacci via Matrix Multiply PrefixFn+1 = Fn + Fn-11-nnn1nFF 0111FFCan compute all Fn by matmul_prefix on [ , , , , , , , , ]then select the upper left entry 011101110111011101110111011101110111Carry-Look Ahead Addition (Babbage 1800’s)Goal: Add Two n-bit Integers Example 1 0 1 1 1 Carry 1 0 1 1 1 First Int 1 0 1 0 1 Second Int 1 0 1 1 0 0


View Full Document

UCSB CS 240A - Parallel Vector Operations

Download Parallel Vector Operations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Vector Operations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Vector Operations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?