DOC PREVIEW
Berkeley COMPSCI C267 - Tricks with Trees

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Tricks with TreesOutlineSlide 3Slide 4A log n lower bound to compute any function of n variablesBroadcasts and Reductions on TreesParallel Prefix, or ScanMapping Parallel Prefix onto a Tree - DetailsE.g., Fibonacci via Matrix Multiply PrefixAdding two n-bit integers in O(log n) timeOther applications of scansMultiplying n-by-n matrices in O(log n) timeInverting triangular n-by-n matrices in O(log2 n) timeInverting Dense n-by-n matrices in O(log n) timeEvaluating arbitrary expressionsEvaluating recurrencesImage Segmentation (1/4)Image Segmentation (2/4)Image Segmentation (3/4)Image Segmentation (4/4)Sparse-Matrix-Vector-Multiply (SpMV) y = A*x Using Segmented Scan (SegScan)Page layout in a browserSummary of tree algorithms102/11/2009 CS267 Lecture 7+CS 267Tricks with TreesJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr09202/11/2009 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time302/11/2009 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time°“2D parallel prefix”, for image segmentation (Bryan Catanzaro, Kurt Keutzer)°Sparse-Matrix-Vector-Multiply (SpMV) using Segmented Scan°Parallel page layout in a browser (Leo Meyerovich, Ras Bodik)402/11/2009 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time°“2D parallel prefix”, for image segmentation (Bryan Catanzaro, Kurt Keutzer)°Sparse-Matrix-Vector-Multiply (SpMV) using Segmented Scan°Parallel page layout in a browser (Leo Meyerovich, Ras Bodik)°Solving n-by-n tridiagonal matrices in O(log n) time°Traversing linked lists °Computing minimal spanning trees°Computing convex hulls of point sets502/11/2009 CS267 Lecture 7+A log n lower bound to compute any function of n variables°Assume we can only use binary operations, one per time unit°After 1 time unit, an output can only depend on two inputs°Use induction to show that after k time units, an output can only depend on 2k inputs•After log2 n time units, output depends on at most n inputs°A binary tree performs such a computation602/11/2009 CS267 Lecture 7+Broadcasts and Reductions on Trees702/11/2009 CS267 Lecture 7+Parallel Prefix, or Scan°If “+” is an associative operator, and x[0],…,x[p-1] are input data then parallel prefix operation computes°Notation: j:k mean x[j]+x[j+1]+…+x[k], blue is final valuey[j] = x[0] + x[1] + … + x[j] for j=0,1,…,p-1802/11/2009 CS267 Lecture 7+Mapping Parallel Prefix onto a Tree - Details°Up-the-tree phase (from leaves to root)°By induction, Lsave = sum of all leaves in left subtree°Down the tree phase (from root to leaves)°By induction, S = sum of all leaves to left of subtree rooted at the parent1) Get values L and R from left and right children2) Save L in a local register Lsave3) Pass sum L+R to parent1) Get value S from parent (the root gets 0)2) Send S to the left child3) Send S + Lsave to the right child902/11/2009 CS267 Lecture 7+E.g., Fibonacci via Matrix Multiply PrefixFn+1 = Fn + Fn-11-nnn1nFF 0111FFCan compute all Fn by matmul_prefix on [ , , , , , , , , ]then select the upper left entry 011101110111011101110111011101110111Slide source: Alan Edelman, MIT1002/11/2009 CS267 Lectur+e 7Adding two n-bit integers in O(log n) time°Let a = a[n-1]a[n-2]…a[0] and b = b[n-1]b[n-2]…b[0] be two n-bit binary numbers°We want their sum s = a+b = s[n]s[n-1]…s[0]°Challenge: compute all c[i] in O(log n) time via parallel prefix°Used in all computers to implement addition - Carry look-aheadc[-1] = 0 … rightmost carry bitfor i = 0 to n-1 c[i] = ( (a[i] xor b[i]) and c[i-1] ) or ( a[i] and b[i] ) ... next carry bit s[i] = ( a[i] xor b[i] ) xor c[i-1] for all (0 <= i <= n-1) p[i] = a[i] xor b[i] … propagate bit for all (0 <= i <= n-1) g[i] = a[i] and b[i] … generate bit c[i] = ( p[i] and c[i-1] ) or g[i] = p[i] g[i] * c[i-1] = C[i] * c[i-1] 1 1 0 1 1 1 … 2-by-2 Boolean matrix multiplication (associative) = C[i] * C[i-1] * … C[0] * 0 1 … evaluate each P[i] = C[i] * C[i-1] * … * C[0] by parallel prefix1102/11/2009 CS267 Lecture 7+Other applications of scans°There are many applications of scans, some more obvious than others•add multi-precision numbers (represented as array of numbers)•evaluate recurrences, expressions •solve tridiagonal systems (numerically unstable!)•implement bucket sort and radix sort•to dynamically allocate processors•to search for regular expression (e.g., grep)°Names: +\ (APL), cumsum (Matlab), MPI_SCAN°Note: 2n operations used when only n-1 needed1202/11/2009 CS267 Lecture 7+Multiplying n-by-n matrices in O(log n) time°For all (1 <= i,j,k <= n) P(i,j,k) = A(i,k) * B(k,j)•cost = 1 time unit, using n3 processors°For all (1 <= i,j <= n) C(i,j) = P(i,j,k)•cost = O(log n) time, using a tree with n3 / 2 processorsk =1n1302/11/2009 CS267 Lecture 7+Inverting triangular n-by-n matrices


View Full Document

Berkeley COMPSCI C267 - Tricks with Trees

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Tricks with Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tricks with Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tricks with Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?