Berkeley COMPSCI C267 - Tricks with Trees - D1336163

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Tricks with Trees

DOC PREVIEW

Berkeley COMPSCI C267 - Tricks with Trees

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 267 Tricks with TreesOutlineSlide 3Slide 4A log n lower bound to compute any function of n variablesBroadcasts and Reductions on TreesParallel Prefix, or ScanMapping Parallel Prefix onto a Tree - DetailsE.g., Fibonacci via Matrix Multiply PrefixAdding two n-bit integers in O(log n) timeOther applications of scansMultiplying n-by-n matrices in O(log n) timeInverting triangular n-by-n matrices in O(log2 n) timeInverting Dense n-by-n matrices in O(log n) timeEvaluating arbitrary expressionsEvaluating recurrencesImage Segmentation (1/4)Image Segmentation (2/4)Image Segmentation (3/4)Image Segmentation (4/4)Sparse-Matrix-Vector-Multiply (SpMV) y = A*x Using Segmented Scan (SegScan)Page layout in a browserSummary of tree algorithms102/09/2010 CS267 Lecture 7+CS 267Tricks with TreesJames Demmel and Horst Simonwww.cs.berkeley.edu/~demmel/cs267_Spr10202/09/2010 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time302/09/2010 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time°“2D parallel prefix”, for image segmentation (Bryan Catanzaro, Kurt Keutzer)°Sparse-Matrix-Vector-Multiply (SpMV) using Segmented Scan°Parallel page layout in a browser (Leo Meyerovich, Ras Bodik)402/09/2010 CS267 Lecture 7+Outline°A log n lower bound to compute any function in parallel°Reduction and broadcast in O(log n) time°Parallel prefix (scan) in O(log n) time°Adding two n-bit integers in O(log n) time°Multiplying n-by-n matrices in O(log n) time°Inverting n-by-n triangular matrices in O(log2 n) time°Inverting n-by-n dense matrices in O(log2 n) time°Evaluating arbitrary expressions in O(log n) time°Evaluating recurrences in O(log n) time°“2D parallel prefix”, for image segmentation (Bryan Catanzaro, Kurt Keutzer)°Sparse-Matrix-Vector-Multiply (SpMV) using Segmented Scan°Parallel page layout in a browser (Leo Meyerovich, Ras Bodik)°Solving n-by-n tridiagonal matrices in O(log n) time°Traversing linked lists °Computing minimal spanning trees°Computing convex hulls of point sets502/09/2010 CS267 Lecture 7+A log n lower bound to compute any function of n variables°Assume we can only use binary operations, one per time unit°After 1 time unit, an output can only depend on two inputs°Use induction to show that after k time units, an output can only depend on 2k inputs•After log2 n time units, output depends on at most n inputs°A binary tree performs such a computation602/09/2010 CS267 Lecture 7+Broadcasts and Reductions on Trees702/09/2010 CS267 Lecture 7+Parallel Prefix, or Scan°If “+” is an associative operator, and x[0],…,x[p-1] are input data then parallel prefix operation computes°Notation: j:k mean x[j]+x[j+1]+…+x[k], blue is final valuey[j] = x[0] + x[1] + … + x[j] for j=0,1,…,p-1802/09/2010 CS267 Lecture 7+Mapping Parallel Prefix onto a Tree - Details°Up-the-tree phase (from leaves to root)°By induction, Lsave = sum of all leaves in left subtree°Down the tree phase (from root to leaves)°By induction, S = sum of all leaves to left of subtree rooted at the parent1) Get values L and R from left and right children2) Save L in a local register Lsave3) Pass sum L+R to parent1) Get value S from parent (the root gets 0)2) Send S to the left child3) Send S + Lsave to the right child902/09/2010 CS267 Lecture 7+E.g., Fibonacci via Matrix Multiply PrefixFn+1 = Fn + Fn-11-nnn1nFF 0111FFCan compute all Fn by matmul_prefix on [ , , , , , , , , ]then select the upper left entry 011101110111011101110111011101110111Slide source: Alan Edelman, MIT1002/09/2010 CS267 Lectur+e 7Adding two n-bit integers in O(log n) time°Let a = a[n-1]a[n-2]…a[0] and b = b[n-1]b[n-2]…b[0] be two n-bit binary numbers°We want their sum s = a+b = s[n]s[n-1]…s[0]°Challenge: compute all c[i] in O(log n) time via parallel prefix°Used in all computers to implement addition - Carry look-aheadc[-1] = 0 … rightmost carry bitfor i = 0 to n-1 c[i] = ( (a[i] xor b[i]) and c[i-1] ) or ( a[i] and b[i] ) ... next carry bit s[i] = ( a[i] xor b[i] ) xor c[i-1] for all (0 <= i <= n-1) p[i] = a[i] xor b[i] … propagate bit for all (0 <= i <= n-1) g[i] = a[i] and b[i] … generate bit c[i] = ( p[i] and c[i-1] ) or g[i] = p[i] g[i] * c[i-1] = C[i] * c[i-1] 1 1 0 1 1 1 … 2-by-2 Boolean matrix multiplication (associative) = C[i] * C[i-1] * … C[0] * 0 1 … evaluate each P[i] = C[i] * C[i-1] * … * C[0] by parallel prefix1102/09/2010 CS267 Lecture 7+Other applications of scans°There are many applications of scans, some more obvious than others•add multi-precision numbers (represented as array of numbers)•evaluate recurrences, expressions •solve tridiagonal systems (numerically unstable!)•implement bucket sort and radix sort•to dynamically allocate processors•to search for regular expression (e.g., grep)°Names: +\ (APL), cumsum (Matlab), MPI_SCAN°Note: 2n operations used when only n-1 needed1202/09/2010 CS267 Lecture 7+Multiplying n-by-n matrices in O(log n) time°For all (1 <= i,j,k <= n) P(i,j,k) = A(i,k) * B(k,j)•cost = 1 time unit, using n3 processors°For all (1 <= i,j <= n) C(i,j) = P(i,j,k)•cost = O(log n) time, using a tree with n3 / 2 processorsk =1n1302/09/2010 CS267 Lecture 7+Inverting triangular

View Full Document

Berkeley COMPSCI C267 - Tricks with Trees

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

Berkeley COMPSCI C267 - Tricks with Trees

Sign up for free to view:

Please select your school