DOC PREVIEW
Berkeley COMPSCI C267 - Data Parallel Architectures and Programming

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS267 L6 Data Parallel Programming.1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 6: Distributed Memory (continued)Data Parallel Architectures and Programming James Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L6 Data Parallel Programming.2Demmel Sp 1999Recap of Last Lecture°Distributed memory machines• Each processor has independent memory• Connected by network- topology, other properties°Cost = #messages * α + #words_sent * β + #flops * f + delay°Distributed memory programming• MPI• Send/Receive• Collective Communication• Sharks and Fish under gravity as example2CS267 L6 Data Parallel Programming.3Demmel Sp 1999Outline°Distributed Memory Programming (continued)• Review Gravity Algorithms• Look at Sharks and Fish code°Data Parallel Programming• Evolution of Machines• Fortran 90 and Matlab• HPF (High Performance Fortran)CS267 L6 Data Parallel Programming.4Demmel Sp 1999Example: Sharks and Fish° N fish on P procs, N/P fish per processor• At each time step, compute forces on fish and move them° Need to compute gravitational interaction• In usual N^2 algorithm, every fish depends on every other fish force on j = Σ (force on j due to k)• every fish needs to “visit” every processor, even if it “lives” onone° What is the cost?k=1:Nk != j3CS267 L6 Data Parallel Programming.5Demmel Sp 19992 Algorithms for Gravity: What are their costs?Algorithm 1 Copy local Fish array of length N/P to Tmp array for j = 1 to N for k = 1 to N/P, Compute force from Tmp(k) on Fish(k) “Rotate” Tmp by 1 for k=2 to N/P, Tmp(k) <= Tmp(k-1) recv(my_proc - 1,Tmp(1)) send(my_proc+1,Tmp(N/P)Algorithm 2 Copy local Fish array of length N/P to Tmp array for j = 1 to P for k=1 to N/P, for m=1 to N/P, Compute force from Tmp(k) on Fish(m) “Rotate” Tmp by N/P recv(my_proc - 1,Tmp(1:N/P)) send(my_proc+1,Tmp(1:N/P))What could go wrong? (be careful of overwriting Tmp)CS267 L6 Data Parallel Programming.6Demmel Sp 1999More Algorithms for Gravity° Algorithm 3 (in sharks and fish code)• All processors send their Fish to Proc 0• Proc 0 broadcasts all Fish to all processors° Tree-algorithms• Barnes-Hut, Greengard-Rokhlin, Anderson• O(N log N) instead of O(N^2)• Parallelizable with cleverness• “Just” an approximation, but as accurate as you like (often only afew digits are needed, so why pay for more)• Same idea works for other problems where effects of distantobjects becomes “smooth” or “compressible”- electrostatics, vorticity, …- radiosity in graphics- anything satisfying Poisson equation or something like it• Will talk about it in detail later in course4CS267 L6 Data Parallel Programming.7Demmel Sp 1999Examine Sharks and Fish Code° www.cs.berkeley.edu/~demmel/cs267_Spr99/Lectures/fish.cCS267 L6 Data Parallel Programming.8Demmel Sp 1999Data Parallel Machines5CS267 L6 Data Parallel Programming.9Demmel Sp 1999Data Parallel Architectures° Programming model• operations are performed on each element of a large (regular)data structure in a single step• arithmetic, global data transfer° A processor is logically associated with each dataelement• A=B+C means for all j, A(j) = B(j) + C(j) in parallel° General communication• A(j) = B(k) may communicate° Global synchronization• implicit barrier between statements° SIMD: Single Instruction, Multiple DataControlProcessorP-M P-M P-M° ° °P-M P-M P-M° ° °P-M P-M P-M° ° °CS267 L6 Data Parallel Programming.10Demmel Sp 1999Vector Machines° The Cray-1 and its successors (www.sgi.com/t90)• Load/store into 64-word Vector Registers, with strides: vr(j) = Mem(base + j*s)• Instructions operate on entire vector registers: for j=1:N vr1(j) = vr2(j) + vr3(j)vectorregisterspipelined function unitshighly interleaved semiconductor (SRAM) memory° No cache, but very fast (expensive) memory° Scatter [Mem(Pnt(j)) = vr(j)] and Gather [vr(j) = Mem(Pnt(j)]° Flag Registers [vf(j) = (vr3(j) != 0)]° Masked operations [vr1(j) = vr2(j)/vr3(j) where vf(j)==1]° Fast scalar unit too6CS267 L6 Data Parallel Programming.11Demmel Sp 1999Use of SIMD Model on Vector MachinesVP0VP1VP63vr0vr1vr31vf0vf1vf3164 bits1 bitGeneralPurposeRegisters(32)FlagRegisters(32)Virtual Processors (64)vcr0vcr1vcr15ControlRegisters32 bitsCS267 L6 Data Parallel Programming.12Demmel Sp 1999Evolution of Vector Processing° Cray (now SGI), Convex, NEC, Fujitsu, Hitachi,…° Pro: Very fast memory makes it easy to program• Don’t worry about cost of loads/stores, where data is (but memory banks)° Pro: Compilers automatically convert loops to use vector instructions• for j=1 to n, A(j) = x*B(j)+C(k,j) becomes sequence of vector instructionsthat breaks operation into groups of 64° Pro: Easy to compile languages like Fortran90° Con: Much more expensive than bunch of micros on network° Relatively few customers, but powerful ones° New application: multimedia• New microprocessors have fixed point vector instructions (MMX, VIS)• VIS (Sun’s Visual Instruction Set) (www.sun.com/sparc/vis)- 8, 16 and 32 bit integer ops- Short vectors only (2 or 4)- Good for operating on arrays of pixels, video7CS267 L6 Data Parallel Programming.13Demmel Sp 1999Data parallel programmingCS267 L6 Data Parallel Programming.14Demmel Sp 1999Evolution of Data Parallel Programming° Early machines had single control unit for multiplearithmetic unit, so data parallel programming wasnecessary° Also a natural fit to vector machines° Can be compiled to run on any parallel machine, ontop of shared memory or MPI° Fortran 77 -> Fortran 90 -> HPF (High Performance Fortran)8CS267 L6 Data Parallel Programming.15Demmel Sp 1999Fortran90 Execution Model (also Matlab)• Sequential composition of parallel (or scalar) statements• Parallel operations on arrays• Arrays have rank (# dimensions), shape (extents), type (elements)– HPF adds layout• Communication implicit in array operations• Hardware configuration independentMainSubr(…)CS267 L6 Data Parallel Programming.16Demmel Sp 1999Example: gravitational fish integer, parameter :: nfish = 10000 complex fishp(nfish), fishv(nfish), force(nfish), accel(nfish) real fishm(nfish). . . do while (t < tfinal) t = t + dt fishp = fishp +


View Full Document

Berkeley COMPSCI C267 - Data Parallel Architectures and Programming

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Data Parallel Architectures and Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Parallel Architectures and Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Parallel Architectures and Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?