Berkeley COMPSCI C267 - Distributed Memory Programming (MPI) and TreeBased Algorithms - D2335043

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Distributed Memory Programming (MPI) and TreeBased Algorithms

DOC PREVIEW

Berkeley COMPSCI C267 - Distributed Memory Programming (MPI) and TreeBased Algorithms

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 32

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 32 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 267: Distributed Memory Programming (MPI) and Tree-Based AlgorithmsRecap of Last LectureOutlineMPI Basic (Blocking) SendA Simple MPI ProgramCollective Operations in MPIExample: PI in C - 1Example: PI in C - 2MPI Collective RoutinesBuffersAvoiding BufferingSources of DeadlocksSome Solutions to the “unsafe” ProblemMore Solutions to the “unsafe” ProblemMPI’s Non-blocking OperationsOther MPI Point-to-Point FeaturesSynchronizationTree-Based ComputationA log n lower bound to compute any function of n variablesBroadcasts and Reductions on TreesParallel Prefix, or ScanMapping Parallel Prefix onto a Tree - DetailsE.g., Fibonacci via Matrix Multiply PrefixAdding two n-bit integers in O(log n) timeOther applications of scansEvaluating arbitrary expressionsMultiplying n-by-n matrices in O(log n) timeEvaluating recurrencesSummaryExtra SlidesInverting Dense n-by-n matrices in O(log n) timeSummary of tree algorithms2/9/2007 CS267 Lecure 8 1CS 267:Distributed Memory Programming (MPI) and Tree-Based Algorithms Kathy Yelickwww.cs.berkeley.edu/~yelick/cs267_sp072/9/2007 CS267 Lecure 8 2Recap of Last Lecture•Distributed memory multiprocessors•Several possible network topologies•On current systems, illusion that all nodes are directly connected to all others (performance may vary)•Key performance parameters:•Latency () bandwidth (1/), LogP for overlap details•Message passing programming •Single Program Multiple Data model (SPMD)•Communication explicit send/receive•Collective communication•Synchronization with barriersContinued today2/9/2007 CS267 Lecure 8 3OutlineTree-Base Algorithms (after MPI wrap-up)•A log n lower bound to compute any function in parallel•Reduction and broadcast in O(log n) time•Parallel prefix (scan) in O(log n) time•Adding two n-bit integers in O(log n) time•Multiplying n-by-n matrices in O(log n) time•Inverting n-by-n triangular matrices in O(log n) time•Evaluating arbitrary expressions in O(log n) time•Evaluating recurrences in O(log n) time•Inverting n-by-n dense matrices in O(log n) time•Solving n-by-n tridiagonal matrices in O(log n) time•Traversing linked lists •Computing minimal spanning trees•Computing convex hulls of point sets•There are online html lecture notes for this material from the 1996 course taught by Jim Demmelhttp://www.cs.berkeley.edu/~demmel/cs267/lecture14.html2/9/2007 CS267 Lecure 8 4MPI Basic (Blocking) SendMPI_SEND(start, count, datatype, dest, tag, comm)•The message buffer is described by (start, count, datatype).•The target process is specified by dest (rank within comm)•When this function returns, the buffer (A) can be reused, but the message may not have been received by the target process.MPI_RECV(start, count, datatype, source, tag, comm, status)•Waits until a matching (source and tag) message is received•source is rank in communicator specified by comm, or MPI_ANY_SOURCE•tag is a tag to be matched on or MPI_ANY_TAG•Receiving fewer than count is OK, but receiving more is an error•status contains further information (e.g. size of message)Slide source: Bill Gropp, ANLA(10)B(20)MPI_Send( A, 10, MPI_DOUBLE, 1, …)MPI_Recv( B, 20, MPI_DOUBLE, 0, … )2/9/2007 CS267 Lecure 8 5A Simple MPI Program#include “mpi.h”#include <stdio.h>int main( int argc, char *argv[]){ int rank, buf; MPI_Status status; MPI_Init(&argv, &argc); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); /* Process 0 sends and Process 1 receives */ if (rank == 0) { buf = 123456; MPI_Send( &buf, 1, MPI_INT, 1, 0, MPI_COMM_WORLD); } else if (rank == 1) { MPI_Recv( &buf, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status ); printf( “Received %d\n”, buf ); } MPI_Finalize(); return 0;}Slide source: Bill Gropp, ANLNote: Fortran and C++ versions are in online lecture notes2/9/2007 CS267 Lecure 8 10Collective Operations in MPI•Collective operations are called by all processes in a communicator•MPI_BCAST distributes data from one process (the root) to all others in a communicator•MPI_REDUCE combines data from all processes in communicator and returns it to one process•Operators include: MPI_MAX, MPI_MIN, MPI_PROD, MPI_SUM,…•In many numerical algorithms, SEND/RECEIVE can be replaced by BCAST/REDUCE, improving both simplicity and efficiency•Can use a more efficient algorithm than you might choose for simplicity (e.g., P-1 send/receive pairs for broadcast or reduce)•May use special hardware support on some systemsSlide source: Bill Gropp, ANL2/9/2007 CS267 Lecure 8 11Example: PI in C - 1#include "mpi.h"#include <math.h>#include <stdio.h>int main(int argc, char *argv[]){int done = 0, n, myid, numprocs, i, rc;double PI25DT = 3.141592653589793238462643;double mypi, pi, h, sum, x, a;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);while (!done) { if (myid == 0) { printf("Enter the # of intervals: (0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break;Slide source: Bill Gropp, ANL2/9/2007 CS267 Lecure 8 12Example: PI in C - 2 h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f, Error is .16f\n", pi, fabs(pi - PI25DT));}MPI_Finalize(); return 0;}Slide source: Bill Gropp, ANL2/9/2007 CS267 Lecure 8 17MPI Collective Routines•Many Routines: Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, Reduce_scatter, Scan, Scatter, Scatterv•All versions deliver results to all participating processes.•V versions allow the hunks to have different sizes.•Allreduce, Reduce, Reduce_scatter, and Scan take both built-in and user-defined combiner functions.•MPI-2 adds Alltoallw, Exscan, intercommunicator versions of most routines2/9/2007 CS267 Lecure 8 18Buffers•Message passing has a small set of primitives, but there are subtleties•Buffering and deadlock•Deterministic execution•Performance •When you send data, where does it go? One possibility is:Process 0 Process 1User dataLocal bufferthe networkUser dataLocal bufferDerived from: Bill Gropp, ANL2/9/2007 CS267 Lecure 8 19Avoiding Buffering•It is better to avoid copies:This requires that MPI_Send wait on delivery, or that MPI_Send return before transfer is complete,

View Full Document

Berkeley COMPSCI C267 - Distributed Memory Programming (MPI) and TreeBased Algorithms

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Berkeley COMPSCI C267 - Distributed Memory Programming (MPI) and Tree­Based Algorithms

Sign up for free to view:

Please select your school

Berkeley COMPSCI C267 - Distributed Memory Programming (MPI) and TreeBased Algorithms