DOC PREVIEW
UMD CMSC 714 - Lecture Slides

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthAnnouncementsz No Class next Thursdayz Class will meet on Friday (9/15) in room 3450 AV Williams– 9:30-10:452CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthSeismic Codez Given echo data, compute under sea mapz Computation model– designed for a collection of workstations– uses variation of RPC model– workers are given an independent trace to compute• requires little communication• supports load balancing (1,000 traces is typical)z Performance– max mfops = O((F * nz * B*)1/2)– F - single processor MFLOPS– nz - linear dimension of input array–B*- effective communication bandwidth•B*= B/(1 + BL/w) ≈ B/7 for Ethernet (10msec lat., w=1400)– real limit to performance was latency not bandwidth3CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthDatabase Applicationsz Too much data to fit in memory (or sometimes disk)– data mining applications (K-Mart has a 4-5TB database)– imaging applications (NASA has a site with 0.25 petabytes)• use a fork lift to load tapes by the palletz Sources of parallelism– within a large transaction– among multiple transactions z Join operation– form a single table from two tables based on a common field– try to split join attribute in disjoint buckets• if know data distribution is uniform its easy• if not, try hashing4CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthSpeedup in Join parallelismz Books claims a speed up of1/p2is possible– split each relation into p buckets• each bucket is a disjoint subset of the joint attribute– each processor only has to consider N/p tuples per relation•join is O(n2) so each processor does O((N/p)2) work• so spedup is O(N2/p2)/O(N2) = O(1/p2)z this is a lie!• could split into 1/p buckets on one processor• time would then be O(p * (N/p)2) = O(N2/p)• so speedup is O(N2/p2)/O(N2/p) = O(1/p)– Amdahls law is not violated5CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthParallel Search (TSP)z may appear to be faster than 1/n– but this is not really the case eitherz Algorithm– compute a path on a processor • if our path is shorter than the shortest one, send it to the others.• stop searching a path when it is longer than the shortest.– before computing next path, check for word of a new min path– stop when all paths have been explored.z Why it appears to be faster than 1/n speedup– we found the a path that was shorter sooner– however, the reason for this is a different search order!6CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthEnsuring a fair speedupz Tserial= faster of– best known serial algorithm– simulation of parallel computation• use parallel algorithm• run all processes on one processor– parallel algorithm run on one processorz If it appears to be super-linear– check for memory hierarchy• increased cache or real memory may be reason– verify order operations is the same in parallel and serial cases7CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthQuantitative Speedupz Consider master-worker– one master and n worker processes– communication time increases as a linear function of nTp= TCOMPp+ TCOMMpTCOMPp= Ts/P1/Sp= Tp/Ts= 1/P + TCOMMp/TsTCOMMpis P * TCOMM11/Sp=1/p + p * TCOMM1/Ts= 1/P + P/r1where r1= Ts/TCOMM1d(1/Sp)/dP = 0 --> Popt= r11/2 and Sopt= 0.5 r11/2z For hierarchy of masters–TCOMMp= (1+logP)TCOMM1–Popt= r1and Sopt= r1/(1 + log r1)8CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthMPIz Goals:– Standardize previous message passing:• PVM, P4, NX– Support copy free message passing– Portable to many platformsz Features:– point-to-point messaging– group communications– profiling interface: every function has a name shifted versionz Buffering– no guarantee that there are buffers– possible that send will block until receive is calledz Delivery Order– two sends from same process to same dest. will arrive in order– no guarantee of fairness between processes on recv.9CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthMPI Communicatorsz Provide a named set of processes for communicationz All processes within a communicator can be named– numbered from 0…n-1z Allows libraries to be constructed– application creates communicators– library uses it– prevents problems with posting wildcard receives• adds a communicator scope to each receivez All programs start will MPI_COMM_WORLD10CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthNon-Blocking Functionsz Two Parts– post the operation–wait for resultsz Also includes a poll option– checks if the operation has finishedz Semantics– must not alter buffer while operation is pending11CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthMPI Misc.z MPI Types– All messages are typed• base types are pre-defined: – int, double, real, {,unsigned}{short, char, long}• can construct user defined types– includes non-contiguous data typesz Processor Topologies– Allows construction of Cartesian & arbitrary graphs– May allow some systems to run fasterz What’s not in MPI-1– process creation– I/O– one sided communication12CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthMPI Housekeeping Callsz Include <mpi.h> in your programz If using mpich, …z First call MPI_Init(&argc, &argv)z MPI_Comm_rank(MPI_COMM_WORLD, &myrank)– Myrank is set to id of this processz MPI_Wtime– Returns wall timez At the end, call MPI_Finalize()13CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthMPI Communication Callsz Parameters– var – a variable– num – number of elements in the variable to use– type {MPI_INT, MPI_REAL, MPI_BYTE}– root – rank of processor at root of collective operation– dest – rank of destination processor– status - variable of type MPI_Status;z Calls (all return a code – check for MPI_Success)– MPI_Send(var, num, type, dest, tag, MPI_COMM_WORLD)– MPI_Recv(var, num, type, dest, MPI_ANY_TAG, MPI_COMM_WORLD, &status)– MPI_Bcast(var, num, type, root, MPI_COMM_WORLD)– MPI_Barrier(MPI_COMM_WORLD)14CMSC 714 – F06 (lect 3)copyright 2006 Jeffrey K. HollingsworthPVMz Provide a simple, free, portable parallel environmentz Run on everything– Parallel Hardware: SMP, MPPs, Vector Machines– Network of Workstations: ATM, Ethernet,• UNIX machines and PCs running Win*– Works on a heterogenous


View Full Document

UMD CMSC 714 - Lecture Slides

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Lecture Slides
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Slides and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Slides 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?