Parallel Programs Conditions of Parallelism Data Dependence Control Dependence Resource Dependence Bernstein s Conditions Asymptotic Notations for Algorithm Analysis Parallel Random Access Machine PRAM Example sum algorithm on P processor PRAM Network Model of Message Passing Multicomputers Example Asynchronous Matrix Vector Product on a Ring Levels of Parallelism in Program Execution Hardware Vs Software Parallelism Parallel Task Grain Size Example Motivating Problems With high levels of concurrency Limited Concurrency Amdahl s Law Parallel Performance Metrics Degree of Parallelism DOP Concurrency Profile Steps in Creating a Parallel Program Decomposition Assignment Orchestration Mapping Program Partitioning Example Static Multiprocessor Scheduling Example EECC756 Shaaban 1 lec 3 Spring2003 3 Conditions of Parallelism Data Dependence 1 True Data or Flow Dependence A statement S2 is data dependent on statement S1 if an execution path exists from S1 to S2 and if at least one output variable of S1 feeds in as an input operand used by S2 denoted by S1 S2 2 Antidependence Statement S2 is antidependent on S1 if S2 follows S1 in program order and if the output of S2 overlaps the input of S1 denoted by S1 S2 3 Output dependence Two statements are output dependent if they produce the same output variable denoted by S1 S2 EECC756 Shaaban 2 lec 3 Spring2003 3 Conditions of Parallelism Data Dependence 4 I O dependence Read and write are I O statements I O dependence occurs not because the same variable is involved but because the same file is referenced by both I O statements 5 Unknown dependence Subscript of a variable is subscribed indirect addressing The subscript does not contain the loop index A variable appears more than once with subscripts having different coefficients of the loop variable The subscript is nonlinear in the loop index variable EECC756 Shaaban 3 lec 3 Spring2003 3 Data and I O Dependence Examples A S1 S2 S3 S4 Load R1 A Add R2 R1 Move R1 R3 Store B R1 S1 S4 S2 Dependence graph B S1 S2 S3 S4 Read 4 A I Rewind 4 Write 4 B I Rewind 4 S3 Read array A from tape unit 4 Rewind tape unit 4 Write array B into tape unit 4 Rewind tape unit 4 I O dependence caused by accessing the same file by the read and write statements S1 I O S3 EECC756 Shaaban 4 lec 3 Spring2003 3 Conditions of Parallelism Control Dependence Order of execution cannot be determined before runtime due to conditional statements Resource Dependence Concerned with conflicts in using shared resources including functional units integer floating point memory areas among parallel tasks Bernstein s Conditions Two processes P1 P2 with input sets I1 I2 and output sets O1 O2 can execute in parallel denoted by P1 P2 if I1 O2 I2 O1 O1 O2 EECC756 Shaaban 5 lec 3 Spring2003 3 Bernstein s Conditions An Example For the following instructions P1 P2 P3 P4 P5 in program order and Instructions are in program order Each instruction requires one step to execute Two adders are available P1 C D x E Using Bernstein s Conditions after checking statement pairs P2 M G C P1 P5 P2 P3 P3 A B C P2 P5 Time P4 C L M G P1 X P2 1 3 2 P4 P3 Dependence graph Data dependence solid lines Resource dependence dashed lines P5 B L E Sequential execution X P4 P5 D E D E P5 F G E P5 P3 P1 X G C G E C 1 P2 2 P3 L M A 3 P1 B 1 P2 3 P4 C P4 C P G 2 P3 A P 5 F Parallel execution in three steps assuming two adders are available per step 5 F EECC756 Shaaban 6 lec 3 Spring2003 3 Asymptotic Notations for Algorithm Analysis Asymptotic analysis of computing time of an algorithm f n ignores constant execution factors and concentrates on determining the order of magnitude of algorithm performance Upper bound Used in worst case analysis of algorithm performance f n O g n iff there exist two positive constants c and n 0 such that f n c g n for all n n0 i e g n an upper bound on f n O 1 O log n O n O n log n O n 2 O n3 O 2n EECC756 Shaaban 7 lec 3 Spring2003 3 Asymptotic Notations for Algorithm Analysis Lower bound Used in the analysis of the lower limit of algorithm performance f n g n if there exist positive constants c n0 such that f n c g n for all n n0 i e g n is a lower bound on f n Tight bound Used in finding a tight limit on algorithm performance f n g n if there exist constant positive integers c 1 c2 and n0 such that c1 g n f n c2 g n for all n n0 i e g n is both an upper and lower bound on f n EECC756 Shaaban 8 lec 3 Spring2003 3 The Growth Rate of Common Computing Functions log n n n log n 0 1 2 3 4 5 1 2 4 8 16 32 0 2 8 24 64 160 n2 n3 1 4 16 64 256 1024 1 8 64 512 4096 32768 2n 2 4 16 256 65536 4294967296 EECC756 Shaaban 9 lec 3 Spring2003 3 Theoretical Models of Parallel Computers Parallel Random Access Machine PRAM n processor global shared memory model Models idealized parallel computers with zero synchronization or memory access overhead Utilized parallel algorithm development and scalability and complexity analysis PRAM variants More realistic models than pure PRAM EREW PRAM Simultaneous memory reads or writes to from the same memory location are not allowed CREW PRAM Simultaneous memory writes to the same location is not allowed ERCW PRAM Simultaneous reads from the same memory location are not allowed CRCW PRAM Concurrent reads or writes to from the same memory location are allowed EECC756 Shaaban 10 lec 3 Spring2003 Example sum algorithm on P processor PRAM Input Array A of size n 2k in shared memory Initialized local variables the order n number of processors p 2q n the processor number s Output The sum of the elements of A stored in shared memory begin 1 for j 1 to l n p do Set B l s 1 j A l s 1 j 2 for h 1 to log n do 2 1 if k h q 0 then for j 2k h q s 1 1 to 2k h qS do Set B j B 2j 1 B 2s 2 2 else if s 2k h then Set B s B 2s 1 B 2s 3 if s 1 then set S B 1 end Running time analysis Step 1 takes O n p each processor executes n p operations The hth of step 2 takes O n 2hp since each processor has to perform n 2hp operations n log n n Step three takes O 1 n n O O log n Tp p h p Total Running time p h 1 2 EECC756 Shaaban 11 lec 3 Spring2003 Example Sum …
View Full Document
Unlocking...