Parallel ProgramsConditions of Parallelism: Data DependenceConditions of Parallelism: Data DependenceData and I/O Dependence: ExamplesConditions of ParallelismBernstein’s Conditions: An ExampleAsymptotic Notations for Algorithm AnalysisSlide 8The Growth Rate of Common Computing FunctionsTheoretical Models of Parallel ComputersExample: sum algorithm on P processor PRAMExample: Sum Algorithm on P Processor PRAMThe Power of The PRAM ModelPerformance of Parallel AlgorithmsNetwork Model of Message-Passing MulticomputersNetwork Model of MulticomputersA Four-Dimensional HypercubeExample: Asynchronous Matrix Vector Product on a RingCreating a Parallel ProgramSome Important ConceptsLevels of Parallelism in Program ExecutionHardware and Software ParallelismComputational Parallelism and Grain SizeSlide 24Example Motivating Problems: Simulating Ocean CurrentsExample Motivating Problems: Simulating Galaxy EvolutionExample Motivating Problems: Rendering Scenes by Ray TracingLimited Concurrency: Amdahl’s LawAmdahl’s Law Example: A Pictorial DepictionParallel Performance Metrics Degree of Parallelism (DOP)Example: Concurrency Profile of A Divide-and-Conquer AlgorithmConcurrency Profile & SpeedupParallel Performance ExampleParallel Performance Example (continued)Steps in Creating a Parallel ProgramDecompositionAssignmentOrchestrationMappingProgram Partitioning ExampleStatic Multiprocessor SchedulingSlide 42EECC756 - ShaabanEECC756 - Shaaban#1 lec # 3 Spring2003 3-18-2003Parallel ProgramsParallel Programs•Conditions of Parallelism:Conditions of Parallelism:–Data DependenceData Dependence–Control DependenceControl Dependence–Resource DependenceResource Dependence–Bernstein’s ConditionsBernstein’s Conditions•Asymptotic Notations for Algorithm AnalysisAsymptotic Notations for Algorithm Analysis•Parallel Random-Access Machine (PRAM)–Example: sum algorithm on P processor PRAMExample: sum algorithm on P processor PRAM•Network Model of Message-Passing MulticomputersNetwork Model of Message-Passing Multicomputers–Example: Asynchronous Matrix Vector Product on a Ring•Levels of Parallelism in Program ExecutionLevels of Parallelism in Program Execution•Hardware Vs. Software ParallelismHardware Vs. Software Parallelism•Parallel Task Grain SizeParallel Task Grain Size•Example Motivating Problems With high levels of concurrencyExample Motivating Problems With high levels of concurrency•Limited Concurrency: Amdahl’s LawLimited Concurrency: Amdahl’s Law•Parallel Performance Metrics: Degree of Parallelism (DOP)Parallel Performance Metrics: Degree of Parallelism (DOP)•Concurrency ProfileConcurrency Profile•Steps in Creating a Parallel Program: Steps in Creating a Parallel Program: –Decomposition, Assignment, Orchestration, Mapping Decomposition, Assignment, Orchestration, Mapping –Program Partitioning ExampleProgram Partitioning Example–Static Multiprocessor Scheduling ExampleStatic Multiprocessor Scheduling ExampleEECC756 - ShaabanEECC756 - Shaaban#2 lec # 3 Spring2003 3-18-2003Conditions of Parallelism: Conditions of Parallelism: Data DependenceData Dependence1True Data or Flow Dependence: A statement S2 is data dependent on statement S1 if an execution path exists from S1 to S2 and if at least one output variable of S1 feeds in as an input operand used by S2 denoted by S1 S22Antidependence: Statement S2 is antidependent on S1 if S2 follows S1 in program order and if the output of S2 overlaps the input of S1 denoted by S1 S23Output dependence: Two statements are output dependent if they produce the same output variable denoted by S1 S2EECC756 - ShaabanEECC756 - Shaaban#3 lec # 3 Spring2003 3-18-2003Conditions of Parallelism: Data DependenceConditions of Parallelism: Data Dependence4I/O dependence: Read and write are I/O statements. I/O dependence occurs not because the same variable is involved but because the same file is referenced by both I/O statements.5Unknown dependence:•Subscript of a variable is subscribed (indirect addressing)•The subscript does not contain the loop index.•A variable appears more than once with subscripts having different coefficients of the loop variable.•The subscript is nonlinear in the loop index variable.EECC756 - ShaabanEECC756 - Shaaban#4 lec # 3 Spring2003 3-18-2003Data and I/O Dependence: ExamplesData and I/O Dependence: Examples A -B -S1: Load R1,AS2: Add R2, R1S3: Move R1, R3S4: Store B, R1S1: Read (4),A(I) /Read array A from tape unit 4/S2: Rewind (4) /Rewind tape unit 4/S3: Write (4), B(I) /Write array B into tape unit 4/S4: Rewind (4) /Rewind tape unit 4/ S1 S3 S4 S2Dependence graph S1 S3I/OI/O dependence caused by accessing thesame file by the read and write statementsEECC756 - ShaabanEECC756 - Shaaban#5 lec # 3 Spring2003 3-18-2003Conditions of ParallelismConditions of Parallelism•Control Dependence:–Order of execution cannot be determined before runtime due to conditional statements.•Resource Dependence:–Concerned with conflicts in using shared resources including functional units (integer, floating point), memory areas, among parallel tasks.•Bernstein’s Conditions: Two processes P1 , P2 with input sets I1, I2 and output sets O1, O2 can execute in parallel (denoted by P1 || P2) if:I1 O2 = I2 O1 = O1 O2 = EECC756 - ShaabanEECC756 - Shaaban#6 lec # 3 Spring2003 3-18-2003Bernstein’s Conditions: An ExampleBernstein’s Conditions: An Example•For the following instructions P1, P2, P3, P4, P5 in program order and –Instructions are in program order–Each instruction requires one step to execute–Two adders are available P1 : C = D x E P2 : M = G + C P3 : A = B + C P4 : C = L + M P5 : F = G E Using Bernstein’s Conditions after checking statement pairs: P1 || P5 , P2 || P3 , P2 || P5 , P5 || P3 , P4 || P5XP1D E+3P4+2P3+1P2CBGLP5G EFACXP1D E+1P2+3P4P5GBFC+2P3ALEGCMParallel execution in three stepsassuming two adders are available per stepSequential executionTime XP1 P5 +2 +3 +1P2P4P3Dependence graph:Data dependence (solid lines)Resource dependence (dashed lines)EECC756 - ShaabanEECC756 - Shaaban#7 lec # 3 Spring2003 3-18-2003Asymptotic Notations for Algorithm AnalysisAsymptotic Notations for Algorithm Analysis Asymptotic analysis of computing time of an algorithm f(n) ignores constant execution
View Full Document