Slide 1Several possible models!Work / Span ModelWork / Span ModelWork / Span ModelWork / Span ModelSeries CompositionParallel CompositionSpeedupParallelismCommunication Volume ModelComplexity Measures for Parallel ComputationLaws of Parallel ComplexitySlide 14Slide 15CS 240A:Complexity Measures for Parallel ComputationSeveral possible models!•Execution time and parallelism: •Work / Span Model•Total cost of moving data: •Communication Volume Model•Detailed models that try to capture time for moving data:•Latency / Bandwidth Model (for message-passing)•Cache Memory Model (for hierarchical memory)•Other detailed models we won’t discuss: LogP, UMH, ….tp = execution time on p processorsWork / Span Modeltp = execution time on p processorst1 = workWork / Span Modeltp = execution time on p processors*Also called critical-path lengthor computational depth.t1 = work t∞ = span *Work / Span Modeltp = execution time on p processorst1 = work t∞ = span **Also called critical-path lengthor computational depth.WORK LAW∙tp ≥t1/pSPAN LAW∙tp ≥ t∞Work / Span ModelWork: t1(A∪B) =Series CompositionAABBWork: t1(A∪B) = t1(A) + t1(B)Span: t∞(A∪B) = t∞(A) +t∞(B)Span: t∞(A∪B) =Parallel CompositionAABBSpan: t∞(A∪B) = max{t∞(A), t∞(B)}Work: t1(A∪B) = t1(A) + t1(B)Def. t1/tP = speedup on p processors.If t1/tP= (p), we have linear speedup,= p, we have perfect linear speedup,> p, we have superlinear speedup, (which is not possible in this model, because of the Work Law tp ≥ t1/p)SpeedupParallelismBecause the Span Law requires tp ≥ t∞, the maximum possible speedup ist1/t∞ = (potential) parallelism = the average amount of work per step along the span.Communication Volume Model•Network of p processors•Each with local memory•Message-passing •Communication volume (v)•Total size (words) of all messages passed during computation•Broadcasting one word costs volume p (actually, p-1)•No explicit accounting for communication time•Thus, can’t really model parallel efficiency or speedup; for that, we’d use the latency-bandwidth model (see later slides)Complexity Measures for Parallel ComputationProblem parameters:•n index of problem size•p number of processorsAlgorithm parameters:•tprunning time on p processors•t1time on 1 processor = sequential time = “work”•t∞time on unlimited procs = critical path length = “span”•v total communication volumePerformance measures•speedup s = t1 / tp•efficiency e = t1 / (p*tp) = s / p•(potential) parallelism pp = t1 / t∞Laws of Parallel Complexity•Work law: tp ≥ t1 / p•Span law: tp ≥ t∞ •Amdahl’s law:•If a fraction f, between 0 and 1, of the work must be done sequentially, then speedup ≤ 1 / f•Exercise: prove Amdahl’s law from the span law.Detailed complexity measures for data movement I: Latency/Bandwith ModelMoving data between processors by message-passing•Machine parameters:• a or tstartup latency (message startup time in seconds) • b or tdata inverse bandwidth (in seconds per word)•between nodes of Triton, a ~ 2.2 × 10-6 and b ~ 6.4 × 10-9•Time to send & recv or bcast a message of w words: a + w*b•tcomm total commmunication time•tcomp total computation time•Total parallel time: tp = tcomp + tcommMoving data between cache and memory on one processor:•Assume just two levels in memory hierarchy, fast and slow•All data initially in slow memory•m = number of memory elements (words) moved between fast and slow memory •tm = time per slow memory operation•f = number of arithmetic operations•tf = time per arithmetic operation, tf << tm•q = f / m average number of flops per slow element access•Minimum possible time = f * tf when all data in fast memory•Actual time •f * tf + m * tm = f * tf * (1 + tm/tf * 1/q) •Larger q means time closer to minimum f * tf Detailed complexity measures for data movement II: Cache Memory
View Full Document