Sangyeun Cho, Rami Melhem, Michael MoengUniversity of PittsburghPresented By: Michael Moeng Amdahl’s Law Power Applications Problem Formulation Derivations Conclusions Maximum speedup for a program given◦ Sequential fraction s, parallel fraction p=1-s◦ N processors Speedup = Assumes parallel portion can be perfectly parallelized Keep execution time constant and improve dynamic energy Assumes p ~ fα Find optimal serial and parallel frequency by taking derivative Keep execution time constant and improve energy Now we set speedup to x Set fs, fp What are optimal serial and parallel frequencies? Now we set speedup to x Set fs, fp What are optimal serial and parallel frequencies? Trivial – minimize frequencies Must consider static power! Serial work scompletes in time t Parallel work p=1-s completes in time 1/x-t◦ Recall that x is speedup, so 1/x is total time Energy is composed of ◦ Serial dynamic energy = ◦ Parallel dynamic energy = ◦ Static energy = First consider special case, x=1 (same execution time as serial case) Take derivative with respect to t First consider special case, x=1 (same execution time as serial case) Take derivative with respect tot First consider special case, x=1 (same execution time as serial case) Take derivative with respect to t Relation between fs*and fp*is dependent on N, but not s! Note that static energy has no effect in this casefp=Fmaxfs=Fmax Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Relax execution time constraint◦ Set derivatives with respect to both xand tto 0◦ Solve equations Previous solution necessitates λN ≤ α-1◦fs= Fmax Energy Delay Energy Delay Energy Delay Same relation between fs*and fp* Implications? Synchronization cost◦ Parallel architectures have some overhead for communication◦ As N increases, more total work is required, which can be expressed as a function◦ Total work is now s + p(1 + σ(N)) Synchronization cost◦ Parallel architectures have some overhead for communication◦ As N increases, more total work is required, which can be expressed as a function◦ Total work is now s + (p + σ(N)) Optimal serial and parallel frequencies have the same relationship◦ Still differ by N1/αMark D. Hill, University of Wisconsin-Madison andMichael R. Marty, GooglePresented By: Michael Moeng Chip has a total resource budget, defined in the number of ‘baseline’ cores that can be supported Assume performance for an r-BCE core grows by
View Full Document