UC Regents Spring 2005 © UCBCS 152 L7: Performance2005-2-8John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 7 – Performancewww-inst.eecs.berkeley.edu/~cs152/TAs: Ted Hong and David MarquardtUC Regents Spring 2005 © UCBCS 152 L7: PerformanceSolution #2: Consensus. Keeping in mind the goal (correctly working CPU on the board on schedule), what option brings the group closer to the goal?Example: 3 members want to do the design one way; member number 4 does not agree. Last Time: Tips for TeamworkSolution #1: Voting. “Fair”. But, what if the “loser” was technically correct?Never lose sight of the goal !UC Regents Spring 2005 © UCBCS 152 L7: PerformanceToday’s Lecture - PerformanceMeasurement: what, why, howThe performance equationHow energy limits performanceAmdahl’s lawAlso: news about PlayStation 3 “Cell” processorUC Regents Spring 2005 © UCBCS 152 L7: PerformancePerformance Measurement(as seen by the customer)UC Regents Spring 2005 © UCBCS 152 L7: PerformanceWho (sensibly) upgrades CPUs often?A professional who turns CPU cycles into money, and who is cycle-limited.Artist tool: animation, video special effects.UC Regents Spring 2005 © UCBCS 152 L7: PerformanceHow to decide to buy a new machine?Measure After Effects “execution time” on a representative render “workload” “Night flight”City map and cloudscomputed“on the fly” with fractalsCPU intensive Trivial I/OUC Regents Spring 2005 © UCBCS 152 L7: Performance Interpreting Execution TimePerformance1Execution Time== 2.85 renders/hour1.5 GHz PB (Y) is N times faster than 1.25 GHz PB (X). N is ?N =Performance (Y)Execution Time (Y)Execution Time (X)Performance (X)== 1. 19PB 1.5 Ghz : 3. 4 renders/hour. PB 1.25 : 2.85 renders/hour.Does artist productivity really increase?Execution Time: 1265 secondsPowerBookG41.25 GHzUC Regents Spring 2005 © UCBCS 152 L7: PerformanceExecution Time: Time for 1 job to complete 2 CPUs: Execution Time vs ThroughputThroughput: # of parallel jobs/hour completedCould G5 and Opteron have similar Throughput? Why?Assume G5 MP executiontime faster because AE doesnot use both Opteron CPUs.1.8xfaster.What does this imply?2 CPUs vs1 CPU,otherwisesimilarUC Regents Spring 2005 © UCBCS 152 L7: PerformancePerformance Measurement(as seen by a CPU designer)Q. Why do we care about After Effect’s performance?A. We want the CPU we are designing to run it well !UC Regents Spring 2005 © UCBCS 152 L7: PerformanceStep 1: Analyze the right measurement!CPU Time:Time the CPU spends running program under measurement.Response Time:Total time: CPU Time + time spent waiting (for disk, I/O, ...).Guides CPU designGuides system designHow to measure CPU time?% time <program name>25.77u 0.72s 0:29.17 90.8%UC Regents Spring 2005 © UCBCS 152 L7: Performance CPU time: Proportional to Instruction CountCPU timeProgramMachine InstructionsProgramQ. Static count?(lines of program printout)Or dynamic count? (trace of execution)Rationale: Every additional instruction you execute takes time.Q. What type of computer architect influences the number of instructions a given program needs?A. Instruction set architect.A. Dynamic.Q. Once ISA is set, who can influence instructioncount?A. Compiler writer,application developer.UC Regents Spring 2005 © UCBCS 152 L7: Performance CPU time: Proportional to Clock PeriodQ. What ultimately limitsan architect’s ability to reduce clock period ?TimeProgramTimeOne Clock PeriodA. Clock-to-Q, setup times.Q. How can architects (not technologists) reduce clock period?A. Shorten the machine critical path.Rationale: We measure each instruction’sexecution time in “number of cycles”. By shortening the period for each cycle, we shorten execution time.UC Regents Spring 2005 © UCBCS 152 L7: Performance Completing the performance equationSecondsProgram InstructionsProgram=SecondsCycleWe need all three terms, and only these terms, to compute CPU Time!When is it OK to compare clock rates?What factors make the CPI for a program differfrom the underlying CPIof a CPU implementation?Instruction mix variesCache behavior varies.Branch prediction varies.“CPI” -- The Average Number of Clock Cycles Per Instruction For the Program InstructionCyclesUC Regents Spring 2005 © UCBCS 152 L7: Performance CPI as an analytical tool to guide designMultiplyOther ALULoadStoreBranch22215Machine CPI5 x 30 + 1 x 20 + 2 x 20 + 2 x 10 + 2 x 20100= 2.7 cycles/instruction20%Branch10%Store20%Load20%Other ALU30%MultiplyProgramInstruction Mix15%Branch7%15%Load7%56%MultiplyWhere program spends its timeUC Regents Spring 2005 © UCBCS 152 L7: Performance Amdahl’s Law (of Diminishing Returns)If enhancement “E” speeds up multiply, but other instructions are unchanged, what is the maximum speedup S? 16%Branch8%16%Load8%52%MultiplyWhere programspends its timeSmax =1un-enhanced % / 100%= 2.08148%/100%= Attributed to Gene Amdahl -- “Amdahl’s Law”What is the lesson of Amdahl’s Law? Must enhance computers in a balanced way!UC Regents Spring 2005 © UCBCS 152 L7: PerformanceInvented the “one ISA, many implementations” business model.UC Regents Spring 2005 © UCBCS 152 L7: PerformanceAmdahl’s Law in ActionProgramWeWishTo RunOn N CPUs30%Serial70%ParallelThe program spends 30%of its time running code that can not be recoded to run in parallel.CPUs2345∞SpeedupCompute speedup for N = 2, 3, 4, 5, and ∞.UC Regents Spring 2005 © UCBCS 152 L7: PerformanceA law of diminishing returns ...ProgramWeWishTo RunOn N CPUs30%Serial70%ParallelThe program spends 30%of its time running code that can not be recoded to run in parallel.S =1(30 % + (70% / N) ) / 100 %CPUs2345∞Speedup1.541.852.12.33.3S(∞)2 3 # CPUsUC Regents Spring 2005 © UCBCS 152 L7: Performance Final thoughts: Performance EquationSecondsProgram InstructionsProgram=SecondsCycle InstructionCyclesGoal is to optimize execution time, notindividualequationterms.The CPI of the program.Reflectsthe program’s instruction mix.Machinesareoptimizedwith respect toprogramworkloads.Clockperiod.OptimizejointlywithmachineCPI.UC Regents Spring 2005 © UCBCS 152 L7: PerformanceAdministrivia: Upcoming deadlines ...Thursday 2/17: At 11:59 PM via email:Lab 2 peer evaluations, and Lab 3 preliminary design document due.(More details on Lab 3 on Thursday)Monday 2/14: Lab 2 final report due via the submit program, 11:59 PM.Friday 2/11: “Xilinx Checkoff”, 12-1, 119 Cory. For 61(c)
View Full Document