2/14/11%1%CS%61C:%Great%Ideas%in%Computer%Architecture%(Machine%Structures)%Performance*Instructors:%Randy%H.%Katz%David%A.%PaGerson%hGp://inst.eecs.Berkeley.edu/~cs61c/sp11%1%Spring%2011%NN%Lecture%#10%2/14/11%2/14/11% Spring%2011%NN%Lecture%#10% 2%NewNSchool%Machine%Structures%(It’s%a%bit%more%complicated!)%• Parallel%Requests%Assigned%to%computer%e.g.,%Search%“Katz”%• Parallel%Threads%Assigned%to%core%e.g.,%Lookup,%Ads%• Parallel%Instruc[ons%>1%instruc[on%@%one%[me%e.g.,%5%pipelined%instruc[ons%• Parallel%Data%>1%data%item%@%one%[me%e.g.,%Add%of%4%pairs%of%words%• Hardware%descrip[ons%All%gates%@%one%[me%2/14/11% Spring%2011%NN%Lecture%#10% 3%Smart%Phone%Warehouse%Scale%Computer%So,ware********Hardware*Harness*Parallelism*&*Achieve*High*Performance*Logic%Gates%Core% Core%…%%%%%%Memory%%%%%%%%%%%%%%%(Cache)%Input/Output%Computer%Main%Memory%Core%%%%%%%%%%Instruc[on%Unit(s)%%%%%%%%Func[onal%Unit(s)%A3+B3%A2+B2%A1+B1%A0+B0%How%do%we%know?%Agenda%• Defining%Performance%• Administrivia%• Wo rkloads%an d%Benchmarks%• Technology%Break%• Measuring%Performance%• Summary%2/14/11% Spring%2011%NN%Lecture%#10% 4%Agenda%• Defining%Performance%• Administrivia%• Wo rkloads%an d%Benchmarks%• Technology%Break%• Measuring%Performance%• Summary%2/14/11% Spring%2011%NN%Lecture%#10% 5%What%is%Performance?%• Latency*(or%response*<me*or%execu<on*<me)%– %Time%to%complete%one%task%• Bandwidth*(or%throughput)%– %Tasks%completed%per%unit%[me%2/14/11% Spring%2011%NN%Lecture%#10% 6%2/14/11%2%Running%Systems%to%100%%U[liza[on%• Implica[on%of%the%graph%at%the%right?%• Can%you%explain%why%this%happens?%%2/14/11% Spring%2011%NN%Lecture%#10% 7%U[liza[on%100%%Service%Time%aka%Latency%or%Responsiveness%“Knee”%Student%RouleGe?%The%Iron%Law%of%Queues%(aka%LiGle’s%Law)%2/14/11% Spring%2011%NN%Lecture%#10% 8%L*=*λ*W*Average%number%of%customers%in%system%(L)%=%%average%interarrival%rate%(λ)%x%average%service%[me%(W)%%%Cloud%Performance:%Why%Applica[on%Latency%MaGers%• Key%figure%of%merit:%applica[on%responsiveness%– Longer%the%delay,%the%fewer%the%user%clicks,%the%less%the%user%happiness,%and%the%lower%the%revenue%per%user%2/14/11% Spring%2011%NN%Lecture%#10% 9%Google%Instant%Search%“Instant%Efficiency”%2/14/11% Spring%2011%NN%Lecture%#10% 10%Typical%search%takes%24%seconds,%Google’s%search%algorithm%is%only%300%ms%of%this%“It’s%not%search%‘as%you%type’,%but%‘search%before%you%type’!”%“We%can%predict%what%you%are%likely%to%type%and%give%you%those%results%in%real%[me”%Defining%CPU%Performance%• What%does%it%mean%to%say%%X%is%faster%than%Y?%• Ferrari%vs.%School%Bus?%• 2009%Ferrari%599%GTB%%%– 2%passengers,%11.1%secs%in%quarter%mile%• 2009%Type%D%scho o l %b us%– 54%passengers,%quarter%mile%[me?%%%hGp://www.youtube.com/watch?v=KwyCoQuhUNA%%• Response*Time/Latency:%e.g.,%[me%to%travel%¼%mile%• Throughput/Bandwidth:%e.g.,%passengerNmi%%in%1%hour%2/14/11% Spring%2011%NN%Lecture%#10% 11%Defining%Rela[ve%CPU%Performance%• PerformanceX%=%1/Program%Execu[on%TimeX%• PerformanceX%>%PerformanceY%=>%1/Execu[on%TimeX%>%1/Execu[on%Timey%=>%Execu[on%TimeY%>%Execu[on%TimeX%• Computer%X%is%N%[mes%faster%than%Computer%Y%PerformanceX%/%PerformanceY%=%N%or%Execu[on%TimeY%/%Execu[on%TimeX%=%N%• Bus%is%to%Ferrari%as%12%is%to%11.1:%Ferrari%is%1.08%[mes%faster%than%the%bus!%2/14/11% Spring%2011%NN%Lecture%#10% 12%2/14/11%3%Measuring%CPU%Performance%• Computers%use%a%clock%to%determine%when%events%takes%place%within%hardware%• Clock*cycles:%discrete%[me%intervals%– aka%clocks,%cycles,%clock%periods,%clock%[cks%%• Clock*rate*or%clock*frequency:%clock%cycles%per%second%(inverse%of%clock%cycle%[me)%• 3%GigaHertz%clock%rate%%=>%clock%cycle%[me%=%1/(3x109)%seconds%%%%%%%%clock%cycle%[me%=%333%picoseconds%(ps)%2/14/11% Spring%2011%NN%Lecture%#10% 13%CPU%Performance%Factors%• To%dis[nguish%between%processor%[me%and%I/O,%CPU*<me*is%[me%spent%in%processor%• CPU Time/Program = Clock Cycles/Program x Clock Cycle Time • Or%%CPU Time/Program = Clock Cycles/Program ÷ Clock Rate 2/14/11% Spring%2011%NN%Lecture%#10% 14%CPU%Performance%Factors%• But%a%program%executes%instruc[ons%• CPU Time/Program% = Clock Cycles/Program x Clock Cycle Time%%%= Instructions/Program x Average Clock Cycles/Instruction x Clock Cycle Time • 1st%term%called%Instruc<on*Count*• 2nd%term%abbreviated%CPI*for%average%%Clock*Cycles*Per*Instruc<on**• 3rd%term%is%1%/%Clock%rate%2/14/11% Spring%2011%NN%Lecture%#10% 15%Resta[ng%Performance%Equa[on%• Time%=%%Seconds%% %Program%%%%Instruc[ons % %Clock%cycles % %Seconds%% %%Program % %Instruc[on %%Clock%Cycle%% %%2/13/11% Spring%2011%NN%Lecture%#10% 16%×%×%=%What%Affects%Each%Component?%Instruc[on%Count,%CPI,%Clock%Rate%Hardware'or'so*ware'component?'Affects'What?'Algorithm%Programming%%Language%Compiler%Instruc[on%Set%Architecture%2/13/11% Spring%2011%NN%Lecture%#10% 17%Student%RouleGe?%Peer%Instruc[on%Ques[on%• Computer%A%clock%cycle%[me%250%ps,%CPIA%=%2%• Computer%B%clock%cycle%[me%500%ps,%CPIB%=%1.2%• Assume%A%and%B%have%same%instruc[on%set%• Which%statement%is%true?%Red.% % %Computer%A%is%~1.2%[mes%faster%than%B%Orange.%%Computer%A%is%~4.0%[mes%faster%than%B%Green.% %Computer%B%is%~1.7%[mes%faster%than%A%Yellow.% %Computer%B%is%~3.4%[mes%faster%than%A%Pink.%% %None%of%the%above%2/13/11% Spring%2011%NN%Lecture%#10% 19%2/14/11%4%Agenda%• Defining%Performance%• Administrivia%• Wo rkloads%an d%Benchmarks%• Technology%Break%• Measuring%Performance%• Summary%2/13/11% Spring%2011%NN%Lecture%#10% 21%Administrivia%• Lab%#5%posted%• Project%#2.1%Due%Sunday%@%11:59:59%• HW%#4%Due%Sunday%@%11:59:59%• Midterm%in%less%than%three%weeks:%– No%discussion%during%exam%week%– TA%Review:%Su,%Mar%6,%2N5%PM,%2050%VLSB%– Exam:%Tu,%Mar%8,%6N9%PM,%145/155%Dwinelle%– Small%number%of%special%considera[on%cases,%due%to%class%conflicts,%etc.—contact%Dave%or%Randy%2/14/11% Spring%2011%NN%Lecture%#7% 22%Agenda%• Defining%Performance%• Administrivia%• Wo rkloads%an d%Benchmarks%• Technology%Break%• Measuring%Performance%• Summary%2/13/11% Spring%2011%NN%Lecture%#10% 23%Workload%and%Benchmark%•
View Full Document