Performance Evaluation I November 5, 1998Performance expressed as a timePerformance expressed as a ratePerformance expressed as a rate(cont)Timing mechanismsTiming mechanisms (cont)Slide 7Measurement pitfallsThe nature of timeAnatomy of a timerMeasurement pitfall #1: Discretization errorExamples of discretization errorExamples of discretization error (cont)Estimating the timer period dtModeling discretization errorRelative error analysisSlide 17Measurement pitfall #2: Unexpected cache effectsMeasurement summaryProfilingProfiling toolsCase study: DEC Continuous Profiling Infrastructure (DCPI)DCPI case studyDCPI architectureSlide 25Slide 26Slide 27Performance Evaluation INovember 5, 1998Topics•Performance measures (metrics)•Timing•Profiling15-213class22.pptCS 213 F’98– 2 –class22.pptPerformance expressed as a timeAbsolute time measures (metrics)•difference between start and finish of an operation•synonyms: running time, elapsed time, response time, latency, completion time, execution time•most straightforward performance measureRelative (normalized) time measures•running time normalized to some reference time •(e.g. time/reference time)Guiding principle: Choose performance measures that track running time.CS 213 F’98– 3 –class22.pptPerformance expressed as a rateRates are performance measures expressed in units of work per unit time.Examples:•millions of instructions / sec (MIPS)•millions of fp instructions / sec (Mflops/sec) Mflop = 10^6 flops•millions of bytes / sec (MBytes/sec) MByte = 2^20 bytes•millions of bits / sec (Mbits/sec) Mbit = 10^6 bits•images / sec•samples / sec•transactions / sec (TPS)CS 213 F’98– 4 –class22.pptPerformance expressed as a rate(cont)Key idea: Report rates that track execution time.Example: Suppose we are measuring a program that convolves a stream of images from a video camera. Bad performance measure: MFLOPS•number of floating point operations depends on the particular convolution algorithm: n^2 matix-vector product vs nlogn fast Fourier transform. An FFT with a bad MFLOPS rate may run faster than a matrix-vector product with a good MFLOPS rate.Good performance measure: images/sec•a program that runs faster will convolve more images per second.CS 213 F’98– 5 –class22.pptTiming mechanismsClocks•returns elapsed time since epoch (e.g., Jan 1, 1970)•Unix getclock() command•coarse grained (e.g., us resolution on Alpha)long int secs, ns;struct timespec *start, *stop;getclock(TIMEOFDAY, start);P();getclock(TIMEOFDAY, stop);secs = (stop->tv_sec - start->tv_sec);ns = (stop->tv_nsec - start->tv_nsec);printf(“%ld ns\n”, secs*1e9 + ns);CS 213 F’98– 6 –class22.pptTiming mechanisms (cont)Interval (count-down) timers•set timer to some initial value•timer counts down to zero, then sends Unix signal•course grained (e.g., us resolution on Alphas)void init_etime() { first.it_value.tv_sec = 86400; setitimer(ITIMER_VIRTUAL, &first, NULL);}init_etime();secs = get_etime();P();secs = get_etime() - secs;printf(“%lf secs\n”, secs); double get_etime() { struct itimerval curr; getitimer(ITIMER_VIRTUAL,&curr); return(double)( (first.it_value.tv_sec - curr.it_value.tv_sec) + (first.it_value.tv_usec - curr.it_value.tv_usec)*1e-6);Using theinterval timerCS 213 F’98– 7 –class22.pptTiming mechanisms (cont)Performance counters•counts system events (CYCLES, IMISS, DMISS, BRANCHMP)•very fine grained•short time span (e.g., 9 seconds on 450 MHz Alpha)unsigned int counterRoutine[] = { /* Alpha cycle counter */ 0x601fc000u, 0x401f0000u, 0x6bfa8001u};unsigned int (*counter)(void) = (void *)counterRoutine; cycles = counter();P();cycles = counter() - cycles;printf(“%d cycles\n”, cycles);Using the Alphacycle counterCS 213 F’98– 8 –class22.pptMeasurement pitfallsDiscretization errors•need to measure large enough chunks of work•but how large is large enough?Unexpected cache effects•artificial hits or misses•cold start misses due to context swappingCS 213 F’98– 9 –class22.pptThe nature of timereal (wall clock) time= user time (time executing instructing instructions in the user process)+ = real (wall clock) timeWe will use the word “time” to refer to user time.= system time (time executing instructing instructions in kernel on behalf of user process+CS 213 F’98– 10 –class22.pptAnatomy of a timertimer period: dt secs/ticktimer resolution: 1/dt ticks/sectime dtclock interrupt (tick)T1T2TnTstartTfinishprogram execution timeinterval 2Assume here that Tk = Tk-1 + dt TkCS 213 F’98– 11 –class22.pptMeasurement pitfall #1:Discretization errortime dtT1T2TnTstartTfinishactual program execution timemeasured time: (Tn - T1)actual time: (Tn - T1) + (Tfinish - Tn) - (Tstart - T1)absolute error = measured time - actual timefstart = (Tstart - T1)/dt fraction of interval overreported ffinish = (Tfinish - Tn)/dt fraction of interval underreportedabsolute error = dt fstart - dt ffinish = dt (fstart - ffinish) max absolute error = +/- dtCS 213 F’98– 12 –class22.pptExamples of discretization errortimeactual running timeActual time = near zeromeasured time = dtAbsolute measurement error = +dtCS 213 F’98– 13 –class22.pptExamples of discretization error (cont)timeactual running timeActual time = near 2dtmeasured time = dtAbsolute measurement error = -dtCS 213 F’98– 14 –class22.pptEstimating the timer period dtstart = 0;while (start == (end = get_etime()))) ;dt = end - start;printf(“dt = %lf\n”, dt);Digital Unix Alpha systems: dt = 1msCS 213 F’98– 15 –class22.pptModeling discretization errorKey idea: need to measure long enough to hide the discretization error.Example:start = get_etime();for (i=0; i<n; i++) { P();}tprime = get_etime() - start;Question: how big must tprime be in order to get a “good” estimate of the running time of the loop?CS 213 F’98– 16 –class22.pptRelative error analysisLet t and t’ be the actual and measured running times of the loop, respectively, and let dt be the timer period.Also, let t’-t be the absolute error and let |t’-t|/t be the relative error.Problem: What value of t’ will result in a relative error less than or equal to Emax?Fact (1): |t’-t| <= dtFact (2): t’ - dt <= t We want |t’-t|/t <= Emaxdt/t <= Emax (1)dt/ Emax <= t (algebra)dt/ Emax <= t’ - dt (2)dt/ Emax + dt <= t’CS
View Full Document