Performance EvaluationMay 2, 2000Topics• Getting accurate measurements• Amdahl’s Law15-213class29.pptCS 213 S’00– 2 –class29.ppt“Time” on a Computer Systemreal (wall clock) time= user time (time executing instructinginstructions in the user process)+ = real (wall clock) timeWe will use the word “time” to refer to user time.= system time (time executing instructinginstructions in kernel on behalf of user process)+= some other user’s time (time executing instructinginstructions in different user’s process)CS 213 S’00– 3 –class29.pptTime of Day Clock• return elapsed time since some reference time (e.g., Jan 1, 1970)• example: Unix gettimeofday() command• coarse grained (e.g., ~3µsec resolution on Linux, 10 msec resolution on Windows NT)–Lots of overhead making call to OS–Different underlying implementations give different resolutions#include <sys/time.h>#include <unistd.h>struct timeval tstart, tfinish;double tsecs;gettimeofday(&tstart, NULL);P();gettimeofday(&tfinish, NULL);tsecs = (tfinish.tv_sec - tstart.tv_sec) + 1e6 * (tfinish.tv_usec - tstart.tv_usec);CS 213 S’00– 4 –class29.pptInterval (Count-Down) Timers• set timer to some initial value• timer counts down toward zero• coarse grained (e.g., 10 msec resolution on Linux)void init_etime() {first.it_value.tv_sec = 86400;setitimer(ITIMER_VIRTUAL, &first, NULL);}init_etime();secs = get_etime();P();secs = get_etime() - secs;printf(“%lf secs\n”, secs); double get_etime() {struct itimerval curr;getitimer(ITIMER_VIRTUAL,&curr);return(double)((first.it_value.tv_sec -curr.it_value.tv_sec) +(first.it_value.tv_usec -curr.it_value.tv_usec)*1e-6);Using theinterval timerCS 213 S’00– 5 –class29.pptCycle Counters• Most modern systems have built in registers that are incrementedevery clock cycle–Very fine grained–Maintained as part of process state»Save & restore with context switches»Counter will reflect time spent by user process• Special assembly code instruction to access• On (recent model) Intel machines:–64 bit counter.–RDTSC instruction sets %edx to high order 32-bits, %eax to low order 32-bitsWrap Around Times for 550 MHz machine• Low order 32-bits wrap around every 232/ (550 * 106) = 7.8 seconds• High order 64-bits wrap around every 264/ (550 * 106) = 33539534679 seconds–1065.3 yearsCS 213 S’00– 6 –class29.pptUsing the Cycle Counter• Example–Function that returns number of cycles elapsed since previous call to function–Express as double to avoid overflow problems/* Keep track of most recent reading of cycle counter */static unsigned cyc_hi = 0;static unsigned cyc_lo = 0;static double delta_cycles(){unsigned ncyc_hi, ncyc_lo;double result;/* Get cycle counter as ncyc_hi and ncyc_lo */. . ./* Do double precision subtraction */. . .cyc_hi = ncyc_hi; cyc_lo = ncyc_lo;return result;}CS 213 S’00– 7 –class29.pptAccessing the Cycle Counter (cont.)• GCC allows inline assembly code with mechanism for matching registers with program variables• Code only works on x86 machine compiling with GCC• Emit assembly with rdtsc and two movl instructions• Code generates two outputs:–Symbolic register %0 should be used for ncyc_hi–Symbolic register %1 should be used for ncyc_lo• Code has no inputs• Registers %eax and %edx will be overwrittenunsigned ncyc_hi, ncyc_lo;/* Get cycle counter */asm("rdtsc\nmovl %%edx,%0\nmovl %%eax,%1" : "=r" (ncyc_hi), "=r" (ncyc_lo): /* No input */ : "%edx", "%eax");CS 213 S’00– 8 –class29.pptAccessing the Cycle Counter (cont.)Emitted Assembly Codedelta_cycles:pushl %ebp # Stack stuffmovl %esp,%ebppushl %esipushl %ebx#APPrdtsc # Result of ASM Statementmovl %edx,%esi # Uses %esi for ncyc_himovl %eax,%ecx # Uses %ecx for ncyc_lo#NO_APPmovl %ecx,%ebx # ncyc_losubl cyc_lo,%ebxcmpl %ecx,%ebxseta %alxorl %edx,%edxmovb %al,%dlmovl %esi,%eax # ncyc_hiCS 213 S’00– 9 –class29.pptUsing the Cycle Counter (cont.)/* Keep track of most recent reading of cycle counter */static unsigned cyc_hi = 0;static unsigned cyc_lo = 0;static double delta_cycles(){unsigned ncyc_hi, ncyc_lo;unsigned hi, lo, borrow;double result;. . ./* Do double precision subtraction */lo = ncyc_lo - cyc_lo;borrow = lo > ncyc_lo;hi = ncyc_hi - cyc_hi - borrow;result = (double) hi * (1 << 30) * 4 + lo;. . .}CS 213 S’00– 10 –class29.pptTiming with Cycle Counterdouble tsecs;delta_cycles();P();tsecs = delta_cycles() / (MHZ * 1e6);CS 213 S’00– 11 –class29.pptMeasurement PitfallsOverhead• Calling delta_cycles() incurs small amount of overhead• Want to measure long enough code sequence to compensateUnexpected Cache Effects• artificial hits or misses• e.g., these measurements were taken with the Alpha cycle counter:foo1(array1, array2, array3); /* 68,829 cycles */foo2(array1, array2, array3); /* 23,337 cycles */vs.foo2(array1, array2, array3); /* 70,513 cycles */foo1(array1, array2, array3); /* 23,203 cycles */CS 213 S’00– 12 –class29.pptDealing with Overhead & Cache Effects• Keep doubling number of times execute P() until reach some threshold–Used CMIN = 50000int cnt = 1;double cmeas = 0;double cycles;do {int c = cnt;P(); /* Warm up cache */(void) delta_cycles();while (c-- > 0)P();cmeas = delta_cycles();cycles = cmeas / cnt;cnt += cnt;} while (cmeas < CMIN); /* Make sure have enough */return cycles / (1e6 * MHZ);CS 213 S’00– 13 –class29.pptContext SwitchingContext switches can also affect cache performance• e.g., (foo1, foo2) cycles on an unloaded timing server:»71,002, 23,617»67,968, 23,384»68,840, 23,365»68,571, 23,492»69,911, 23,692Why Do Context Switches Matter?• Cycle counter only accumulates when running user process• Some amount of overhead• Caches polluted by OS and other user’s code & data–Cold misses as restart processMeasurement Strategy• Try to measure uninterrupted code executionCS 213 S’00– 14 –class29.pptDetecting Context SwitchesClock Interrupts• Processor clock causes interrupt every ∆t seconds–Typically ∆t = 10 ms–Same as interval timer resolution• Can detect by seeing if interval timer has advanced during measurementtime∆tstart = get_etime();/* Perform Measurement */. . .if (get_etime() - start > 0)/* Discard measurement */Measurement takes place withoutinterval timer advancingCS 213 S’00– 15 –class29.pptDetecting Context Switches (Cont.)External Interrupts• E.g., due to completion of disk operation• Occur at unpredictable times but generally
View Full Document