111CMSC 212 – S06 (lect 25)AnnouncementsProgram #6 – is due in a week from ThursdayReading– Notes2CMSC 212 – S06 (lect 25)Improving Code PerformanceThis lecture is not about algorithms, but rather– how to convert algorithms into efficient code– how to refine code to make it run fasterHelpful to know a bit about – what a compiler can and can't do– what takes time on this specific hardwareKey Ideas– Be systematic and data driven in what you do• Easy to get into making random code changes– Programs spend their time in• loops (or recursion)– a sequence of code executed once is fast• doing I/O223CMSC 212 – S06 (lect 25)Know Your CompilerCompilers can be asked to "optimize" your code– -O flag (and -On) enables the optimizer– they are supposed to never break your code• only safe changes to the program are permittedLimits on compiler optimizations– should never alter correct programs• may discover latent bugs though– limited understanding of the program• you are much smarter than the best compiler– need to compile program quickly4CMSC 212 – S06 (lect 25)Understanding Modern ProcessorsPipelined– parts of multiple instructions running at once– decode instruction, load from memorySuperscalar processors– Can execute 2 or more instructions at onceBranch prediction– Computer guess which way a branch will go– Allows pipeline to stay fullCache memory– keep copies of recently accessed memory in fast storage• recently accessed memory is much fasterFloating point takes much longer than integer– typically 20-30x for the same operation335CMSC 212 – S06 (lect 25)Sample Compiler OptimizationOriginal Codeint i, j, k;….for (i=0; i < 200; i++) {a[2*j+i] = j * k + i;}Optimized Codeint i, j, k;….int temp1 temp2;temp1= 2 * j;temp2 = j * k;for (i=0; i < 200; i++) {a[temp1+i] = temp2 + i;}6CMSC 212 – S06 (lect 25)Typical Types of Compiler OptimizationsCode motion– previous exampleLoop Unrolling– often faster if a loop body is a bit bigger• modern computers can do several instructions at once– convert: for (i=0; i < n; i++) c[i] = a[i] + b[i]; – to: for (i=0; i < n; i+=2) { c[i] = a[i] + b[i]; c[i+1] = a[i+1] +b[i+1]; }Dead-Code Elimination– if compiler can infer code is never run, remove it447CMSC 212 – S06 (lect 25)Limits to Compiler OptimizationPointer aliasesvoid func1(int *xp, int *yp) {*xp += *yp;*xp += *yp;}– problem if xp and yp point to same locationFunction Side effects– given:• return f(x) + f(x) + f(x) + f(x);– can't simplify to • return 4 * f(x);Some transformations can only be made by programmer8CMSC 212 – S06 (lect 25)Common Code Changes -Loop Bound ChecksConsider for (i=0; i < strlen(data); i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}strlen is called each time, but doesn't change in loop– results in O(N2) performanceChange toint limit = strlen(data); for (i=0; i < limit; i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}Compiler could not figure this one out559CMSC 212 – S06 (lect 25)Common Code Changes -Use PointersRecallint limit = strlen(data); for (i=0; i < limit; i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}With a function callint limit = strlen(data); for (i=0; i < limit; i++) {makelower(&data[i]);}Can eliminate strlen callfor (ptr = data; *ptr; ++ptr) {if (*ptr > 'A' && *ptr < 'Z') *ptr -= ('A' - 'a');}Can move one subtraction outsideint diff = ‘A’ –’a’;for (ptr = data; *ptr; ++ptr) {if (*ptr > 'A' && *ptr < 'Z') *ptr -= diff;}10CMSC 212 – S06 (lect 25)Common Code Transformation -Don't write small functionsFunctions calls take time– Recall last week's code to setup calls stacks1-2 line functions – take more time for the call than work– provide relatively little isolation of ideasSolution:– Define abstractions so work is larger than a few lines6611CMSC 212 – S06 (lect 25)Approach To Applying These TechniquesFor a big program– Hard to know where to start to optimizeStart with gprof data– concentrate in parts of the program that take the most timeAhmdals Law– if tuning a function that takes α share of the time– and make it k times faster– Tnew= ( 1 - α) Told+ (αTold)/k– Basically are limited by how big α
View Full Document