Unformatted text preview:

111CMSC 212 – S06 (lect 25)AnnouncementsProgram #6 – is due in a week from ThursdayReading– Notes2CMSC 212 – S06 (lect 25)Improving Code PerformanceThis lecture is not about algorithms, but rather– how to convert algorithms into efficient code– how to refine code to make it run fasterHelpful to know a bit about – what a compiler can and can't do– what takes time on this specific hardwareKey Ideas– Be systematic and data driven in what you do• Easy to get into making random code changes– Programs spend their time in• loops (or recursion)– a sequence of code executed once is fast• doing I/O223CMSC 212 – S06 (lect 25)Know Your CompilerCompilers can be asked to "optimize" your code– -O flag (and -On) enables the optimizer– they are supposed to never break your code• only safe changes to the program are permittedLimits on compiler optimizations– should never alter correct programs• may discover latent bugs though– limited understanding of the program• you are much smarter than the best compiler– need to compile program quickly4CMSC 212 – S06 (lect 25)Understanding Modern ProcessorsPipelined– parts of multiple instructions running at once– decode instruction, load from memorySuperscalar processors– Can execute 2 or more instructions at onceBranch prediction– Computer guess which way a branch will go– Allows pipeline to stay fullCache memory– keep copies of recently accessed memory in fast storage• recently accessed memory is much fasterFloating point takes much longer than integer– typically 20-30x for the same operation335CMSC 212 – S06 (lect 25)Sample Compiler OptimizationOriginal Codeint i, j, k;….for (i=0; i < 200; i++) {a[2*j+i] = j * k + i;}Optimized Codeint i, j, k;….int temp1 temp2;temp1= 2 * j;temp2 = j * k;for (i=0; i < 200; i++) {a[temp1+i] = temp2 + i;}6CMSC 212 – S06 (lect 25)Typical Types of Compiler OptimizationsCode motion– previous exampleLoop Unrolling– often faster if a loop body is a bit bigger• modern computers can do several instructions at once– convert: for (i=0; i < n; i++) c[i] = a[i] + b[i]; – to: for (i=0; i < n; i+=2) { c[i] = a[i] + b[i]; c[i+1] = a[i+1] +b[i+1]; }Dead-Code Elimination– if compiler can infer code is never run, remove it447CMSC 212 – S06 (lect 25)Limits to Compiler OptimizationPointer aliasesvoid func1(int *xp, int *yp) {*xp += *yp;*xp += *yp;}– problem if xp and yp point to same locationFunction Side effects– given:• return f(x) + f(x) + f(x) + f(x);– can't simplify to • return 4 * f(x);Some transformations can only be made by programmer8CMSC 212 – S06 (lect 25)Common Code Changes -Loop Bound ChecksConsider for (i=0; i < strlen(data); i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}strlen is called each time, but doesn't change in loop– results in O(N2) performanceChange toint limit = strlen(data); for (i=0; i < limit; i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}Compiler could not figure this one out559CMSC 212 – S06 (lect 25)Common Code Changes -Use PointersRecallint limit = strlen(data); for (i=0; i < limit; i++) {if (data[i] > 'A' && data[i] < 'Z') data[i] -= ('A' - 'a');}With a function callint limit = strlen(data); for (i=0; i < limit; i++) {makelower(&data[i]);}Can eliminate strlen callfor (ptr = data; *ptr; ++ptr) {if (*ptr > 'A' && *ptr < 'Z') *ptr -= ('A' - 'a');}Can move one subtraction outsideint diff = ‘A’ –’a’;for (ptr = data; *ptr; ++ptr) {if (*ptr > 'A' && *ptr < 'Z') *ptr -= diff;}10CMSC 212 – S06 (lect 25)Common Code Transformation -Don't write small functionsFunctions calls take time– Recall last week's code to setup calls stacks1-2 line functions – take more time for the call than work– provide relatively little isolation of ideasSolution:– Define abstractions so work is larger than a few lines6611CMSC 212 – S06 (lect 25)Approach To Applying These TechniquesFor a big program– Hard to know where to start to optimizeStart with gprof data– concentrate in parts of the program that take the most timeAhmdals Law– if tuning a function that takes α share of the time– and make it k times faster– Tnew= ( 1 - α) Told+ (αTold)/k– Basically are limited by how big α


View Full Document

UMD CMSC 212 - Lecture Slides

Download Lecture Slides
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Slides and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Slides 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?