Unformatted text preview:

November 14, 2007 ISA's, Compilers, and Assembly 1CS232 roadmapIn the first 3 quarters of the class, we have covered1. Understanding the relationship between HLL and assembly code2. Processor design, pipelining, and performance3. Memory systems, caches, virtual memory, I/O, and ECCThe next major topic is: performance tuning How can I, as a programmer, make my programs run fast? The first step is figuring out where/why the program is slow?— Program profiling How does one go about optimizing a program?— Use better algorithms (do this first!)— Exploit the processor better (3 ways)1. Write hand-tuned assembly versions of hot spots2. Getting more done with every instruction3. Using more than one processorNovember 14, 2007 ISA's, Compilers, and Assembly 2Performance Optimization Until you are an expert, first write a working version of the program Then, and only then, begin tuning, first collecting data, and iterate— Otherwise, you will likely optimize what doesn’t matter“We should forget about small efficiencies, say about 97% of the time:premature optimization is the root of all evil.” -- Sir Tony HoareNovember 14, 2007 ISA's, Compilers, and Assembly 3Building a benchmark You need something to gauge your progress.— Should be representative of how the program will be usedNovember 14, 2007 ISA's, Compilers, and Assembly 4Instrumenting your program We can do this by hand. Consider: test.c --> test2.c— Let’s us know where the program is spending its time.— But implementing it is tedious; consider instrumenting 130k lines ofcodeNovember 14, 2007 ISA's, Compilers, and Assembly 5Using tools to do instrumentation Two GNU tools integrated into the GCC C compiler Gprof: The GNU profiler— Compile with the -pg flag• This flag causes gcc to keep track of which pieces of source codecorrespond to which chunks of object code and links in a profilingsignal handler.— Run as normal; program requests the operating system to periodicallysend it signals; the signal handler records what instruction wasexecuting when the signal was received in a file called gmon.out— Display results using gprof command• Shows how much time is being spent in each function.• Shows the calling context (the path of function calls) to the hotspot.November 14, 2007 ISA's, Compilers, and Assembly 6Example gprof outputEach sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 81.89 4.16 4.16 37913758 0.00 0.00 cache_access 16.14 4.98 0.82 1 0.82 5.08 sim_main 1.38 5.05 0.07 6254582 0.00 0.00 update_way_list 0.59 5.08 0.03 1428644 0.00 0.00 dl1_access_fn 0.00 5.08 0.00 711226 0.00 0.00 dl2_access_fn 0.00 5.08 0.00 256830 0.00 0.00 yylexOver 80% of time spent in one functionindex % time self children called name 0.82 4.26 1/1 main [2][1] 100.0 0.82 4.26 1 sim_main [1] 4.18 0.07 36418454/36484188 cache_access <cycle 1> [4] 0.00 0.01 10/10 sys_syscall [9] 0.00 0.00 2935/2967 mem_translate [16] 0.00 0.00 2794/2824 mem_newpage [18]Provides calling context (main calls sim_main calls cache_access) of hot spotNovember 14, 2007 ISA's, Compilers, and Assembly 7Using tools for instrumentation (cont.) Gprof didn’t give us information on where in the function we werespending time. (cache_access is a big function; still needle inhaystack) Gcov: the GNU coverage tool— Compile/link with the -fprofile-arcs -ftest-coverage options• Adds code during compilation to add counters to every controlflow edge (much like our by hand instrumentation) to computehow frequently each block of code gets executed.— Run as normal— For each xyz.c file an xyz.gdna and xyz.gcno file are generated— Post-process with gcov xyz.c• Computes execution frequency of each line of code• Marks with ##### any lines not executedUseful for making sure that you tested your whole programNovember 14, 2007 ISA's, Compilers, and Assembly 8Example gcov output 14282656: 540: if (cp->hsize) { #####: 541: int hindex = CACHE_HASH(cp, tag); -: 542: #####: 543: for (blk=cp->sets[set].hash[hindex]; -: 544: blk; -: 545: blk=blk->hash_next) -: 546: { #####: 547: if (blk->tag == tag && (blk->status & CACHE_BLK_VALID)) #####: 548: goto cache_hit; -: 549: } -: 550: } else { -: 551: /* linear search the way list */753030193: 552: for (blk=cp->sets[set].way_head; -: 553: blk; -: 554: blk=blk->way_next) {751950759: 555: if (blk->tag == tag && (blk->status & CACHE_BLK_VALID))738747537: 556: goto cache_hit; -: 557: } -: 558: }Loop executed over 50 interations on average (751950759/14282656)Code never executedNovember 14, 2007 ISA's, Compilers, and Assembly 9Conclusion The second step to making a fast program is finding out why it is slow— The first step is making a working program— Your intuition where it is slow is probably wrong• So don’t guess, collect data! Many tools already exist for automatically instrumenting your code— Identify the “hot spots” in your code where time is being spent— Two example tools:• Gprof: periodically interrupts program• Gcov: inserts counters into code— We’ll see Vtune in section, which explains why the code is slow If you’ve never tuned your program, there is probably “low hanging fruit”— Most of the time is spent in one or two functions— Try using better algorithms to speed these up (CS473)November 14, 2007 ISA's, Compilers, and Assembly 10Key Idea: Iterative Refinement1. Build simplest possible implementation2. Does it meet criteria? If so, stop.Else, what can be improved?3. Generate ideas on how to improve it4. Select best ideas, based on benefit/cost5. Modify implementation based on best ideas6. Goto step 2.It is very tempting to go straight to an “optimized” solution. Pitfalls:1. You never get anything working2. Incomplete problem knowledge leads to selection of wrong optimizationsWith


View Full Document

U of I CS 232 - Lecture notes

Documents in this Course
Goal

Goal

2 pages

Exam 1

Exam 1

5 pages

Exam 1

Exam 1

6 pages

Exam 2

Exam 2

6 pages

Exam 1

Exam 1

5 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?