DOC PREVIEW
UMD CMSC 714 - Performance Debugging Shared Memory Multiprocessor Programs with MTOOL

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Performance Debugging Shared Memory Multiprocessor Programs with MTOOLA. Goldberg, J. HennessyPresented by Sam AngiuoliWhat is MTOOL? Performance profiler Shared memory bottlenecks, synchronization overhead, parallelization overhead At least 2 profiled executions required Supported platforms MIPS based architectures (+ others?) SGI 380 (8x33 MHz processors and 256M shared mem)  C + ANL macros Fortran with loop level parallelismOverview of paper Instrumentation Timers Basic block counters Efforts to minimize instrumentation overhead Description of memory/synchronization bottlenecks 2 case studiesTimers start_timer/stop_timer added to begin/end of procedures Bloat is minimized by scanning initial execution profile to exclude fast/frequently executed regions Minimum of 5x the overhead of start/stop timer Alternative to timers is pc-samplingBasic block A sequence of one or more consecutive, executable statements containing no branchesi=0; i<10F(i) != 0x=1/F(i)x=0i=i++;i<10returnTTFFFfor(i=0;i<10;i++){if(f(i) != 0)x=1/f(i);elsex=0;}Minimum Cost Basic Block Counting Minimize overhead while collecting block counts during program execution Only place counters on independent control paths  Derive dependent counts during post processing Eg: Don’t count both blocks of if/then/else Use loop counters to avoid counting each iteration2Basic block counting Capture block counts during initial execution Counting cost 379 Eliminate edges on maximal path {(a,b),(b,d),(e,b),(a,f)} Counting cost 125 Examine loop variables {(a,b),(e,f)} Counting cost 4Memory bottlenecks Identify bottlenecks by comparing actual execution time to an estimated execution time that assumes optimal memory access Use initial profile run to select target regions  Contain large amount of global memory access Low timer overhead Reasonable number of lines of codeEstimating optimal memory Estimated compute time for basic block * basic block count RISC architecture allows for estimation of compute time except in Data dependent stalls  Memory accesses Stalls between instructionsSynchronization bottlenecks Overhead is any time spent idle/spin-waiting Low perturbation timers used Bottlenecks examined Load imbalance Waiting at barrier Critical sections Lock contention Starvation Sequential executions in master process User defined locks are ignored but can be specified in a config file Case study 1 Significant memory bottleneck Suspect subroutine contains pointer swap that is replaced with a copy to take advantage of cache Æ50% decrease in memory overheadCase study 2 Shared vector (Ready) used to synchronize processes exchanging computed values Non-linear speedup indicates a bottleneck3 MTOOL displays code block responsible for the bottleneck UI allows for reclassification of user spin-wait as synchronization overhead Code indicates that numerous global memory references may be saturating the shared bus and causing the bottleneckSummary MTOOL profiling can identify memory and synchronization bottlenecks on a shared memory architecture with as few as 2 program executions MTOOL timer and basic block count instrumentations minimize overhead and program


View Full Document

UMD CMSC 714 - Performance Debugging Shared Memory Multiprocessor Programs with MTOOL

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Performance Debugging Shared Memory Multiprocessor Programs with MTOOL
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Debugging Shared Memory Multiprocessor Programs with MTOOL and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Debugging Shared Memory Multiprocessor Programs with MTOOL 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?