Unformatted text preview:

4 22 09 Heike Jagode jagode eecs utk edu With slides from Karl Fuerlinger Andreas Knuepfer Shirley Moore Sameer Shende Felix Wolf and others 4 22 09 2 4 22 09 3 As long as we don t know what Performance is it will produce confusion and frustration Simply getting a job done Producing results that you aimed at Nothing else matters Basically solving a problem themed hack now fix later Think carefully about the prevalent idea that only delivering results counts as acceptable performance If you don t reach the objectives maybe you have not performed well enough The idea that the design of software consists of is NOT simply implementing it Instead it consists of designing its functional aspects first then implementing and finally trying to improve the low quality product of this procedure 4 22 09 4 Domenico Ferrari 1986 The study of performance evaluation as an independent subject has sometimes caused researchers in the area to lose contact with reality Why is it that performance evaluation is by no means an integrated and natural part of software development o The primary duty of software developers is to create functionally correct programs o Performance evaluation tends to be optional Some people compare it to the freestyle event in ice skating o Raj Jain 1991 Contrary to common belief performance evaluation is an art Like artist each analyst has a unique style Given the sample problem two analysts may choose different performance metrics and evaluation methodologies but even they need tools 4 22 09 5 Performance Analysis is important Large investments in HPC systems o Procurement costs 40 Mio o Operational costs 5 Mio per year o Electricity costs 1 MW year 1 Mio Efficient usage is important because of expensive and limited resources Scalability is important to achieve next bigger simulation Performance analysis Get highest performance for a given cost Performance Analyst Anyone who is associated with computer systems i e system engineers computer scientists and of course users 4 22 09 6 4 22 09 Performance Optimization cycle Measure Analyze Have an optimization phase just like testing debugging phase Do profiling and tracing Use tools avoid do it yourself with printf solutions seriously 7 Code Development Instrumentation functionally complete and correct program Measure Analyze Modify Tune complete correct and well performing program Usage Production 4 22 09 8 Profiling Records aggregated information of performance metrics Number of times a routine was invoked Exclusive inclusive time counts spent executing it Number of instrumented child routines invoked etc Structure of invocations call trees call graphs Memory message communication sizes Tracing When and where events took place along a global timeline Time stamped events points of interest Message communication events sends receives are tracked Shows when and from to where messages were sent Event Trace collection of all events of a process program sorted by time 4 22 09 9 Recording of summary information during execution o inclusive exclusive time calls hardware counter statistics Reflects performance behavior of program entities o functions loops basic blocks o user defined semantic entities Very good for low cost performance assessment Helps to expose performance bottlenecks and hotspots Implemented through either o sampling periodic OS interrupts or hardware counter traps o measurement direct insertion of measurement code 4 22 09 10 int main takes 100 secs Inclusive time for main 100 secs Exclusive time for main 100 20 50 20 10 secs Exclusive time sometimes called self f1 takes 20 secs other work f2 takes 50 secs f1 takes 20 secs other work similar for other metrics such as hardware performance counters etc 4 22 09 11 Recording of information about significant points events during program execution o entering exiting code region function loop block o thread process interactions e g send receive message Save information in event record o Timestamp o CPU identifier thread identifier o Event type and event specific information Event Trace Collection of all events of a process program sorted by time Can be used to reconstruct dynamic program behavior o Profiles can be calculated from traces Tracing Disadvantages o traces can become very large o instrumentation and tracing is complicated o event buffering clock synchronization 4 22 09 12 enter leave of function routine region o time stamp process thread function ID send receive of P2P message MPI o time stamp sender receiver length tag communicator collective communication MPI o time stamp process root communicator bytes hardware performance counter values o time stamp process counter ID value etc 4 22 09 10010 10090 10110 10110 10330 10400 10520 10550 13 P P P P P P P P 1 1 1 1 1 1 1 1 ENTER 5 ENTER 6 ENTER 12 SEND TO 3 LEN 1024 LEAVE 12 LEAVE 6 ENTER 9 LEAVE 9 DEF TIMERRES 1000000000 DEF PROCESS 1 Master DEF PROCESS 2 Slave DEF FUNCTION 5 main DEF FUNCTION 6 foo DEF FUNCTION 9 bar DEF FUNCTION 12 MPI Send 10020 10095 10120 10300 10350 10450 10620 10650 4 22 09 P P P P P P P P 2 2 2 2 2 2 2 2 ENTER 5 DEF FUNCTION 13 MPI Recv ENTER 6 ENTER 13 RECV FROM 3 LEN 1024 LEAVE 13 LEAVE 6 ENTER 9 LEAVE 9 14 Event definition CPU A void master trace ENTER 1 trace SEND B send B tag buf trace EXIT 1 1 master 2 slave 3 timestamp MONITOR CPU B void slave trace ENTER 2 recv A tag buf trace RECV A trace EXIT 2 58 A ENTER 1 60 B ENTER 2 62 A SEND B 64 A EXIT 1 68 B RECV A 69 B EXIT 2 4 22 09 15 main master slave 1 master 2 slave 3 58 A ENTER 1 60 B ENTER 2 62 A SEND B 64 A EXIT 1 68 B RECV A 69 B EXIT 2 4 22 09 A B 58 60 62 64 66 68 70 16 Draw conclusions from measured performance data Manual analysis o Visualization o Interactive exploration o Statistical analysis o Modeling o Examples TAU Vampir Suite Paraver Intel Trace Collector Analyzer Automated analysis o Try to cope with huge amounts of performance by automation o Examples Paradyn KOJAK Scalasca 4 22 09 17 Reason for Automation Size of systems several tens of thousand of processors o ORNL s Jaguar Cray XT5 150 152 cores 2009 o LLNL Sequoia IBM based on future Blue Gene arch 1 6 million cores 2011 2012 o Trend to multi core o Large amounts of performance data when tracing o o Not all programmers are performance experts o o Several gigabytes or even terabytes Overwhelms user Scientists want to focus on their domain Need to keep up with new machines Automation can solve some of these issues 4 22 09 18 This is a situation that can be


View Full Document

UTK CS 594 - Performance Analysis Tools

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Performance Analysis Tools and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Analysis Tools and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?