DOC PREVIEW
Berkeley COMPSCI C267 - Tools for Performance Debugging HPC Applications

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Tools for Performance DebuggingOne Slide about NERSCSlide 4Performance is more than a single numberPerformance is RelativeSpecific Facets of PerformancePerformance is HierarchicalSlide 9Tools are HierarchicalHPC Perf Tool MechanismsTypical Tool Use RequirementsPerformance Tools @ NERSCWhat can HPC tools tell us?Using the right toolIntroduction to CrayPatUsing CrayPat @ HopperGuidelines for OptimizationPerf Debug and Production ToolsIPM: Let’s SeeIPM : IPM_PROFILE=fullAdvice: Develop (some) portable approaches to performanceSlide 23Scaling: definitionsConducting a scaling studySlide 26The scalability landscapeNot always so trickyLoad Imbalance : Pitfall 101Load Balance : cartoonToo much communicationSimple Stuff: What’s wrong here?Not so simple: Comm. topologyPerformance in Batch Queue SpaceA few notes on queue optimizationMarshalling your own workflowThanks!Tools for Performance Debugging HPC ApplicationsDavid [email protected] for Performance Debugging•Practice–Where to find tools –Specifics to NERSC and Hopper•Principles–Topics in performance scalability–Examples of areas where tools can help•Scope & Audience–Budding simulation scientist app dev–Compiler/middleware dev, YMMV2One Slide about NERSC•Serving all of DOE Office of Science–domain breadth–range of scales •Lots of users–~4K active–~500 logged in–~300 projects •Science driven–sustained performance •Architecture aware–procurements driven by workload needsBig Picture of Performance and Scalability45Performance is more than a single number•Plan where to put effort•Optimization in one area can de-optimize another•Timings come from timers and also from your calendar, time spent coding•Sometimes a slower algorithm is simpler to verify correctness•To your goals–Time to solution, Tq+Twall …–Your research agenda–Efficient use of allocation •To the –application code–input deck–machine type/statePerformance is RelativeSuggestion: Focus on specific use casesas opposed to making everything perform well. Bottlenecks can shift.7•Serial–Leverage ILP on the processor–Feed the pipelines–Reuse data in cache–Exploit data locality•Parallel–Expose task level concurrency –Minimizing latency effects–Maximizing work vs. communicationSpecific Facets of Performance8Performance is Hierarchicalinstructions & operandslinespagesmessagesblocks, files…on to specifics about HPC toolsMostly at NERSC but fairly general910Tools are HierarchicalPAPIvalgrindCraypatIPMTauSARPMPI11•Sampling–Regularly interrupt the program and record where it is–Build up a statistical profile•Tracing / Instrumenting–Insert hooks into program to record and time events•Use Hardware Event Counters–Special registers count events on processor–E.g. floating point instructions–Many possible events–Only a few (~4 counters)HPC Perf Tool MechanismsTypical Tool Use Requirements•(Sometimes) Modify your code with macros, API calls, timers•Compile your code•Transform your binary for profiling/tracing with a tool•Run the transformed binary–A data file is produced•Interpret the results with a tool12Performance Tools @ NERSC•Vendor Tools:–CrayPat•Community Tools :–TAU (U. Oregon via ACTS)–PAPI (Performance Application Programming Interface)–gprof•IPM: Integrated Performance Monitoring13What can HPC tools tell us?•CPU and memory usage–FLOP rate–Memory high water mark•OpenMP–OMP overhead–OMP scalability (finding right # threads) •MPI–% wall time in communication–Detecting load imbalance–Analyzing message sizes14Tools can add overhead to code execution•What level can you tolerate?Tools can add overhead to scientists •What level can you tolerate?Scenarios:•Debugging a code that is “slow”•Detailed performance debugging•Performance monitoring in production15Using the right toolIntroduction to CrayPat•Suite of tools to provide a wide range of performance-related information•Can be used for both sampling and tracing user codes–with or without hardware or network performance counters–Built on PAPI•Supports Fortran, C, C++, UPC, MPI, Coarray Fortran, OpenMP, Pthreads, SHMEM•Man pages–intro_craypat(1), intro_app2(1), intro_papi(1)16Using CrayPat @ Hopper1. Access the tools–module load perftools2. Build your application; keep .o files–make clean–make3. Instrument application–pat_build ... a.out–Result is a new file, a.out+pat4. Run instrumented application to get top time consuming routines–aprun ... a.out+pat–Result is a new file XXXXX.xf (or a directory containing .xf files)5. Run pat_report on that new file; view results–pat_report XXXXX.xf > my_profile–vi my_profile–Result is also a new file: XXXXX.ap217Guidelines for Optimization18* Suggested by CrayDerived metric Optimization needed when*PAT_RT_HWPCComputational intensity < 0.5 ops/ref 0, 1L1 cache hit ratio < 90% 0, 1, 2L1 cache utilization (misses) < 1 avg hit 0, 1, 2L1+L2 cache hit ratio < 92% 2L1+L2 cache utilization (misses)< 1 avg hit 2TLB utilization < 0.9 avg use 1(FP Multiply / FP Ops) or(FP Add / FP Ops)< 25% 5Vectorization < 1.5 for dp; 3 for sp 12 (13, 14)Perf Debug and Production Tools•Integrated Performance Monitoring•MPI profiling, hardware counter metrics, POSIX IO profiling•IPM requires no code modification & no instrumented binary–Only a “module load ipm” before running your program on systems that support dynamic libraries–Else link with the IPM library•IPM uses hooks already in the MPI library to intercept your MPI calls and wrap them with timers and counters19IPM: Let’s See1) Do “module load ipm”, link with $IPM, then run normally2) Upon completion you get Maybe that’s enough. If so you’re done. Have a nice day ##IPM2v0.xx#################################################### command : ./fish -n 10000 # start : Tue Feb 08 11:05:21 2011 host : nid06027 # stop : Tue Feb 08 11:08:19 2011 wallclock : 177.71# mpi_tasks : 25 on 2 nodes %comm : 1.62# mem [GB] : 0.24 gflop/sec : 5.06…IPM : IPM_PROFILE=full21# host : s05601/006035314C00_AIX mpi_tasks : 32 on 2 nodes# start : 11/30/04/14:35:34 wallclock : 29.975184 sec# stop : 11/30/04/14:36:00 %comm : 27.72# gbytes : 6.65863e-01 total gflop/sec : 2.33478e+00 total# [total] <avg>


View Full Document

Berkeley COMPSCI C267 - Tools for Performance Debugging HPC Applications

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Tools for Performance Debugging HPC Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tools for Performance Debugging HPC Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tools for Performance Debugging HPC Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?