Debugging parallel programs Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging In this method you are allowed to specify locations in your program breakpoints where the program execution will suddenly stop giving you the opportunity to examine the program s state You can then either let the program execute one or more instructions at a time or allow it to continue until another breakpoint and examine the state again Breakpoint debugging works very well for serial programs that do not interact with any other dynamic entities other programs or real world devices However programs in the parallel and real time domains may have their behavior and results altered if interrupted by a debugger Events may go undetected message queues may overflow and moving parts may fail to stop in time causing real world damage to machines or people One solution is to instrument the code but the most frequently used way to do this is to insert print statements by hand which has numerous disadvantages and limited power A tool to instrument a program at runtime would need many of the capabilities of a debugger and indeed a typical debugger has most of the capabilities both to perform the instrumenting and to help analyze the resulting trace data A debugger could easily plant tracing instrumentation in the executing program and just as easily could display the values of program data and arbitrary expressions collected together with the associated source code and it could do it all interactively The Cygnus approach uses the popular GNU debugger GDB both to set up and to analyze trace experiments In a trace experiment the user specifies program locations to trace and what data to collect at each one using the full power of the source language s symbolic expressions A simplified non symbolic description of the trace experiment is downloaded to a separate trace collection program Then the program is run while the specially written trace collection program collects the data Finally GDB is used again to review the traced events stepping from one tracepoint execution to the next and displaying the recorded data values just as if debugging the program in real time or GDB s scripting language is used to produce a report of the collected data formatted to the user s specification From http www redhat com support wpapers cygnus cygnus heinsenberg TotalView Most of the time MPI programs are debugged using print statements The most popular breakpoint debugger is TotalView What is TotalView TotalView is a sophisticated software debugger product of Etnus LLC Used for debugging analyzing and tuning program performance Especially designed for use with complex multiprocess and or multi threaded applications Has been selected as the Department of Energy s Advanced Simulation and Computing ASC program s debugger Key Features of TotalView Provides source and assembler level debugging for serial parallel multi process and multi threaded codes Portable able to be used in a variety of UNIX environments including those with distributed clustered uniprocessor and SMP machines Supports most popular parallel programming models libraries such as MPI OpenMP Threads PVM SHMEM and hybrid Provides all debugging facilities through easy to learn and use Xwindows based Graphical User Interface Also provides a command line interpreter for non GUI debugging Can be used to debug a specified program an unattached running process or a core file On a per process thread basis permits you to view Provides for the insertion and execution of code fragments within the current process context Provides several types of action points as well as the ability to set delete suppress unsurpress and save them Source code assembler code or both Source for called functions The execution stack trace procedure calling stack Stack variables and registers Program data variables arrays MPI message queues process breakpoint on a source line basis multi process barrier blocking breakpoint for parallel processes conditional breakpoint where breakpoint occurs only if a code fragment expression is satisfied evaluation points where code fragments are evaluated Allows you to easily modify program data addresses arrays array slices variables while debugging Provides special features for memory related debugging Provides graphical visualization of array data during debugging session Includes an extensive web browser based online help system Detecting races with trace analysis The objective of trace analysis techniques is to identify races in parallel programs The strategy consists in conceptually executing the program generating a trace of all memory accesses and synchronization operations Building a graph of orderings solid arrows below and conflicting memory references dashed lines below Detecting races when two nodes connected by dashed lines are not ordered by solid arrows Example Intel Thread Checker Doacross synchronization Replay Races are possible in MPI programs For debugging we want to keep a history of events so that every time we run the program during debugging we get the same behavior See Optimal tracing and replay for debugging message pass in parallel programs R H B Netzer B P Miller Proceedings of the 1992 ACM IEEE conference on Supercomputing Minneapolis Minnesota United States Pages 502 511 Year of Publication 1992 ISBN 0 81862630 5
View Full Document
Unlocking...