DOC PREVIEW
UW-Madison CS 739 - Partial Orders for Parallel Debugging

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Partial Orders for Parallel Debugging C. J. Fidge Department of Computer Science Australian National University Canberra, ACT, Australia ABSTRACT Parallel programs differ from sequential programs primarily in that the temporal rela- tionsbips between events are ouly partially defined. However, for a given distributed computation, debugging utilities typically linearize the observed set of events into a total ordering, thus losing information and allowing potentially capturable temporal errors to escape detection. We explore use of the partially ordered relation “happened before” to augment both centralized and distributed parallel debuggers to ensure that such errors are always detected and that the results produced by the debugger are unaffected by the non-determinism inherent in the partial ordering. This greatly reduces the number of tests required during debugging. Assertions are based on time intervals, rather than treating events as dimensionless points. 1. INTRODUCTION This paper presents techniques for reliably detecting temporal errors in a distributed program, where a “temporal error” is defined to be a violation of the intended partial ordering of events. This is proposed as a supplement to existing debugging utilities as a way of increasing their consistency in the face of non-determinism, and thus reduce the number of tests required for a programmer to be satisfied that a program is correct. As a compromise between efficieucy and usefulness the approach is developed methodically from first principles by starting with the simplest possible temporal relationship, “happened before”, for atomic, dimensionless eve&s, and extended to finite time intervals. The presentation is motivated by two styles of assertion, one event and one state-based. Two different implementations are considered. Firstly, a centralized system is discussed, based on existing debugging systems. A more attractive distributed approach is then presented which attempts to minimize the impact of the debugging code on the program under test. It is assumed that message-passing is the only form of inter-process communication. Both syn- chronous and asynchronous communication are discussed, although this work was originally motivated by consideration of synchronous systems. Note that we are only concerned with relative event orderings; no consideration is made of the issues associated with real-time or “performance” debugging. 2. MOTIVATION Faced with a black-box parallel program and a set of assertions specifying the intended behaviour of the program, a natural approach to debugging is to insert an observer (or monitor) process which receives reports from the program under test and checks that the assertions hold [2]. The observer may or may not delay the program while checkiug the assertions. This philosophy also underlies those systems which display significant events on a graphics termiua.1 (where the onus is on the user to ensure that the program is seen to behave correctly) or those in which a linear truce is generated for post-mortem analysis [7]. This work was motivated by the observation that any such system implicitly totally orders the events of interest. This has a number of associat.ed problems. Firstly it is impossible for an observer process to distinguish between the case where the temporal ordering between two events is, and is not, enforced by a causal relationship: 183Indeed it is even possible for an observer in an asynchronous system to perceive an incorrect total ordering due to “overtaking”, even if FIFO queueiug is guaranteed for each inter-process channel: event a i assume b precedes a (Admittedly this can not occur in the synchronous case, or if an asynchronous observer blocks the observed processes by the use of “ack” signals.) Additionally, the debugger itself may be non-deterministically affected by the (global time) inter- leaving of events. For exactly the same computation, with the same inputs, following the same control paths, and generating the same results, the debugger may produce different results due to differences in interleaving, and delays in reporting events. For example, assuming synchronous communication: b event a event a This forces the user to execute the same test again and again to ensure that the debugger has not missed a temporal error simply due to a fortuitous interleaving of events. Experience in this area has shown that it is necessary to run a particular test several times with different process priorities in the hopes of forcing an error to manifest itself [7]. It is important to note that this situation exists even with the use of reproducibility tools that do not totally order event traces (section 3.1). Also, the perceived ordering differs for each process in the system. This problem is inherent in any attempt to add avzriliary communications to a parallel system to distribute information about past events. As a degenerate example, two processes simply informing each other of events will inevitably infer different temporal orderings in the absence of any additional information: assume a precedes b assume b precedes a eventa f 1 event” Finally, the observer itself may alter the behaviour of the system being studied [9]. This so-called probe eflect [6] is particularly noticeable when t,he observer blocks processes. The stand taken here is that the interleaving of concurrent events should not affect the results produced by a parallel debugging system. Any pronouncement on the correctness, or otherwise, of a particular test must be valid for any possible event ordering defined by that computation. Additionally, the perceived event ordering must be consistent


View Full Document

UW-Madison CS 739 - Partial Orders for Parallel Debugging

Documents in this Course
Load more
Download Partial Orders for Parallel Debugging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Partial Orders for Parallel Debugging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Partial Orders for Parallel Debugging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?