Guardians and Actions Linguistic Support for Robust Distributed Programs BARBARA LISKOV and ROBERT SCHEIFLER Massachusetts Institute of Technology An overview is presented of an integrated programming language and system designed to support the construction and maintenance of distributed programs programs in which modules reside and execute at communicating but geographically distinct nodes The language is intended to support a class of applications concerned with the manipulation and preservation of long lived on line distributed data The language addresses the writing of robust programs that survive hardware failures without loss of distributed information and that provide highly concurrent access to that information while preserving its consistency Several new linguistic constructs are provided among them are atomic actions and modules called guardians that survive node failures Categories and Subject Descriptors C 2 4 Computer Communication Networks Distributed Systems distributed applications distributed databases D 1 3 Programming Techniques Concurrent Programming D 3 3 Programming Languages Language Constructs abstract data types concurrent programming structures modules packages D 4 5 Operating Systems Reliability checkpoint restart fault tolerance H 2 4 Database Management Systems distributed systems transaction processing General Terms Languages Reliability Additional Key Words and Phrases Atomicity nested atomic actions remote procedure call 1 INTRODUCTION T e c h n o l o g i c a l a d v a n c e s h a v e m a d e it c o s t e f f e c t i v e t o c o n s t r u c t l a r g e s y s t e m s from collections of c o m p u t e r s connected via networks T o support such systems t h e r e is a g r o w i n g n e e d f o r e f f e c t i v e w a y s t o o r g a n i z e a n d m a i n t a i n d i s t r i b u t e d p r o g r a m s p r o g r a m s in w h i c h m o d u l e s r e s i d e a n d e x e c u t e a t c o m m u n i c a t i n g b u t geographically distinct locations In this p a p e r we p r e s e n t an o v e r v i e w of an integrated programming language and system called ARGUS that was designed for t h i s p u r p o s e A preliminary version of this paper appeared in the Conference Record of the Ninth Annual Symposium on Principles of Programming Languages January 1982 18 This research was supported in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Office of Naval Research under contract N00014 75 C 0661 and in part by the National Science Foundation under grant MCS 79 23769 Authors address Laboratory for Computer Science Massachusetts Institute of Technology 545 Technology Square Cambridge MA 02139 Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage the ACM copyright notice and the title of the publication and its date appear and notice is given that copying is by permission of the Association for Computing Machinery To copy otherwise or to republish requires a fee and or specific permission 1983 ACM 0164 0925 83 0700 0381 00 75 ACMTransactions on ProgrammingLanguages and Systems Vol 5 No 3 July 1983 Pages 381 404 382 B Liskov and R Scheifler Distributed programs run on nodes connected only via a communications network A node consists of one or more processors one or more levels of memory and any number of external devices Different nodes may contain different kinds of processors and devices T he network may be long haul or short haul or any combination connected by gateways Neither the network nor any nodes need be reliable However we do assume that all failures can be detected as explained in 15 We also assume that message delay is long relative to the time needed to access local memory and therefore that access to nonlocal data is significantly more expensive than access to local data Th e applications that can make effective use of a distributed organization differ in their requirements We have concentrated on a class of applications concerned with the manipulation and preservation of long lived on line data Examples of such applications are banking systems airline reservation systems office automation systems database systems and various components of operating systems In these systems real time constraints are not severe but reliable available distributed data is of primary importance T he systems may serve a geographically distributed organization Our language is intended to support the implementation of such systems Th e application domain together with our hardware assumptions imposes a number of requirements Service A major concern is to provide continuous service of the system as a whole in the face of node and network failures Failures should be localized so that a program can perform its task as long as the particular nodes it needs to communicate with are functioning and reachable Adherence to this principle permits an application program to use replication of data and processing to increase availability Reconfiguration An important reason for wanting a distributed implementation is to make it easy to add and reconfigure hardware to increase processing power decrease response time or increase the availability of data It also must be possible to implement logical systems that can be reconfigured T o maintain continuous service it must be possible to make both logical and physical changes dynamically while the system continues to operate Autonomy We assume that nodes are owned by individuals or organizations that want to control how the node is used For example the owner may want to control what runs at the node or to control the availability of services provided at the node Further a node might contain data that must remain resident at that node for example a multinational organization must abide by laws governing information flow among countries T he important point here is that the need for distribution arises not only from efficiency considerations but from political and sociological considerations as well Distribution T he distribution of data and processing can have a major impact on overall efficiency in terms of both responsiveness and cost effective use of hardware Distribution also affects availability To create efficient available systems while retaining autonomy the programmer needs explicit control over the placement of modules in the system However to support a reasonable degree of modularity
View Full Document
Unlocking...