Unformatted text preview:

6.826—Principles of Computer Systems 2004 Handout 21. Distributed Systems 1 21. Distributed Systems The rest of the course is about distributed computing systems. In the next four lectures we will characterize distributed systems and study how to specify and code communication among the components of a distributed system. Later lectures consider higher-level system issues: distributed transactions, replication, security, management, and caching. The lectures on communication are organized bottom-up. Here is the plan: 1. Overview. 2. Links. Broadcast networks. 3. Switching networks. 4. Reliable messages. 5. Remote procedure call and network objects. Overview An underlying theme in computer systems as a whole, and especially in distributed systems, is the tradeoff between performance and complexity. Consider the problem of carrying railroad traffic across a mountain range.1 The minimal system involves a single track through the mountains. This solves the problem, and no smaller system can do so. Furthermore, trains can travel from East to West at the full bandwidth of the track. But there is one major drawback: if it takes 10 hours for a train to traverse the single track, then it takes 10 hours to switch from E–W traffic to W–E traffic, and during this 10 hours the track is idle. The scheme for switching can be quite simple: the last E–W train tells the W–E train that it can go. There is a costly failure mode: the East end forgets that it sent a ‘last’ E–W train and sends another one; the result is either a collision or a lot of backing up. The simplest way to solve both problems is to put in a second track. Now traffic can flow at full bandwidth in both directions, and the two-track system is even simpler than the single-track system, since we can dedicate one track to each direction and don’t have to keep track of which way traffic is running. However, the second track is quite expensive. If it has to be retrofitted, it may be as expensive as the first one. A much cheaper solution is to add sidings: short sections of double track, at which trains can pass each other. But now the signaling system must be much more complex to ensure that traffic between sidings flows in only one direction at a time, and that no siding fills up with trains. 1 This example is due to Mike Schroeder. 6.826—Principles of Computer Systems 2004 Handout 21. Distributed Systems 2 What makes a system distributed? One man’s constant is another man’s variable. Alan Perlis A distributed system is a system where I can’t get my work done because a computer has failed that I’ve never even heard of. Leslie Lamport There is no universally accepted definition of a distributed system. It’s like pornography: you recognize one when you see it. And like everything in computing, it’s in the eye of the beholder. In the current primitive state of the art, Lamport’s definition has a lot of truth. Nonetheless, there are some telltale signs that help us to recognize a distributed system: It has concurrency, usually because there are multiple general-purpose computing elements. Distributed systems are closely related to multiprocessors. Communication costs are an important part of the total cost of solving a problem on the system, and hence you try to minimize them. This is not the same as saying that the cost of communication is an important part of the system cost. In fact, it is more nearly the opposite: a system in which communication is good enough that the programmer doesn’t have to worry about it (perhaps because the system builder spent a lot of money on communication) is less like a distributed system. Distributed systems are closely related to telephone systems; indeed, the telephone system is by far the largest example of a distributed system, though its functionality is much simpler than that of most systems in which computers play a more prominent role. It tolerates partial failures. If some parts break, the rest of the system keeps doing useful work. We usually don’t think of a system as distributed if every failure causes the entire system to go down. It is scaleable: you can add more components to increase capacity without making any qualitative changes in the system or its clients. Scalability is especially important for many web servers, since a successful one like eBay or Google sees millions or billions of requests per day. It is heterogeneous. This means that you can add components that implement the system’s internal interfaces in different ways: different telephone switches, different computers sending and receiving E-mail, different NFS clients and servers, or whatever. It also means that components may be autonomous, that is, owned by different organizations and managed according to different policies. It doesn’t mean that you can add arbitrary components with arbitrary interfaces, because then what you have is chaos, not a system. Hence the useful reminder: “There’s no such thing as a heterogeneous system.”6.826—Principles of Computer Systems 2004 Handout 21. Distributed Systems 3 Layers Any idea in computing is made better by being made recursive. Brian Randell There are three rules for writing a novel. Unfortunately, no one knows what they are. Somerset Maugham You can look at a computer system at many different scales. At each scale you see the same basic components: computing, storage, and communications. The bigger system is made up of smaller ones. Figure 1 illustrates this idea over about 10 orders of magnitude (we have seen it before, in the handout on performance). But Figure 1 is misleading, because it doesn’t suggest that different levels of the system may have quite different interfaces. When this happens, we call the level a layer. Here is an example of different interfaces that transport bits or messages from a sender to a receiver. Each layer is motivated by different functionality or performance than the one below it. This stack is ten layers deep. Note that in most cases the motivation for separate layers is either compatibility or the fact that a layer has other clients or other code. Internet LAN Multiprocessor Processor chip 64-bit register 500 MB RAM 100 ms 1 ms 75 ns 1 ns 64 1K 500 (uniprocessors) 5M 1 75 1M 100M 1 / 500 MB 500 / 250 GB 2500 M / 1 XB How fast? How many? Slowdown Total Fig. 1. Scales of interconnection. Relative speed and size are in italics. 6.826—Principles of Computer Systems


View Full Document

MIT 6 826 - Distributed System

Documents in this Course
Consensus

Consensus

10 pages

Load more
Download Distributed System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?