Unformatted text preview:

CS352 Lecture: Distributed Database Systems last revised 11/26/08Materials: Projectable showing horizontal and vertical fragmentationI. What is a Distributed Database System?- ---- -- - ----------- -------- ------ A. In a distributed database system, the database is stored on several computers located at multiple physical sites. 1. This distinguishes it from a parallel system, in which the database is stored on multiple computers at the same physical site. 2. This distinguishes it from a client-server system, in which the database is only stored at one site, though it may be accessed from multiple sites. (Of course, the server in a client-server system may itself be a distributed database!) B. There are two major variants of distributed systems 1. A distributed system can be HOMOGENOUS. All sites run the same ! brand of DBMS software (and generally the same general kind of ! hardware and operating system.) 2. A distributed system can be HETEROGENOUS. Different sites run ! different DBMS software, perhaps on different kinds of hardware ! running different operating systems as well. The databases at the ! different sites may have different schemes, and the individual sites ! may not even be aware of each other. 3. Naturally, coordination of activities between sites is much easier in the homogenous case - and most actual implementations are of this type.II. What are the Advantages and Disadvantages of a Distributed System?-- ---- --- --- ---------- --- ------------- -- - ----------- ------ A. We will basically consider the advantages and disadvantages of a remotely distributed system as over against a large, centralized system. This accords with the trend in many companies to move away from large central computer centers toward networks of PC's and minicomputers. 1. One of the major advantages of a distributed system is sharing of data generated at the different sites, without requiring that all the data be moved to a single central site. 2. Another advantage is the possibility of LOCAL CONTROL and AUTONOMY. Within boundaries established by the need for sharing, each site can control its own data, determine what is stored how etc. 3. A third advantage is reliability and availability. a. In a centralized system, the failure of the main system shuts down all processing activity until the problem is fixed. b. In a distributed system, the failure of a site may reduce performance and/or make some data unavailable. But processing of most kinds of requests can continue. We say that the system has improved availability.4. A fourth advantage is the possibility of improved response times to queries. This can come about in two ways: a. As over-against a centralized system, a distributed system that stores data at the site(s) that use it the most allows those sites to access the data more quickly than they would if they had to get the data from a central site via a communication link. b. A multi-processor system can speed response to queries by the use of parallel processing. Two or more nodes can work on different parts of the same query at the same time, thus reducing the delay between the issuing of the query and response to the user. 5. A fifth advantage is the possibility of upgrading system capacity or performance incrementally. a. If more capacity or speed is needed on a centralized system, usually the only option is to replace it wholesale with a new, larger or faster system. b. However, a distributed system can be upgraded by either adding one or more new nodes, or by upgrading one or more nodes. The rest of the system continues to function as it is. C. Disadvantages 1. One major disadvantage of a distributed system is the cost and time required for communication between sites. a. This is not necessarily a disadvantage, if the alternatives are a centralized system where ALL queries require communication vs a distributed system where SOME queries can be processed locally at the requesting site. b. But operations requiring access to data at multiple sites will almost always involve more communication between sites than would be required if all the data involved were at one location. c. The performance impact of communication depends a great deal on what kind of communication links are used. In particular, note that the performance of a network is determined by the slowest link. If the Internet is used, this is often the "last mile" connection between a DBMS site and the ISP. d. The time cost of any given message is given by: Access delay + (message length) / (data rate) i. Access delay is the overhead time needed to set up for a message between sites. This varies greatly from system to system, but will tend to be a constant that is independent of message length. ii. For short messages, access delay may be the dominant cost. iii. For longer messages, (message length) / (data rate) may be the dominant cost.e. Depending on the configuration, communication cost may dominate disk access cost, in which case a distributed systems might need to be optimized to minimize the number and volume of messages, rather than disk accesses. 2. A second disadvantage is increased complexity. As we shall see, choosing a query processing strategy, performing updates, dealing with crashes, and concurrency control are all much more complex in a distributed system. 3. A third disadvantage related to the second is that distributed systems are much harder to debug. In fact, the algorithms used must be bug-proof; discovering the cause of a problem that arises only under certain circumstances of operation timing is not possible using conventional debugging techniques.III. Fragmentation and Replication of Data--- ------------- --- ----------- -- ---- A. At the


View Full Document

Gordon CPS 352 - distributed

Download distributed
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view distributed and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view distributed 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?