Unformatted text preview:

Distributed DatabasesSlide 2Advantages of a DDBSlide 4Slide 5Slide 6Disadvantages of a DDBData FragmentationSlide 9Slide 10Data ReplicationSlide 12SynchronizationUS Air Force EmailSlide 15Slide 16Slide 17Query Processing in DDBSlide 19Query Processing using SemijoinConcurrency Control and RecoveryDistinguished CopyDistributed RecoverySummaryPrimary Site TechniqueDistributed DatabasesJohn OrtizLecture 24 Distributed Databases 2Distributed DatabasesDistributed Database (DDB) is a collection of interrelated databases interconnected by a computer networkDistributed Database Management System (DDBMS) is software which manages a distributed databaseWorld Wide Web technology does not yet constitute a DDB by our definitionLecture 24 Distributed Databases 3Advantages of a DDBSupports various levels of transparencyDistribution (network) transparencyDegree to which user is unaware of the networked nature of the DBReplication transparencyDegree to which user is unaware of copies of the DBFragmentation transparencyDegree to which user is unaware the DB is broken into piecesLecture 24 Distributed Databases 4Advantages of a DDBIncreased Reliability and AvailabilityReliability – probability a system is running at a particular point in timeAvailability – probability a system is continuously available during a time intervalLecture 24 Distributed Databases 5Advantages of a DDBImproved PerformanceSupports data localization – data is kept near where it is most often used to reduce affects of network delayEasier ExpansionAdding more data, increasing DB size, adding resources is easierReduced Operation Costs (when considering a mainframe system) cheaper to add workstations than a new mainframe computerLecture 24 Distributed Databases 6Advantages of a DDBNo Single Point of FailureWhen one computer fails, others can take its placeLecture 24 Distributed Databases 7Disadvantages of a DDBSignificant increase in complexityNormalization, query optimization, security, transaction processing, concurrency control, crash recovery, etc. ALL become much more difficult to handleIncreased storage requirementsSince multiple copies of various portions of the DB exist, more storage space is requiredLecture 24 Distributed Databases 8Data FragmentationFragmentation is the division of the database into pieces stored at different sitesHorizontal Fragmentation – a subset of tuples in a particular relationthe result of a query which SELECTS some tuples, but not others produces a horizontal “fragment”In a DDB, the output from the previous query may be stored as a separate DB at a separate siteRequires a UNION to recombine informationLecture 24 Distributed Databases 9Data FragmentationVertical Fragmentation – a subset of attributes of a particular relationThe result of a query which PROJECTS certain, specific attributesRequires an outer join (or an outer union) to recombine informationHybrid Fragmentation – can you guess? Includes both horizontal and vertical fragmentationComplete fragmentation simply means all tuples/attributes are in the resultA fragmentation schemaLecture 24 Distributed Databases 10Data FragmentationA fragmentation schema is a definition of the set of fragments that includes all attributes and tuples sufficient to reconstruct the DBAn allocation schema describes which fragments are at what sitesLecture 24 Distributed Databases 11Data ReplicationReplication is the creation of copies of the DBA DDB may be fully replicated (a copy of the entire DB is made at each site)Why would you want to make a full copy of a DDB?A DDB may have no replication (each fragment is stored at one and only one site)Naturally, a DDB may be partially replicatedA replication schema is a description of what pieces are copied at which sitesLecture 24 Distributed Databases 12Data ReplicationReplication creates new consistency and redundancy problemsEvery piece of data that is replicated is redundant, and therefore subject to be inconsistentThese copies may be updated separately which causes inconsistencyHow much inconsistency acceptable?Lecture 24 Distributed Databases 13SynchronizationSynchronization is the process of of updating the individual replicasSince pieces are stored in different places, the DDB must periodically be made consistentSynchronization can be expensive in terms of network resources and timeIt is not simply copying one replica to another – most recent updates on both copies being synchronized must be accounted forP.775 - 778 in the text has an example of a DDBLecture 24 Distributed Databases 14US Air Force EmailWe have noted in the past that there are many types of databases such as spreadsheets, address books, and even documents (such as MS Word)Consider the AF with approximately 500,000 people who all have email addresses and need to communicateThey have constructed a global email address book and make use of replicationThe AF is divided into levels: global, command, baseLecture 24 Distributed Databases 15US Air Force EmailInitially the bases were each set up with email and interconnected via the networkHowever, you had to know the email address of anyone at a different baseEventually, each command (a group of related bases) set up an address book consisting of all the basesEach base maintains a complete replica of the entire commands address bookWhy not just a piece?Lecture 24 Distributed Databases 16US Air Force EmailThe DB is synchronized each nightSo, when someone moves, their email address is removed from the local copyAll the other bases will still have that “old” email address until the next day, at which point the DDB is consistent againI believe that now the entire AF address book is available at each baseNot sure how often it is synchronized, perhaps weeklySearch for an email address is quickLecture 24 Distributed Databases 17US Air Force EmailSearch for an email address is quick since a local copy is keptThis reduces network traffic considerably compared with everyone having to search a centralized DB for email addressesLecture 24 Distributed Databases 18Query Processing in DDBWhen we looked at query processing before, the largest delay was with the diskNow, that same concept is extended to include network delay – which can be much longerSuppose the EMPLOYEE


View Full Document

UTSA CS 3743 - Distributed Databases

Download Distributed Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Databases 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?