DOC PREVIEW
Stanford CS 347 - Distributed Databases

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Expected Background Basic SQL Relational algebra Following aspects of centralized DB Distributed Databases Query processing query plans cost estimation optimization Concurrency control techniques Recovery methods CS347 Lecture 13 May 23 2001 1 2 Centralized DBMS Reading Material Primarily lecture notes No required textbook Some lecture material drawn from M Tamer Ozsu and Patrick Valduriez Principles of Distributed Database Systems Second Edition Prentice Hall 1999 P Software M Application SQL Front End Query Processor Transaction Proc File Access Simplifications single front end one place to keep locks if processor fails system fails 3 4 Distributed DB Heterogeneity Multiple processors memories and disks Opportunity for parallelism Opportunity for enhanced reliability Synchronization issues Application Heterogeneity and autonomy of components Autonomy example may not get statistics for query optimization from a site 5 Stock ticker tape RDBMS Portfolio Select new investments Files History of dividends ratios 6 1 Big Picture Today s topics Introductory topics Data management with multiple processors and possible autonomy heterogeneity Impacts Database architectures Distributed versus Parallel DB systems Distributed database design Data organization Query processing Access structures Concurrency control Recovery Fragmentation Allocation 7 8 Common DB architectures Common DB architectures Shared nothing P P P P M M P M P P P M M M P M Shared memory Shared disk Number of other hybrid architectures are possible 9 Selecting the right architecture 10 Parallel vs Distributed DB system Typically parallel DBs Reliability Scalability Geographic distribution of data Performance Cost Fast interconnect Homogeneous software Goals High performance and Transparency Typically distributed DBs Geographically distributed Disconnected operation possible Goal Data sharing heterogeneity autonomy 11 12 2 Distributed DB Design Typical query processing scenarios Parallel DB Distribute partition sort data to make certain DB operations e g Join fast Distributed DB Given data distribution find query processing strategy to minimize cost e g communication cost Top down approach have a database how to split and allocate to individual sites Multi databases or bottom up combine existing databases how to deal with heterogeneity autonomy 13 14 Two issues in top down design Example Fragmentation Allocation Employee relation E name loc sal 40 of queries 40 of queries Qa select Qb select from E from E where loc Sa where loc Sb and and Note issues not independent but studied separately for simplicity Motivation Two sites Sa Sb Qa Sa Sb Qb 15 16 Name Loc Sal E Joe Sally Tom Sa Sb Sa Fragmentation 10 25 15 Horizontal R F F1 F2 Sa 10 Sa 15 7 Sally Sb 25 Joe Tom F1 loc Sa E Derived depends on foreign relation Vertical At Sb At Sa 5 8 Primary depends on local attributes 5 7 8 R F2 loc Sb E primary horizontal fragmentation 17 18 3 Round robin Horizontal partitioning techniques Round robin Hash partitioning Range partitioning R t1 t2 t3 t4 t1 F0 F1 t2 t4 F2 t3 t5 Evenly distributes data Good for scanning full relation Not good for point or range queries 19 20 Range partitioning Hash partitioning R t1 h k1 2 t2 h k2 0 t3 h k3 0 t4 h k4 1 F0 F1 R t1 A 5 t2 A 8 t3 A 2 t4 A 3 F2 t1 t2 t3 t4 F0 partitioning vector 4 7 V0 V1 F1 t1 t3 t4 21 Which are good fragmentations F F1 F2 F1 sal 10 E Prefer to deal with replication explicitly F5 sal 5 E F6 5 sal 10 E F7 sal 10 E F2 sal 5 E Tuples with 5 sal 10 are duplicated 22 Example F F5 F6 F7 F2 sal 20 E Problem Some tuples lost Example 2 F F3 F4 F1 sal 10 E t2 Good for some range queries on A Need to select good vector else create imbalance data skew execution skew Good for point queries on key also for joins Not good for range queries point queries not on key Good hash function even distribution Example 1 F2 Then replicate F6 if desired as part of allocation 23 24 4 Horizontal Fragmentation Desiderata Generating horizontal fragments Given simple predicates Pr p1 p2 pm and relation R Generate minterm predicates R F F1 F2 1 Completeness t R Fi F such that t Fi 2 Disjointness Fi Fj i j such that i j M m m pk 1 k m where pk is either pk or pk Eliminate useless minterms and simplify M to get M 3 Reconstruction i such that R Fi Generate fragments m R for each m M 25 5 A 10 Example 26 More on Horizontal Fragmentation Example say queries use predicates Elimination of useless fragments predicates depends on application semantics A 10 A 5 Loc SA Loc SB e g if Loc SA and SB is possible must retain fragments such as 5 A 10 Loc SA Loc SB Eliminate and simplify minterms A 10 A 5 Loc SA Loc SB A 10 A 5 Loc SA Loc SB Final set of fragments 5 A 10 Loc SA 5 A 10 Loc SB A 5 Loc SA A 5 Loc SB A 10 Loc SA A 10 Loc SB Work out details for all minterms Minterm based fragmentation generates complete disjoint and reconstructible Justify this fragments statement 27 28 Choosing simple predicates Qb Select loc SB Loc SA sal 10 Three choices for Pr and hence F Pr F1 Pr F1 F Pr E Pr Loc SA Loc SB F2 F Pr loc SA E loc SB E Pr Loc SA Loc SB Sal 10 Loc SB sal 10 F2 F3 Prefer F2 to F1 and F3 Loc SB sal 10 F3 F Pr loc SA sal 10 E loc SB sal 10 E loc SA sal 10 E loc SA sal 10 E Qa Select loc SA Loc SA sal 10 E name loc sal with common queries Qa select from E where loc SA and Qb select from E where loc SB and 29 30 5 Desiderata for simple predicates Derived horizontal fragmentation Different from completeness of fragmentation Completeness Set of predicates Pr is complete if for every Fi F Pr every t Fi has equal probability of access by every major application Minimality Set of predicates Pr is minimal if no Pr Pr is complete Example Two relations Employee and Jobs E NAME SAL LOC J DES Fragment E into E1 E2 by LOC Common query Given employee name list projects s he works in To get complete and minimal Pr use predicates that are 31 relevant in frequent queries E1 5 8 NM Joe Tom Loc Sa Sa Sal 10 15 E2 7 12 NM Sally Fred at Sa J Loc Sb Sb Sal 25 15 32 E1 5 8 Loc Sa Sa Sal 10 15 E2 7 12 NM Sally Fred at Sa at Sb 5 7 5 12 NM Joe Tom Description work on 347 hw go to moon build table rest J1 5 5 J2 E1 7 12 Des go to moon rest J2 J E2 33 Derived horizontal fragmentation R fragmented as F F1 F2 Fn 34 Completeness of derived fragmentation 33 Example Say J is S derive D D1 D2 Dn where Di S …


View Full Document

Stanford CS 347 - Distributed Databases

Download Distributed Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Databases and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?