DOC PREVIEW
Stanford CS 347 - Distributed Databases

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Distributed Databases CS347 Lecture 13 May 23 2001 1 Expected Background Basic SQL Relational algebra Following aspects of centralized DB Query processing query plans cost estimation optimization Concurrency control techniques Recovery methods 2 Reading Material Primarily lecture notes No required textbook Some lecture material drawn from M Tamer Ozsu and Patrick Valduriez Principles of Distributed Database Systems Second Edition Prentice Hall 1999 3 Centralized DBMS P Software M Application SQL Front End Query Processor Transaction Proc File Access Simplifications single front end one place to keep locks if processor fails system fails 4 Distributed DB Multiple processors memories and disks Opportunity for parallelism Opportunity for enhanced reliability Synchronization issues Heterogeneity and autonomy of components Autonomy example may not get statistics for query optimization from a site 5 Heterogeneity Application Stock ticker tape RDBMS Portfolio Select new investments Files History of dividends ratios 6 Big Picture Data management with multiple processors and possible autonomy heterogeneity Impacts Data organization Query processing Access structures Concurrency control Recovery 7 Today s topics Introductory topics Database architectures Distributed versus Parallel DB systems Distributed database design Fragmentation Allocation 8 Common DB architectures P P P M P M P M P M Shared memory Shared disk 9 Common DB architectures Shared nothing P P M M P M Number of other hybrid architectures are possible 10 Selecting the right architecture Reliability Scalability Geographic distribution of data Performance Cost 11 Parallel vs Distributed DB system Typically parallel DBs Fast interconnect Homogeneous software Goals High performance and Transparency Typically distributed DBs Geographically distributed Disconnected operation possible Goal Data sharing heterogeneity autonomy 12 Typical query processing scenarios Parallel DB Distribute partition sort data to make certain DB operations e g Join fast Distributed DB Given data distribution find query processing strategy to minimize cost e g communication cost 13 Distributed DB Design Top down approach have a database how to split and allocate to individual sites Multi databases or bottom up combine existing databases how to deal with heterogeneity autonomy 14 Two issues in top down design Fragmentation Allocation Note issues not independent but studied separately for simplicity 15 Example Employee relation E name loc sal 40 of queries 40 of queries Qa select Qb select from E from E where loc Sa where loc Sb and and Motivation Two sites Sa Sb Qa Sa Sb Qb 16 Name Loc Sal E 5 Joe Sa 10 7 Sally Sb 25 8 Tom Sa 15 F F1 F2 Sa 10 Sa 15 7 Sally Sb 25 Joe Tom At Sb At Sa 5 8 F1 loc Sa E F2 loc Sb E primary horizontal fragmentation 17 Fragmentation Horizontal Primary depends on local attributes R Derived depends on foreign relation Vertical R 18 Horizontal partitioning techniques Round robin Hash partitioning Range partitioning 19 Round robin R F0 F1 t1t1 t2 t2 t3t3 t4t4 t5 F2 Evenly distributes data Good for scanning full relation Not good for point or range queries 20 Hash partitioning R F0 F1 F2 t1 h k1 2 t2 h k2 0 t3 h k3 0 t4 h k4 1 t1 t2 t3 t4 Good for point queries on key also for joins Not good for range queries point queries not on key Good hash function even distribution 21 Range partitioning R t1 A 5 t2 A 8 t3 A 2 t4 A 3 F0 partitioni ng vector 4 7 V0 V1 F1 t1 t3 t4 F2 t2 Good for some range queries on A Need to select good vector else create imbalance data skew execution skew 22 Which are good fragmentations Example 1 F F1 F2 F1 sal 10 E F2 sal 20 E Problem Some tuples lost Example 2 F F3 F4 F1 sal 10 E F2 sal 5 E Tuples with 5 sal 10 are duplicated 23 Prefer to deal with replication explicitly Example F F5 F6 F7 F5 sal 5 E F6 5 sal 10 E F7 sal 10 E Then replicate F6 if desired as part of allocation 24 Horizontal Fragmentation Desiderata R F F1 F2 1 Completeness t R Fi F such that t Fi 2 Disjointness Fi Fj i j such that i j 3 Reconstruction i such that R Fi 25 Generating horizontal fragments Given simple predicates Pr p1 p2 pm and relation R Generate minterm predicates M m m pk 1 k m where pk is either pk or pk Eliminate useless minterms and simplify M to get M Generate fragments m R for each m M 26 5 A 10 Example Example say queries use predicates A 10 A 5 Loc SA Loc SB Eliminate and simplify minterms A 10 A 5 Loc SA Loc SB A 10 A 5 Loc SA Loc SB Final set of fragments 5 A 10 Loc SA 5 A 10 Loc SB A 5 Loc SA A 5 Loc SB A 10 Loc SA A 10 Loc SB Work out details for all minterms 27 More on Horizontal Fragmentation Elimination of useless fragments predicates depends on application semantics e g if Loc SA and SB is possible must retain fragments such as 5 A 10 Loc SA Loc SB Minterm based fragmentation generates complete disjoint and reconstructible Justify this fragments statement 28 Choosing simple predicates E name loc sal with common queries Qa select from E where loc SA and Qb select from E where loc SB and Three choices for Pr and hence F Pr Pr F1 F Pr E Pr Loc SA Loc SB F2 F Pr loc SA E loc SB E Pr Loc SA Loc SB Sal 10 F3 F Pr loc SA sal 10 E loc SB sal 10 E loc SA sal 10 E loc SA sal 10 E 29 Qa Select loc SA Loc SA sal 10 Qb Select loc SB Loc SA sal F1 10 Loc SB sal 10 Loc SB sal F2 F3 Prefer F2 to F1 and F3 10 30 Desiderata for simple predicates Different from completeness of fragmentation Completeness Set of predicates Pr is complete if for every Fi F Pr every t Fi has equal probability of access by every major application Minimality Set of predicates Pr is minimal if no Pr Pr is complete To get complete and minimal Pr use predicates that are 31 relevant in frequent queries Derived horizontal fragmentation Example Two relations Employee and Jobs E NAME SAL LOC J DES Fragment E into E1 E2 by LOC Common query Given employee name list projects s he works in 32 E1 5 8 NM Joe Tom Loc Sa Sa Sal 10 15 E2 7 12 at Sa J NM Sally Fred Loc Sb Sb Sal 25 15 at Sb 5 7 5 12 Description work on 347 hw go to moon build table rest 33 E1 5 8 NM Joe Tom Loc Sa Sa Sal 10 15 E2 7 12 NM Sally Fred at Sa J1 5 5 Des work on 347 hw build table J1 J E1 Loc Sb Sb Sal 25 15 at Sb J2 7 12 Des go to moon rest J2 J E2 34 Derived horizontal fragmentation R fragmented as F F1 F2 Fn S derive D D1 D2 Dn where Di …


View Full Document

Stanford CS 347 - Distributed Databases

Download Distributed Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Databases and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?