Unformatted text preview:

CS347 Lecture 14 May 30 2001 1 Query processing in distributed databases Localization Distributed query operators Cost based optimization 2 Decomposition Given SQL query generate one or more algebraic query trees Localization Rewrite query trees replacing relations by fragments Optimization Given cost model one or more localized query trees Produce minimum cost query execution plan 3 Same as in a centralized DBMS Normalization usually into relational algebra Select A C From R Natural Join S Where R B 1 and S D 2 or R C 3 and S D 2 R B 1 v R C 3 S D 2 R S Conjunctive normal form 4 Redundancy elimination S A 1 S A 5 False S A 10 S A 5 S A 5 Algebraic Rewriting Example pushing conditions down cond3 cond S T cond1 S T cond2 5 1 Start with query tree 2 Replace relations by fragments 3 Push up down CS245 rules 4 Simplify eliminating unnecessary operations Note To denote fragments in query trees R cond Relation that fragment belongs to Condition its tuples satisfy 6 E 3 E 3 R E 3 R E 10 R E 10 R E 10 E 3 R E 10 E 3 R E 10 7 A R A S R A 5 R1 R 5 A 10 R A 10 R2 R3 S A 5 S A 5 S1 S2 8 A R1 A S1 R1 A S2 R2 A A S1 R2 S2 R3 A S1 R3 S2 A A A R A 5 S A 5 R 5 A 10 S A 5 R A 10 S A 5 9 C1 R C2 R C1 C2 R False S C1 C2 R A S A R C1 S C2 R In Example 1 E 3 R2 E 10 R2 E 3 E 10 R2 False In Example 2 R A 5 A S A 5 R A S R A 5 S A 5 R A S A R A S False A A 10 S s fragmentation is derived from that of R K R S K R A 10 R A 10 R1 R2 S K R K R A 10 S K R K R A 10 S1 S2 11 K K R1 S1 R1 K S2 R2 K R2 S2 S1 K K R A 10 S K R K R A 10 R A 10 S K R K R A 10 12 A A K R R1 K A B R2 K C D A A R1 K A B K K A K A R1 K A B R2 K C D 13 Given vertical fragmentation of R A Ri Ai R Ai A For any B A B R B i Ri B Ai 14 Sort Basic sort Range partitioning sort Parallel external sort merge Join Partitioned join Asymmetric fragment and replicate join General fragment and replicate join Semi join programs Aggregation and duplicate removal 15 Input relation R on single site disk fragmented partitioned by sort attribute fragmented partitioned by some other attribute Output sorted relation R single site disk individual sorted fragments partitions 16 Given R A range partitioned on attribute A sort R on A 7 3 3 7 10 11 17 14 10 11 14 17 20 27 22 20 22 27 Each fragment is sorted independently Results shipped elsewhere if necessary 17 Given R A located at one or more sites not fragmented on A sort R on A Algorithm range partition on A and then do basic sort Ra Rb R1 a0 R2 Local sort R1s Local sort R2s Result a1 R3 Local sort R3s 18 Possible centralized approach using a coordinator Each site sends statistics about its fragment to coordinator Coordinator decides of sites to use for local sort Coordinator computes and distributes partitioning vector For example Statistics could be min sort key max sort key of tuples Coordinator tries to choose vector that equally partitions relation 19 Coordinator receives From site 1 Min 5 Max 9 10 tuples From site 2 Min 7 Max 16 10 tuples Assume sort keys distributed uniformly within min max in each fragment Partition R into two fragments What is k0 5 10 k0 15 20 20 Different kinds of statistics Local partitioning vector Histogram Site 1 3 5 4 6 3 8 10 of tuples local vector Multiple rounds between coordinator and sites Sites send statistics Coordinator computes and distributes initial vector V Sites tell coordinator the number of tuples that fall in each range of V Coordinator computes final partitioning vector Vf 21 Local sort Compute partition vector Merge sorted streams at final sites In order Ra Rb Local sort Local sort Ras Rbs R1 a0 R2 Result a1 R3 22 Input Output Relations R S May or may not be partitioned R S Result at one or more sites 23 Local join Join attribute A Ra Rb f A R1 S1 R2 S2 R3 S3 Result Sa Sb f A Note Works only for equi joins 24 Same partition function f for both relations f can be range or hash partitioning Any type of local join nested loop hash merge etc can be used Several possible scheduling options Example partition R partition S join partition R build local hash table for R partition S and join Good partition function important Distribute join load evenly among sites 25 Local join Join attribute A Ra Rb R1 S R2 S R3 S f Partition function Sa Sb union Result Any partition function f can be used even round robin Can be used for any kind of join not just equi joins 26 Ra Rb Partition Sa Sb Partition R1 R2 R1 Replicate m copies R2 Rn Rn S1 S1 S2 Sm Replicate n copies S2 Sm 27 R1 S1 R1 Sm R2 S1 R2 Sm Rn Sm All n x m pairings of R S fragments Rn S1 Result Asymmetric F R join is a special case of general F R Asymmetric F R is useful when S is small 28 Used to reduce communication traffic during join processing R S R S S R S R S R S R 29 S A B A C 2 10 25 30 a b c d 3 10 15 25 32 x y z w x Compute S R R A S 2 10 25 30 S R S 10 y 25 w Using semi join communication cost 4 A 2 A C result Directly joining R and S communication cost 4 A B result 30 Say R is the smaller of the two relations R and S R S S is cheaper than R S if size AS size R S size R Similar comparisons for other types of semi joins Common implementation trick Encode AS or AR as a bit vector 1 bit per domain of attribute A 001101000010100 31 To compute R S T Semi join program 1 R S T where R R S S S S T Semi join program 2 R where R R S S S Several other options T T In general number of options is exponential …


View Full Document

Stanford CS 347 - Distributed Databases

Download Distributed Databases
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Distributed Databases and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Distributed Databases and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?