Distributed Databases CS347 Lecture 14 May 30 2001 1 Topics for the Day Query processing in distributed databases Localization Distributed query operators Cost based optimization 2 Query Processing Steps Decomposition Given SQL query generate one or more algebraic query trees Localization Rewrite query trees replacing relations by fragments Optimization Given cost model one or more localized query trees Produce minimum cost query execution plan 3 Decomposition Same as in a centralized DBMS Normalization usually into relational algebra Select A C From R Natural Join S Where R B 1 and S D 2 or R C 3 and S D 2 R B 1 v R C 3 S D 2 R S Conjunctiv e normal form 4 Decomposition Redundancy elimination S A 1 S A 5 False S A 10 S A 5 S A 5 Algebraic Rewriting Example pushing conditions down cond S T cond1 S cond3 T cond2 5 Localization Steps 1 Start with query tree 2 Replace relations by fragments 3 Push up down CS245 rules 4 Simplify eliminating unnecessary operations Note To denote fragments in query trees R cond Relation that fragment belongs to Condition its tuples satisfy 6 Example 1 E 3 R E 3 R E 10 E 3 R E 10 E 3 R E 10 R E 10 E 3 R E 10 7 Example 2 A R A S R A 5 R 5 A 10 R A 10 R1 R2 R3 S A 5 S A 5 S1 S2 8 A R1 S2 A S1 R1 A S2 R2 A A S1 R2 S2 R3 A S1 R3 A A A R A 5 S A 5 R 5 A 10 S A 5 R A 10 9 S A 5 Rules for Horiz Fragmentation C1 R C2 R C1 C2 R False R C1 A S C2 R A S C1 C2 R A S A In Example 1 E 3 R2 E 10 R2 E 3 E 10 R2 False In Example 2 R A 5 A S A 5 R A S R A 5 S A 5 R A S A R A S False 10 Example 3 Derived Fragmentation S s fragmentation is derived from that of R K R S K R A 10 R A 10 R1 R2 S K R K R A 10 S K R K R A 10 S1 S2 11 K K R1 S1 R1 K S2 R2 K R2 S2 S1 K R A 10 S K R K R A 10 K R A 10 S K R K R A 10 12 Example 4 Vertical Fragmentation A A K R R1 K A B R2 K C D A A R1 K A B K K A K A R1 K A B R2 K C D 13 Rule for Vertical Fragmentation Given vertical fragmentation of R A Ri Ai R Ai A For any B A B R B i Ri B Ai 14 Parallel Distributed Query Operations Sort Basic sort Range partitioning sort Parallel external sort merge Join Partitioned join Asymmetric fragment and replicate join General fragment and replicate join Semi join programs Aggregation and duplicate removal 15 Parallel distributed sort Input relation R on single site disk fragmented partitioned by sort attribute fragmented partitioned by some other attribute Output sorted relation R single site disk individual sorted fragments partitions 16 Basic sort Given R A range partitioned on attribute A sort R on A 7 3 3 7 10 11 17 14 10 11 14 17 20 27 22 20 22 27 Each fragment is sorted independently Results shipped elsewhere if necessary 17 Range partitioning sort Given R A located at one or more sites not fragmented on A sort R on A Algorithm range partition on A and then do basic sort Ra Rb R1 a0 R2 Local sort R1s Local sort R2s Local sort R3s Result a1 R3 18 Selecting a partitioning vector Possible centralized approach using a coordinator Each site sends statistics about its fragment to coordinator Coordinator decides of sites to use for local sort Coordinator computes and distributes partitioning vector For example Statistics could be min sort key max sort key of tuples Coordinator tries to choose vector that equally partitions relation 19 Example Coordinator receives From site 1 Min 5 Max 9 10 tuples From site 2 Min 7 Max 16 10 tuples Assume sort keys distributed uniformly within min max in each fragment Partition R into two fragments What is k0 5 10 k0 15 20 20 Variations Different kinds of statistics Local partitioning vector Histogram Site 1 3 5 6 4 3 8 10 of tuples local vector Multiple rounds between coordinator and sites Sites send statistics Coordinator computes and distributes initial vector V Sites tell coordinator the number of tuples that fall in each range of V Coordinator computes final partitioning vector Vf 21 Parallel external sort merge Local sort Compute partition vector Merge sorted streams at final sites In order Ra Rb Local sort Local sort Ras Rbs R1 a0 R2 Result a1 R3 22 Parallel distributed join Input Output Relations R S May or may not be partitioned R S Result at one or more sites 23 Partitioned Join Local join Join attribute A Ra Rb f A R1 S1 R2 S2 R3 S3 Result Sa Sb f A Note Works only for equi joins 24 Partitioned Join Same partition function f for both relations f can be range or hash partitioning Any type of local join nested loop hash merge etc can be used Several possible scheduling options Example partition R partition S join partition R build local hash table for R partition S and join Good partition function important Distribute join load evenly among sites 25 Asymmetric fragment replicate join Local join Join attribute A Ra Rb R1 S Sa R2 S Sb R3 S f Partition function union Result Any partition function f can be used even round robin Can be used for any kind of join not just equi joins 26 General fragment replicate join Ra Rb Partition Sa Sb Partition R1 R2 R1 Replicate m copies R2 Rn Rn S1 S1 S2 Sm Replicate n copies S2 Sm 27 R1 S1 R1 Sm R2 S1 R2 Sm Rn Sm All n x m pairings of R S fragments Rn S1 Result Asymmetric F R join is a special case of general F R Asymmetric F R is useful when S is small 28 Semi join programs Used to reduce communication traffic during join processing R S R S S R S R S R S R 29 S A B 2 10 25 30 a b c d Compute S R Example R A S 2 10 25 30 S R A C 3 10 15 25 32 x y z w x S 10 y 25 w Using semi join communication cost 4 A 2 A C result Directly joining R and S communication cost 4 A B result 30 Comparing communication costs Say R is the smaller of the two relations R and S R S S is cheaper than R S if …
View Full Document