Topics for the Day Query processing in distributed databases Localization Distributed query operators Cost based optimization Distributed Databases CS347 Lecture 14 May 30 2001 1 2 Query Processing Steps Decomposition Decomposition Same as in a centralized DBMS Normalization usually into relational algebra Given SQL query generate one or more algebraic query trees Select A C From R Natural Join S Where R B 1 and S D 2 or R C 3 and S D 2 Localization Rewrite query trees replacing relations by fragments Optimization R B 1 v R C 3 S D 2 Given cost model one or more localized query trees Produce minimum cost query execution plan R 3 S Conjunctive normal form 4 1 Decomposition Localization Steps Redundancy elimination 1 Start with query tree 2 Replace relations by fragments S A 1 S A 5 False S A 10 S A 5 S A 5 3 Push Algebraic Rewriting Example pushing conditions down R E 10 4 Simplify eliminating unnecessary operations T cond1 S E 3 E 3 down CS245 rules Note To denote fragments in query trees Example 1 R cond3 cond S up R cond cond2 T 5 Relation that fragment belongs to Condition its tuples satisfy 6 Example 2 E 3 R E 10 A R R E 10 R E 10 E 3 A S E 3 R E 10 7 R A 5 R1 R 5 A 10 R A 10 R2 R3 S A 5 S A 5 S1 S2 8 2 A A S1 R1 S2 R2 A R1 Rules for Horiz Fragmentation A A S1 R2 S2 R3 A C1 R C2 R C1 C2 R False R C1 S C2 R S C1 C2 R A S A In Example 1 E 3 R2 E 10 R2 E 3 E 10 R2 False In Example 2 R A 5 A S A 5 R A S R A 5 S A 5 R A S A R A S False A A S1 R3 S2 A A R A 5 S A 5 R 5 A 10 S A 5 R A 10 S A 5 A 9 10 Example 3 Derived Fragmentation S s fragmentation is derived from that of R K R K K S R1 S1 R1 K K R1 R2 S2 R2 K S1 R2 S2 R A 10 R A 10 K S K R K R A 10 S K R K R A 10 S1 S2 11 K R A 10 S K R K R A 10 R A 10 S K R K R A 10 12 3 Example 4 Vertical Fragmentation A A Given vertical fragmentation of R A Ri Ai R Ai A For any B A K R R1 K A B Rule for Vertical Fragmentation R2 K C D B R B i Ri B Ai A A R1 K A B K K A K A R1 K A B R2 K C D 13 Parallel Distributed Query Operations Sort 14 Parallel distributed sort Input relation R on Basic sort Range partitioning sort Parallel external sort merge single site disk fragmented partitioned by sort attribute fragmented partitioned by some other attribute Join Partitioned join Asymmetric fragment and replicate join General fragment and replicate join Semi join programs Output sorted relation R single site disk individual sorted fragments partitions Aggregation and duplicate removal 15 16 4 Basic sort Range partitioning sort Given R A range partitioned on attribute A sort R on A 7 3 11 10 17 20 14 27 Given R A located at one or more sites not fragmented on A sort R on A Algorithm range partition on A and then do basic sort 22 R1 Ra 3 7 11 10 14 17 20 a0 Rb 22 R2 27 Each fragment is sorted independently Results shipped elsewhere if necessary Local sort R1s Local sort R2s Result a1 R3 Local sort R3s 17 18 Selecting a partitioning vector Example Coordinator receives Possible centralized approach using a coordinator Each site sends statistics about its fragment to coordinator Coordinator decides of sites to use for local sort Coordinator computes and distributes partitioning vector For example From site 1 Min 5 Max 9 10 tuples From site 2 Min 7 Max 16 10 tuples Assume sort keys distributed uniformly within min max in each fragment Partition R into two fragments Statistics could be min sort key max sort key of tuples Coordinator tries to choose vector that equally partitions relation What is k0 5 19 10 k0 15 20 20 5 Variations Parallel external sort merge Different kinds of statistics Local partitioning vector Histogram Site 1 3 5 6 4 of tuples 3 8 10 local vector Local sort Compute partition vector Merge sorted streams at final sites Multiple rounds between coordinator and sites Sites send statistics Coordinator computes and distributes initial vector V Sites tell coordinator the number of tuples that fall in each range of V Coordinator computes final partitioning vector Vf In order Ra Local sort Ras Rb Local sort Rbs R1 a0 Result R2 a1 R3 21 22 Parallel distributed join Partitioned Join Local join Join attribute A Input Output Relations R S May or may not be partitioned R S Result at one or more sites Ra Rb f A R1 S1 R2 S2 R3 S3 Result Sa Sb f A Note Works only for equi joins 23 24 6 Partitioned Join Asymmetric fragment replicate join Same partition function f for both relations f can be range or hash partitioning Any type of local join nested loop hash merge etc can be used Several possible scheduling options Example partition R partition S join partition R build local hash table for R partition S and join Good partition function important Local join Join attribute A Ra Rb R1 S R2 S R3 S f Partition function Sa Sb union Result Distribute join load evenly among sites Any partition function f can be used even round robin 25 General fragment replicate join Ra Rb Partition Sa Sb R1 R2 R1 Replicate m copies R2 Rn Rn S1 S1 S2 Replicate n copies Can be used for any kind of join not just equi joins 26 R1 S1 R1 Sm R2 S1 R2 Sm Rn Sm All n x m pairings of R S fragments Rn S2 S1 Result Asymmetric F R join is a special case of general F R Partition Sm Sm 27 Asymmetric F R is useful when S is small 28 7 Semi join programs Used to reduce communication traffic during join processing R S R S S R S R S S B 2 a S R Example R 10 b R A C 3 x 10 y 25 c 15 z 30 d 25 w Compute R S A A S 2 10 25 30 S R 32 x S 10 y 25 w Using semi join communication cost 4 A 2 A C result Directly joining R and S communication cost 4 A B result 29 30 n way joins Comparing communication costs Say R is the smaller of the two relations R and S …
View Full Document