Lecture 25: Query OptimizationOutlineIndex Based SelectionSlide 4Index Based JoinSlide 6OptimizationConverting from SQL to Logical PlansSlide 9Converting Nested QueriesSlide 11Slide 12Slide 13Slide 14Algebraic LawsSlide 16Slide 17Slide 18Heuristic Based OptimizationsQuery Rewriting: Predicate PushdownQuery Rewrites: Predicate Pushdown (through grouping)Query Rewrite: Pushing predicates upSlide 23Slide 24Lecture 25: Query OptimizationWednesday, November 22, 2000Outline•Finish index-based algorithms•From SQL to logical plans (7.3)•Algebraic laws (7.2)Index Based Selection•Selection on equality: a=v(R)•Clustered index on a: cost B(R)/V(R,a)•Unclustered index on a: cost T(R)/V(R,a)Index Based Selection•Example: B(R) = 2000, T(R) = 100,000, V(R, a) = 20, compute the cost of a=v(R)•Cost of table scan:–If R is clustered: B(R) = 2000 I/Os–If R is unclustered: T(R) = 100,000 I/Os•Cost of index based selection:–If index is clustered: B(R)/V(R,a) = 100–If index is unclustered: T(R)/V(R,a) = 5000•Notice: when V(R,a) is small, then unclustered index is uselessIndex Based Join•R S•Assume S has an index on the join attribute•Iterate over R, for each tuple fetch corresponding tuple(s) from S•Assume R is clustered. Cost:–If index is clustered: B(R) + T(R)B(S)/V(S,a)–If index is unclustered: B(R) + T(R)T(S)/V(S,a)Index Based Join•Assume both R and S have a sorted index (B+ tree) on the join attribute•Then perform a merge join (called zig-zag join)•Cost: B(R) + B(S)Optimization•Start: convert SQL to Logical Plan•Algebraic laws: are the foundation for every optimization•Two approaches to optimizations:–Heuristics: apply laws that seem to result in cheaper plans–Cost based: estimate size and cost of intermediate results, search systematically for best planConverting from SQL to Logical PlansSelect a1, …, anFrom R1, …, RkWhere Ca1,…,an( C(R1 R2 … Rk))Converting from SQL to Logical PlansSelect a1, …, anFrom R1, …, RkWhere CGroup by b1, …, bla1,…,an( b1, …, bm, aggs ( C(R1 R2 … Rk)))Converting Nested QueriesSelect product.nameFrom productWhere product.maker in (Select company.name From company where company.city=“seattle”)Select product.nameFrom product, companyWhere product.maker = company.name AND company.city=“seattle”Converting Nested QueriesSelect x.name, x.makerFrom product xWhere x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”•How do we convert this one to logical plan ?Converting Nested Queries•Let’s compute the complement first:Select x.name, x.makerFrom product xWhere x.color= “blue” AND x.price < SOME (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”Converting Nested Queries•This one becomes a SFW query:Select x.name, x.makerFrom product x, product yWhere x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price•This returns exactly the products we DON’T want, so…Converting Nested Queries(Select x.name, x.maker From product x Where x.color = “blue”)EXCEPT(Select x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price)Algebraic Laws•Commutative and Associative Laws–R U S = S U R, R U (S U T) = (R U S) U T–R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T–R S = S R, R (S T) = (R S) T•Distributive Laws–R (S U T) = (R S) U (R T)Algebraic Laws•Laws involving selection:– C AND C’(R) = C( C’(R)) = C(R) ∩ C’(R)– C OR C’(R) = C(R) U C’(R)– C (R S) = C (R) S •When C involves only attributes of R– C (R – S) = C (R) – S– C (R U S) = C (R) U C (S)– C (R ∩ S) = C (R) ∩ SAlgebraic Laws•Example: R(A, B, C, D), S(E, F, G)– F=3 (R S) = ?– A=5 AND G=9 (R S) = ?D=ED=EAlgebraic Laws•Laws involving projections– M(R S) = N(P(R) Q(S))•Where N, P, Q are appropriate subsets of attributes of M– M(N(R)) = M,N(R)•Example R(A,B,C,D), S(E, F, G)– A,B,G(R S) = ? (?(R) ?(S)) D=ED=EHeuristic Based Optimizations•Query rewriting based on algebraic laws•Result in better queries most of the time•Heuristics number 1:–PUSH SELECTIONS DOWN !Query Rewriting: Predicate PushdownProductCompanymaker=name price>100 AND city=“Seattle” pnameProductCompanymaker=nameprice>100 pnamecity=“Seattle”The earlier we process selections, less tuples we need to manipulatehigher up in the tree (but may cause us to loose an important orderingof the tuples).Query Rewrites: Predicate Pushdown (through grouping)Select y.name, Max(x.price)From product x, company yWhere x.maker = y.name GroupBy y.nameHaving Max(x.price) > 100Select y.name, Max(x.price)From product x, company yWhere x.maker=y.name and x.price > 100GroupBy y.nameHaving Max(x.price) > 100• For each company, find the maximal price of its products.•Advantage: the size of the join will be smaller.• Requires transformation rules specific to the grouping/aggregation operators.• Won’t work if we replace Max by Min.Query Rewrite:Pushing predicates upCreate View V1 ASSelect x.category, Min(x.price) AS pFrom product xWhere x.price < 20GroupBy x.categoryCreate View V2 ASSelect y.name, x.category, x.priceFrom product x, company yWhere x.maker=y.nameSelect V2.name, V2.priceFrom V1, V2Where V1.category = V2.category and V1.p = V2.priceBargain view V1: categories with some price<20, and the cheapest priceQuery Rewrite:Pushing predicates upCreate View V1 ASSelect x.category, Min(x.price) AS pFrom product xWhere x.price < 20GroupBy x.categoryCreate View V2 ASSelect y.name, x.category, x.priceFrom product x, company yWhere x.maker=y.nameSelect
View Full Document