Lecture 20: Query Optimization (2) Wednesday, May 19, 2010 Dan Suciu -- 444 Spring 2010 1Dan Suciu -- 444 Spring 2010 2 Outline • Search space • Algorithms for enumerating query plans • Estimating the cost of a query planKey Decisions Logical plan • What logical plans do we consider (left-deep, bushy ?); Search Space • Which algebraic laws do we apply, and in which context(s) ?; Optimization rules • In what order to we explore the search space ?; Optimization algorithm 3Key Decisions Physical plan • What physical operators to use? • What access paths to use (file scan or index)? 4Optimizers • Heuristic-based optimizers: – Apply greedily rules that always improve • Typically: push selections down – Very limited: no longer used today • Cost-based optimizers – Use a cost model to estimate the cost of each plan – Select the “cheapest” plan Dan Suciu -- 444 Spring 2010 5The Search Space • Complete plans • Bottom-up plans • Top-down plans Dan Suciu -- 444 Spring 2010 6Complete Plans Dan Suciu -- 444 Spring 2010 7 SELECT * FROM R, S, T WHERE R.B=S.B and S.C=T.C and R.A<40 ⨝ S σA<40 R ⨝ T ⨝ S σA<40 R ⨝ T Why is this search space inefficient ? R(A,B) S(B,C) T(C,D)Bottom-up Partial Plans 8 SELECT * FROM R, S, T WHERE R.B=S.B and S.C=T.C and R.A<40 R(A,B) S(B,C) T(C,D) ⨝ σA<40 R S T ⨝ S σA<40 R ⨝ R S ⨝ S σA<40 R ⨝ T ….. Why is this better ?Top-down Partial Plans 9 SELECT * FROM R, S, T WHERE R.B=S.B and S.C=T.C and R.A<40 R(A,B) S(B,C) T(C,D) ⨝ σA<40 T ⨝ S ⨝ T ….. SELECT R.A, T.D FROM R, S, T WHERE R.B=S.B and S.C=T.C SELECT * FROM R, S WHERE R.B=S.B and R.A < 40 SELECT * FROM R WHERE R.A < 40Plan Enumeration Algorithms • Dynamic programming (in class) – Classical algorithm [1979] – Limited to joins: join reordering algorithm – Bottom-up • Rule-based algorithm (will not discuss) – Database of rules (=algebraic laws) – Usually: dynamic programming – Usually: top-down Dan Suciu -- 444 Spring 2010 1011!Dynamic Programming Originally proposed in System R [1979] • Only handles single block queries: • Heuristics: selections down, projections up SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!Dan Suciu -- 444 Spring 2010Dynamic Programming • Search space = join trees • Algebraic laws = commutativity, associativity • Algorithm = dynamic programming Dan Suciu -- 444 Spring 2010 1213!Join Trees • R1 ⨝ R2 ⨝ …. ⨝ Rn • Join tree: • A plan = a join tree • A partial plan = a subtree of a join tree R3 R1 R2 R4 Dan Suciu -- 444 Spring 201014!Types of Join Trees • Left deep: R3 R1 R5 R2 R4 Dan Suciu -- 444 Spring 201015!Types of Join Trees • Bushy: R3 R1 R2 R4 R5 Dan Suciu -- 444 Spring 201016!Types of Join Trees • Right deep: R3 R1 R5 R2 R4 Dan Suciu -- 444 Spring 201017!Dynamic Programming Join ordering: • Given: a query R1 ⨝ R2 ⨝ . . . ⨝ Rn • Find optimal order • Assume we have a function cost() that gives us the cost of every join tree Dan Suciu -- 444 Spring 2010 SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!18!Dynamic Programming • For each subquery Q ⊆{R1, …, Rn} compute the following: – Size(Q) = the estimated size of Q – Plan(Q) = a best plan for Q – Cost(Q) = the estimated cost of that plan Dan Suciu -- 444 Spring 2010 SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!19!Dynamic Programming • Step 1: For each {Ri} do: – Size({Ri}) = B(Ri) – Plan({Ri}) = Ri – Cost({Ri}) = (cost of scanning Ri) Dan Suciu -- 444 Spring 2010 SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!20!Dynamic Programming • Step 2: For each Q ⊆{R1, …, Rn} of cardinality i do: – Size(Q) = estimate it recursively – For every pair of subqueries Q’, Q’’ s.t. Q = Q’ ∪ Q’’ compute cost(Plan(Q’) ⨝ Plan(Q’’)) • Cost(Q) = the smallest such cost • Plan(Q) = the corresponding plan Dan Suciu -- 444 Spring 2010 SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!21!Dynamic Programming • Step 3: Return Plan({R1, …, Rn}) Dan Suciu -- 444 Spring 2010 SELECT list"FROM R1, …, Rn"WHERE cond1 AND cond2 AND . . . AND condk!22!Example To illustrate, ad-hoc cost model (from the book ): • Cost(P1 ⨝ P2) = Cost(P1) + Cost(P2) + size(intermediate results for P1, P2) • Cost of a scan = 0 Dan Suciu -- 444 Spring 201023!Example • R ⨝ S ⨝ T ⨝ U • Assumptions: Dan Suciu -- 444 Spring 2010 SELECT *"FROM R, S, T, U"WHERE cond1 AND cond2 AND . . . !T(R) = 2000 T(S) = 5000 T(T) = 3000 T(U) = 1000 T(R ⨝ S) = 0.01*T(R)*T(S) T(S ⨝ T) = 0.01*T(S)*T(T) etc. All join selectivities = 1%24!Subquery! Size! Cost! Plan!RS!RT!RU!ST!SU!TU!RST!RSU!RTU!STU!RSTU!T(R) = 2000 T(S) = 5000 T(T) = 3000 T(U) = 100025!Subquery! Size! Cost! Plan!RS! 100k! 0! RS!RT! 60k! 0! RT!RU! 20k! 0! RU!ST! 150k! 0! ST!SU! 50k! 0! SU!TU! 30k! 0! TU!RST! 3M! 60k! (RT)S!RSU! 1M! 20k! (RU)S!RTU! 0.6M! 20k! (RU)T!STU! 1.5M! 30k! (TU)S!RSTU! 30M!60k+50k=110k!(RT)(SU)!T(R) = 2000 T(S) = 5000 T(T) = 3000 T(U) = 100026!Reducing the Search Space • Restriction 1: only left linear trees (no bushy) • Restriction 2: no trees with cartesian product Dan Suciu -- 444 Spring 2010 R(A,B) ⨝ S(B,C) ⨝ T(C,D) Plan: (R(A,B)⨝T(C,D)) ⨝ S(B,C) has a cartesian product. Most query optimizers will not consider it27!Dynamic Programming: Summary • Handles only join queries: – Selections are pushed down (i.e. early) – Projections are pulled up (i.e. late) • Takes exponential time in general, BUT: – Left linear joins may reduce time – Non-cartesian products may reduce time further Dan Suciu -- 444 Spring 201028!Rule-Based Optimizers • Extensible collection of rules Rule = Algebraic law with a direction • Algorithm for firing these rules Generate many alternative plans, in some order Prune by cost • Volcano (later SQL Sever) • Starburst (later DB2) Dan Suciu -- 444 Spring 201029!Completing the Physical Query Plan • Choose algorithm for each operator – How much memory do we have ? – Are the
View Full Document