1Lecture 24:Query Execution Monday, November 27, 20062Outline• Query optimization: algebraic laws 16.23ExampleProduct(pname, maker), Company(cname, city)• How do we execute this query ?Select Product.pnameFrom Product, CompanyWhere Product.maker=Company.cnameand Company.city = “Seattle”Select Product.pnameFrom Product, CompanyWhere Product.maker=Company.cnameand Company.city = “Seattle”4ExampleProduct(pname, maker), Company(cname, city)Assume:Clustered index: Product.pname, Company.cnameUnclustered index: Product.maker, Company.city5σcity=“Seattle”Product(pname,maker)Company(cname,city)maker=cnameLogical Plan:6σcity=“Seattle”Product(pname,maker)Company(cname,city)cname=makerPhysical plan 1:Index-basedselectionIndex-basedjoin7σcity=“Seattle”Product(pname,maker)Company(cname,city)maker=cnamePhysical plans 2a and 2b:Index-scanMerge-joinScan and sort (2a)index scan (2b)Which one is better ??Which one is better ??8σcity=“Seattle”Product(pname,maker)Company(cname,city)cname=makerPhysical plan 1:Index-basedselectionIndex-basedjoinT(Company) / V(Company, city) × T(Product) / V(Product, maker)Total cost:T(Company) / V(Company, city) × T(Product) / V(Product, maker)Total cost:T(Company) / V(Company, city) × T(Product) / V(Product, maker)9σcity=“Seattle”Product(pname,maker)Company(cname,city)maker=cnamePhysical plans 2a and 2b:Table-scanMerge-joinScan and sort (2a)index scan (2b)B(Company) 3B(Product)T(Product)No extra cost(why ?)Total cost:(2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company)Total cost:(2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company)10Plan 1: T(Company)/V(Company,city) ×T(Product)/V(Product,maker)Plan 2a: B(Company) + 3B(Product)Plan 2b: B(Company) + T(Product)Plan 1: T(Company)/V(Company,city) ×T(Product)/V(Product,maker)Plan 2a: B(Company) + 3B(Product)Plan 2b: B(Company) + T(Product)Which one is better ??Which one is better ??It depends on the data !!It depends on the data !!11Example•Case 1: V(Company, city) ≈ T(Company) •Case 2: V(Company, city) << T(Company)T(Company) = 5,000 B(Company) = 500 M = 100T(Product) = 100,000 B(Product) = 1,000We may assume V(Product, maker) ≈ T(Company) (why ?)T(Company) = 5,000 B(Company) = 500 M = 100T(Product) = 100,000 B(Product) = 1,000We may assume V(Product, maker) ≈ T(Company) (why ?)V(Company,city) = 2,000V(Company,city) = 2,000V(Company,city) = 20V(Company,city) = 2012Which Plan is Best ?Plan 1: T(Company)/V(Company,city) × T(Product)/V(Product,maker)Plan 2a: B(Company) + 3B(Product)Plan 2b: B(Company) + T(Product)Plan 1: T(Company)/V(Company,city) × T(Product)/V(Product,maker)Plan 2a: B(Company) + 3B(Product)Plan 2b: B(Company) + T(Product)Case 1:Case 2:13Lessons• Need to consider several physical plan– even for one, simple logical plan• No magic “best” plan: depends on the data• In order to make the right choice– need to have statistics over the data– the B’s, the T’s, the V’s14Query Optimzation• Have a SQL query Q• Create a plan P• Find equivalent plans P = P’ = P’’ = …• Choose the “cheapest”. HOW ??15Logical Query PlanSELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name ANDP.city=‘seattle’ ANDQ.phone > ‘5430000’SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name ANDP.city=‘seattle’ ANDQ.phone > ‘5430000’PurchasePersonBuyer=nameCity=‘seattle’ phone>’5430000’buyerσIn class:find a “better” plan P’P=Purchasse(buyer, city)Person(name, phone)16Logical Query PlanSELECT city, sum(quantity)FROM salesGROUP BY cityHAVING sum(quantity) < 100SELECT city, sum(quantity)FROM salesGROUP BY cityHAVING sum(quantity) < 100sales(product, city, quantity)γ city, sum(quantity)→pσ p < 100T1(city,p)T2(city,p)In class:find a “better” plan P’Q=P=17The three components of an optimizerWe need three things in an optimizer:• Algebraic laws• An optimization algorithm• A cost estimator18Algebraic Laws• Commutative and Associative LawsR ∪ S = S ∪ R, R ∪ (S ∪ T) = (R ∪ S) ∪ TR |×|S = S |×|R, R |×|(S |×|T) = (R |×|S) |×|TR |×|S = S |×|R, R |×|(S |×|T) = (R |×|S) |×|T• Distributive LawsR |×|(S ∪ T) = (R |×|S) ∪ (R |×|T)19Algebraic Laws• Laws involving selection:σC AND C’(R) = σC(σC’(R)) = σC(R) ∩σC’(R)σC OR C’(R) = σC(R) ∪σC’(R)σC (R |×|S) = σC (R) |×|S • When C involves only attributes of RσC (R – S) = σC (R) – SσC (R ∪ S) = σC (R) ∪σC (S)σC (R |×|S) = σC (R) |×|S20Algebraic Laws• Example: R(A, B, C, D), S(E, F, G)σF=3 (R |×|D=ES) = ?σA=5 AND G=9 (R |×|D=ES) = ?21Algebraic Laws• Laws involving projectionsΠM(R |×|S) = ΠM(ΠP(R) |×| ΠQ(S))ΠM(ΠN(R)) = ΠM,N(R)• Example R(A,B,C,D), S(E, F, G)ΠA,B,G(R |×|D=ES) = Π? (Π?(R) |×|D=EΠ?(S))22Algebraic Laws• Laws involving grouping and aggregation:δ(γA, agg(B)(R)) = γA, agg(B)(R)γA, agg(B)(δ(R)) = γA, agg(B)(R) if agg is “duplicate insensitive”• Which of the following are “duplicate insensitive” ?sum, count, avg, min, maxγA, agg(D)(R(A,B) |×|B=CS(C,D)) = γA, agg(D)(R(A,B) |×|B=C(γC, agg(D)S(C,D)))23Optimizations Based on SemijoinsTHIS IS ADVANCED STUFF; NOT ON THE FINAL• R S = ΠA1,…,An(R S)• Where the schemas are:– Input: R(A1,…An), S(B1,…,Bm)– Output: T(A1,…,An)24Optimizations Based on SemijoinsSemijoins: a bit of theory (see [AHV])• Given a query:• A full reducer for Q is a program:• Such that no dangling tuples remain in any relationQ :- Π (σ (R1 |x| R2|x| . . . |x| Rn))Q :- Π (σ (R1 |x| R2|x| . . . |x| Rn))Ri1:= Ri1Rj1Ri2:= Ri2Rj2. . . . .Rip:= RipRjpRi1:= Ri1Rj1Ri2:= Ri2Rj2. . . . .Rip:= RipRjp25Optimizations Based on Semijoins•Example:• A full reducer is:Q(A,E) :- R1(A,B) |x| R2(B,C) |x| R3(C,D,E)Q(A,E) :- R1(A,B) |x| R2(B,C) |x| R3(C,D,E)R2(B,C) := R2(B,C) |x R1(A,B)R3(C,D,E) := R3(C,D,E) |x R2(B,C)R2(B,C) := R2(B,C) |x R3(C,D,E)R1(A,B) := R1(A,B) |x R2(B,C)R2(B,C) := R2(B,C) |x R1(A,B)R3(C,D,E) := R3(C,D,E) |x R2(B,C)R2(B,C) := R2(B,C) |x R3(C,D,E)R1(A,B) := R1(A,B) |x R2(B,C)The new tables have only the tuples necessary to compute Q(E)26Optimizations Based on Semijoins• Example: • Doesn’t have a full reducer (we can reduce forever)Theorem a query has a full reducer iff it is “acyclic”Q(E) :- R1(A,B) |x| R2(B,C) |x|
View Full Document