1Query OptimizationCPS 116Introduction to Database Systems2Announcements (November 20) Homework #4 (last one and short) assigned today Due next Thursday3Query optimization One logical plan → “best” physical plan Questions How to enumerate possible plans How to estimate costs How to pick the “best” one Often the goal is not getting the optimum plan, but instead avoiding the horrible ones1 second 1 hour1 minuteAny of these will do4Plan enumeration in relational algebra Apply relational algebra equivalences) Join reordering: × and ! are associative and commutative (except column ordering, but that is unimportant)!!R ST!!S RT!!R TS…===5More relational algebra equivalences Convert σp-× to/from !p: σp(R × S) = R !pS Merge/split σ’s: σp1(σp2R) = σp1 ∧ p2R Merge/split π’s: πL1(πL2R) = πL1R, where L1 ⊆ L2 Push down/pull up σ:σp∧pr∧ps(R!p’S)=(σprR)!p∧p’(σpsS),whereσp∧pr∧ps(R!p’S) (σprR) !p∧p’(σpsS), where pr is a predicate involving only R columns ps is a predicate involving only S columns p and p’ are predicates involving both R and S columns Push down π: πL(σpR) = πL(σp(πL L’R)), where L’ is the set of columns referenced by p that are not in L Many more (seemingly trivial) equivalences… Can be systematically used to transform a plan to new ones6Relational query rewrite exampleπtitleσStudent.name=“Bart” ∧ Student.SID = Enroll.SID ∧ Enroll.CID = Course.CID×EnrollCourse×StudentπtitleσEnroll.CID = Course.CIDCtσ×t!×EnrollCourse×StudentσStudent.SID = Enroll.SIDσStudent.name = “Bart”Push down σπtitle!Enroll.CID = Course.CIDEnrollCourseStudent!Student.SID = Enroll.SIDσname = “Bart”Convert σp-×to !p27Heuristics-based query optimization Start with a logical plan Push selections/projections down as much as possible Why? Reduce the size of intermediate results Why not? May be expensive; maybe joins filter better Join smaller relations first, and avoid cross product Why? Reduce the size of intermediate results Why not? Size depends on join selectivity too Convert the transformed logical plan to a physical plan (by choosing appropriate physical operators)8SQL query rewrite More complicated—subqueries and views divide a query into nested “blocks” Processing each block separately forces particular join methods and join order Even if the plan is optimal for each block, it may not be optimal for the entire query Unnest query: convert subqueries/views to joins) We can just deal with select-project-join queries Where the clean rules of relational algebra apply9SQL query rewrite example SELECT nameFROM StudentWHERE SID = ANY (SELECT SID FROM Enroll); SELECT nameFROM Student, EnrollWHERE St d t SID E ll SIDWHERE Student.SID = Enroll.SID; Wrong—consider two Bart’s, each taking two classes SELECT nameFROM (SELECT DISTINCT Student.SID, nameFROM Student, EnrollWHERE Student.SID = Enroll.SID); Right—assuming Student.SID is a key10Dealing with correlated subqueries SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); SELECT CIDFROM Course (SELECT CID COUNT(*) AS cntFROM Course, (SELECT CID, COUNT(*) AS cntFROM Enroll GROUP BY CID) tWHERE t.CID = Course.CID AND min_enroll > t.cntAND title LIKE ’CPS%’; New subquery is inefficient (computes enrollment for all courses) Suppose a CPS class is empty?11“Magic” decorrelation SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); CREATE VIEW Supp_Course ASSELECT * FROM Course WHERE title LIKE ’CPS%’;CREATE VIEW Magic ASProcess the outer querywithout the subqueryCll bidigSELECT DISTINCT CID FROM Supp_Course;CREATE VIEW DS AS(SELECT Enroll.CID, COUNT(*) AS cntFROM Magic, Enroll WHERE Magic.CID = Enroll.CIDGROUP BY Enroll.CID) UNION(SELECT Magic.CID, 0 AS cnt FROM MagicWHERE Magic.CID NOT IN (SELECT CID FROM Enroll);SELECT Supp_Course.CID FROM Supp_Course, DSWHERE Supp_Course.CID = DS.CIDAND min_enroll > DS.cnt;Collect bindingsEvaluate the subquerywith bindingsFinally, refinethe outer query12Heuristics- vs. cost-based optimization Heuristics-based optimization Apply heuristics to rewrite plans into cheaper ones Cost-based optimization Rewrite logical plan to combine “blocks” as much as possible Optimize query block by block• Enumerate logical plans (already covered)• Estimate the cost of plans• Pick a plan with acceptable cost Focus: select-project-join blocks313Cost estimationPROJECT (title)MERGE-JOIN (CID)SCAN (Course)SORT (CID)MERGE-JOIN (SID)SORT (SID)Physical plan example:Input to SORT(CID): We have: cost estimation for each operator Example: SORT(CID) takes 2 × B(input)•But what is B(input)?We need: size of intermediate resultsSCAN (Enroll)SORT (SID)SCAN (Student)FILTER (name = “Bart”)14Selections with equality predicates Q: σA = vR Suppose the following information is available Size of R: |R| Number of distinct A values in R: |πAR| Assumptions Values of A are uniformly distributed in R Values of v in Q are uniformly distributed over all R.Avalues |Q| ≈ |R| ⁄ |πAR| Selectivity factor of (A = v) is 1 ⁄ |πAR|15Conjunctive predicates Q: σA = u and B = vR Additional assumptions (A = u) and (B = v) are independent• Counterexample: major and advisor No “over”-selection• Counterexample: A is the key|Q| ≈ |R| ⁄ (|πAR| · |πBR|) Reduce total size by all selectivity factors16Negated and disjunctive predicates Q: σA ≠ vR |Q| ≈ |R| · (1 – 1 ⁄ |πAR|)• Selectivity factor of ¬ p is (1 – selectivity factor of p)Q: σA = u or B = vR |Q| ≈ |R| · (1 ⁄ |πAR| + 1 ⁄ |πBR|)?• No! Tuples satisfying (A = u) and (B = v) are counted twice |Q| ≈ |R| · (1 –(1 –1 ⁄ |πAR|) · (1 – 1 ⁄ |πBR|))• Intuition: (A = u) or (B = v) is equivalent to¬ ( ¬ (A = u) AND ¬ (B = v))17Range predicates Q: σA > vR Not enough information! Just pick, say, |Q| ≈ |R| · 1 ⁄ 3 With more information Largest R.A value: high(R.A) Smallest R.A value: low(R.A) |Q| ≈ |R| · (high(R.A) – v) ⁄ (high(R.A) – low(R.A)) In practice: sometimes the second highest and lowest are used instead• The highest and the lowest are
View Full Document