Duke CPS 116 - Query Optimization - D2366947

Home> Schools> Duke University> (CPS) > CPS 116> Query Optimization

DOC PREVIEW

Duke CPS 116 - Query Optimization

School name Duke University

Course Cps 116- Introduction to Database Systems

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Query OptimizationCPS 116Introduction to Database Systems2Announcements (November 20) Homework #4 (last one and short) assigned today Due next Thursday3Query optimization One logical plan → “best” physical plan Questions How to enumerate possible plans How to estimate costs How to pick the “best” one Often the goal is not getting the optimum plan, but instead avoiding the horrible ones1 second 1 hour1 minuteAny of these will do4Plan enumeration in relational algebra Apply relational algebra equivalences) Join reordering: × and ! are associative and commutative (except column ordering, but that is unimportant)!!R ST!!S RT!!R TS…===5More relational algebra equivalences Convert σp-× to/from !p: σp(R × S) = R !pS Merge/split σ’s: σp1(σp2R) = σp1 ∧ p2R Merge/split π’s: πL1(πL2R) = πL1R, where L1 ⊆ L2 Push down/pull up σ:σp∧pr∧ps(R!p’S)=(σprR)!p∧p’(σpsS),whereσp∧pr∧ps(R!p’S) (σprR) !p∧p’(σpsS), where pr is a predicate involving only R columns ps is a predicate involving only S columns p and p’ are predicates involving both R and S columns Push down π: πL(σpR) = πL(σp(πL L’R)), where L’ is the set of columns referenced by p that are not in L Many more (seemingly trivial) equivalences… Can be systematically used to transform a plan to new ones6Relational query rewrite exampleπtitleσStudent.name=“Bart” ∧ Student.SID = Enroll.SID ∧ Enroll.CID = Course.CID×EnrollCourse×StudentπtitleσEnroll.CID = Course.CIDCtσ×t!×EnrollCourse×StudentσStudent.SID = Enroll.SIDσStudent.name = “Bart”Push down σπtitle!Enroll.CID = Course.CIDEnrollCourseStudent!Student.SID = Enroll.SIDσname = “Bart”Convert σp-×to !p27Heuristics-based query optimization Start with a logical plan Push selections/projections down as much as possible Why? Reduce the size of intermediate results Why not? May be expensive; maybe joins filter better Join smaller relations first, and avoid cross product Why? Reduce the size of intermediate results Why not? Size depends on join selectivity too Convert the transformed logical plan to a physical plan (by choosing appropriate physical operators)8SQL query rewrite More complicated—subqueries and views divide a query into nested “blocks” Processing each block separately forces particular join methods and join order Even if the plan is optimal for each block, it may not be optimal for the entire query Unnest query: convert subqueries/views to joins) We can just deal with select-project-join queries Where the clean rules of relational algebra apply9SQL query rewrite example SELECT nameFROM StudentWHERE SID = ANY (SELECT SID FROM Enroll); SELECT nameFROM Student, EnrollWHERE St d t SID E ll SIDWHERE Student.SID = Enroll.SID; Wrong—consider two Bart’s, each taking two classes SELECT nameFROM (SELECT DISTINCT Student.SID, nameFROM Student, EnrollWHERE Student.SID = Enroll.SID); Right—assuming Student.SID is a key10Dealing with correlated subqueries SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); SELECT CIDFROM Course (SELECT CID COUNT(*) AS cntFROM Course, (SELECT CID, COUNT(*) AS cntFROM Enroll GROUP BY CID) tWHERE t.CID = Course.CID AND min_enroll > t.cntAND title LIKE ’CPS%’; New subquery is inefficient (computes enrollment for all courses) Suppose a CPS class is empty?11“Magic” decorrelation SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); CREATE VIEW Supp_Course ASSELECT * FROM Course WHERE title LIKE ’CPS%’;CREATE VIEW Magic ASProcess the outer querywithout the subqueryCll bidigSELECT DISTINCT CID FROM Supp_Course;CREATE VIEW DS AS(SELECT Enroll.CID, COUNT(*) AS cntFROM Magic, Enroll WHERE Magic.CID = Enroll.CIDGROUP BY Enroll.CID) UNION(SELECT Magic.CID, 0 AS cnt FROM MagicWHERE Magic.CID NOT IN (SELECT CID FROM Enroll);SELECT Supp_Course.CID FROM Supp_Course, DSWHERE Supp_Course.CID = DS.CIDAND min_enroll > DS.cnt;Collect bindingsEvaluate the subquerywith bindingsFinally, refinethe outer query12Heuristics- vs. cost-based optimization Heuristics-based optimization Apply heuristics to rewrite plans into cheaper ones Cost-based optimization Rewrite logical plan to combine “blocks” as much as possible Optimize query block by block• Enumerate logical plans (already covered)• Estimate the cost of plans• Pick a plan with acceptable cost Focus: select-project-join blocks313Cost estimationPROJECT (title)MERGE-JOIN (CID)SCAN (Course)SORT (CID)MERGE-JOIN (SID)SORT (SID)Physical plan example:Input to SORT(CID): We have: cost estimation for each operator Example: SORT(CID) takes 2 × B(input)•But what is B(input)?We need: size of intermediate resultsSCAN (Enroll)SORT (SID)SCAN (Student)FILTER (name = “Bart”)14Selections with equality predicates Q: σA = vR Suppose the following information is available Size of R: |R| Number of distinct A values in R: |πAR| Assumptions Values of A are uniformly distributed in R Values of v in Q are uniformly distributed over all R.Avalues |Q| ≈ |R| ⁄ |πAR| Selectivity factor of (A = v) is 1 ⁄ |πAR|15Conjunctive predicates Q: σA = u and B = vR Additional assumptions (A = u) and (B = v) are independent• Counterexample: major and advisor No “over”-selection• Counterexample: A is the key|Q| ≈ |R| ⁄ (|πAR| · |πBR|) Reduce total size by all selectivity factors16Negated and disjunctive predicates Q: σA ≠ vR |Q| ≈ |R| · (1 – 1 ⁄ |πAR|)• Selectivity factor of ¬ p is (1 – selectivity factor of p)Q: σA = u or B = vR |Q| ≈ |R| · (1 ⁄ |πAR| + 1 ⁄ |πBR|)?• No! Tuples satisfying (A = u) and (B = v) are counted twice |Q| ≈ |R| · (1 –(1 –1 ⁄ |πAR|) · (1 – 1 ⁄ |πBR|))• Intuition: (A = u) or (B = v) is equivalent to¬ ( ¬ (A = u) AND ¬ (B = v))17Range predicates Q: σA > vR Not enough information! Just pick, say, |Q| ≈ |R| · 1 ⁄ 3 With more information Largest R.A value: high(R.A) Smallest R.A value: low(R.A) |Q| ≈ |R| · (high(R.A) – v) ⁄ (high(R.A) – low(R.A)) In practice: sometimes the second highest and lowest are used instead• The highest and the lowest are

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Duke CPS 116 - Query Optimization

Sign up for free to view:

Please select your school