DOC PREVIEW
Duke CPS 116 - Query Optimization

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Query OptimizationCPS 116Introduction to Database Systems2Announcements (November 20) Homework #4 (last one and short) assigned today Due next Thursday3Query optimization One logical plan → “best” physical plan Questions How to enumerate possible plans How to estimate costs How to pick the “best” one Often the goal is not getting the optimum plan, but instead avoiding the horrible ones1 second 1 hour1 minuteAny of these will do4Plan enumeration in relational algebra Apply relational algebra equivalences) Join reordering: × and ! are associative and commutative (except column ordering, but that is unimportant)!!R ST!!S RT!!R TS…===5More relational algebra equivalences Convert σp-× to/from !p: σp(R × S) = R !pS Merge/split σ’s: σp1(σp2R) = σp1 ∧ p2R Merge/split π’s: πL1(πL2R) = πL1R, where L1 ⊆ L2 Push down/pull up σ:σp∧pr∧ps(R!p’S)=(σprR)!p∧p’(σpsS),whereσp∧pr∧ps(R!p’S) (σprR) !p∧p’(σpsS), where pr is a predicate involving only R columns ps is a predicate involving only S columns p and p’ are predicates involving both R and S columns Push down π: πL(σpR) = πL(σp(πL L’R)), where L’ is the set of columns referenced by p that are not in L Many more (seemingly trivial) equivalences… Can be systematically used to transform a plan to new ones6Relational query rewrite exampleπtitleσStudent.name=“Bart” ∧ Student.SID = Enroll.SID ∧ Enroll.CID = Course.CID×EnrollCourse×StudentπtitleσEnroll.CID = Course.CIDCtσ×t!×EnrollCourse×StudentσStudent.SID = Enroll.SIDσStudent.name = “Bart”Push down σπtitle!Enroll.CID = Course.CIDEnrollCourseStudent!Student.SID = Enroll.SIDσname = “Bart”Convert σp-×to !p27Heuristics-based query optimization Start with a logical plan Push selections/projections down as much as possible Why? Reduce the size of intermediate results Why not? May be expensive; maybe joins filter better Join smaller relations first, and avoid cross product Why? Reduce the size of intermediate results Why not? Size depends on join selectivity too Convert the transformed logical plan to a physical plan (by choosing appropriate physical operators)8SQL query rewrite More complicated—subqueries and views divide a query into nested “blocks” Processing each block separately forces particular join methods and join order Even if the plan is optimal for each block, it may not be optimal for the entire query Unnest query: convert subqueries/views to joins) We can just deal with select-project-join queries Where the clean rules of relational algebra apply9SQL query rewrite example SELECT nameFROM StudentWHERE SID = ANY (SELECT SID FROM Enroll); SELECT nameFROM Student, EnrollWHERE St d t SID E ll SIDWHERE Student.SID = Enroll.SID; Wrong—consider two Bart’s, each taking two classes SELECT nameFROM (SELECT DISTINCT Student.SID, nameFROM Student, EnrollWHERE Student.SID = Enroll.SID); Right—assuming Student.SID is a key10Dealing with correlated subqueries SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); SELECT CIDFROM Course (SELECT CID COUNT(*) AS cntFROM Course, (SELECT CID, COUNT(*) AS cntFROM Enroll GROUP BY CID) tWHERE t.CID = Course.CID AND min_enroll > t.cntAND title LIKE ’CPS%’; New subquery is inefficient (computes enrollment for all courses) Suppose a CPS class is empty?11“Magic” decorrelation SELECT CID FROM CourseWHERE title LIKE ’CPS%’AND min_enroll > (SELECT COUNT(*) FROM EnrollWHERE Enroll.CID = Course.CID); CREATE VIEW Supp_Course ASSELECT * FROM Course WHERE title LIKE ’CPS%’;CREATE VIEW Magic ASProcess the outer querywithout the subqueryCll bidigSELECT DISTINCT CID FROM Supp_Course;CREATE VIEW DS AS(SELECT Enroll.CID, COUNT(*) AS cntFROM Magic, Enroll WHERE Magic.CID = Enroll.CIDGROUP BY Enroll.CID) UNION(SELECT Magic.CID, 0 AS cnt FROM MagicWHERE Magic.CID NOT IN (SELECT CID FROM Enroll);SELECT Supp_Course.CID FROM Supp_Course, DSWHERE Supp_Course.CID = DS.CIDAND min_enroll > DS.cnt;Collect bindingsEvaluate the subquerywith bindingsFinally, refinethe outer query12Heuristics- vs. cost-based optimization Heuristics-based optimization Apply heuristics to rewrite plans into cheaper ones Cost-based optimization Rewrite logical plan to combine “blocks” as much as possible Optimize query block by block• Enumerate logical plans (already covered)• Estimate the cost of plans• Pick a plan with acceptable cost Focus: select-project-join blocks313Cost estimationPROJECT (title)MERGE-JOIN (CID)SCAN (Course)SORT (CID)MERGE-JOIN (SID)SORT (SID)Physical plan example:Input to SORT(CID): We have: cost estimation for each operator Example: SORT(CID) takes 2 × B(input)•But what is B(input)?We need: size of intermediate resultsSCAN (Enroll)SORT (SID)SCAN (Student)FILTER (name = “Bart”)14Selections with equality predicates Q: σA = vR Suppose the following information is available Size of R: |R| Number of distinct A values in R: |πAR| Assumptions Values of A are uniformly distributed in R Values of v in Q are uniformly distributed over all R.Avalues |Q| ≈ |R| ⁄ |πAR| Selectivity factor of (A = v) is 1 ⁄ |πAR|15Conjunctive predicates Q: σA = u and B = vR Additional assumptions (A = u) and (B = v) are independent• Counterexample: major and advisor No “over”-selection• Counterexample: A is the key|Q| ≈ |R| ⁄ (|πAR| · |πBR|) Reduce total size by all selectivity factors16Negated and disjunctive predicates Q: σA ≠ vR |Q| ≈ |R| · (1 – 1 ⁄ |πAR|)• Selectivity factor of ¬ p is (1 – selectivity factor of p)Q: σA = u or B = vR |Q| ≈ |R| · (1 ⁄ |πAR| + 1 ⁄ |πBR|)?• No! Tuples satisfying (A = u) and (B = v) are counted twice |Q| ≈ |R| · (1 –(1 –1 ⁄ |πAR|) · (1 – 1 ⁄ |πBR|))• Intuition: (A = u) or (B = v) is equivalent to¬ ( ¬ (A = u) AND ¬ (B = v))17Range predicates Q: σA > vR Not enough information! Just pick, say, |Q| ≈ |R| · 1 ⁄ 3 With more information Largest R.A value: high(R.A) Smallest R.A value: low(R.A) |Q| ≈ |R| · (high(R.A) – v) ⁄ (high(R.A) – low(R.A)) In practice: sometimes the second highest and lowest are used instead• The highest and the lowest are


View Full Document

Duke CPS 116 - Query Optimization

Documents in this Course
Part I

Part I

8 pages

XSLT

XSLT

4 pages

XSLT

XSLT

8 pages

Part I

Part I

8 pages

XSLT

XSLT

8 pages

Load more
Download Query Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?