DOC PREVIEW
UW CSE 444 - Query Optimization

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 21: Query Optimization (3) Friday, May 21, 2010 Dan Suciu -- 444 Spring 2010 1Announcement • No Homework 4 • BUT PLEASE: – Study the remaining material – Do good job on Project 4 Dan Suciu -- 444 Spring 2010 2Dan Suciu -- 444 Spring 2010 3 Outline • Search space • Algorithms for enumerating query plans • Estimating the cost of a query planDan Suciu -- 444 Spring 2010 4 Computing the Cost of a Plan • Collect statistical summaries of stored data • Estimate size in a bottom-up fashion • Estimate cost by using the estimated sizeDan Suciu -- 444 Spring 2010 5 Statistics on Base Data • Collected information for each relation – Number of tuples (cardinality) – Indexes, number of keys in the index – Number of physical pages, clustering info – Statistical information on attributes • Min value, max value, number distinct values • Histograms – Correlations between columns (hard) • Collection approach: periodic, using samplingSize Estimation Problem Dan Suciu -- 444 Spring 2010 6 S = SELECT list! FROM R1, …, Rn ! WHERE cond1 AND cond2 AND . . . AND condk"Given T(R1), T(R2), …, T(Rn) Estimate T(S) How can we do this ? Note: doesn’t have to be exact.Size Estimation Problem Dan Suciu -- 444 Spring 2010 7 S = SELECT list! FROM R1, …, Rn ! WHERE cond1 AND cond2 AND . . . AND condk"Remark: T(S) ≤ T(R1) × T(R2) × … × T(Rn)Selectivity Factor • Each condition cond reduces the size by some factor called selectivity factor • Assuming independence, multiply the selectivity factors Dan Suciu -- 444 Spring 2010 8Example Dan Suciu -- 444 Spring 2010 9 SELECT * FROM R, S, T WHERE R.B=S.B and S.C=T.C and R.A<40 R(A,B) S(B,C) T(C,D) T(R) = 30k, T(S) = 200k, T(T) = 10k Selectivity of R.B = S.B is 1/3 Selectivity of S.C = T.C is 1/10 Selectivity of R.A < 40 is ½ What is the estimated size of the query output ?Rule of Thumb • If selectivities are unknown, then: selectivity factor = 1/10 [System R, 1979] Dan Suciu -- 444 Spring 2010 1011"Selectivities from Statistics • Condition is A = c /* value selection on R */ – Selectivity = 1/V(R,A) • Condition is A < c /* range selection on R */ – Selectivity = (c - Low(R, A))/(High(R,A) - Low(R,A))T(R) • Condition is A = B /* R ⨝A=B S */ – Selectivity = 1 / max(V(R,A),V(S,A)) – (will explain next) Dan Suciu -- 444 Spring 201012"Selectivity of R ⨝A=B S Assumptions: • Containment of values: if V(R,A) <= V(S,B), then the set of A values of R is included in the set of B values of S – Note: this indeed holds when A is a foreign key in R, and B is a key in S • Preservation of values: for any other attribute B, V(R ⨝A S, B) = V(R, B) (or V(S, B)) Dan Suciu -- 444 Spring 201013"Selectivity of R ⨝A=B S Assume V(R,A) <= V(S,B) • Each tuple t in R joins with T(S)/V(S,B) tuple(s) in S • Hence T(R ⨝A=B S) = T(R) T(S) / V(S,B) In general: T(R ⨝A=B S) = T(R) T(S) / max(V(R,A),V(S,B)) Dan Suciu -- 444 Spring 201014"Size Estimation for Join Example: • T(R) = 10000, T(S) = 20000 • V(R,A) = 100, V(S,B) = 200 • How large is R ⨝A=B S ? Dan Suciu -- 444 Spring 201015 Histograms • Statistics on data maintained by the RDBMS • Makes size estimation much more accurate (hence, cost estimations are more accurate) Dan Suciu -- 444 Spring 2010Histograms Dan Suciu -- 444 Spring 2010 16 Employee(ssn, name, age) T(Employee) = 25000, V(Empolyee, age) = 50 min(age) = 19, max(age) = 68 σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?Histograms Dan Suciu -- 444 Spring 2010 17 Employee(ssn, name, age) T(Employee) = 25000, V(Empolyee, age) = 50 min(age) = 19, max(age) = 68 σage=48(Empolyee) = ? Estimate = 25000 / 50 = 500 Estimate = 25000 * 6 / 60 = 2500 σage>28 and age<35(Empolyee) = ?Histograms Dan Suciu -- 444 Spring 2010 18 Age:" 0..20" 20..29" 30-39" 40-49" 50-59" > 60"Tuples" 200" 800" 5000" 12000" 6500" 500"Employee(ssn, name, age) T(Employee) = 25000, V(Empolyee, age) = 50 min(age) = 19, max(age) = 68 σage=48(Empolyee) = ? σage>28 and age<35(Empolyee) = ?Histograms 19 Employee(ssn, name, age) T(Employee) = 25000, V(Empolyee, age) = 50 min(age) = 19, max(age) = 68 σage=48(Empolyee) = ? Estimate = 1200 Estimate = 2*80 + 5*500 = 2660 σage>28 and age<35(Empolyee) = ? Age:" 0..20" 20..29" 30-39" 40-49" 50-59" > 60"Tuples" 200" 800" 5000" 12000" 6500" 500"Types of Histograms • How should we determine the bucket boundaries in a histogram ? Dan Suciu -- 444 Spring 2010 20Types of Histograms • How should we determine the bucket boundaries in a histogram ? • Eq-Width • Eq-Depth • Compressed Dan Suciu -- 444 Spring 2010 21Histograms Dan Suciu -- 444 Spring 2010 22 Age:" 0..20" 20..29" 30-39" 40-49" 50-59" > 60"Tuples" 200" 800" 5000" 12000" 6500" 500"Employee(ssn, name, age) Age:" 0..20" 20..29" 30-39" 40-49" 50-59" > 60"Tuples" 1800" 2000" 2100" 2200" 1900" 1800"Eq-width: Eq-depth: Compressed: store separately some highly frequent values: (48,1900)Difficult Questions on Histograms • Small number of buckets – Hundreds, or thousands, but not more – WHY ? • Not updated during database update, but recomputed periodically – WHY ? • Multidimensional histograms rarely used – WHY ? Dan Suciu -- 444 Spring 2010 23Summary of Query Optimization • Three parts: – search space, algorithms, size/cost estimation • Ideal goal: find optimal plan. But – Impossible to estimate accurately – Impossible to search the entire space • Goal of today’s optimizers: – Avoid very bad plans Dan Suciu -- 444 Spring 2010


View Full Document

UW CSE 444 - Query Optimization

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Query Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?