DOC PREVIEW
UW CSE 444 - Overview of Query Optimization

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Database Systems CSE 444 Lecture 20 Overview of Query Optimization CSE 444 Summer 2010 1 Where We Are Back to how a DBMS executes a query What we learned so far How data is stored and indexed lectures 15 and 16 Logical query plans relational algebra lecture 17 Steps involved in processing a query lecture 18 Operator algorithms lecture 19 Today How to select logical physical query plans Chapter Ch t 16 iin th the b book k recommended d d nott required i d CSE 444 Summer 2010 2 Query Optimization Goal For a query There exists many logical and physical query plans Query optimizer needs to pick a good one CSE 444 Summer 2010 3 Query Optimization Algorithm Enumerate alternative plans Compute estimated cost of each plan Compute number of I Os Compute CPU cost Choose plan with lowest cost This is called cost based optimization CSE 444 Summer 2010 4 Outline Search space Algorithm for enumerating query plans Estimating the cost of a query plan CSE 444 Summer 2010 5 Relational Algebra Equivalences Selections Commutative c1 c2 R same as c2 c1 R Cascading c1 c2 R same as c2 c1 R Projections Cascading Joins Commutative R S same as S R Associative R S T same as R S T CSE 444 Summer 2010 6 Left Deep Left Deep Plans and Bushy Plans R2 R4 R3 R1 Left deep Left deep plan R3 R1 R2 R4 Bushy plan CSE 444 Summer 2010 7 Relational Algebra Equivalences Selects projects and joins We can commute and combine all three types of operators We just have to be careful that the fields we need are available when we apply pp y the operator p Relatively straightforward See book 16 2 CSE 444 Summer 2010 8 Search Space Challenges Search space is huge Many possible equivalent trees Many implementations for each operator Many access paths for each relation File Fil scan or index i d matching t hi selection l ti condition diti Cannot consider ALL plans Want search space that includes low cost plans CSE 444 Summer 2010 9 Outline Search space Algorithms for enumerating query plans Estimating the cost of a query plan CSE 444 Summer 2010 10 Key Decisions When selecting a plan some of the most important decisions include Logical plan Can we push selections down Can we push projections or aggregations down What order to use for joins Physical plan What join algorithms to use What access paths to use file scan or index CSE 444 Summer 2010 11 Plan Enumeration Algorithms Rule based vs cost based algorithms g p plans Logical Heuristic based algorithms Use size of intermediate results as cost measure Physical plans Top down algorithms or Bottom up dynamic programming approaches Also called Selinger style optimizers Use heuristics to limit search space CSE 444 Summer 2010 12 Outline Search space Algorithms for enumerating query plans Estimating the cost of a query plan CSE 444 Summer 2010 13 Computing the Cost of a Plan Collect statistical summaries of stored data Compute cost in a bottom up fashion For each operator compute Estimate cost of executing g the operation p Estimate statistical summary of the output data CSE 444 Summer 2010 14 Statistics on Base Data Collected information for each relation Number of tuples cardinality Indexes number of keys in the index Number of physical pages clustering info Statistical information on attributes Min value max value number distinct values Histograms g Correlations between columns hard Collection approach pp p periodic using g sampling p g CSE 444 Summer 2010 15 Retrieving data from Storage Access path a way to retrieve tuples from a table A file scan An index plus a matching selection condition Index matches selection condition if it can be used to retrieve just tuples that satisfy the condition Example Supplier sid sname scity sstate B tree index on scity sstate matches scity scity Seattle Seattle does not match sid 3 does not match sstate WA CSE 444 Summer 2010 16 Access Path Selection Supplier sid sname scity sstate Selection condition sid 300 scity Seattle Indexes I d B t B tree on sid id and d B B tree t on scity it Which access path should we use We should p pick the most selective access p path CSE 444 Summer 2010 17 Access Path Selectivity Access path selectivity is the number of pages retrieved if we use this access path Most selective retrieves fewest pages As we saw earlier for equality predicates Selection on equality a v R V R a of distinct values of attribute a 1 V R a is thus the reduction factor Cl stered inde Clustered index on a a cost B R V R B R V R a a Unclustered index on a cost T R V R a we are ignoring I O cost of index pages for simplicity CSE 444 Summer 2010 18 Selectivity for Range Predicates Selection on range a v R How to compute the selectivity Assume values are uniformly distributed R d ti ffactor Reduction t X X Max R a v Max R a Min R a Clustered index on a cost is B R X Unclustered index on a cost is T R X CSE 444 Summer 2010 19 Back to Our Example Selection condition sid 300 scity Seattle Index I1 B tree on sid clustered Index I2 B B tree tree on scity unclustered Let s assume V Supplier scity 20 Max Supplier sid 1000 Min Supplier sid 1 B Supplier 100 100 T Supplier 1000 Cost I1 B R Max v Max Min 100 700 999 70 Cost I2 T R 1 V Supplier scity 1 V Supplier scity 1000 20 50 CSE 444 Summer 2010 20 Selectivity with Multiple Conditions What if we have an index on multiple attributes Example selection a v1 b v2 R and index on a b How to compute the selectivity Assume A attributes tt ib t are independent i d d t X 1 V R a V R b Clustered index on a b cost B R X cost T R X Unclustered index on a b CSE 444 Summer 2010 21 Computing Cost of an Operator The cost of executing an operator depends On the operator implementation On the input data We learned how to compute this in the previous lecture so we do not repeat it here CSE 444 Summer 2010 22 Statistics on the Output Data Most important piece of information Size of operator result I e I e the number of output tuples Projection output size same as input size Selection multiply input size by reduction factor Si Similar il to t what h t we did for f estimating ti ti access path th selectivity l ti it Assume independence between conditions in the predicate use product of the reduction factors for the terms CSE 444 Summer 2010 23 Estimating Result Sizes For joins R S Take product of cardinalities of relations R and S Apply reduction factors for each term in join condition Terms are of the form column1 column2 Reduction 1 MAX V R column1 V S …


View Full Document

UW CSE 444 - Overview of Query Optimization

Documents in this Course
XML

XML

48 pages

SQL

SQL

25 pages

SQL

SQL

42 pages

Recovery

Recovery

30 pages

SQL

SQL

36 pages

Indexes

Indexes

35 pages

Security

Security

36 pages

Wrap-up

Wrap-up

6 pages

SQL

SQL

37 pages

More SQL

More SQL

48 pages

SQL

SQL

35 pages

XML

XML

46 pages

Triggers

Triggers

26 pages

Load more
Download Overview of Query Optimization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Overview of Query Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Overview of Query Optimization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?