CORNELL CS 432 - Query Optimization - D2093609

Home> Schools> Cornell University> Computer Science (CS) > CS 432> Query Optimization

CORNELL CS 432 - Query Optimization

Course Cs 432- Intro to Database Systems

Pages 6

Download Save

Unformatted text preview:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1Query OptimizationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2Schema for Examples Similar to old schema; rname added for variations. Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3Motivating Example Cost: 500+500*1000 I/Os By no means the worst plan!  But can do better (how?)SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid ANDR.bid=100 AND S.rating>5ReservesSailorssid=sidbid=100 rating > 5snameRA Tree:ReservesSailorssid=sidbid=100 rating > 5sname(Page Nested Loops)(On-the-fly)(On-the-fly)Plan:Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4Alternative Plans 1 (No Indexes) Main difference: push selects. With 5 buffers, cost of plan: Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). Sort T1 (2*2*10), sort T2 (2*3*250), merge (10+250) Total: 3560 page I/Os. If we used BNL join, join cost = 10+4*250, total cost = 2770. If we `push’ projections, T1 has only sid, T2 only sid and sname: T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < 2000.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5Alternative Plans 2With Indexes With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. INL with pipelining (outer is not materialized). Decision not to push rating>5 before the join is based on availability of sid index on Sailors. Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. Join column sid is a key for Sailors.–At most one matching tuple, unclustered index on sid OK.–Projecting out unnecessary fields from outer doesn’t help.ReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Use hashindex; donot writeresult to temp)(Index Nested Loops,with pipelining )(On-the-fly)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull’interface: when an operator is `pulled’ for the next output tuples, it `pulls’ on its inputs and computes them. Two main issues: For a given query, what plans are considered?• Algorithm to search plan space for cheapest (estimated) plan. How is the cost of a plan estimated? Ideally: Want to find best plan. Practically: Avoid worst plans! We will study the System R approach.Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7Outline Relational algebra equivalences Statistics and size estimation Plan enumeration and cost estimation Nested queriesDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8Relational Algebra Equivalences Allow us to choose different join orders and to `push’ selections and projections ahead of joins. Selections: (Cascade)()()()σ σ σc cn c cnR R1 1∧ ∧≡......()()()()σ σ σ σc c c cR R1 2 2 1≡(Commute) Projections:( ) ( )()()π π πa a anR R1 1≡ ...(Cascade) Joins: R (S T) (R S) T   ≡(Associative) (R S) (S R) ≡(Commute)R (S T) (T R) S Show that: ≡    Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9More Equivalences A projection commutes with a selection that only uses attributes retained by the projection. Selection between attributes of the two arguments of a cross-product converts cross-product to a join. A selection on just attributes of R commutes with R S. (i.e., (R S) (R) S ) Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection. σ  σ≡ Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10Outline Relational algebra equivalences Statistics and size estimation Plan enumeration and cost estimation Nested queriesDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11Example PlanReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12Statistics and Catalogs Need information about the relations and indexes involved. Catalogs typically contain at least: # tuples (NTuples) and # pages (NPages) for each relation. # distinct key values (NKeys) and NPages for each index. Index height, low/high key values (Low/High) for each tree index. Catalogs updated periodically. Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. More detailed information (e.g., histograms of the values in some field) are sometimes stored.Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13Example PlanReservesSailorssid=sidbid=100 sname(On-the-fly)rating > 5(Scan;write to temp T1)(Scan;write totemp T2)(Sort-Merge Join)Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14Size Estimation and Reduction Factors Consider a query block: What is maximum # tuples possible in result? Reduction factor (RF) associated with each term reflects the impact of the term in reducing result size. Resultcardinality = Max # tuples * product of all RF’s. Implicit assumption that terms are independent! Term col=value has RF 1/NKeys(I), given index I on col Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2)) Term col>value has RF (High(I)-value)/(High(I)-Low(I))SELECT attribute listFROM relation listWHERE term1 AND ... AND termkDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15Reduction Factors & Histograms For better estimation, use a histogramequiwidthNo. of Values2 3 3 1 8 2 1Value0-.99 1-1.99 2-2.993-3.994-4.99

View Full Document


School:
Email:
New Password:
Confirm Password:

CORNELL CS 432 - Query Optimization

Sign up for free to view:

Please select your school