Lecture 21: Query Optimization (1) November 17, 2010 Bill Howe -- 444 Fall 2010 1Administrivia (Preview for Friday) • For project 4, students are expected (but not required) to work in pairs. • Ideally you should pair up by end of day Monday. • That way, Michael can give each group their shared Amazon AWS grant code by Tuesday. • Once you run out of money on your AWS grant, your *personal* credit cards will be charged! • Because students are asked to do interactive rather than batch jobs on AWS, they should remember to explicitly kill every job. 2 Bill Howe -- 444 Fall 2010Where We Are • We are learning how a DBMS executes a query • What we learned so far – How data is stored and indexed – Logical query plans and physical operators • This week: – How to select logical & physical query plans 3 Bill Howe -- 444 Fall 20104 Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ Give a relational algebra expression for this query Review Bill Howe -- 444 Fall 20105 sname(scity=‘Seattle’ sstate=‘WA’ pno=2 (Supplier sid = sid Supply)) Relational Algebra Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Bill Howe -- 444 Fall 20106 Supplier Supply sid = sid scity=‘Seattle’ sstate=‘WA’ pno=2 sname Relational Algebra Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity)Key Idea: Algebraic Optimization N = ((z*2)+((z*3)+y))/x Given x = 1, y = 0, and z = 4, solve for N What order did you perform the operations? Bill Howe -- 444 Fall 2010Key Idea: Algebraic Optimization N = ((z*2)+((z*3)+0))/1 Given x = 1, y = 0, and z = 4, solve for N again, but now assume: * costs 10 units + costs 2 units / costs 50 units Which execution plan offers the lowest cost? Bill Howe -- 444 Fall 2010Key Idea: Algebraic Optimization N = ((z*2)+((z*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*z two operations instead of five, no division operator Bill Howe -- 444 Fall 201010 Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA ’ Give a different relational algebra expression for this query sname(scity=‘Seattle’ sstate=‘WA’ pno=2 (Supplier sid = sid Supply)) Bill Howe -- 444 Fall 201011 Query Optimization Goal • For a query – There exist many logical and physical query plans – Query optimizer needs to pick a good one Bill Howe -- 444 Fall 201012 Example • Some statistics – T(Supplier) = 1000 records – T(Supply) = 10,000 records – B(Supplier) = 100 pages – B(Supply) = 100 pages – V(Supplier,scity) = 20, V(Supplier,state) = 10 – V(Supply,pno) = 2,500 – Both relations are clustered • M = 10 Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ Bill Howe -- 444 Fall 201013 Physical Query Plan 1 Supplier Supply sid = sid scity=‘Seattle’ sstate=‘WA’ pno=2 sname (File scan) (File scan) (Block-nested loop) (On the fly) (On the fly) Selection and project on-the-fly -> No additional cost. Total cost of plan is thus cost of join: = B(Supplier)+B(Supplier)*B(Supply)/M = 100 + 10 * 100 = 1,100 I/Os B(Supplier) = 100 B(Supply) = 100 T(Supplier) = 1000 T(Supply) = 10,000 V(Supplier,scity) = 20 V(Supplier,state) = 10 V(Supply,pno) = 2,500 M = 10 Bill Howe -- 444 Fall 201014 Supplier Supply sid = sid ascity=‘Seattle’ sstate=‘WA’ sname (File scan) (File scan) (Sort-merge join) (Scan write to T2) (On the fly) b pno=2 (Scan write to T1) Physical Query Plan 2 Total cost = 100 + 100 * 1/20 * 1/10 (a) + 100 + 100 * 1/2500 (b) + 2 (c) + 0 (d) Total cost 204 I/Os c d B(Supplier) = 100 B(Supply) = 100 T(Supplier) = 1000 T(Supply) = 10,000 V(Supplier,scity) = 20 V(Supplier,state) = 10 V(Supply,pno) = 2,500 M = 10 Bill Howe -- 444 Fall 2010Supply Supplier sid = sid scity=‘Seattle’ sstate=‘WA’ sname (Index nested loop) (Index lookup on sid) Doesn’t matter if clustered or not (On the fly) a pno=2 (Index lookup on pno ) Assume: clustered Physical Query Plan 3 Total cost = 1 (a) + 4 (b) + 0 (c) + 0 (d) Total cost 5 I/Os (Use index) b c d (On the fly) 4 tuples 15 B(Supplier) = 100 B(Supply) = 100 T(Supplier) = 1000 T(Supply) = 10,000 V(Supplier,scity) = 20 V(Supplier,state) = 10 V(Supply,pno) = 2,500 M = 1016 Simplifications • In the previous examples, we assumed that all index pages were in memory • When this is not the case, we need to add the cost of fetching index pages from disk Bill Howe -- 444 Fall 201017 Query Optimization Goal • For a query – There exist many logical and physical query plans – Query optimizer needs to pick a good one Bill Howe -- 444 Fall 2010 How do we choose a good one?18 Query Optimization Algorithm • Enumerate alternative plans • Compute estimated cost of each plan – Compute number of I/Os – Compute CPU cost • Choose plan with lowest cost – This is called cost-based optimization Bill Howe -- 444 Fall 201019 Lessons • Need to consider several physical plan – even for one, simple logical plan • No magic “best” plan: depends on the data • In order to make the right choice – need to have statistics over the data – the B’s, the T’s, the V’s Bill Howe -- 444 Fall 201020 Outline • Search space (Today) • Algorithm for enumerating query plans • Estimating the cost of a query plan Bill Howe -- 444 Fall 201021 Relational Algebra Equivalences • Selections – Commutative: c1(c2(R)) same as c2(c1(R)) – Cascading: c1c2(R) same as c2(c1(R)) • Projections • Joins – Commutative : R ⋈ S same as S ⋈ R – Associative: R ⋈ (S ⋈ T) same as (R ⋈ S) ⋈ T Bill Howe -- 444 Fall 2010Left-Deep Plans and Bushy Plans 22 R3 R1 R2 R4 R3 R1 R4 R2 Left-deep plan Bushy plan Bill Howe -- 444 Fall 2010Commutativity, Associativity, Distributivity 23
View Full Document