Relational algebra and query executionCSE 444, fall 2010 — section 8 worksheetNovember 18, 20101 Relational algebra warm-up1. Given this database schema:• Product (pid, name, price)• Purchase (pid, cid, store)• Customer (cid, name, city)draw the logical query plan for each of the following SQL queries.(a) SELECT DISTINCT x.storeFROM Purchase x, Customer yWHERE x.cid = y.cidand y.city = ’Seattle’1(b) SELECT z.city, sum(x.price)FROM Product x, Purchase y, Customer zWHERE x.pid = y.pid and y.cid = z.cidand y.store = ’Wal-Mart’GROUP BY z.cityHAVING count(*) > 10022. Write a SQL query that is equivalent to the logical plan below:! "#$%&!'()*+!,!-./!01+(2!*3,*!)4!+01)5,6+7*!*8!*3+!689):,6 !;6,7!%+68<=! !!!%>!V,?@!A$,B%B>&! -$%B:B3 & C$:BDB 9&!%::9!V9EF!AG%H-G%-G:HCG:>9G!33. Consider two tables R(A, B, C) and S (D, E, F), and the logical query plan P:P = σA>9γA,sum(F)(R 1C=DS)Indicate which of the following three query plans are equivalent to P.P1= γA,sum(F)(σA>9R)1C=DγD,sum(F)SP2= γA,sum(F)(σA>9R)1C=DγD,E,sum(F)SP3= γA,sum(F)(σA>9R)1C=DγD,E,sum(F)((σA>9R)1C=D(S))Hint: Draw out each plan in tree form before solving.42 Spring 2009 final, problem 3 (parts a-b)Consider four tables R(a, b, c), S(d, e, f ), T(g, h) , U(i, j, k).(a) Consider the following SQL query:SELECT R.b, avg(U.k) as avgFROM R, S, T, UWHERE R.a = S.dAND S.e = T.gAND T.h = U.iAND U.j = 5AND (R.c + S.f) < 10GROUP BY R.bDraw a logical plan for the query. You may choose any plan as long as it is correct (i.e. no need toworry about efficiency).5(b) Consider the following two physical query plans. Give two reasons why plan B may be faster thanplan A. Explain each reason.A B A.b = B.b σ B.c < 2000 π A.d,B.d (Index nested loop) (B+ tree index on B.b) (On the fly) σ a>10 (B+ tree index on A.a) (Use B+ tree index) (On the fly) A B A.b = B.b σ B.c < 2000 π A.d,B.d (Hash join) (File scan) (On the fly) σ a>10 (File scan) Plan A Plan B63 Summer 2009 final, problem 2This problem and the next concern two relations R(a, b, c) and S(x, y, z) that have the following character-istics:B(R) = 600 B(S) = 800T(R) = 3000 T(S) = 4000V(R, a) = 300 V(S, x) = 100V(R, b) = 100 V(S, y) = 400V(R, c) = 50 V(S, z) = 40We also have M = 1000 (number of memory blocks).Relation R has a clustered index on attribute a and an unclustered index on attribute c. Relation S hasa clustered index on attribute x and an unclustered index on attribute z. All indices are B+ trees.Now consider the following logical query plan for a query involving these two relations.!"# $$$ %&'()* +,-,./ 01* 0223 4(-5 6 78 12 !"#$%&'( )* 97-&:() ;,5<= >)('. ?0@ >7&'/.A BC&. ><7D)5E ('F /C5 '5G/ :7':5<' /H7 <5)(/&7'. I?(* D* :A ('F "?G* =* JA /C(/ C(K5 /C5 87))7H&'- :C(<(:/5<&./&:.L M?IA N O22 M?"A N P22 B?IA N 6222 B?"A N $222 Q?I* (A N 622 Q?"* GA N 122 Q?I* DA N 122 Q?"* =A N $22 Q?I* :A N @2 Q?"* JA N $2 R5 ().7 C(K5 S N 1222 ?',ED5< 78 E5E7<= D)7:T.AU I5)(/&7' I C(. ( :),./5<5F &'F5G 7' (//<&D,/5 ( ('F (' ,':),./5<5F &'F5G 7' (//<&D,/5 :U I5)(/&7' " C(. ( :),./5<5F &'F5G 7' (//<&D,/5 G ('F (' ,':),./5<5F &'F5G 7' (//<&D,/5 JU +)) &'F&:5. (<5 MV /<55.U W7H :7'.&F5< /C5 87))7H&'- )7-&:() ;,5<= >)(' 87< ( ;,5<= &'K7)K&'- /C5.5 /H7 <5)(/&7'.U ?+'.H5< /C5 ;,5./&7'. (D7,/ /C&. ;,5<= 7' /C5 87))7H&'- >(-5A!"#$%&'(J!"#$%()*+,"-.o&'(V!"#%/%01%&'2%,"3%/%04վ%!"5%/%,"6 ! ,(Answer the questions about this query on the following page.)7(a) Write a SQL query that is equivalent to the logical query plan on the previous page.(b) Change or rearrange the original logical query plan to produce one that is equivalent (has the samefinal results), but which is estimated to be significantly faster, if possible. Recall that logical queryoptimization does not consider the final physical operators used to execute the query, but only thingsat the logical level, such as the sizes of relations and estimated sizes of intermediate results.You should include a brief but specific explanation of how much you expect your changes to improvethe speed of the query and why.Draw your new query plan and write your explanation below. If you can clearly and easily showthe changes on the original diagram, you may do so, otherwise draw a new diagram here.84 Summer 2009 final, problem 3This problem uses the same two relations and statistics as the previous one, but for a different operationnot related to the previous query.As before, the relations are R(a, b, c) and S(x, y, z). Here are the statistics again:B(R) = 600 B(S) = 800T(R) = 3000 T(S) = 4000V(R, a) = 300 V(S, x) = 100V(R, b) = 100 V(S, y) = 400V(R, c) = 50 V(S, z) = 40M = 1000 (number of memory blocks)Also as before, relation R has a clustered index on attribute a and an unclustered index on attribute c.Relation S has a clustered index on attribute x and an unclustered index on attribute z. All indices are B+trees.Your job for this problem is to specify and justify a good physical plan for performing the joinR 1a=zS (i.e., join R and S using the condition R.a = S.z)Your answer should specify the physical join operator used (hash, nested loop, sort-merge, or other)and the access methods used to read relations R and S (sequential scan, index, etc.). Be sure to giveessential details: i.e., if you use a hash join, which relations(s) are included in hash tables; if you usenested loops, which relation(s) are accessed in the inner and outer loops, etc.Give the estimated cost of your solution in terms of number of disk I/O operations needed (you shouldignore CPU time and the cost to read any index blocks).You should give a brief justification why this is the best (cheapest) way to implement the join. You donot need to exhaustively analyze all the other possible join implementations and access methods, but youshould give a brief discussion of why your solution is the preferred one compared to the other possibilities.(Write your answer on the next page. You may remove this page for reference if you wish.)9Give your solution and explanation
View Full Document