Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1Evaluating Relational Operators: Part IIDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2Relational Operators Select Project Join Set operations (union, intersect, except) AggregationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3Example No indices on Sailor or ReservesSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4Tuple Nested Loop Joinforeach tuple r in R doforeach tuple s in S doif r.sid == s.sid then add <r, s> to result R is “outer” relation S is “inner” relationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5Analysis Assume M pages in R, pRtuples per page• M = 1000, pR= 100 N pages in S, pStuples per page Select• N = 500, pS= 80 Total cost = M + pR* M * N Main problem: depends on # tuples per page Ignore cost of writing out result Same for all join methodsDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6Page Nested Loop Joinforeach page p1 in R doforeach page p2 in S doforeach r in p1 doforeach s in p2 do if r.sid == s.sid then add <r, s> to result R is “outer” relation S is “inner” relationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7Analysis Assume M pages in R, pRtuples per page• M = 1000, pR= 100 N pages in S, pStuples per page Select• N = 500, pS= 80 Total cost = M + M * N Main problem: does not use all buffer pages Note: Smaller relation should be “outer” Better for S to be “outer” in this case!Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8Block Nested Loops Join Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R. For each matching tuple r in R-block, s in S-page, add <r, s> to result. Then read next R-block, scan S, etc.. . .. . .R & SHash table for block of R(k < B-1 pages)Input buffer for SOutput buffer. . .Join ResultDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9Examples of Block Nested Loops Cost: Scan of outer + #outer blocks * scan of inner #outer blocks = With Reserves (R) as outer, and 100 page blocks: Cost of scanning R is 1000 I/Os; a total of 10 blocks. Per block of R, we scan Sailors (S); 10*500 I/Os. With 100-page block of Sailors as outer: Cost of scanning S is 500 I/Os; a total of 5 blocks. Per block of S, we scan Reserves; 5*1000 I/Os. With sequential reads considered, analysis changes: may be best to divide buffers evenly between R and S. # /of pages of outer blocksizeDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10Example Hash index on Sailor.sidSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner and exploit the index. Cost: M + ( (M*pR) * cost of finding matching S tuples) Cost of finding matching tuples depends on type of index B+-tree or hash Clustered or unclusteredforeach tuple r in R doforeach tuple s in S where ri== sj doadd <r, s> to resultDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12Example B+-tree index on Sailor.sidSELECT *FROM Reserves R, Sailor S,WHERE R.sid > S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13Example No indices on Sailor or ReservesSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14Sort-Merge Join Sort R on the join attributes Sort S on the join attributes Merge sorted relations to produce join result Advance r in R until r.sid >= s.sid Advance s in S until s.sid >= r.sid If r.sid = s.sid• All R tuples with same value as r.sid is current R group• All S tuples with same value as s.sid is current S group• Output all <rg, sg> pairs, where rg is in current R group, sg is in current S group RepeatDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15Example of Sort-Merge Joinsid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0sid bid day rname28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustinDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16Analysis Assume M pages in R, pRtuples per page N pages in S, pStuples per page Select Total cost = M log M + N log N + (M + N) Note: (M + N) could be (M * N) in worst case Unlikely! With 35, 100 or 300 buffer pages, both Reserves and Sailors can be sorted in 2 passes Total join cost: 7500 Equivalent BNL cost: 2500 to 15000Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17Refinement of Sort-Merge Join We can combine the merging phases in the sorting of R and S with the merging required for the join. Assume B > , where L is the size of larger relation Use refinement that produces runs of length 2B in Phase 1 #runs of each relation is < B/2. Allocate 1 page per run of each relation, and `merge’ while checking the join condition. Cost: read+write each relation in Pass 0 + read each relation (only) in merging pass = 3 (M + N) In example, cost goes down from 7500 to 4500 I/Os. In practice, cost of sort-merge join, like the cost of external sorting, is linear.LDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18Hash-Join Partition both relations using hash fn h: R tuples in partition i will only match S tuples in partition i. Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.Partitionsof R & SInput bufferfor SiHash table for partitionRi (k < B-1 pages)B main memory buffersDiskOutput bufferDiskJoin Resulthashfnh2h2B main memory buffersDiskDiskOriginal RelationOUTPUT2INPUT1hashfunctionhB-1Partitions12B-1. . .Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19Analysis (without recursive paritioning) Assumptions # partitions = B –1 B-2 > size of largest partition (to avoid partitioning again) Required memory M/(B-1) < B-2, i.e., B must be > M corresponds to smaller relation In partitioning
View Full Document