CORNELL CS 432 - Evaluating Relational Operators: Part II

Unformatted text preview:

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1Evaluating Relational Operators: Part IIDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2Relational Operators Select Project Join Set operations (union, intersect, except) AggregationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3Example No indices on Sailor or ReservesSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4Tuple Nested Loop Joinforeach tuple r in R doforeach tuple s in S doif r.sid == s.sid then add <r, s> to result R is “outer” relation S is “inner” relationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5Analysis Assume M pages in R, pRtuples per page• M = 1000, pR= 100 N pages in S, pStuples per page Select• N = 500, pS= 80 Total cost = M + pR* M * N Main problem: depends on # tuples per page Ignore cost of writing out result Same for all join methodsDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6Page Nested Loop Joinforeach page p1 in R doforeach page p2 in S doforeach r in p1 doforeach s in p2 do if r.sid == s.sid then add <r, s> to result R is “outer” relation S is “inner” relationDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7Analysis Assume M pages in R, pRtuples per page• M = 1000, pR= 100 N pages in S, pStuples per page Select• N = 500, pS= 80 Total cost = M + M * N Main problem: does not use all buffer pages Note: Smaller relation should be “outer” Better for S to be “outer” in this case!Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8Block Nested Loops Join Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R. For each matching tuple r in R-block, s in S-page, add <r, s> to result. Then read next R-block, scan S, etc.. . .. . .R & SHash table for block of R(k < B-1 pages)Input buffer for SOutput buffer. . .Join ResultDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9Examples of Block Nested Loops Cost: Scan of outer + #outer blocks * scan of inner #outer blocks = With Reserves (R) as outer, and 100 page blocks: Cost of scanning R is 1000 I/Os; a total of 10 blocks. Per block of R, we scan Sailors (S); 10*500 I/Os. With 100-page block of Sailors as outer: Cost of scanning S is 500 I/Os; a total of 5 blocks. Per block of S, we scan Reserves; 5*1000 I/Os. With sequential reads considered, analysis changes: may be best to divide buffers evenly between R and S. # /of pages of outer blocksizeDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10Example Hash index on Sailor.sidSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner and exploit the index. Cost: M + ( (M*pR) * cost of finding matching S tuples)  Cost of finding matching tuples depends on type of index B+-tree or hash Clustered or unclusteredforeach tuple r in R doforeach tuple s in S where ri== sj doadd <r, s> to resultDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12Example B+-tree index on Sailor.sidSELECT *FROM Reserves R, Sailor S,WHERE R.sid > S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13Example No indices on Sailor or ReservesSELECT *FROM Reserves R, Sailor S,WHERE R.sid = S.sidDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14Sort-Merge Join Sort R on the join attributes Sort S on the join attributes Merge sorted relations to produce join result Advance r in R until r.sid >= s.sid Advance s in S until s.sid >= r.sid If r.sid = s.sid• All R tuples with same value as r.sid is current R group• All S tuples with same value as s.sid is current S group• Output all <rg, sg> pairs, where rg is in current R group, sg is in current S group RepeatDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15Example of Sort-Merge Joinsid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0sid bid day rname28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustinDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16Analysis Assume M pages in R, pRtuples per page N pages in S, pStuples per page Select Total cost = M log M + N log N + (M + N) Note: (M + N) could be (M * N) in worst case Unlikely! With 35, 100 or 300 buffer pages, both Reserves and Sailors can be sorted in 2 passes Total join cost: 7500 Equivalent BNL cost: 2500 to 15000Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17Refinement of Sort-Merge Join We can combine the merging phases in the sorting of R and S with the merging required for the join. Assume B > , where L is the size of larger relation Use refinement that produces runs of length 2B in Phase 1 #runs of each relation is < B/2. Allocate 1 page per run of each relation, and `merge’ while checking the join condition. Cost: read+write each relation in Pass 0 + read each relation (only) in merging pass = 3 (M + N) In example, cost goes down from 7500 to 4500 I/Os. In practice, cost of sort-merge join, like the cost of external sorting, is linear.LDatabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18Hash-Join Partition both relations using hash fn h: R tuples in partition i will only match S tuples in partition i. Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.Partitionsof R & SInput bufferfor SiHash table for partitionRi (k < B-1 pages)B main memory buffersDiskOutput bufferDiskJoin Resulthashfnh2h2B main memory buffersDiskDiskOriginal RelationOUTPUT2INPUT1hashfunctionhB-1Partitions12B-1. . .Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19Analysis (without recursive paritioning) Assumptions # partitions = B –1 B-2 > size of largest partition (to avoid partitioning again) Required memory M/(B-1) < B-2, i.e., B must be > M corresponds to smaller relation  In partitioning


View Full Document

CORNELL CS 432 - Evaluating Relational Operators: Part II

Download Evaluating Relational Operators: Part II
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Evaluating Relational Operators: Part II and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Evaluating Relational Operators: Part II 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?