Unformatted text preview:

Quick Review of Apr 22 material Sections 13 1 through 13 3 in text Query Processing take an SQL query and parse translate it into an internal representation optimize it choose an efficient form for the query evaluate it Metadata for query processing Operations and their costs Sel for equality one particular value Range selection projection all with and without indices Complex selections Sorting What has sorting to do with query processing SQL queries can specify the output be sorted several relational operations such as joins can be implemented very efficiently if the input data is sorted first as a result query processing is often concerned with sorting temporary intermediate and final results creating a secondary index on the active relation logical sorting isn t sufficient sequential scans through the data on secondary indices are very inefficient We often need to sort the data physically into order Sorting We differentiate two types of sorting internal sorting the entire relation fits in memory external sorting the relation is too large to fit in memory Internal sorting can use any of a large range of wellestablished sorting algorithms e g Quicksort In databases the most commonly used method for external sorting is the sort merge algorithm based upon Mergesort Sort merge Algorithm create runs phase Load in M consecutive blocks of the relation M is number of blocks that will fit easily in main memory Use some internal sorting algorithm to sort the tuples in those M blocks Write the sorted run to disk Continue with the next M blocks etcetera until finished merge runs phase assuming that the number of runs N is less than M load the first block of each run into memory grab the first tuple lowest value from all the runs and write it to an output buffer page when the last tuple of a block is read grab a new block from that run when the output buffer page is full write it to disk and start a new one continue until all buffer pages are empty Sort merge Algorithm 2 Merge runs phase N M operate on M runs at a time creating runs of length M2 and continue in multiple passes of the Merge operation Cost of sorting b r is the number of blocks occupied by relation r runs phase does one read one write on each block of r cost 2b r total number of runs N b r M number of passes in merge operation 1 if N M otherwise logM 1 b r M during each pass in the merge phase we read the whole relation and write it all out again cost 2b r per pass total cost of merge phase is therefore 2b r logM 1 b r M 1 if only one merge pass is required N M the cost is 4b r if M b r then there is only one run internal sorting and the cost is b r Join Operation Join is perhaps the most important operation combining two relations Algorithms computing the join efficiently are crucial to the optimization phase of query processing We will examine a number of algorithms for computing joins An important metric for estimating efficiency is the size of the result as mentioned last class the best algorithms on complex multi relation queries is to cut down the size of the intermediate results as quickly as possible Join Operation Size Estimation 0 size n r n s between 0 and size of cartesian product If A R S is a key of R then size n s If A R S is a key of R and a foreign key of S then size n s If A R S is not a key then each value of A in R appears no more than n s V A s times in S so n r tuples of R produce size n r n s V A s symmetrically size n s n r V A r if the two values are different we use size min n s n r V A r n r n s V A s Join Methods Nested Loop Simplest algorithm to compute a join nested for loops requires no indices tuple oriented for each tuple t1 in r do begin for each tuple t2 in s do begin join t1 t2 and append the result to the output block oriented for each block b1 in r do begin for each block b2 in s do begin join b1 b2 and append the result to the output reverse inner loop as above but we alternate counting up and down in the inner loop Why Cost of Nested Loop Join Cost depends upon the number of buffers and the replacement strategy pin 1 block from the outer relation k for the inner and LRU cost b r b r b s assuming b s k pin 1 block from the outer relation k for the inner and MRU cost b r b s b s k 1 b r 1 b r 2 k k 1 b r b s assuming b s k pin k blocks from the outer relation 1 for the inner read k from the outer for each block of s join 1xk blocks repeat with next k blocks of r untildone cost k cost b s repeated b r k times cost k b s b r k b r b r b s k which relation should be the outer one Join Methods Sort Merge Join Two phases Sort both relations on the join attributes Merge the sorted relations sort R on joining attribute sort S on joining attribute merge sorted R sorted S cost with k buffers b r 2 log b r k 1 to sort R b s 2 log b s k 1 to sort S b r b s to merge total b r 2 log b r k 1 b s 2 log b s k 1 b r b s Join Methods Hash Join Two phases Hash both relations into hashed partitions Bucket wise join join tuples of the same partitions only Hash R on joining attribute into H R buckets Hash S on joining attribute into H S buckets nested loop join of corresponding buckets cost assuming pairwise buckets fit in the buffers 2b r to hash R read and write 2b s to hash S as above b r b s to merge total 3 b r b s Join Methods Indexed Join Inner relation has an index clustering or not for each block b r in R do begin for each tuple t in b r do begin search the index on S with the value t A of the joining attribute and join with the resulting tuples of S cost b r n r cost select S A c where cost select S A c is as described before for indexed selection What if R is sorted on A hint use V A r in the above 3 way Join Suppose we want to compute R A B X S B C X T C D 1st …


View Full Document

UMD CMSC 424 - Class Notes

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Loading Unlocking...
Login

Join to view Class Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Class Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?