Lecture 7 Query Execution Efficient Join Processing Sept 14 2007 ChengXiang Zhai Most slides are adapted from Jun Yang s and AnHai Doan s lectures CS511 Advanced Database Management Systems 1 DBMS Architecture User Web Forms Applications DBA query transaction Query Parser Transaction Manager Query Rewriter Today s lecture Query Optimizer Lock Manager Logging Recovery Query Executor Files Access Methods Buffer Manager Buffers Main Memory Storage Manager Storage CS511 Advanced Database Management Systems Lock Tables Past lectures 2 Query Execution Plans buyer item SELECT P buyer P item FROM Purchase P Person Q WHERE P buyer Q name AND Q city urbana AND Q age 20 Query Plan logical plan declarative physical plan procedural City urbana age 20 buyer name Purchase Table scan Simple nested loops Person Index scan procedural implementation of each logical operator scheduling of operations CS511 Advanced Database Management Systems 3 Logical v s Physical Operators Logical operators what they do e g union selection project join grouping Physical operators how they do it e g nested loop join sort merge join hash join index join Other differences and relations between them How should we design the physical operators CS511 Advanced Database Management Systems 4 How do We Combine Operations The iterator model Each operation is implemented by 3 functions Open sets up the data structures and performs initializations GetNext returns the the next tuple of the result Close ends the operations Cleans up the data structures Enables pipelining Contrast with data driven materialize model Can be generalized to object oriented DB CS511 Advanced Database Management Systems 5 Typical Physical Operators Table Scan Sorting While Scanning Tables Various kinds of algorithms for implementing the relational operations Set union intersection Selection Projection Join Join is most important why CS511 Advanced Database Management Systems 6 Notation Relations R S Tuples r s Number of tuples R S Number of disk blocks B R B S Number of memory blocks available M Cost metric Number of I O s Memory requirement CS511 Advanced Database Management Systems 7 Internal Memory Join Solutions Nested loop join check for every record in R and every record in S cost O R S Sort merge join sort R S followed by merging cost O S log S if R S Hash join build a hash table for R for every record in S probe the hash table cost O S CS511 Advanced Database Management Systems 8 External Memory Solutions Issues to Consider Disk buffer manager Cost model I O based Indexing Sort hash CS511 Advanced Database Management Systems 9 Nested Loop Join p S R For each block of R and for each r in the block For each block of S and for each s in the block Output rs if p evaluates to true over r and s R is called the outer table S is called the inner table I O s B R R B S Memory requirement 4 double buffering More reasonable block based nested loop join For each block of R and for each block of S For each r in the R block and for each s in the S block I O s B R B R B S Memory requirement same as before CS511 Advanced Database Management Systems 10 Index Nested Loop Join R R A S B S Idea use the value of R A to probe the index on S B For each block of R and for each r in the block Use the index on S B to retrieve s with s B r A Output rs I O s B R R index lookup Typically the cost of an index lookup is 2 4 I O s Beats other join methods if R is not too big Better pick R to be the smaller relation Memory requirement 2 CS511 Advanced Database Management Systems 11 Stop early Improvements of Nested Loop Join If the key of the inner table is being matched May reduce half of the I O s less for block based Make use of available memory Stuff memory with as much of R as possible stream S by and join every S tuple with all R tuples in memory I O s B R B R M 2 B S roughly B R B S M Memory requirement M as much as possible CS511 Advanced Database Management Systems 12 Zig Zag Join Using Ordered Indexes R R A S B S Idea use the ordering provided by the indexes on R A and S B to eliminate the sorting step of sortmerge join Trick use the larger key to probe the other index Possibly skipping many keys that do not match CS511 Advanced Database Management Systems 13 External Merge Sort Problem sort R but R does not fit in memory Pass 0 read M blocks of R at a time sort them and write out a level 0 run There are B R M level 0 sorted runs Pass i merge M 1 level i 1 runs at a time and write out a level i run M 1 memory blocks for input 1 to buffer output of level i runs of level i 1 runs M 1 Final pass produces 1 sorted run CS511 Advanced Database Management Systems 14 Example of External Merge Sort Input 1 7 4 5 2 8 9 6 3 0 Each block holds one number and memory has 3 blocks Pass 0 1 7 4 1 4 7 5 2 8 2 5 8 9 6 3 3 6 9 0 0 Pass 1 1 4 7 2 5 8 1 2 4 5 7 8 3 6 9 0 0 3 6 9 Pass 2 final 1 2 4 5 7 8 0 3 6 9 0 1 2 3 4 5 6 7 8 9 CS511 Advanced Database Management Systems 15 Performance of external merge sort log M 1 B R M 1 Number of passes I O s Multiply by 2 B R each pass reads the entire relation once and writes it once Subtract B R for the final pass Roughly this is O B R log M B R Memory requirement M as much as possible CS511 Advanced Database Management Systems 16 Some Tricks for Sorting Double buffering Allocate an additional block for each run Trade off smaller fan in more passes Blocked I O Instead of reading writing one disk block at a time read write a bunch cluster Trade off more sequential I O s smaller fan in more passes CS511 Advanced Database Management Systems 17 Sort Merge Join R A S BS R Sort R and S by their join attributes and then merge r s the first tuples in sorted R and S Repeat until one of R and S is exhausted If r A s B then s next tuple in S else if r A s B then r next tuple in R else output all matching tuples and r s next in R and S I O s sorting 2 B R 2 B S In most cases e g join of key and foreign key Worst case is B …
View Full Document