CS 511 Design of Database Management System Homework 2 Due 2 00 PM CST on September 26 2007 NOTE Please submit a hard copy of your homework Bring it to the lecture table at the beginning of the lecture on 26th September 2007 The hard copy should be as clearly readable as possible You may be subtracted points for unreadability and ugly presentation I2CS Students You should email your solutions to the TA termehch uiuc edu in the pdf or the word format Please send your email with subject CS511 HW2 and solution file attached by 2PM UIUC time CST Problem 1 GiST 25 Pts This problem asks you to build a GiST search tree for the date data type Each date object can be either a one day or a period For simplicity we assume all dates are in the year of 2007 The following are some example date object d1 01 02 d2 01 20 02 25 d3 01 01 01 10 d4 01 01 02 10 a one day example a period example We want to use GiST to build a search tree for supporting the following query predicate in which x and y are two date objects That is given a predicate below we want to search the index tree for some given y to find all x that satisfy the predicate Overlap x y True if x and y overlap and False otherwise e g Overlap d1 d2 True Overlap d1 d3 False Before x y True if x ends before y starts and False otherwise e g Before d3 d2 True Before d1 d2 False Describe how Consistent can be implemented for each of the above query predicates Solution Key predicate Contains d v where d is the minimal bounding date of all dates v in the subtree i e True if v falls within the data period d and False otherwise p Contains d v q Overlap v y True if d and y overlap and False otherwise p Contains d v q Before v y True if d starts before y starts and False otherwise Problem 2 Query Optimization 25 Pts Consider the following relational schema and SQL query Suppliers sid sname address category Supply sid pid Parts pid pname brand SELECT S sname P pname FROM Suppliers S Parts P Supply Y WHERE sS sid Y sid AND Y pid P pid a Enumerate all joins orderings that System R considers Assume that the optimizer follows the heuristic of never considering orderings that require the computation of cross products Solution As system orderings 1 S 2 Y 3 P 4 Y R only considers left deep joins they consider the following join Y P S P Y S P S b Suppose relation Supply has a hash index on attribute sid What plan would you choose Explain why and how this index might be helpful Solution Wee choose the first plan from a as it can take advantage of the hash index on sid to selective retrieve joinable pairs for S Y c Give two examples orderings that System R will not consider How can you change the optimizer to consider such ordering If not possible explain why not Solution System R considers only left deep joins and thus do not support the following right deep joins 1 S Y P Y S 2 P We could change the optimizer to enumerate right deep orderings in dynamic programming Examples with cross product are given complete points as well Problem 3 Query Optimization 15 Pts Consider relations r A B and s B C Assume that r contains 2 000 tuples and that s contains 5 000 tuples We want to compute v r r B s B s a Without any further assumptions what is the maximum number of tuples that v may contain Solution 2000 5000 10 000 000 b Now assume that we know that V B r 500 That is in r the attribute B takes on 500 different values What is now a reasonable estimate on the size of v Solution 2000 500 500 20 000 d Finally assume we know that s satisfies the functional dependency B C What is now a reasonable estimate on the size of v Solution 2000 since B is a key Problem 4 Query Execution 35 Pts There are other types of join in addition to equi join where the predicate is smaller than or greater than For instance consider a query like For each manager show employees whose salaries are more than the manager s Database management system has to join table Manager with table Employee on Salary attribute where join predicate is less than a Explain how the hybrid hash join algorithm could be extended to implement join with above predicates Solution In order to implement non equi join join using hybrid hash join algorithm one has to find an order preserving hash function The rest will be similar to the original version of the algorithm except for the step that each bucket from outer relation must combined by any bucket in the inner relation that has a tuples with larger or smaller join values b Compare the memory usage in the extended algorithm of part a with the case of equi join Solution The memory usage will be the same as original algorithm c Compare the extended algorithm of part a with sort merge algorithm for new join operators in terms of speed and memory usage Solution Sort merge algorithm could be extended to cover these new join predicates without any important change Performance of extending hybrid hash join depends on finding an order preserving hash function that also distributes tuples among buckets evenly If such a function could be found extended hybrid hash join will have better memory usage with the same I O cost Otherwise sort merge algorithm will have better memory usage and I O cost Note Such hash functions could be determined using statistical information about each relation stored in query optimization module Sampling could help finding these functions as well
View Full Document