Lecture 21: Query ExecutionOutlineArchitecture of a Database EngineLogical Algebra OperatorsLogical Query PlanSlide 6Physical Query PlanQuestion in ClassSlide 9Cost ParametersSlide 11Selection and ProjectionHash TablesHash Table ExampleSearching in a Hash TableInsertion in Hash TableSlide 17Hash Table PerformanceMain Memory Hash JoinDuplicate EliminationGroupingNested Loop JoinsSlide 23Slide 24Slide 25Slide 26Index Based JoinSlide 28Zig-zag Index Based JoinIndex Based SelectionSlide 31Operations on Very Large TablesPartitioned Hash AlgorithmsSlide 34Slide 35Partitioned Hash JoinHash-JoinSlide 38External SortingExternal Merge-Sort: Step 1External Merge-Sort: Step 2Cost of External Merge SortSlide 43Slide 44Merge-JoinSlide 46Two-Pass Algorithms Based on SortingSummary of External Join Algorithms1Lecture 21:Query Execution Monday, November 20, 20062Outline•Hash-tables (13.4)•Query execution: 15.1 – 15.53Architecture of a Database EngineParse QuerySelect Logical PlanSelect Physical PlanQuery ExecutionSQL queryQueryoptimizationLogicalplanPhysicalplan4Logical Algebra Operators•Union, intersection, difference•Selection •Projection •Join |x| •Duplicate elimination •Grouping •Sorting 5Logical Query PlanSELECT city, count(*)FROM salesGROUP BY cityHAVING sum(price) > 100SELECT city, count(*)FROM salesGROUP BY cityHAVING sum(price) > 100sales(product, city, price)city, sum(price)→p, count(*) → c p > 100city, cT1(city,p,c)T2(city,p,c)T3(city, c)T1, T2, T3 = temporary tables6Logical Query PlanSELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND P.city=‘seattle’ AND Q.phone > ‘5430000’ SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND P.city=‘seattle’ AND Q.phone > ‘5430000’ PurchasePersonBuyer=nameCity=‘seattle’ phone>’5430000’buyerPurchase(buyer, city)Person(name, phone)7Physical Query PlanSELECT S.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ SELECT S.buyerFROM Purchase P, Person QWHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Query Plan:• logical tree• implementation choice at every node• scheduling of operations.PurchasePersonBuyer=nameCity=‘seattle’ phone>’5430000’buyer(Simple Nested Loops)(Table scan) (Index scan)Some operators are from relationalalgebra, and others (e.g., scan)are not.8Question in ClassLogical operator: Product(pname, cname) || Company(cname, city)Propose three physical operators for the join, assuming the tables are in main memory:1. 2. 3.9Question in ClassProduct(pname, cname) |x| Company(cname, city)•1000000 products•1000 companiesHow much time do the following physical operators take if the data is in main memory ?•Nested loop join time = •Sort and merge = merge-join time =•Hash join time =10Cost ParametersThe cost of an operation = total number of I/Os result assumed to be delivered in main memoryCost parameters:•B(R) = number of blocks for relation R•T(R) = number of tuples in relation R•V(R, a) = number of distinct values of attribute a•M = size of main memory buffer pool, in blocks11Cost Parameters•Clustered table R:–Blocks consists only of records from this table–B(R) << T(R)•Unclustered table R:–Its records are placed on blocks with other tables–B(R) T(R)•When a is a key, V(R,a) = T(R)•When a is not a key, V(R,a)12Selection and ProjectionSelection (R), projection (R)•Both are tuple-at-a-time algorithms•Cost: B(R)Input buffer Output bufferUnaryoperator13Hash Tables•Key data structure used in many operators•May also be used for indexes, as alternative to B+trees•Recall basics:–There are n bucket s–A hash function f(k) maps a key k to {0, 1, …, n-1}–Store in bucket f(k) a pointer to record with key k•Secondary storage: bucket = block, use overflow blocks when needed14•Assume 1 bucket (block) stores 2 keys + pointers•h(e)=0•h(b)=h(f)=1•h(g)=2•h(a)=h(c)=3Hash Table Exampleebfgac0123Here: h(x) = x mod 4Here: h(x) = x mod 415•Search for a:•Compute h(a)=3•Read bucket 3•1 disk accessSearching in a Hash Tableebfgac012316•Place in right bucket, if space•E.g. h(d)=2Insertion in Hash Tableebfgdac012317•Create overflow block, if no space•E.g. h(k)=1•More over-flow blocksmay be neededInsertion in Hash Tableebfgdac0123k18Hash Table Performance•Excellent, if no overflow blocks•Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).19Main Memory Hash JoinHash join: R |x| S•Scan S, build buckets in main memory•Then scan R and join•Cost: B(R) + B(S)•Assumption: B(S) <= M20Duplicate EliminationDuplicate elimination (R)•Hash table in main memory•Cost: B(R)•Assumption: B((R)) <= M21GroupingGrouping:Product(name, department, quantity)department, sum(quantity) (Product) Answer(department, sum)Main memory hash tableQuestion: How ?22Nested Loop Joins•Tuple-based nested loop R ⋈ S•Cost: T(R) B(S) when S is clustered•Cost: T(R) T(S) when S is unclusteredfor each tuple r in R do for each tuple s in S do if r and s join then output (r,s)for each tuple r in R do for each tuple s in S do if r and s join then output (r,s)23Nested Loop Joins•We can be much more clever•Question: how would you compute the join in the following cases ? What is the cost ?–B(R) = 1000, B(S) = 2, M = 4–B(R) = 1000, B(S) = 3, M = 4–B(R) = 1000, B(S) = 6, M = 424Nested Loop Joins•Block-based Nested Loop Joinfor each (M-2) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if “r and s join” then output(r,s)for each (M-2) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if “r and s join” then output(r,s)25Nested Loop Joins. . .. . .R & SHash table for block of S(M-2 pages)Input buffer for ROutput buffer. . .Join Result26Nested Loop Joins•Block-based Nested Loop Join•Cost:–Read S once: cost B(S)–Outer loop runs B(S)/(M-2) times, and each time need to read R: costs B(S)B(R)/(M-2)–Total cost: B(S) + B(S)B(R)/(M-2)•Notice: it is better to iterate over the smaller relation first•R |x| S: R=outer relation, S=inner relation27Index Based Join•R S•Assume S has an index on the join attributefor each tuple
View Full Document