Chapter 13: Query ProcessingSlide 2Basic Steps in Query ProcessingBasic Steps in Query Processing (Cont.)Basic Steps in Query Processing : OptimizationBasic Steps: Optimization (Cont.)Measures of Query CostMeasures of Query Cost (Cont.)Selection OperationSelection Operation (Cont.)Selections Using IndicesSelections Involving ComparisonsImplementation of Complex SelectionsAlgorithms for Complex SelectionsSortingExternal Sort-MergeExternal Sort-Merge (Cont.)Example: External Sorting Using Sort-MergeExternal Merge Sort (Cont.)Join OperationNested-Loop JoinNested-Loop Join (Cont.)Block Nested-Loop JoinBlock Nested-Loop Join (Cont.)Indexed Nested-Loop JoinExample of Nested-Loop Join CostsMerge-JoinMerge-Join (Cont.)Hash-JoinHash-Join (Cont.)Slide 31Hash-Join AlgorithmHash-Join algorithm (Cont.)Handling of OverflowsCost of Hash-JoinExample of Cost of Hash-JoinHybrid Hash–JoinComplex JoinsOther OperationsOther Operations : AggregationOther Operations : Set OperationsOther Operations : Outer JoinEvaluation of ExpressionsMaterializationMaterialization (Cont.)PipeliningPipelining (Cont.)Slide 48Evaluation Algorithms for PipeliningSlide 50Database System Concepts©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use Chapter 13: Query ProcessingChapter 13: Query Processing©Silberschatz, Korth and Sudarshan13.2Database System Concepts, 5th Ed.Chapter 13: Query ProcessingChapter 13: Query ProcessingOverview Measures of Query CostSelection Operation Sorting Join Operation Other OperationsEvaluation of Expressions©Silberschatz, Korth and Sudarshan13.3Database System Concepts, 5th Ed.Basic Steps in Query ProcessingBasic Steps in Query Processing1. Parsing and translation2. Optimization3. Evaluation©Silberschatz, Korth and Sudarshan13.4Database System Concepts, 5th Ed.Basic Steps in Query Processing (Cont.)Basic Steps in Query Processing (Cont.)Parsing and translationtranslate the query into its internal form. This is then translated into relational algebra.Parser checks syntax, verifies relationsEvaluationThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.©Silberschatz, Korth and Sudarshan13.5Database System Concepts, 5th Ed.Basic Steps in Query Processing : OptimizationBasic Steps in Query Processing : OptimizationA relational algebra expression may have many equivalent expressions balance2500(balance(account)) is equivalent to balance(balance2500(account))Each relational algebra operation can be evaluated using one of several different algorithmsCorrespondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.can use an index on balance to find accounts with balance < 2500,or can perform complete relation scan and discard accounts with balance 2500©Silberschatz, Korth and Sudarshan13.6Database System Concepts, 5th Ed.Basic Steps: Optimization (Cont.)Basic Steps: Optimization (Cont.)Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated using statistical information from the database catalogExample: number of tuples in each relation, size of tuples, etc.In this module we studyHow to measure query costsAlgorithms for evaluating relational algebra operationsHow to combine algorithms for individual operations in order to evaluate a complete expressionIn Chapter 14We study how to optimize queries, that is, how to find an evaluation plan with lowest estimated cost©Silberschatz, Korth and Sudarshan13.7Database System Concepts, 5th Ed.Measures of Query CostMeasures of Query CostCost is generally measured as total elapsed time for answering queryMany factors contribute to time costdisk accesses, CPU, or even network communicationTypically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into accountNumber of seeks * average-seek-costNumber of blocks read * average-block-read-costNumber of blocks written * average-block-write-costCost to write a block is greater than cost to read a block –data is read back after being written to ensure that the write was successful©Silberschatz, Korth and Sudarshan13.8Database System Concepts, 5th Ed.Measures of Query Cost (Cont.)Measures of Query Cost (Cont.)For simplicity we just use number of block transfers from disk as the cost measureWe also ignore (for simplicity) the difference in cost between sequential and random I/O.We also ignore (for simplicity) CPU costs.Costs depends on the size of the buffer in main memoryHaving more memory reduces need for disk accessAmount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is availableReal systems take CPU cost into account, differentiate between sequential and random I/O, and take buffer size into accountWe do not include cost to writing output to disk in our cost formulae©Silberschatz, Korth and Sudarshan13.9Database System Concepts, 5th Ed.Selection OperationSelection OperationFile scan – search algorithms that locate and retrieve records that fulfill a selection condition.Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.Cost estimate (number of disk blocks scanned) = br br denotes number of blocks containing records from relation rIf selection is on a key attribute, average cost = (br /2) stop on finding recordLinear search can be applied regardless of selection condition orordering of records in the file, or availability of indices©Silberschatz, Korth and Sudarshan13.10Database System Concepts, 5th Ed.Selection Operation (Cont.)Selection Operation (Cont.)A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate (number of disk blocks to be scanned):log2(br) — cost of locating the first tuple by a binary search on the blocksPlus number of blocks containing records that satisfy selection condition –Will
View Full Document