Unformatted text preview:

Chapter 13: Query ProcessingSlide 2Basic Steps in Query ProcessingBasic Steps in Query Processing (Cont.)Basic Steps in Query Processing : OptimizationBasic Steps: Optimization (Cont.)Measures of Query CostMeasures of Query Cost (Cont.)Selection OperationSelection Operation (Cont.)Selections Using IndicesSelections Involving ComparisonsImplementation of Complex SelectionsAlgorithms for Complex SelectionsSortingExternal Sort-MergeExternal Sort-Merge (Cont.)Example: External Sorting Using Sort-MergeExternal Merge Sort (Cont.)Join OperationNested-Loop JoinNested-Loop Join (Cont.)Block Nested-Loop JoinBlock Nested-Loop Join (Cont.)Indexed Nested-Loop JoinExample of Nested-Loop Join CostsMerge-JoinMerge-Join (Cont.)Hash-JoinHash-Join (Cont.)Slide 31Hash-Join AlgorithmHash-Join algorithm (Cont.)Handling of OverflowsCost of Hash-JoinExample of Cost of Hash-JoinHybrid Hash–JoinComplex JoinsOther OperationsOther Operations : AggregationOther Operations : Set OperationsOther Operations : Outer JoinEvaluation of ExpressionsMaterializationMaterialization (Cont.)PipeliningPipelining (Cont.)Slide 48Evaluation Algorithms for PipeliningSlide 50Database System Concepts©Silberschatz, Korth and SudarshanSee www.db-book.com for conditions on re-use Chapter 13: Query ProcessingChapter 13: Query Processing©Silberschatz, Korth and Sudarshan13.2Database System Concepts, 5th Ed.Chapter 13: Query ProcessingChapter 13: Query ProcessingOverview Measures of Query CostSelection Operation Sorting Join Operation Other OperationsEvaluation of Expressions©Silberschatz, Korth and Sudarshan13.3Database System Concepts, 5th Ed.Basic Steps in Query ProcessingBasic Steps in Query Processing1. Parsing and translation2. Optimization3. Evaluation©Silberschatz, Korth and Sudarshan13.4Database System Concepts, 5th Ed.Basic Steps in Query Processing (Cont.)Basic Steps in Query Processing (Cont.)Parsing and translationtranslate the query into its internal form. This is then translated into relational algebra.Parser checks syntax, verifies relationsEvaluationThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.©Silberschatz, Korth and Sudarshan13.5Database System Concepts, 5th Ed.Basic Steps in Query Processing : OptimizationBasic Steps in Query Processing : OptimizationA relational algebra expression may have many equivalent expressions balance2500(balance(account)) is equivalent to balance(balance2500(account))Each relational algebra operation can be evaluated using one of several different algorithmsCorrespondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.can use an index on balance to find accounts with balance < 2500,or can perform complete relation scan and discard accounts with balance  2500©Silberschatz, Korth and Sudarshan13.6Database System Concepts, 5th Ed.Basic Steps: Optimization (Cont.)Basic Steps: Optimization (Cont.)Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.  Cost is estimated using statistical information from the database catalogExample: number of tuples in each relation, size of tuples, etc.In this module we studyHow to measure query costsAlgorithms for evaluating relational algebra operationsHow to combine algorithms for individual operations in order to evaluate a complete expressionIn Chapter 14We study how to optimize queries, that is, how to find an evaluation plan with lowest estimated cost©Silberschatz, Korth and Sudarshan13.7Database System Concepts, 5th Ed.Measures of Query CostMeasures of Query CostCost is generally measured as total elapsed time for answering queryMany factors contribute to time costdisk accesses, CPU, or even network communicationTypically disk access is the predominant cost, and is also relatively easy to estimate. Measured by taking into accountNumber of seeks * average-seek-costNumber of blocks read * average-block-read-costNumber of blocks written * average-block-write-costCost to write a block is greater than cost to read a block –data is read back after being written to ensure that the write was successful©Silberschatz, Korth and Sudarshan13.8Database System Concepts, 5th Ed.Measures of Query Cost (Cont.)Measures of Query Cost (Cont.)For simplicity we just use number of block transfers from disk as the cost measureWe also ignore (for simplicity) the difference in cost between sequential and random I/O.We also ignore (for simplicity) CPU costs.Costs depends on the size of the buffer in main memoryHaving more memory reduces need for disk accessAmount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is availableReal systems take CPU cost into account, differentiate between sequential and random I/O, and take buffer size into accountWe do not include cost to writing output to disk in our cost formulae©Silberschatz, Korth and Sudarshan13.9Database System Concepts, 5th Ed.Selection OperationSelection OperationFile scan – search algorithms that locate and retrieve records that fulfill a selection condition.Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition.Cost estimate (number of disk blocks scanned) = br br denotes number of blocks containing records from relation rIf selection is on a key attribute, average cost = (br /2) stop on finding recordLinear search can be applied regardless of selection condition orordering of records in the file, or availability of indices©Silberschatz, Korth and Sudarshan13.10Database System Concepts, 5th Ed.Selection Operation (Cont.)Selection Operation (Cont.)A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate (number of disk blocks to be scanned):log2(br) — cost of locating the first tuple by a binary search on the blocksPlus number of blocks containing records that satisfy selection condition –Will


View Full Document

UMBC CMSC 461 - Chapter 13: Query Processing

Download Chapter 13: Query Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 13: Query Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 13: Query Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?