Chapter 19 Query Processing and Optimization CS 6360 Database Design Chris Irwin Davis Ph D Email chrisirwindavis utdallas edu Phone 972 883 3574 O ce ECSS 4 705 Introduction to Query Processing Query optimization The process of choosing a suitable execution strategy for processing a query Two internal representations of a query Query Tree Query Graph 3 19 1 Translating SQL Queries into Relational Algebra Translating SQL Queries into Relational Algebra In practice SQL is the query language that is used in most commercial RDBMSs An SQL query is first translated into an equivalent extended relational algebra expression represented as a query tree data structure that is then optimized 5 Translating SQL Queries into Relational Algebra Query block The basic unit that can be translated into the algebraic operators and optimized A query block contains a single SELECT FROM WHERE expression as well as GROUP BY and HAVING clause if these are part of the block Nested queries within a query are identified as separate query blocks Aggregate operators in SQL must be included in the extended algebra 6 Translating SQL Queries into Relational Algebra SELECT Lname Fname FROM EMPLOYEE WHERE Salary SELECT MAX Salary FROM EMPLOYEE WHERE Dno 5 Nested query without correlation with the outer query 7 Translating SQL Queries into Relational Algebra SELECT LNAME FNAME FROM EMPLOYEE WHERE SALARY SELECT LNAME FNAME FROM EMPLOYEE WHERE SALARY C SELECT MAX SALARY FROM EMPLOYEE WHERE DNO 5 SELECT MAX SALARY FROM EMPLOYEE WHERE DNO 5 Must be a single value i e exactly one column and one row 8 Translating SQL Queries into Relational Algebra SELECT LNAME FNAME FROM EMPLOYEE WHERE SALARY SELECT LNAME FNAME FROM EMPLOYEE WHERE SALARY C Lname Fname Salary C EMPLOYEE SELECT MAX SALARY FROM EMPLOYEE WHERE DNO 5 SELECT MAX SALARY FROM EMPLOYEE WHERE DNO 5 MAX Salary Dno 5 EMPLOYEE 9 19 2 Algorithms for External Sorting External Access Files Data access occurs in main memory It is necessary to copy from data stored in files into main memory Available memory is usually less than the size of data files How to access data in files nB buffer size in disk blocks int b file size in disk blocks int nR number of runs to access the entire file int 11 Algorithms for External Sorting How to sort records in a file Sort Merge Sort the records in each block Recursively merge the blocks 12 Algorithms for External Sorting Sorting Phase The size of each run and the available buffer space nB For example if the number of available main memory buffers nB 5 disk blocks and the size of the file b 1024 disk blocks then nR b nB or 205 initial runs each of size 5 blocks except the last run which will have only 4 blocks Hence after the sorting phase 205 sorted runs or 205 sorted subfiles of the original file are stored as temporary subfiles on disk 13 Algorithms for External Sorting Merging Phase The sorted runs are merged during one or more merge passes Each merge pass can have one or more merge steps The degree of merging dM is the number of sorted subfiles that can be merged in each merge step Performance The performance of the sort merge algorithm can be measured in the number of disk block reads and writes between the disk and main memory before the sorting of the whole file is completed The following formula approximates this cost 14 19 3 Algorithms for SELECT and JOIN Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation SELECT Examples OP1 SSN 123456789 EMPLOYEE OP2 DNUMBER 5 DEPARTMENT OP3 DNO 5 EMPLOYEE OP4 DNO 5 AND SALARY 30000 AND SEX F EMPLOYEE OP5 ESSN 123456789 AND PNO 10 WORKS ON 16 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation contd Search Methods for Simple Selection S1 Linear search brute force Retrieve every record in the file and test whether its attribute values satisfy the selection condition S2 Binary search If the selection condition involves an equality comparison on a key attribute on which the file is ordered binary search which is more efficient than linear search can be used See OP1 S3 Using a primary index or hash key to retrieve a single record If the selection condition involves an equality comparison on a key attribute with a primary index or a hash key use the primary index or the hash key to retrieve the record 17 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation cont d Search Methods for Simple Selection S4 Using a primary index to retrieve multiple records If the comparison condition is or on a key field with a primary index use the index to find the record satisfying the corresponding equality condition then retrieve all subsequent records in the ordered file S5 Using a clustering index to retrieve multiple records If the selection condition involves an equality comparison on a non key attribute with a clustering index use the clustering index to retrieve all the records satisfying the selection condition S6 Using a secondary B tree index On an equality comparison this search method can be used to retrieve a single record if the indexing field has unique values is a key or to retrieve multiple records if the indexing field is not a key In addition it can be used to retrieve records on conditions involving or FOR RANGE QUERIES 18 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation cont d Search Methods for Simple Selection S7 Conjunctive selection If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the methods S2 to S6 use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition S8 Conjunctive selection using a composite index If two or more attributes are involved in equality conditions in the conjunctive condition and a composite index or hash structure exists on the combined field we can use the index directly 19 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation cont d Search Methods for Complex Selection S9 Conjunctive selection by intersection of record pointers This method is possible if secondary indexes are available on all or some of the fields involved in equality comparison conditions in the conjunctive condition and if the indexes include record pointers rather than block pointers Each index can be used to retrieve the record pointers that satisfy the individual condition The intersection
View Full Document