Access Path Selection in a Relational Database Management System Selinger P Griffiths M M Astrahan D D Chamberlin It A Lorie T G Price 4 IBM Research Division San Jose 95193 retrieval Nor does a user specify in what order joins are to be performed The System R optimizer chooses both join order and an access path for each table in the SQL statement Of the many possible choices the optimizer the chooses one which minimizes total access cost for performing the entire statement ABSTRACT In a high level query and data manipulation language such as SQL requests stated non procedurally without are reference to access paths This paper describes how System R chooses access paths both simple single relation and for such as joins complex queries given a specification of desired data as a user of predicates System R boolean expression database management is an experimental system developed to carry out research on the relational model of data System R was designed and built by members of the IBM San Jose Research Laboratory 1 California This paper issues of will address the access path selection for queries Retrieval for data manipulation UPDATE DELETE is treated similarly Section 2 will describe the place of the optimizer in processing the of a SQL statement and section 3 will describe the storage component access paths that are available on a single physically stored table In section optimizer cost intro4 the formulas are duced for single table queries and section more 5 discusses the joining of two or and their corresponding costs tables Nested queries queries in predicates are covered in section 6 Introduction System R is an experimental database based on the relational management system model of data which has been under development at the IBM San Jose Research Laborato1975 Cl The software was since ry developed as a research vehicle in relaand is tional database not generally outside the IBM Research available Division 2 processi Bg B B u statement four A SQL statement is subjected to phases of Depending on the processing origin and contents of the statement these phases may be separated by arbitrary time In System intervals of RI these arbitrary time intervals are transparent to a SQL components which process the system These mechanisms and a descripstatement the processing tion of of SQL statements terminals are both programs and from Only an overview further discussed in 2 of those processing steps that are relevant to access path selection will be discussed here assumes familiarity This with paper data model terminology as relational a The described in Codd 7 and Date in System R is the unified user interface data definition and manipulation query Statements in SQL can be language SQL 5 issued both from an on line casual user oriented terminal interface and from programming languages such as PL I and COBOL In System R a user need not know how are physically stored and what the tuples are available e g which access paths indexes SQL statements do columns have the user not require to specify anything about the access path to be used for tuple The four phases of statement processing code generation optimization are parsing is sent and execution Each SQL statement is checked for to the parser where it reprecorrect syntax A guery block is sented by a SELECT list a FROM list and a the respectively containing WHERE tree list of items to be retrieved the table s boolean combination of referenced and the simple predicates specified by the user A may have many query single SQL statement one because a predicate may have blocks Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct couunercial advantage the ACMcopyright notice and the title of the publication and its date appear and notice is given that copying is by permission of the Association for Computing Machinery To copy otherwise or to republish requires a fee end or specific permission 01979 ACM0 89791 001 X 79 0500 0023 00 75 23 operand which is itself a query 3 the parser returns without If any the OPTIMIZER component is errors detected accumulates the The OPTIMIZER called tables and columns referenced in names of the query and looks them up in the System R their existence and to catalogs to verify retrieve information about them T he Research Storaae System The Research Storage System RSSI is the storage subsystem of System R It is responsible for maintaining Physical storage of relations access paths on these relations locking in a multi user environment and logging and recovery facilities The RSS presents a tuple oriented interface RSII to its users Although the RSS may be used R independently of System we are concerned here with its use for executing the code generated by the processing of SQL statements in System R as described in the previous section For a complete description of the RSS see l lookup portion of the The catalog OPTIMIZER also obtains statistics about the referenced relations and the access paths These will be available on each of them After used later in access path selection lookup has obtained the datatype catalog OPTIMIZER of each column the and length SELECT list and WHERE tree to rescans the check for semantic errors and type compatiexpressions and predicate in both bility comparisons Relations are stored in the RSS as a collection of tuples whose columns are physically contiguous These tuples are stored on 4K byte pages no tuple spans a Pages are organized page into logical units called segments Segments may contain one or more relations but no relation rn y span a segment Tuples from two or more relations may occur on the same Each tuple is page tagged with the identification of the relation to which it belongs access OPTIMIZER performs Finally the It first determines the path selection evaluation order among the query blocks in the statement Then for each query block are relations in the FROM list the processed If there is more than one a block permutations of the relation in join order and of the method of joining are The access paths that minimize evaluated are chosen from a total cost for the block alternate path This tree of choices solution is represented minimum cost by a structural modification of the parse tree plan in the The result is an execution Access Specification Language ASLI lo The primary way of accessing tuples in a relation is via an RSS scan A scan returns a tuple at a time along a given access path
View Full Document
Unlocking...