Berkeley COMPSCI 262A - Access Path Selection in a Relational Database Management System

Unformatted text preview:

Access Path Selection in a Relational Database Management System P. Griffiths Selinger M. M. Astrahan D. D. Chamberlin ‘, : It. A. Lorie .: ' T. G. Price 4: IBM Research Division, San Jose, California 95193 ABSTRACT: In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research'Laboratory. 1. Introduction System' R is an experimental database management system based on the relational model of data which has been under develop- ment at the IBM San Jose Research Laborato- ry since 1975 Cl>. The software was . developed as a research vehicle in rela- tional database, and is not generally available outside the IBM Research Divi- sion. This paper assumes familiarity with relational data model terminology as described in Codd <7> and Date <a>. The user interface in System R is the unified query, data definition, and manipulation language SQL <5>. Statements in SQL can be issued both from an on-line casual-user-or- iented terminal interface and from program- ming languages such as PL/I and COBOL. In System R a user need not know how the tuples are physically stored and what access paths are available (e.g. which columns have indexes). SQL statements do not require the user to specify anything about the access path to be used for tuple Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct couunercial ad- vantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Associa- tion for Computing Machinery. To copy otherwise, or to republish, requires a fee end/or specific permission. 01979 ACM 0-89791-001-X/79/0500-0023 $00.75 retrieval. Nor does a user specify in what order joins are to be performed. The System R optimizer .chooses both join order and an access path for each table in the SQL statement. Of the many possible choices, the optimizer chooses the one which minimizes "total access cost" for performing the entire statement. This paper will address the issues of access path selection for queries. Retrieval for data manipulation (UPDATE, DELETE) is treated similarly. Section 2 will describe the place of the optimizer in the processing of a SQL statement, and section 3 will describe the storage compo- nent access paths that are available on a single physically stored table. In section 4 the optimizer cost formulas are intro- duced for single table queries, and section 5 discusses the joining of two or more tables, and their corresponding costs. Nested queries (queries in predicates) are covered in section 6. 2. processi.Bg & B.B u statement A SQL statement is subjected to four phases of processing. Depending on 'the origin and contents of the statement., these phases may be separated by arbitrary intervals. of time. In System RI these arbitrary time intervals are transparent to the system components which process a SQL statement. These mechanisms and a descrip- tion of the processing of SQL statements from both programs and terminals are further discussed in <2>. Only an overview of those processing steps that are relevant to access path selection will be discussed here. The four phases of statement processing are-parsing, optimization. code generation. and execution. Each SQL statement is sent to , the parser. where it is checked for correct syntax. A guery block is repre- sented by a SELECT list, a FROM list, and a WHERE tree, containing, respectively the list of .items to be retrieved, the table(s) referenced, and the boolean combination of simple predicates specified by the user. A single SQL statement may have many query blocks because a predicate may have one 23operand which is itself a query. If the parser returns without any errors detected, the OPTIMIZER component is called. The OPTIMIZER accumulates the names of tables and columns referenced in the query and looks them up in the System R catalogs to verify their existence and to retrieve information about them. The catalog lookup portion of the OPTIMIZER also obtains statistics about the referenced relations, and the access paths ‘available on each of them. These will be used later in access path selection. After catalog lookup has obtained the datatype and length of each column, the OPTIMIZER rescans the SELECT-list and WHERE-tree to check for semantic errors and type compati- bility in both expressions and predicate comparisons. Finally the OPTIMIZER performs access path selection. It first determines the evaluation order among the query blocks in the statement. Then for each query block, the relations in the FROM list are processed. If there is more than one relation in a block, permutations of the join order and of the method of joining are evaluated. The access paths that minimize total cost for the block are chosen from a tree of alternate path choices. This minimum cost .solution is represented by a structural modification of the parse tree. The result is an execution plan in the Access Specification Language (ASLI <lo>. After a plan is chosen for each query block and represented in the parse tree, the CODE GENERATOR is called. The CODE GENERATOR is a table-driven program which translates ASL trees into machine language code to execute the plan chosen by the OPTIMIZER. In doing this it uses a rela- tively small number of code templates, one for each type of


View Full Document

Berkeley COMPSCI 262A - Access Path Selection in a Relational Database Management System

Download Access Path Selection in a Relational Database Management System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Access Path Selection in a Relational Database Management System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Access Path Selection in a Relational Database Management System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?