Unformatted text preview:

The Ubiquitous B Tree DOUGLAS COMER Computer Sctence Department Purdue Untverstty West Lafayette Indiana 47907 B trees have become de facto a standard for file organization File indexes of users dedicated database systems and general purpose access methods have all been proposed and nnplemented using B trees This paper reviews B trees and shows why they have been so successful It discusses the major variations of the B tree especially the B tree contrasting the relatwe merits and costs of each implementatmn It illustrates a general purpose access method whmh uses a B tree Keywords and Phrases B tree B tree B tree file organization index CR Categorws 3 73 3 74 4 33 4 34 INTRODUCTION The secondary storage facilities available on large computer systems allow users to store update and recall data from large collections of information called files A computer must retrieve an item and place it in main memory before it can be processed In order to make good use of the computer resources one must organize files intelligently making the retrieval process efficient The choice of a good file organization depends on the kinds of retrieval to be performed There are two broad classes of retrieval commands which can be illustrated by the following examples Sequential From our employee file prepare a list of all employees names and addresses and Random From our employee file extract the information about employee J Smith We can imagine a filing cabinet with three drawers of folders one folder for each employee The drawers might be labeled AG H R and S Z while the folders might be labeled with the employees last names A sequential request requires the searcher to examine the entire file one folder at a time On the other hand a random request implies that the searcher guided by the labels on the drawers and folders need only extract one folder Associated with a large randomly accessed file in a computer system is an index which like the labels on the drawers and folders of the file cabinet speeds retrieval by directing the searcher to the small part of the file containing the desired item Figure 1 depicts a file and its index An index may be physically integrated with the file like the labels on employee folders or physically separate like the labels on the drawers Usually the index itself is a file If the index file is large another index may be built on top of it to speed retrieval further and so on The resulting hierarchy is similar to the employee file where the topmost index consists of labels on drawers and the next level of index consists of labels on folders Natural hierarchies like the one formed by considering last names as index entries do not always produce the best perform Permission to copy without fee all or part of this material is granted provided that the copras are not made or distributed for direct commercial advantage the ACM copyright notice and the title of the pubhcation and its date appear and notice is given that copying is by permission of the Association for Computing Machinery To copy otherwme or to republish requires a fee and or specific permission 1979 ACM 0010 4892 79 0600 0121 00 75 Computing Surveys Vol ll No 2 June 1979 122 D Comer CONTENTS INTRODUCTION OperaUons on a t de I THE BASIC B TREE Balancing Insertion Deletion 2 THE COST OF OPERATIONS Retrmva Costs Insertion and Deletmn Costs Sequentm Processing 3 B TREE VARIANTS B Trees B Trees Prefix B Trees Virtual B Trees Compression Variable Length Entries Binary B Trees 2 3 Trees and Theoreucal Results 4 B TREES 1N A MULTIUSER ENVIRONMENT Security 5 A G E N E R A L P U R P O S E ACCESS M E T H O D USING B TREES PerformanceEnhancements Tree StructuredFde Directory Other V S A M Facdltms SUMMARY ACKNOWLEDGMENTS REFERENCES A T ance when used in a computer system Usually a unique key is assigned to each item in the file and all retrieval is requested by specifying the key For example each employee might be assigned a unique employee number which would identify that employee s record Instead of labeling the drawers of the cabinet A G etc one would use ranges of employee numbers like 0001 3000 Many techniques for organizing a file and its index have been proposed Knuth KNuT73 provides a survey of the basics While no single scheme can be optimum for all applications the technique of organizing a file and its index called the B tree has become widely used The B tree is de facto the standard organization for indexes in a database system This paper intended for computer professionals who have heard of B trees and want some explanation or direction for further reading compares several variations of the B tree especially the C o m p u t i n g S u r v e y s Vol 11 N o 2 J u n e 1979 B tree showing why it has become popular It surveys the literature on B trees including recent papers not mentioned in textbooks In addition it discusses a general purpose file access method based on the Btree The starting point of our discussion is an internal storage structure called the binary search tree In particular we begin with balanced binary search trees because of their guaranteed low retrieval cost For a survey of binary search trees and other internal storage mechanisms the reader is referred to SEVE74 and NIEV74 NIEV74 also explains the graph theoretic terms tree node edge root path and leaf which will be used throughout the discussion The remainder of this Introduction presents a model of the retrieval process and outlines the file operations to be considered Section 1 presents the basic B tree as proposed by Bayer and McCreight giving the methods for inserting deleting and locating items Then for each type of operation Section 2 examines the cost and concludes that sequential processing can be expensive In many cases changes in implementation can lower the costs Section 3 shows variations of the B tree which have been developed to do so Extending the variations of B trees Section 4 reviews the problems of maintaining a B tree in a multiple user environment and outlines solutions for concurrency and security problems Finally Section 5 presents IBM s general purpose file access method which is based on the Btree Operations on a File For purposes of this paper we think of a file as a set of n records each of the form r k a in which k is called the key for the ith record and a the associated information For example the key for a record in an employee file might be a five digit employee number while the associated information might


View Full Document

UMD CMSC 818S - The Ubiquitous B-Tree

Loading Unlocking...
Login

Join to view The Ubiquitous B-Tree and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Ubiquitous B-Tree and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?