Generalized Search Trees for Database Systems Extended Abstract Joseph M Hellerstein University of Wisconsin Madison jmh cs berkeley edu Jeffrey F Naughton University of Wisconsin Madison naughton cs wisc edu developing domain specific search trees is problematic The effort required to implement and maintain such data structures is high As new applications need to be supported new tree structures have to be developed from scratch requiring new implementations of the usual tree facilities for search maintenance concurrency control and recovery Abstract This paperintroducesthe GeneralizedSearchTree GiST an index structuresupporting an extensible setof queriesand datatypes The GiST allows new data types to be indexed in a manner supporting queries natural to the types this is in contrast to previous work on tree extensibility which only supportedthe traditional setof equality and range predicates In a single data structure the GiST provides all the basic searchtree logic required by a databasesystem thereby unifying disparatestructuressuch asB treesand R treesin a single pieceof code and opening the application of searchtreesto general extensibility To illustrate the flexibility of the GiST we provide simple method implementationsthat allow it to behavelike a B tree an R tree and an RD tree a new index for data with set valuedattributes We also presenta preliminary performanceanalysisof RD trees which leads to discussionon the nature of tree indices and how they behavefor various datasets 1 Introduction An efficient implementation of search trees is crucial for any database system In traditional relational systems B trees Corn791 were sufficient for the sorts of queries posed on the usual set of alphanumeric data types Today database systems are increasingly being deployed to support new applications such as geographic information systems multimedia systems CAD tools document libraries sequence databases fingerprint identification systems biochemical databases etc To support the growing set of applications search trees must be extended for maximum flexibility This requirement has motivated two major research approaches in extending search tree technology 1 Specialized Search Trees A large variety of search trees has been developed to solve specific problems Among the best known of these trees are spatial search trees such as R trees Gut84 While some of this work has had significant impact in particular domains the approach of HellersteinandNaughtonweresupported by NSFgrantIRI 9157357 Permission to copy withoutfee all or part of this material is granted provided that the copies are not made or distributed for direct commercial udvantage the VLDB copyright notice and the title of the publication and its date appear and notice is given that copying is by permission of the Very Lurge Data Base Endowment To copy otherwise or to republish requires a fee and or special permission from the Endowment Proceedings of the 21st VLDB Conference Zurich Switzerland 1995 562 Avi Pfeffer University of California Berkeley avi cs berkeley edu 2 Search Trees For Extensible Data Types As an alternative to developing new data structures existing data structures such as B trees and R trees can be made extensible in the data types they support Sto86 For example B trees can be used to index any data with a linear ordering supporting equality or linear range queries over that data While this provides extensibility in the data that can be indexed it does not extend the set of queries which can be supported by the tree Regardless of the type of data stored in a B tree the only queries that can benefit from the tree are those containing equality and linear range predicates Similarly in an R tree the only queries that can use the tree are those containing equality overlap and containment predicates This inflexibility presents significant problems for new applications since traditional queries on linear orderings and spatial location are unlikely to be apropos for new data types In this paper we present a third direction for extending search tree technology We introduce a new data structure called the Generalized Search Tree GiST which is easily extensible both in the data types it can index and in the queries it can support Extensibility of queries is particularly important since it allows new data types to be indexed in a manner that supports the queries natural to the types In addition to providing extensibility for new data types the GiST unifies previously disparate structures used for currently common data types For example both B trees and R trees can be implemented as extensions of the GiST resulting in a single code base for indexing multiple dissimilar applications The GiST is easy to configure adapting the tree for different uses only requires registering six methods with the database system which encapsulate the structure and behavior of the object class used for keys in the tree As an illustration of this flexibility we provide method implementations that allow the GiST to be used as a B tree an Rtree and an RD tree a new index for data with set valued attributes The GiST can be adapted to work like a variety of other known search tree structures e g partial sum trees WEgO k D B trees Robgl Ch trees KKD89 Exodus large objects CDG 90 hB trees LS90 V trees MCD94 TV trees LJF94 etc Implementing a new set of methods for the GiST is a significantly easiertaskthan implementing a new tree packagefrom scratch for example the POSTGRES Gro94 and SHORE CDF 94 implementationsof R trees and B trees are on the order of 3000 lines of C or C code each while our method implementationsfor the GiST are on the order of 500 lines of C code each In addition to providing an unified highly extensible data structure our generaltreatmentof searchtreesshedssomeinitial light on a more fundamentalquestion if any datasetcan be indexed with a GiST doesthe resulting treealwaysprovide efficient lookup The answerto this question is no and in our discussionwe illustrate someissuesthat can affect the efficiency of a searchtree This leads to the interesting question of how and when one can build an efficient searchtree for queriesover non standarddomains a question that can now be further explored by experimenting with the GiST 1 1 Structure of the Paper In Section 2 we illustrate and generalizethe basic nature of databasesearchtrees Section 3 introduces the Generalized SearchTree object with its structure properties and behavior In
View Full Document