UH COSC 3480 - B+-Trees and Hashing Techniques for Storage and Index Structures

Unformatted text preview:

B+-Trees and Hashing Techniques for Storage and Index StructuresAlternative File OrganizationsPhysical Database Design for Relational DatabasesFocus of the Current DiscussionExample of an Internal Schema for the Library ExampleExample: Stored DatabaseIndex ClassificationClustered vs. Unclustered IndexIndex Classification (Contd.)Slide 10Introduction Indexing TechniquesStatic HashingStatic Hashing (Contd.)Range SearchesB+ Tree: The Most Widely Used IndexExample B+ Tree (order p=5, m=4)B+ Trees in PracticeInserting a Data Entry into a B+ TreeInserting 4* into Example B+ TreeExample B+ Tree After Inserting 4*Deleting a Data Entry from a B+ TreeExample Tree (after inserting 4*, and deleting 19* and 20*) before deleting 24... And Then Deleting 24*Example of Non-leaf Re-distributionAfter Re-distributionClarifications B+ TreeWhy is the minimum number of pointers in an intermediate node (p+1)/2 and not p/2 + 1??What order p and leaf entry maximum m should I chose?Choosing p and m (continued)Coping with Duplicate Keys in B+ TreesSummary B+ TreeExtendible HashingExampleInsert h(r)=20 (Causes Doubling)Points to NoteDirectory DoublingComments on Extendible HashingLinear HashingB+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick1B+-Trees and Hashing Techniques for Storage and Index StructuresCovers Chapters [[8]], 10, 11 Third EditionLast updated: October 31, 2005B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick2Alternative File OrganizationsMany alternatives exist, each ideal for some situation , and not so good in others:•Heap files: Suitable when typical access is a file scan retrieving all records.•Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.•Hashed Files: Good for equality selections.•File is a collection of buckets. Bucket = primary page plus zero or more overflow pages.•Hashing function h: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick3Physical Database Designfor Relational Databases1. Select Storage Structures (determine how the particular relation is physically stored)2. Select Index Structures (to speed up certain queries)3. Select ……to minimize the runtime for a certain workload (e.g a given set of queries)B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick4Focus of the Current Discussion•Introduce popular storage structures / file organizations•Introduce popular index structures•Discuss advantages and disadvantages of the introduced storage and index structures•Obtain a clear understanding how storage structure and index structures relate to query processingB+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick5Example of an Internal Schemafor the Library ExampleINTERNAL Schema Library12 references Library. Book is stored sequentially, index on B# using hashing, index on Author using hashing. Person is stored using hashing on ssn. Check_out is stored sequentially, index on since using B+-tree.B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick6Example: Stored Database Relation Book(11, Y,W)(30, Z, B)(20, Y,W)(51, C, B)(1, C,W)11151Index on B#Block#= B# mod 10012030(101,…)Relation PersonBlock#= sss mod 1001(200,…)(500,…)W,…Index on AuthorRelationCheckoutIndex on sincestorage structureindex structureB+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick7Index Classification•Primary vs. secondary: If search key contains primary key, then called primary index.•Unique index: Search key contains a candidate key.•Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of data entries, then called clustered index.•Alternative 1 implies clustered, but not vice-versa.•A file can be clustered on at most one search key.•Cost of retrieving data records through index varies greatly based on whether index is clustered or not!B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick8Clustered vs. Unclustered Index•Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file.• To build clustered index, first sort the Heap file (with some free space on each page for future inserts). •Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not identical to, the sort order.)Index entriesData entriesdirect search for (Index File)(Data file)Data Recordsdata entriesData entriesData RecordsCLUSTEREDUNCLUSTEREDB+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick9Index Classification (Contd.)•Dense vs. Sparse: If there is at least one data entry per search key value (in some data record), then dense.•Alternative 1 always leads to dense index.•Every sparse index is clustered!•Sparse indexes are smaller; however, some useful optimizations are based on dense indexes.Ashby, 25, 3000Smith, 44, 3000AshbyCassSmith22253040444450Sparse IndexonNameData FileDense IndexonAge33Bristow, 30, 2007Basu, 33, 4003Cass, 50, 5004Tracy, 44, 5004Daniels, 22, 6003Jones, 40, 6003B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick10Index Classification (Contd.)•Composite Search Keys: Search on a combination of fields.•Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index:•age=20 and sal =75•Range query: Some field value is not a constant. E.g.:•age =20; or age=20 and sal > 10•Data entries in index sorted by search key to support range queries.•Lexicographic order, or•Spatial order.sue 13 75bobcaljoe 121020801112name age sal<sal, age><age, sal> <age><sal>12,2012,1011,8013,7520,1210,1275,1380,111112121310207580Data recordssorted by nameData entries in indexsorted by <sal,age>Data entriessorted by <sal>Examples of composite keyindexes using lexicographic order.B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick11Introduction Indexing Techniques•As for any index, 3 alternatives for data entries k*:1. Data record with key value


View Full Document

UH COSC 3480 - B+-Trees and Hashing Techniques for Storage and Index Structures

Download B+-Trees and Hashing Techniques for Storage and Index Structures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view B+-Trees and Hashing Techniques for Storage and Index Structures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view B+-Trees and Hashing Techniques for Storage and Index Structures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?