DOC PREVIEW
UMD CMSC 424 - Class Notes

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Quick Review of material covered Apr 8• B+-Tree Overview and some definitions– balanced tree– multi-level– reorganizes itself on insertion and deletion– built so each node fits on a single disk page• Examined mechanics of B+-tree Insertion and Deletion– looked at several examples• We’ll finish up B+-trees with two more concepts:– B+-tree File Organization– B-tree index filesB+-tree File Organization• B+-Tree Indices solve the problem of index file degradation. The original data file will still degrade upon a stream of insert/delete operations.• Solve data-file degradation by using a B+-tree file organization• Leaf nodes in a B+-tree file organization store records, not pointers into a separate original datafile– since records are larger than pointers, the maximum number of recrods that can be stored in a leaf node is less than the number of pointers in a non-leaf node– leaf nodes must still be maintained at least half full– insert and delete are handled in the same was as insert and delete for entries in a B+-tree indexB+-tree File Organization Example• Records are much bigger than pointers, so good space usage is important• To improve space usage, involve more sibling nodes in redistribution during splits and merges (to avoid split/merge when possible) – involving one sibling guarantees 50% space use– involving two guarantees at least 2/3 space use, etc.B-tree Index Files• B-trees are similar to B+-trees, but search-key values appear only once in the index (eliminates redundant storage of key values)– search keys in non-leaf nodes don’t appear in the leaf nodes, so an additional pointer field for each search key in a non-leaf node must be stored to point to the bucket or record for that key value– leaf nodes look like B+-tree leaf nodes:(P1, K1, P2, K2, …, Pn)– non-leaf nodes look like so:(P1, B1, K1, P2, B2, K2, …, Pn)where the Biare pointers to buckets or file records.B-tree Index File ExampleB-treeandB+-treeB-tree Index Files (cont.)• Advantages of B-tree Indices (vs. B+-trees)– May use less tree nodes than a B+-tree on the same data– Sometimes possible to find a specific key value before reaching a leaf node• Disadvantages of B-tree Indices– Only a small fraction of key values are found early– Non-leaf nodes are larger, so fanout is reduced, and B-trees may be slightly taller than B+-trees on the same data– Insertion and deletion are more complicated than on B+-trees– Implementation is more difficult than B+-trees• In general, advantages don’t outweigh disadvantagesHashing• We’ve examined Ordered Indices (design based upon sorting or ordering search key values); the other type of major indexing technique is Hashing• Underlying concept is very simple: – observation: small files don’t require indices or complicated search methods– use some clever method, based upon the search key, to split a large file into a lot of little buckets– each bucket is sufficiently small– use the same method to find the bucket for a given search keyHashing Basics– A bucket is a unit of storage containing one or more records (typically a bucket is one disk block in size)– In a hash file organization we find the bucket for a record directly from its search-key value using a hash function– A hash function is a function that maps from the set of all search-key values K to the set of all bucket addresses B– The hash function is used to locate records for access, insertion, and deletion– Records with different search-key values may be mapped to the same bucket• the entire bucket must be searched to find a record• buckets are designed to be small, so this task is usually not onerousHashed File Example– So we:• divide the set of disk blocks that make up the file into buckets• devise a hash function that maps each key value into a bucketV: set of key valuesB: number of bucketsH: hashing function H: V--> (0, 1, 2, 3, …, B-1)Example: V= 9 digit SS#; B=1000; H= key modulo 1000Hash Functions• To search/insert/delete/modify a key do:– compute H(k) to get the bucket number– search sequentially in the bucket (heap organization within eachbucket)• Choosing H: almost any function that generates “random” numbers in the range [0, B-1]– try to distribute the keys evenly into the B buckets– one rule of thumb when using MOD -- use a prime numberHash Functions (2)• Collision is when two or more key values go to the same bucket– too many collisions increases search time and degrades performance– no or few collisions means that each bucket has only one (or very few) key(s)• Worst-case hash functions map all search keys to the same bucketHash Functions (3)• Ideal hash functions are uniform– each bucket is assigned the same number of search-key values from the set of all possible values• Ideal hash functions are random– each bucket has approximately the same number of records assigned to it irrespective of the actual distribution of search-key values in the file• Finding a good hash function is not always easyExamples of Hash Functions• Given 26 buckets and a string-valued search key, consider the following possible hash functions:– Hash based upon the first letter of the string– Hash based upon the last letter of the string– Hash based upon the middle letter of the string– Hash based upon the most common letter in the string– Hash based upon the “average” letter in the string: the sum of the letters (using A=0, B=1, etc) divided by the number of letters– Hash based upon the length of the string (modulo 26)• Typical hash functions perform computation on the internal binary representation of the search key– example: searching on a string value, hash based upon the binary sum of the characters in the string, modulo the number of


View Full Document

UMD CMSC 424 - Class Notes

Documents in this Course
Lecture 2

Lecture 2

36 pages

Databases

Databases

44 pages

Load more
Download Class Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Class Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Class Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?