DOC PREVIEW
CORNELL CS 211 - B-Trees

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

B-TreesIntroductionA B-tree is a specialized multiway tree designed especially for use on disk. In a B-tree each node may contain alarge number of keys. The number of subtrees of each node, then, may also be large. A B-tree is designed tobranch out in this large number of directions and to contain a lot of keys in each node so that the height of thetree is relatively small. This means that only a small number of nodes must be read from disk to retrieve an item.The goal is to get fast access to the data, and with disk drives this means reading a very small number ofrecords. Note that a large node size (with lots of keys in the node) also fits with the fact that with a disk driveone can usually read a fairly large amount of data at once (perhaps 1024 bytes). DefinitionsA multiway tree of order m is an ordered tree where each node has at most m children. For each node, if k is theactual number of childen in the node, then k - 1 is the number of keys in the node. If the keys and subtrees arearranged in the fashion of a search tree, then this is called a multiway search tree of order m. For example, thefollowing is a multiway search tree of order 4. Note that the first row in each node shows the keys, while thesecond row shows the pointers to the child nodes. Of course, in any useful application there would be a recordof data associated with each key, so that the first row in each node might be an array of records where eachrecord contains a key and its associated data. Another approach is to have the first row of each node contain anarray of records where each record contains a key and a record number for the associated data record, which isfound in another file. This last method is often used when the data records are large. The example software willuse the first method. What does it mean to say that the keys and subtrees are "arranged in the fashion of a search tree"? Then a multiway search tree of order 4 has to fulfill the following conditions related to the ordering of the keys: The keys in each node are in ascending order. At every given node (call it Node) the following is true: The subtree starting at record Node.Branch[0] has only keys that are less than Node.Key[0]. The subtree starting at record Node.Branch[1] has only keys that are greater than Node.Key[0] and1 of 10B-Treesat the same time less than Node.Key[1]. The subtree starting at record Node.Branch[2] has only keys that are greater than Node.Key[1] andat the same time less than Node.Key[2]. The subtree starting at record Node.Branch[3] has only keys that are greater than Node.Key[2]. Note that if less than the full number of keys are in the Node, these 4 conditions are truncated so that theyspeak of the appropriate number of keys and branches. This generalizes in the obvious way to multiway search trees with other orders. A B-tree of order m is a multiway search tree of order m such that: All leaves are on the bottom level. All internal nodes (except the root node) have at least ceil(m / 2) (nonempty) children. The root node can have as few as 2 children if it is an internal node, and can obviously have no children ifthe root node is a leaf (that is, the whole tree consists only of the root node). Each leaf node must contain at least ceil(m / 2) - 1 keys. Note that ceil(x) is the so-called ceiling function. It's value is the smallest integer that is greater than or equal tox. Thus ceil(3) = 3, ceil(3.35) = 4, ceil(1.98) = 2, ceil(5.01) = 6, ceil(7) = 7, etc. A B-tree is a fairly well-balanced tree by virtue of the fact that all leaf nodes must be at the bottom. Condition(2) tries to keep the tree fairly bushy by insisting that each node have at least half the maximum number ofchildren. This causes the tree to "fan out" so that the path from root to leaf is very short even in a tree thatcontains a lot of data. Example B-TreeThe following is an example of a B-tree of order 5. This means that (other that the root node) all internal nodeshave at least ceil(5 / 2) = ceil(2.5) = 3 children (and hence at least 2 keys). Of course, the maximum number ofchildren that a node can have is 5 (so that 4 is the maximum number of keys). According to condition 4, eachleaf node must contain at least 2 keys. In practice B-trees usually have orders a lot bigger than 5. Operations on a B-Tree2 of 10B-TreesQuestion: How would you search in the above tree to look up S? How about J? How would you do a sort-of"in-order" traversal, that is, a traversal that would produce the letters in ascending order? (One would only dosuch a traversal on rare occasion as it would require a large amount of disk activity and thus be very slow!) Inserting a New ItemThe insertion algorithm proceeds as follows: When inserting an item, first do a search for it in the B-tree. If theitem is not already in the B-tree, this unsuccessful search will end at a leaf. If there is room in this leaf, justinsert the new item here. Note that this may require that some existing keys be moved one to the right to makeroom for the new item. If instead this leaf node is full so that there is no room to add the new item, then thenode must be "split" with about half of the keys going into a new node to the right of this one. The median(middle) key is moved up into the parent node. (Of course, if that node has no room, then it may have to be splitas well.) Note that when adding to an internal node, not only might we have to move some keys one position tothe right, but the associated pointers have to be moved right as well. If the root node is ever split, the mediankey moves up into a new root node, thus causing the tree to increase in height by one. An example: Insert the following letters into what is originally an empty B-tree of order 5: A G F B K D H M JE S I R X C L N T U P. Order 5 means that a node can have a maximum of 5 children and 4 keys. All nodesother than the root must have a minimum of 2 keys. The first 4 letters get inserted into the same node, resultingin this picture: When we try to insert the K, we find no room in this node, so we split it into 2 nodes, moving the median item Fup into a new root node. Note that in practice we just leave the A and B in the current node and place the G andK into a new node to the right of the old one. Inserting D, H, and M proceeds without requiring any splits: 3 of …


View Full Document

CORNELL CS 211 - B-Trees

Documents in this Course
Hashing

Hashing

3 pages

Load more
Download B-Trees
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view B-Trees and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view B-Trees 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?