CSE332: Data AbstractionsLecture 6: Dictionaries; Binary Search TreesTyler RobisonSummer 20101Where we areADTs so far:1. Stack: push, pop, isEmpty2. Queue: enqueue, dequeue, isEmpty3. Priority queue: insert, deleteMinNext:4. Dictionary: associate keys with values probably the most common, way more than priority queue Ex: Binary Search Tree, HashMap2LIFOFIFOMinThe Dictionary (a.k.a. Map, a.k.a. Associative Array) ADT Data: set of (key, value) pairs keys must be comparable (< or > or =) Primary Operations: insert(key,val): places (key,val) in map If key already used, overwrites existing entry find(key): returns val associated with key delete(key)3Comparison: Set ADT vs. Dictionary ADTThe Set ADT is like a Dictionary without any values A key is present or not (no repeats)For find, insert, delete, there is little difference In dictionary, values are “just along for the ride” So same data-structure ideas work for dictionaries and sets Java HashSet implemented using a HashMap, for instanceSet ADT may have other important operations union, intersection, is_subset notice these are operators on 2 sets4Dictionary data structuresWill spend the next week or two looking at three important dictionary data structures:1. AVL trees Binary search trees with guaranteed balancing2. B-Trees Also always balanced, but different and shallower B!=Binary; B-Trees generally have large branching factor3. Hashtables Not tree-like at allSkipping: Other balanced trees (red-black, splay)5A Modest Few UsesAny time you want to store information according to some key and be able to retrieve it efficiently Lots of programs do that! Networks: router tables Compilers: symbol tables Databases, phone directories, associating username with profile, …6Some possible data structuresWorst case for dictionary with n key/value pairsinsert find delete Unsorted linked-list Unsorted array Sorted linked list Sorted arrayWe‟ll see a Binary Search Tree (BST) probably does better…But not in the worst case unless we keep it balanced*Correction: Given our policy of „no duplicates‟, we would need to do O(n) work to check for a key‟s existence before insertion7O(1)* O(n) O(n)O(1)* O(n) O(n)O(n) O(n) O(n)O(n) O(log n) O(n)Some tree terms (review… again) A tree can be balanced or not A balanced tree with n nodes has a height of O(log n) Different tree data structures have different “balance conditions” to achieve this8ABD ECF GBalanced:n=7h=2BCDEFGAUnbalanced:n=7h=6Binary Trees Binary tree is empty or a node (with data), and with a left subtree (maybe empty) a right subtree (maybe empty) Representation:ABD ECFHGJIDataright pointerleftpointer• For a dictionary, data will include key and a valueDitched this representation for binary heaps, but it’s useful for BST9Binary Trees: Some NumbersRecall: height of a tree = longest path from root to leaf (counting # of edges)Operations tend to be a function of heightFor binary tree of height h: max # of leaves: max # of nodes: min # of leaves: min # of nodes:For n nodes, we cannot do better than O(log n) height, and we want to avoid O(n) height2h2(h+1)– 11h+110Calculating heightHow do we find the height of a tree with root r?int treeHeight(Node root) {???}11Calculating heightHow do we find the height of a tree with root r?int treeHeight(Node root) {if(root == null)return -1;return 1 + max(treeHeight(root.left),treeHeight(root.right));}Running time for tree with n nodes: O(n) – single pass over treeNote: non-recursive is painful – need your own stack of pending nodes; much easier to use recursion‟s call stack12Tree TraversalsA traversal is an order for visiting all the nodes of a tree Pre-order: root, left subtree, right subtree+*245 In-order: left subtree, root, right subtree2*4+5 Post-order: left subtree, right subtree, root24*5++*2 45Expression tree13More on traversalsvoid inOrdertraversal(Node t){if(t != null) {traverse(t.left);process(t.element);traverse(t.right);}}Sometimes order doesn‟t matter• Example: sum all elementsSometimes order matters• Example: print tree with parent above indented children (pre-order)• Example: print BST values in order (in-order)ABDECFGABD ECF G14Binary Search Tree Structural property (“binary”) each node has 2 children Order property all keys in left subtree smallerthan node‟s key all keys in right subtree largerthan node‟s key result: easy to find any given key4121062115814137 915Are these BSTs?3117184541810621158202171516Are these BSTs?3117184541810621158202171517Yep NopeFind in BST, Recursive2092155123071710Data find(Key key, Node root){if(root == null)return null;if(key < root.key)return find(key,root.left);if(key > root.key)return find(key,root.right);return root.data;}18Run-time (for worst-case)?Find in BST, Iterative2092155123071710Data find(Key key, Node root){while(root != null&& root.key != key) {if(key < root.key)root = root.left;else if(key > root.key)root = root.right;}if(root == null)return null;return root.data;}19For iteratively calculating height & doing traversals, we needed a stack. Why do we not need one here?Other “finding operations” Find minimum node Find maximum node Find predecessor Find successor209215512307171020Insert in BST20921551230717insert(13)insert(8)insert(31)108 311321How do we insert k elements to get a completely unbalanced tree?How do we insert k elements to get a balanced tree?Lazy DeletionA general technique for making delete as fast as find: Instead of actually removing the item just mark it deleted“Uh, I‟ll do it later”Plusses: Simpler Can do removals later in batches If re-added soon thereafter, just unmark the deletionMinuses: Extra space for the “is-it-deleted” flag Data structure full of deleted nodes wastes space Can hurt run-times of other operationsWe‟ll see lazy deletion in use later10 12 24 30 41 42 44 45 50 22(Non-lazy) Deletion in BST20921551230717Why might deletion be harder than insertion?1023Deletion Removing an item disrupts the tree structure Basic idea: find the node to be removed, then “fix” the tree so that it is still a binary search tree Three cases: node has no children (leaf) node
View Full Document