MIT 6 897 - Advanced Data Structures Lecture 14 - D2488771

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 897> Advanced Data Structures Lecture 14

MIT 6 897 - Advanced Data Structures Lecture 14

School name Massachusetts Institute of Technology

Course 6 897-

Pages 11

Download Save

Unformatted text preview:

6.897: Advanced Data Structures Spring 2003Lecture 14 — Monday April 14, 2003Prof. Erik Demaine Scribe: Kunal Agrawal, Vladimir Kiriansky1 OverviewToday we will first see a file-maintenance algorithm that allows random inserts and deletes inO(lg2n) time in a sequential file taking linear space.Then we will move onto algorithms that care about memory hierarchy. In modern computers,memory is not flat. It has multiple levels of progressively slower and larger memory. Thus it is notequally costly to access all items in memory. For example, if you store a matrix in row-order, thenscanning in row-order is much faster than scanning in column order.Our goal to get models that capture this behavior and analyze why some algorithms do so well.We will see two memory hierarchy models that will be used later to analyze various algorithms anddata structures.The first algorithm we see is an ordered-file-maintainence algorithm which can be implemented ina file system for random insertions and deletions in a file. This basic algorithm will be later usedas a black box for more complicated algorithms.2 Ordered-File Maintenance [IKR81, BDFC00]2.1 Problem StatementGiven n elements in some order, store them in that order in an array of size O(n). The followingoperations must be supported:• Insert an element between 2 given elements• Delete any given element.2.2 Na¨ıve SolutionStore all elements in an array with no gaps, as shown on Figure 1. For insert and delete, we haveto move all elements after the position where the insert or delete took place.Cost: O(n) in the worst case for both inserts and deletes.1Figure 1: Insertion into an array2.3 Better SolutionIn the following sections we will discuss a better solution for this problem with cost O(lg2n)amortized for both inserts and deletes. It is possible to get O(lg2n) worst case [Wil92, BCD+02],but the analysis is more complex and is not covered in this lecture. It is conjectured that thesebounds are the best possible with linear space; see [DSZ94] for a matching lower bound but in aweak model.2.4 Rough IdeaThe idea is to keep gaps in the array so that the elements can usually be inserted without shifting theentire array. These gaps should be roughly evenly distributed and should remain evenly distributedin spite of insertions and deletions. So the rough idea of the algorithm is to ensure that, on insertion,the area around the element that was inserted doesn’t get too crowded. If it does, then we findan interval around it that has enough blank spaces in total, and then redistribute the blank spacesevenly in the interval. The interval is grown exponentially, so that there have to be many insertsbefore it gets full again and thus we can charge to these inserts.2.5 Tree ViewFor analysis purposes, we look at the conceptual tree view of the array. Divide the array intochunks of Θ(lg n) slots. These chunks are the leaves of the binary tree.The height of the tree is h = lg n−lg lg n. Conceptually, each node of the tree represents an intervalwhich is the combination of its children’s intervals. The size of the interval of the leaves is Ω(lg n)and as we walk up the tree the size of the interval grows exponentially. Thus when any interval istoo full, we try to find an ancestor which is not too full (on average) and redistribute the spaces.If we reach the root of the tree and the interval is still full, we double the size of the array andredistribute the gaps.2.6 DensityThe density of a node represents how full or how empty the interval represented by that node is:Definition 1 (Density). Density(node) =number of elements present in intervalnumber of slots in interval2Figure 2: Conceptual Tree RepresentationDensity is always a constant between 0 (completely empty) and 1 (completely full).2.6.1 Density ConstraintsThe density constraints define the lower and upper bounds on the density.Node thresholds at depth d:• lower density threshold and constraintdensity ≥12−14dh= ldt(d)ldt ∈·14,12¸• upper density threshold and constraintdensity ≤34+14dh= udt(d)udt ∈·34, 1¸• density threshold intervalthresholds(node) =hldt(depth(node)), udt(depth(node))i3Figure 3: Density ConstraintsNotice that the constraints are tighter as we go higher up the tree. In particular, at depth 0 (theroot), 1/2 ≤ density ≤ 3/4 and, at depth h (the leaves), 1/4 ≤ density ≤ 1.These constraints are not always satisfied for all nodes. They are violated occasionally. But if weever see the violation, we fix it. If the violation is never seen, then it doesn’t effect the algorithm.The density constraints can be tuned to take less space. With these constraints, the array size isabout twice the number of elements. This constant factor can be made arbitrarily small (1 + ²) bychanging the constraints.2.7 AlgorithmTo insert a new element x between y and z given pointers to y and z:Insert(x, y, z):insert x into the same chunk as y (or z) using na¨ıve insertionnode ← leaf node representing chunkif density(node) /∈ thresholds(node):while density(node) /∈ thresholds(node):node ← parent(node)rebalance node by evenly redistributing all elements in the intervalDelete(x) is exactly the same except for the first step, which is to delete the element x from thechunk containing it using na¨ıve deletion.2.8 DescriptionIt is guaranteed that there is at least one gap. So inserting into the leaf always succeeds in Θ(lg n)time by the na¨ıve shifting algorithm. Now, if the leaf is within threshold, we are done. If the leafis outside threshold, we do not have enough gaps within this interval, so we need to find a largerinterval with more gaps. When we reach a level at which there are enough gaps, i.e., at which theinterval is within threshold, we redistribute the gaps evenly in that entire interval. Because theparents have tighter constraints than their children, all children will also now have enough gaps(be within threshold). The cost of redistribution can be amortized by charging to the inserts and4deletes that brought a child node outside threshold. If we reach the root without finding any nodewithin threshold, we rebuild the whole tree.2.9 AnalysisThe important observation is that thresholds get weaker as we go deeper in the tree. The conse-quence of this observation is simple but powerful.Whenever we rebalance a node, it is because one if its children was outside of the thresholds, whilethe node itself is within thresholds. After rebalancing children, densities

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 897 - Advanced Data Structures Lecture 14

Sign up for free to view:

Please select your school