MIT 6 897 - Classical Data Structures - D1069062

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 897> Classical Data Structures

MIT 6 897 - Classical Data Structures

School name Massachusetts Institute of Technology

Course 6 897-

Pages 11

Download Save

Unformatted text preview:

6.897: Advanced Data Structures Spring 2003Project Presentations — May 7, 12, and 14, 2003Prof. Erik Demaine Scribes: Ray Jones and Mihai Pˇatra¸scu1 Classical Data Structures1.1 Partial sums data structures — Mihai Pˇatra¸scuThe problem is to maintain an array of size n, containing b-bit integers (b ≥ lg n), subjectto the following operations:update(k, ∆) replaces A[k] ← A[k] + ∆. Here ∆ must be a δ-bit integer, where δ ≤ b is aparameter of the problem.sum(k) returns the partial sumPki=1A[i].select(σ) returns the index i such that sum(i − 1) < σ ≤ sum(i).Several new results were presented. First, an upper bound of Θ(1 + lg n/ lg(b/δ)) wasobtained for the RAM model. Second, a matching lower bound on the cell-probe complexityof update and sum was derived. Finally, a tight lower bound of Ω(lg n) was obtained forthe group model of computation.Open problems include proving a tight bound on the complexity of sequences of update andselect (without the sum operation), and analyzing the off-line problem.Update: This work will appear in the 15th Annual ACM-SIAM Symposium on DiscreteAlgorithms, 2004.1.2 Nearest neighbor data structures — Ray Jones, Mihai Pˇatra¸scuThis project investigated the predecessor problem in an algebraic model of computationwith random access. It is known that interpolation search achieves an expected O(lg lg n)bound if elements are drawn independently from a uniform distribution. Interpolationsearch trees attain this bound for the dynamic problem, but impose additional restrictions,1such as deletes being uniformly random. It can be questioned whether such ideal distribu-tions appear in practice. It would be desirable to obtain good bounds in terms of a moremeasurable parameter; one such parameter is the spread factor D, defined as the ratiobetween the maximum and minimum difference between consecutive elements.The main result of this project is an O(lg D · lg lg n) bound for insert, delete, and prede-cessor queries. The key idea is to generate random samples from a distribution defined interms of the input values. These samples are stored in an interpolation search tree; regularbinary search trees hold the values between consecutive samples. If the number of sam-ples is appropriately chosen, the binary search trees have polylogarithmic size with highprobability.It would be interesting to analyze the performance of interpolation search in terms of D.It should also be investigated whether regular interpolation search can be used instead ofinterpolation search trees.Update: This work (with an improved bound of O(lg D)) will appear in the 15th AnnualACM-SIAM Symposium on Discrete Algorithms, 2004.1.3 Faster hashing — Vladimir KirianskyHashing is one of the oldest and most important data-structural problems. The collision-chaining technique yields good (optimal) average-case performance, and represents a goodreference point for evaluating the performance of alternative implementations. In manycontexts, however, we want constant running times in the worst case for lookups. Thisis achieved for the static case by the classical Fredman-Koml´os-Szemer´edi perfect hashingscheme, which was later generalized to the dynamic case by Dietzfelbinger et al. Un-fortunately, perfect hashing has rather large constant hidden by the asymptotic analysis,especially in the space consumption.The cuckoo hashing of Pagh and Rodler is another scheme that achieves constant worst-case bounds for lookups; in fact, the lookup routine makes only two memory probes. Thespace consumption is much lower, roughly 2n. The space consumption can be reducedeven further by d-ary cuckoo hashing, but at the price of increasing the number of memoryprobes needed during a lookup.However, experimental evidence suggests that cache performance is more important thanthe number of memory probes. In this context, it was proposed that every element in the2hash table be replaced by a group of k elements that fits in a cache line. This reduces thenumber of misses in primary search location, and does not decrease the number of cachelines that must be loaded. Experimental evidence found that this scheme is effective. Aninteresting open problem is to give some theoretical results along these lines.1.4 Offline linear search — Ilya Baran, Denis Cebikins, Lev TeytelmanSelf-organizing linear search is a classic computer-science problem; several data structuresfor the online version were discussed in class. The offline version of the problem has also beenconsidered. It is known that the problem is NP-hard. Furthermore, a 1.6-approximationalgorithm can be obtained by derandomizing the COMB algorithm, which is the best knownalgorithm for the online problem. This project attempted to give a better approximationalgorithm.It was noted that, in the offline case, there exists an optimal solution using no free swaps.Projective algorithms were discussed and it was remarked that the projective optimum onlygives a good approximation of the real optimum for lists of 3 elements. A new algorithmfor the offline case was presented. It was shown that it is always as least as good as move-to-front, but unfortunately it only gives a factor 2 approximation in the worst case.The main open problem is to find an approximation algorithm that improves the 1.6 bound.2 Text Indexing Data Structures2.1 String data structures — Alexandr Andoni and Cristian CadarWhile suffix trees give an optimal solution for static text indexing, no such solution is knownfor the case when characters can be inserted and deleted from the text dynamically. Threesolutions for this problem were proposed, having different running times in terms of |T |, |P|,and k (the number of updates seen so far):1. The first solution maintains a static suffix tree and uses exhaustive search to findadditional matches straddling update points. This gives O(lg k) updates and O(k · P )searches.2. The second solution achieves O(T2/3) updates, and O(P2) searches. This is done bymaintaining a tree of prefixes and a tree of suffixes, in conjunction with a table that3gives matches defined by two nodes in these trees. The whole structure is rebuilt whenk = Θ(n1/3).3. The third solution achieves O(T2/3) updates and O(P lg k) searches, by augmentingthe nodes in the suffix and prefix trees with some additional links.2.2 Dynamic text indexing — Hans RobertsonThe first part of this project concerned text indexing data structures for dynamic texts. Itwas shown

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 897 - Classical Data Structures

Sign up for free to view:

Please select your school