DOC PREVIEW
LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME∗

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

LOW REDUNDANCY IN STATIC DICTIONARIES WITHCONSTANT QUERY TIME∗RASMUS PAGH†SIAM J. COMPUT.c2001 Society for Industrial and Applied MathematicsVol. 31, No. 2, pp. 353–363Abstract. A static dictionary is a data structure storing subsets of a finite universe U , answeringmembership queries. We show that on a unit cost RAM with word size Θ(log |U |), a static dictionaryfor n-element sets with constant worst case query time can be obtained using B +O(log log |U |)+o(n)bits of storage, where B = log2|U |n is the minimum number of bits needed to represent all n-element subsets of U .Key words. information retrieval, dictionary, hashing, redundancy, compressionAMS subject classifications. 68P10, 68P20, 68P30PII. S00975397003699091. Introduction. Consider the problem of storing a subset S of a finite set U ,such that membership queries, “u ∈ S?”, can be answered in worst case constant timeon a unit cost RAM. We are interested only in membership queries, so we assume thatU = {0,...,m−1}. Also, we restrict attention to the case where the RAM has wordsize Θ(log m). In particular, elements of U can be represented within a constantnumber of machine words, and the usual RAM operations (including multiplication)on numbers of size mO(1)can be done in constant time.Our goal will be to solve this, the static dictionary problem, using little memory,measured in consecutive bits. We express the complexity in terms of m = |U| andn = |S|, and often consider the asymptotics when n is a function of m. Since thequeries can distinguish any two subsets of U, we need at leastmndifferent memoryconfigurations, that is, at least B = logmn bits. (Log is base 2 throughout thispaper.) We will focus on the case n ≤ m/2 and leave the symmetry implications tothe reader. Using Stirling’s approximation to the factorial function, one can derivethe following (where e =2.718 ... denotes the base of the natural logarithm):B = n log(e m/n) − Θ(n2/m) − O(log n).(1.1)It should be noted that using space very close to B is only possible if elements ofS are stored implicitly, since explicitly representing all elements requires n log m =B +Ω(n log n) bits.Previous work. The static dictionary is a very fundamental data structure andhas been much studied. We focus on the development in space consumption forschemes with worst case constant query time. A bit vector is the simplest possiblesolution to the problem, but the space complexity of m bits is poor compared to Bunless n ≈ m/2. By the late 1970s, known dictionaries with a space complexity ofO(n) words (i.e., O(n log m) bits) either had nonconstant query time or worked only for restricted universe sizes [4, 16, 17].∗Received by the editors April 3, 2000; accepted for publication (in revised form) November 27,2000; published electronically August 8, 2001. A preliminary version of this paper appeared atICALP 1999 [12].http://www.siam.org/journals/sicomp/31-2/36990.html†BRICS (Basic Research in Computer Science, Centre of the Danish National Research Founda-tion), University of Aarhus, Aarhus, Denmark ([email protected]).353354 RASMUS PAGHThe breakthrough paper of Fredman, Koml´os, and Szemer´edi [7] described ageneral constant time hashing scheme, from now on referred to as the FKS dictionary,using O(n) words. A refined solution in the paper uses B +O(n log n + log log m) bits,which is O(B) when n = m1−Ω(1). Brodnik and Munro [2] constructed the first staticdictionary using O(B) bits with no restrictions on m and n. They later improved thebound to B + O(B/ log log log m) bits [3].No nontrivial space lower bound is known in a general model of computation.However, various restrictions on the data structure and the query algorithm havebeen successfully studied. Yao [17] showed that if words of the data structure mustcontain elements of S, the number of words necessary for o(log n) time queries cannotbe bounded by a function of n. Fich and Miltersen [6] studied a RAM with standardunit cost arithmetic operations but without division and bit operations, and showedthat query time o(log n) requires Ω(m/n) words of memory for any >0. Miltersen[11] showed that on a RAM with bit operations but without multiplication, one needsmwords, for some >0, when n = mo(1).This paper. In this paper we show that it is possible to achieve space usagevery close to the information theoretical minimum of B bits. The additional term ofthe space complexity, which we will call the redundancy, will be o(n)+O(log log m)bits. More precisely we show the following.Theorem 1.1. The static dictionary problem with worst case constant query timecan be solved using B + O(n (log log n)2/ log n + log log m) bits of storage.Theorem 1.1 improves the redundancy of Ω(min(n log log m, m/(log n)o(1))) ob-tained by Brodnik and Munro [3] by a factor of Ω(min(n, log log m (log n)1−o(1))).For example, when n =Ω(m) we obtain space B + B/(log B)1−o(1)as compared toB + B/(log B)o(1).Forn = Θ(log m) our space usage is B + n/(log n)1−o(1)ratherthan B +Ω(n log n).We will also show how to associate satellite data from a finite domain to eachelement of S, with nearly the same redundancy as above.Our main observation is that one can save space by “compressing” the hash tablepart of data structures based on (perfect) hashing, storing in each cell not an elementof S but only a quotient — information that distinguishes the element from the part ofU that hashes to the cell.1This technique, referred to as quotienting, is presented insection 2, where a B + O(n + log log m) bit dictionary is exhibited. Section 3 outlineshow to improve the dependency on n to that of theorem 1.1. The construction usesa dictionary supporting rank and predecessor queries, described in section 4. Section5 gives the details of the construction and an analysis of the redundancy. The sizesof the data structures described are not computed explicitly. Rather, indirect meansare employed to determine the number of redundant bits. While direct summationand comparison with (1.1) would be possible, it is believed that the proofs given herecontain more information about the “nature” of the redundancy.Without loss of generality, we will assume that n is greater than some sufficientlylarge constant. This is to avoid worrying about special cases for small values of n.2. First solution. This section presents a static dictionary with a space con-sumption of B + O(n + log log


LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME∗

Download LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME∗
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME∗ and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME∗ 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?