New version page

Notes

Upgrade to remove ads

This preview shows page 1-2 out of 5 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

HashingReview:• Goal: map s items from size m universe to table of size n• some items get mapped to same place: “collision”• Problem: any function has bad set mapping m/n items to same bucket• Solution: build family of functions, choose one that works wellHash Families:• Random function has good behavior, but hard to compute efficiently• Goal: O(1) access time• So can only look at constant number of cells.• Each holds value in range 1, . . . , m (log m bits)• So, fixed number of cells can only distinguish poly(m) functions• This bounds size of hash family we can choose fromRecall random function analysis:• set S of s items– what is expected time for i access?– Cij= 1 if i, j collide– Time to find i isPjCij– expected value (s − 1)/n ≤ 1 for s ≤ n (and optimal for s)2-universal family:• how much independence was used above? pairwise (search item versus each other item)• so: OK if items land pairwise independent• pick p in range m, . . . , 2m (not random)• pick random a, b• map x to (ax + b mod p) mod n– pairwise independent, uniform before mod m– So pairwise independent, near-uniform after mod m• argument above holds: O(1) expected search time.1• represent with two O(log m)-bit integers: hash family of poly size.• em max load?– expected load in a bin is 1– so O(√n) with prob. 1-1/n (chebyshev).– this bounds expected max-load– some item may have bad load, but unlikely to be the requested oneperfect hash families• perfect hash function: no collisions• for any S of s ≤ n, perfect h in family• eg, set of all functions• but hash choice in table: mO(1)size family.• exists iff m = 2Ω(n)(probabilistic method) (hard computationally)– random function. Pr(perfect)= n!/nn– So take nn/n! ≈ enfunctions. Pr(all bad)= 1/e– Number of subsets: at most mn– So take en· ln mn= nenln m functions. Pr(all bad)≤ 1/mn– So with nonzero probability, no set has all bad functions (union)– number of functions: nenln m = mO(1)if m = 2Ω(n)• Too bad: only fit sets of log m items• also, hard computationallyAlternative try: use more space:• How big can s be for random s to n without collisions?– Expected number of collisions is E[PCij] =s2(1/n) ≈ s2/2n– So s =√n works with prob. 1/2• Is this best possible?– Birthday problem: (1 −1/n) ···(1 − s/n) ≈ e−s2/2n– So, when s =√n has Ω(1) chance of collision– 23 for birthdaysTwo level hashing solves problem2• Hash s items into O(s) space• Build quadratic size hash table on contents of each bucket• boundPb2k=Pi[i ∈ bk] =PCi+ Cij= O(s)• expected O(s).• So try till get• Then build collision-free quadratic tables inside• Try till get• Polynomial time in s, Las-vegas algorithm• Easy: 6s cells• Hard: s + o(s) cells (bit fiddling)Derandomization• Probability 1/2 top-level function works• Only m2top-level functions• Try them all!• Polynomial in m, deterministic algorithmTreapsDictionaries for ordered sets• New Operations.– enumerate in order– successor-of, predecessor-of (even if not in set)– join(S, k, T ), split, paste(S, T )Binary tree.• child and parent pointers• endogenous: leaf nodes empty.• balanced if depth O(log n)• average case.• worst case3Tree balancing• rotations• implementing operations.• red/black, AVL• splay trees.– drawbacks in geometry:– auxiliary structure on nodes in subtree– rebuild on rotationReturning to average case:• Assign random “arrival orders” to keys• Build tree as if arrived in that order• Average case applies• No rotations on searchesChoosing priorities• define arrival by random priorities• assume continuous distribution, fix.• eg, use 2 log n bits, w.h.p. no collisionsTreaps.• tree has keys in heap order of priorities• unique tree given priorities—follows from insertion order• implement insert/delete etc.• rotations to maintain heap propertyDepth d(x) analysis• Tree is trace of a quicksort• We proved O(log n) w.h.p.• for x rank k, E[d(x)] = Hk+ Hn−k+1− 1• S−= {y ∈ S | y ≤ x}• Qx= ancestors of x4• Show E[Q−x] = Hk.• to show: y ∈ Q−xiff inserted before all z, y < z ≤ x.• deduce: item j away has prob 1/j. Add.• Suppose y ∈ Q−x.– The inserted before x– Suppose some z between inserted before y– Then y in left subtree of z, x in right, so not ancestor– Thus, y before every z• Suppose y first– then x follows y on all comparisons (no z splits– So ends up in subtree of yRotation analysis• Insert/Delete time– define spines– equal left spine of right sub plus right spine of left sub– proof: when rotate up, on spine increments, other stays fixed.• Rxlength of right spine of left subtree• E[Rx] = 1 − 1/k if rank k• To show: y ∈ Rxiff– inserted after x– all z, y < z < x, arrive after y.– if z before y, then y goes left, so not on spine• deduce: if r elts between, r! of (r + 2)! permutations work.• So probability 1/r2.• ExpectationP1/(1 · 2) + 1/(2 · 3) + ··· = 1 − 1/k• subtle: do analysis only on elements inserted in real-time before x, but now assumethey arrive in random order in virtual


Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?