DOC PREVIEW
MIT 6 838 - Algorithms for Streaming Data

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

April 12, 2005 Lecture 16: Algorithms for Streaming DataAlgorithms for Streaming DataPiotr IndykApril 12, 2005 Lecture 16: Algorithms for Streaming DataStreaming Data• Problems defined over points P={p1,…,pn}• The algorithm sees p1, then p2, then p3,…• Key fact: it has limited storage– Can store only s<<n points– Can store only s<<n bits (need to assume finite precision)p1....p2....p3….p4....p5….p6….p7….April 12, 2005 Lecture 16: Algorithms for Streaming DataExample - diameterApril 12, 2005 Lecture 16: Algorithms for Streaming DataProblems• Diameter• Minimum enclosing ball•l2norm of a high-dimensional vectorApril 12, 2005 Lecture 16: Algorithms for Streaming DataDiameter in ld’∞• Assume we measure distances according to the l∞norm• What can we do ?April 12, 2005 Lecture 16: Algorithms for Streaming DataDiameter in l∞, ctd.• From previous lecture we know thatDiam∞(P)=maxi=1…d’[maxp∈Ppi -minp∈Ppi]• Can maintain max/min in constant space• Total space = O(d’)• What about l1?April 12, 2005 Lecture 16: Algorithms for Streaming DataDiameter in l1• Let f:l1d→l∞2^dbe an isometric embedding• We will maintain Diam∞(f(P))– For each point p, we compute f(p) and feed it to the previous algorithm– Return the pair p,q that maximizes ||f(p)-f(q)||∞• This gives O(2d) space for l1d• What about l2?April 12, 2005 Lecture 16: Algorithms for Streaming DataDiameter in l2• Let f:l2d→l∞d’, d’=O(1/ε)(d-1)/2, be a (1+ε)-distortion embedding• Apply the same algorithm as before• Parameters:– Space: O(1/ε)(d-1)/2April 12, 2005 Lecture 16: Algorithms for Streaming DataMinimum Enclosing Ball• Problem: given P={p1…pn}, find center oand radius r>0 such that –P ⊆ B(o,r)–ris as small as possible• Solve the problem in l∞• Generalize to l1and l2via embeddingsApril 12, 2005 Lecture 16: Algorithms for Streaming DataMEB in l∞• Let C be the hyper-rectangle defined by max/min in every dimension• Easy to see that min radius ball B(o,r) is a min size hypercube that contains C• Min radius = min side length/2• How to solve it in l2?April 12, 2005 Lecture 16: Algorithms for Streaming DataMEB in l2• Firstly, assume (1+ε)≈1• Let f:l2d→l∞d’be an “almost” isometric embedding• Algorithm:– For each point p, compute f(p)– Maintain MEB∞B’(o’,r) of f(p1)…f(pn)– Compute o such that f(o)=o’– Report B(o,r)April 12, 2005 Lecture 16: Algorithms for Streaming DataProblem• There might be NO o such that f(o)=o’• If it was the case, then we would always have MEB radius=Diameter/2, which is not true:• The problem is that f is into, not ontoApril 12, 2005 Lecture 16: Algorithms for Streaming DataThe Correct Version• Algorithm:– Maintain the min/max points f(p1)…f(p2d’) , two points per dimension– Compute MEB B(o,r) of p1…p2d’– Report B(o,r(1+ε))April 12, 2005 Lecture 16: Algorithms for Streaming DataCorrectnessMEB radius for P= Min r s.t. ∃o P ⊆ B(o,r)≈ Min r s.t. ∃o f(P) ⊆ B(f(o),r)= Min r s.t. ∃o {f(p1)…f(p2d’)} ⊆ B(f(o),r) ≈ Min r s.t. ∃o {p1…p2d’} ⊆ B(o,r)= MEB radius for {p1…p2d’}• Total error at most (1+ε)2• In reality, at most (1+ε)April 12, 2005 Lecture 16: Algorithms for Streaming DataDigression: Core Sets• In the previous slide we use the fact that in l∞, for any set P of points, there is a subset P’ of P, |P|=2d’ , such that MEB(P’)=MEB(P)•P’is called a “core-set” for the MEB of P in l∞• For more on core-sets, see the web page by Sariel Har-PeledApril 12, 2005 Lecture 16: Algorithms for Streaming DataMaintaining l2norm of a vector• Implicit vector x=(x1,…,xn)• Start with x=0• Stream: sequence of pairs (i,b) , meaningxi=xi+b• Goal: maintain (approximately) ||x||2April 12, 2005 Lecture 16: Algorithms for Streaming DataMotivation• Consider a set of web pages, stored in some order• Two pages are “similar” if they link to the same page• Note that each page is similar to itself• Want to know the number of pairs of similar web pages • Web pages stored sequentially on a diskApril 12, 2005 Lecture 16: Algorithms for Streaming DataConnection to l2norm• Let –In(i)be the # in-links to page i–Out(i)be the # out-links of page i•Out(i)is easy to compute, In(i) is not• We want to compute½*∑i In(i) (In(i)+1) = ½ [∑i In(i)2+ ∑i In(i)]• Every time we see link to i: In(i):=In(i)+1April 12, 2005 Lecture 16: Algorithms for Streaming DataApproximate Algorithm• Algorithm:– Computes a (1+ε)-approximation to ||x||2with probability 1-P–Stores O(log(1/P)/ε2) numbersApril 12, 2005 Lecture 16: Algorithms for Streaming DataAlgorithm• From JL lemma, it suffices to maintain Axfor “random” A, since ||Ax|| ≈ ||x||• Assume –we have Ax– Need to compute Ay, where y=x except for yi=xi+b• Use linearity: Ay = A(y-x)+Ax = A(bei)+Ax = b ai+ AxApril 12, 2005 Lecture 16: Algorithms for Streaming DataPseudo-randomness• In practice: use A[i,j]=Normal( RND(i,j) )• In theory: one can use bounded space random generators to generate A using only O( log n * log(1/P)/ε2) random


View Full Document

MIT 6 838 - Algorithms for Streaming Data

Download Algorithms for Streaming Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Algorithms for Streaming Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Algorithms for Streaming Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?