DOC PREVIEW
BU CS 565 - Database-friendly random projections

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Database-friendly random projections: Johnson-Lindenstrauss with binary coinsIntroductionJohnson and Lindenstrauss [9]Our contributionProjecting onto random linesRandomizationDerandomizationPrevious workFrankl and Meahara [6]Some intuitionPreliminaries and the spherically symmetric casePreliminariesThe spherically symmetric caseTail boundsMoment boundsDiscussionDatabase-friendlinessFurther workAcknowledgementsReferenceshttp://www.elsevier.com/locate/jcssJournal of Computer and System Sciences 66 (2003) 671–687Database-friendly random projections:Johnson-Lindenstrauss with binary coinsDimitris AchlioptasMicrosoft Research, One Microsoft Way, Redmond, WA 98052, USAReceived 28 August 2001; revised 19 July 2002AbstractA classic result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclideanspace can be embedded into k-dimensional Euclidean space—where k is logarithmic in n and independentof d—so that all pairwise distances are maintained within an arbitrarily small factor. All knownconstructions of such embeddings involve projecting the n points onto a spherically random k-dimensionalhyperplane through the origin. We give two constructions of such embeddings with the property that allelements of the projection matrix belong in f1; 0; þ1g: Such constructions are particularly well suited fordatabase environments, as the computation of the embedding reduces to evaluating a single aggregate overk random partitions of the attributes.r 2003 Elsevier Science (USA). All rights reserved.1. IntroductionConsider projecting the points of your favorite sculpture first onto the plane and then onto asingle line. The result amply demonstrates the power of dimensionality.In general, given a high-dimensional pointset it is natural to ask if it could be embedded into alower dimensional space without suffering great distortion. In this paper, we consider thisquestion for finite sets of points in Euclidean space. It will be convenient to think of n points in Rdas an n  d matrix A; each point represented as a row (vector).Given such a matrix representation, one of the most commonly used embeddings is the onesuggested by the singular value decomposition of A: That is, in order to embed the n points intoRkwe project them onto the k-dimensional space spanned by the singular vectors correspondingto the k largest singular values of A: If one rewrites the result of this projection as a (rank k) n  dmatrix Ak; we are guaranteed that any other k-dimensional pointset (represented as an n dARTICLE IN PRESSE-mail address: [email protected]/03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved.doi:10.1016/S0022-0000(03)00025-4matrix D) satisfiesjA  AkjFpjA  DjF;where, for any matrix Q; jQj2F¼PQ2i;j: To interpret this result observe that if moving a point by ztakes energy proportional to z2; Akrepresents the k-dimensional configuration reachable from Aby expending least energy.In fact, Akis an optimal rank k approximation of A under many matrix norms. In particular, itis well-known that for any rank k matrix D and for any rotationally invariant normjA  AkjpjA  Dj:At the same time, this optimality implies no guarantees regarding local properties of the resultingembedding. For example, it is not hard to devise examples where the new distance between a pairof points is arbitrarily smaller than their original distance. For a number of problems wheredimensionality reduction is clearly desirable, the absence of such local guarantees can make ithard to exploit embeddings algorithmically.In a seminal paper, Linial et al. [12] were the first to consider algorithmic applications ofembeddings that respect local properties. By now, embeddings of this type have become animportant tool in algorithmic design. A real gem in this area has been the following result ofJohnson and Lindenstrauss [9].Lemma 1.1 (Johnson and Lindenstrauss [9]). Given e40 and an integer n; let k be a positive integersuch that kXk0¼ Oðe2log nÞ: For every set P of n points in Rdthere exists f : Rd-Rksuch thatfor all u; vAPð1  eÞjju vjj2pjjf ðuÞf ðvÞjj2pð1 þ eÞjju vjj2:We will refer to embeddings providing a guarantee akin to that of Lemma 1.1 as JL-embeddings. In the last few years, such embeddings have been used in solving a variety ofproblems. The idea is as follows. By providing a low-dimensional representation of the data,JL-embeddings speed up certain algorithms dramatically, in particular algorithms whose run-timedepends exponentially in the dimension of the working space. (For a number of practicalproblems the best-known algorithms indeed have such behavior.) At the same time, the providedguarantee regarding pairwise distances often allows one to establish that the solution found byworking in the low-dimensional space is a good approximation to the solution in the originalspace. We give a few examples below.Papadimitriou et al. [13], proved that embedding the points of A in a low-dimensional space cansignificantly speed up the computation of a low-rank approximation to A; without significantlyaffecting its quality. In [8], Indyk and Motwani showed that JL-embeddings are useful in solvingthe e-approximate nearest-neighbor problem, where (after some preprocessing of the pointset P)one is to answer queries of the following type: ‘‘Given an arbitrary point x; find a point yAP;such that for every point zAP; jjx zjjXð1  eÞjjx  yjj:’’ In a different vein, Schulman [14] usedJL-embeddings as part of an approximation algorithm for the version of clustering where weseek to minimize the sum of the squares of intracluster distances. Recently, Indyk [7] showed thatARTICLE IN PRESSD. Achlioptas / Journal of Computer and System Sciences 66 (2003) 671–687672JL-embeddings can also be used in the context of ‘‘data-stream’’ computation, where one haslimited memory and is allowed only a single pass over the data (stream).1.1. Our contributionOver the years, the probabilistic method has allowed for the original proof of Johnson andLindenstrauss to be greatly simplified and sharpened, while at the same time giving conceptuallysimple randomized algorithms for constructing the embedding [5,6,8]. Roughly speaking, all suchalgorithms project the input points onto a spherically random hyperplane through the origin.While this is conceptually simple, in practical terms it amounts to multiplying the input matrix Awith a


View Full Document

BU CS 565 - Database-friendly random projections

Documents in this Course
Load more
Download Database-friendly random projections
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database-friendly random projections and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database-friendly random projections 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?