Stanford CS 369 - Lecture Notes - D606505

Home> Schools> Stanford University> Computer Science (CS) > CS 369> Lecture Notes

Stanford CS 369 - Lecture Notes

Course Cs 369- Topics in Analysis of Algorithms

Pages 5

Download Save

Unformatted text preview:

CS369M: Algorithms for Modern Massive Data Set Analysis Lecture 1 - 09/23/2009The Johnson-Lindenstrauss LemmaLecturer: Michael Mahoney Scribes: Ben Newhouse and Gourab Mukherjee*Unedited notes1 Background and MotivationData can be modeled using many different methods. Two examples being:• A matrix A ∈ <m∗nm data pointsn features• A graph G consisting of vertexes V and edges E, which can be represented as (among others):1. An adjacency matrix Aij= {0, 1}2. A Laplacian matrix where matrix L = D − A where:D - degree matrix (a diagonal matrix representing the number of edges connected to each vertex)A - adjacency matrix of above3. A normalized Laplacian L = D12LD12= I − D12AD12All of the aforementioned representations are based upon linear algebra. Modern algorithms in data miningtake care to employ a number of techniques to ensure that solving problems posed do not become intractable.We can characterize the above data representations in the context of algorithmic complexity using thefollowing tools: for matrices m data points, n features, and M nonzero elements; for graphs p = |v|, q = |m|.2 Taking advantage of dimensionalityIn shaping our algorithms to be feasably computable, we can examine our problem space in terms of dimen-sionality:Low Dimensionality - Good, computation is inexpensiveHigh Dimensionality - Good, use measure concentration and the Law of Large numbers (ie. randomness)Medium Dimensionality - Hard, neither of the above advantages applyThis raises the goal:Goal: Given points P =n ∈ <d, describe P in fewer demensions k << n, d. That is, define a functionf such that ∀xi, xj∈ P||f(xi) − f(xj)||2≈ ||xi− xj||2(1)This dimensionality reduction is import in many different fields, namely:1• Latent Semantic Indexing (LSI) and Information Retrieval (IR) - in addition to reducing storageand computation needs, dimensionality reduction can also be used for clustering of related features(terms).[2]• Finding nearest neighbors• Learning mixtures of gaussians• Compound compressed sensing applicationsAn important part of understanding dimensionality reduction is the Johnson-Lindenstrauss Lemma. TheJohnson-Lindenstrauss Lemma states that any n points in high dimensional euclidian space can be mappedonto k dimensions where k ≥ O(log n/2) without distorting the euclidian distance between any two pointsmore than a factor of 1 ± .[1]While the remaining part of these notes pertain to proving that such a mapping exists (rather than howto find it), it is good to keep in mind that it is important to ensure that finding the function f that mapsbetween dimensions is computable itself.3 Johnson-Lindenstrauss LemmaLemma For any 0 <  < 1 and any interger n let k be a possitive interger such thatk ≥2432− 23log n (2)then for any set A of n points ∈ <dthere exists a map f : <d→ <ksuch that for all xi, xj∈ A(1 − )||xi− xj||2≤ ||f(xi) − f(xj)||2≤ (1 + )||xi− xj||2(3)Further this map can be found in randomized ploynomial time. Repeating this projection O(n) times canboost the success probability to any desired constant, giving us a randomized polynomial time algorithm.[1]General Proof Outline: Construct a random projection over k dimensional subspaces. Prove that theexpected value of the euclidian distance of the random projection is equal to the euclidian distance of theoriginal subspace. Prove that the variance of the euclidian distance is greater than the specified error factoronly with a probability F = 2/n2such that the union bound of this probability across all pairs of points isless than 1 − 1/n.3.1 Formal ProofDefinition Let R be a random matrix of order k × d i.e Riji.i.d∼ N(0, 1) and u be any fixed vector ∈ <d.Define v =1√kR · u . Thus v ∈ <kand vi=1√kPjRijujLemma E||v||22 = ||u||222ProofE||v||22= E[kXi=1v2i] (Breaking out the summation)=kXi=1E[v2i] (Move factor out)=kXi=11kE[(XjRijuj)2] (Introduce a dummy variable k)=kXi=11kX1≤j,k≤dujukE(RijRik) (δjk= {0, j 6= k; 1, j = k})=kXi=11kX1≤j,k≤dujukδjk=kXi=11kX1≤j≤du2j=X1≤j≤du2j= ||u||22Definition Let X =√k||u||v i.e xi=1||u||RTi· u for i = 1(1)kx =Pki=1x2i= ||X||22=k||v||22||u||22Note : {vi: i = 1(1)k}i.i.d∼ N(0, ||u||22/k) . So X ∼ Nk(0, I)Lemma The probability P||v||22≥ (1 + )||u||22 > 1 − n−2ProofP||v||22≥ (1 + )||u||22|| = P||u||22xk≥ (1 + )||u||22= P [x ≥ (1 + )k]= Pheλx≥ eλ(1+)ki(for all λ ≥ 0 )3By Markov’s inequalityP [x ≥ a] ≤E [x]aPheλx≥ eλ(1+)ki≤Eeλx eλ(1+)k≤kYi=1Eheλx2iieλ(1+)k(as xi’s are i.i.d.)≤Eheλx2iieλ(1+)kk≤1√1 − 2λ · eλ(1+)k(for all 0 < λ < 1/2 ; using m.g.f of χ2.)Setting λ =2(1+)≤(1 + )e− k/2Using the inequality log(1 + x) < x − x2/2 + x3/3 and (2)≤ e−(2/2−3/3)k/2≤ e−2 log n≤ n−2Lemma The probability P||v||22≤ (1 − )||u||22 > 1 − n−2ProofP||v||22≤ (1 − )||u||22|| = P||u||22xk≤ (1 − )||u||22= P [x ≤ (1 − )k]= Phe−λx≥ e−λ(1−)ki(for all λ ≥ 0)By Markov’s inequalityP [x ≥ a] ≤E [x]aPhe−λx≥ e−λ(1−)ki≤Ee−λx e−λ(1−)k≤kYi=1Ehe−λx2iie−λ(1−)k(as xi’s are i.i.d.)≤Ehe−λx2iie−λ(1−)kk≤1√1 + 2λ · e−λ(1−)k( using the m.g.f. of the χ2distribution )4Setting λ =2(1−)≤(1 − )e− k/2Using the inequality log(1 − x) < −x − x2/2 and (2)≤ n−2Combining the above we get P(||v||22/∈ [(1 − )||u||22, (1 + )||u||22]) ≤2n2Since u is an arbitrary d dimensional vector, this probability holds for u = xi− xjwhere xi, xjare any twopoints in A and f is defined as multiplication by the random k ×d matrix. Using the Bonferroni Inequality,taking the disjoint set A consisting of all pairings of points (of count n, thus withn(n−1)2pairings), the unionbound for the probability of any pair of points falling out of the desired error is:P (∪Ei) ≤nXi=1P (Ei)≤n(n − 1)22n2≤ 1 −1nThereby completing our proof.4 References1. S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. RandomStructures and Algorithms, 22(1):60-65, 2003.2. M. W. Berry, Z. Drmac, and E. R. Jessup. Matrices, vector spaces, and information retrieval. SIAMReview, 41(2):335-362, 2001.3. D. Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins.

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford CS 369 - Lecture Notes

Sign up for free to view:

Please select your school