BU CS 565 - Dimensionality reduction

Unformatted text preview:

Dimensionality reductionOutline• From distances to points : – MultiDimensional Scaling (MDS)– FastMap• Dimensionality Reductions or data projections• Random projections• Principal Component Analysis (PCA)Multi-Dimensional Scaling (MDS)• So far we assumed that we know both data points X and distance matrix D between these points• What if the original points X are not known but only distance matrix D is known?• Can we reconstruct X or some approximation of X?Problem• Given distance matrix D between n points• Find a k-dimensional representation of every xipoint i• So that d(xi,xj) is as close as possible to D(i,j)Why do we want to do that?How can we do that? (Algorithm)High-level view of the MDS algorithm• Randomly initialize the positions of n points in a k-dimensional space• Compute pairwise distances D’ for this placement • Compare D’ to D• Move points to better adjust their pairwisedistances (make D’ closer to D)• Repeat until D’ is close to DThe MDS algorithm• Input: nxn distance matrix D• Random n points in the k-dimensional space (x1,…,xn)• stop = false• while not stop– totalerror = 0.0– For every i,j compute • D’(i,j)=d(xi,xj)• error = (D(i,j)-D’(i,j))/D(i,j)• totalerror +=error• For every dimension m: xim= (xim-xjm)/D’(i,j)*error– If totalerror small enough, stop = trueQuestions about MDS• Running time of the MDS algorithm– O(n2I), where I is the number of iterations of the algorithm• MDS does not guarantee that metric property is maintained in d’• Faster? Guarantee of metric property?Problem (revisited)• Given distance matrix D between n points• Find a k-dimensional representation of every xipoint i• So that: – d(xi,xj) is as close as possible to D(i,j)– d(xi,xj) is a metric – Algorithm works in time linear in nFastMap• Select two pivot points xaand xbthat are far apart.• Compute a pseudo-projection of the remaining points along the “line” xaxb• “Project” the points to a subspace orthogonal to “line” xaxband recurse.Selecting the Pivot PointsThe pivot points should lie along the principal axes, and hence should be far apart.– Select any point x0– Let x1be the furthest from x0– Let x2be the furthest from x1– Return (x1, x2)x0x2x1Pseudo-ProjectionsGiven pivots (xa, xb), for any third point y, we use the law of cosines to determine the relation of y along xaxbThe pseudo-projection for y isThis is first coordinate.xaxbycyda,ydb,yda,b2 2 22by ay ab y abd d d c d2 2 22ay ab byyabd d dcd“Project to orthogonal plane”Given distances along xaxbcompute distances within the “orthogonal hyperplane”Recurse using d ’(.,.), until k features chosen.22'( ', ') ( , ) ( )zyd y z d y z c cxbxayzy’z’d’y’,z’dy,zcz-cyThe FastMap algorithm• D: distance function, Y: nxk data points• f=0 //global variable• FastMap(k,D)– If k<=0 return– (xa,xb) chooseDistantObjects(D)– If(D(xa,xb)==0), set Y[i,f]=0 for every i and return– Y[i,f] = [D(a,i)2+D(a,b)2-D(b,i)2]/(2D(a,b))– D’(i,j) // new distance function on the projection– f++– FastMap(k-1,D’)FastMap algorithm• Running time– Linear number of distance computationsThe Curse of Dimensionality• Data in only one dimension is relatively packed• Adding a dimension “stretches” the points across that dimension, making them further apart• Adding more dimensions will make the points further apart—high dimensional data is extremely sparse• Distance measure becomes meaningless(graphs from Parsons et al. KDD Explorations 2004)The curse of dimensionality• The efficiency of many algorithms depends on the number of dimensions d– Distance/similarity computations are at least linear to the number of dimensions– Index structures fail as the dimensionality of the data increasesGoals• Reduce dimensionality of the data• Maintain the meaningfulness of the dataDimensionality reduction• Dataset X consisting of n points in a d-dimensional space• Data point xiєRd(d-dimensional real vector): xi= [xi1, xi2,…, xid]• Dimensionality reduction methods:– Feature selection: choose a subset of the features– Feature extraction: create new features by combining new onesDimensionality reduction• Dimensionality reduction methods:– Feature selection: choose a subset of the features– Feature extraction: create new features by combining new ones• Both methods map vector xiєRd, to vector yiєRk, (k<<d)• F : RdRkLinear dimensionality reduction• Function F is a linear projection• yi= A xi• Y = A X• Goal: Y is as close to X as possibleCloseness: Pairwise distances• Johnson-Lindenstrauss lemma: Given ε>0, and an integer n, let k be a positive integer such that k≥k0=O(ε-2logn). For every set X of npoints in Rdthere exists F: RdRksuch that for all xi, xjєX(1-ε)||xi - xj||2≤ ||F(xi )- F(xj)||2≤ (1+ε)||xi - xj||2What is the intuitive interpretation of this statement?JL Lemma: Intuition• Vectors xiєRd, are projected onto a k-dimensional space (k<<d): yi= R xi• If ||xi||=1 for all i, then, ||xi-xj||2is approximated by (d/k)||xi-xj||2• Intuition:– The expected squared norm of a projection of a unit vector onto a random subspace through the origin is k/d– The probability that it deviates from expectation is very smallJL Lemma: More intuition• x=(x1,…,xd), d independent Gaussian N(0,1) random variables; y = 1/|x|(x1,…,xd)• z : projection of y into first k coordinates– L = |z|2, μ = E[L] = k/d• Pr(L ≥ (1+ε)μ)≤1/n2and Pr(L ≤ (1-ε)μ)≤1/n2• f(y) = sqrt(d/k)z• What is the probability that for pair (y,y’): |f(y)-f(y’)|2/(|y-y’|) does not lie in range [(1-ε),(1+ ε)]?• What is the probability that some pair suffers?Finding random projections• Vectors xiєRd, are projected onto a k-dimensional space (k<<d)• Random projections can be represented by linear transformation matrix R• yi= R xi• What is the matrix R?Finding random projections• Vectors xiєRd, are projected onto a k-dimensional space (k<<d)• Random projections can be represented by linear transformation matrix R• yi= R xi• What is the matrix R?Finding matrix R• Elements R(i,j) can be Gaussian distributed • Achlioptas* has shown that the Gaussian distribution can be replaced by• All zero mean, unit variance distributions for R(i,j)would give a mapping that satisfies the JL lemma• Why is Achlioptas result useful?61 prob with 132 prob with 061


View Full Document

BU CS 565 - Dimensionality reduction

Documents in this Course
Load more
Download Dimensionality reduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dimensionality reduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dimensionality reduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?