UCSD CSE 291 - Locally Linear Embedding - D2114112

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 291> Locally Linear Embedding

UCSD CSE 291 - Locally Linear Embedding

School name University of California, San Diego

Course Cse 291-

Pages 39

Download Save

Unformatted text preview:

Locally Linear Embedding Paul Ruvolo Raghava N Swamy Aim Dimensionality Reduction Non Linear PCA Non Parametric Non Probabilistic No generative model for data Why care Long standing problem in geometry and statistics to compute a low dimensional embedding of high dimensional data Without using prior knowledge about data Applications in Machine learning Pattern recognition Comp Neuroscience etc Other methods PCA Project data along the eigenvectors of covariance matrix You all know MDS Preserve pair wise distances similar to PCA Their drawbacks Both are Linear not suitable for non linear manifolds create distortions in non linear manifolds LLE doesn t have local minima in optimization just like PCA The Algorithm Over to Paul LLE and its Extensions Outline Basic LLE Algorithm Convex Reconstructions Enforcing dimensionality From Embeddings to mapping LLE in One Slide Step 1 Compute Neighbors Many ways can be used to define the notion of neighborhood K nearest Epsilon size ball Requirement need a decent margin between the number of neighbors per point K and the embedding dimension d d K Effect of Neighborhood Size Weakly or Disconnected Components Verify that neighborhood graph is connected If it is not or is weakly so then treat the data as lying on two submanifolds Find Reconstruction Weights Motivation capture the intrinsic geometry of the neighborhood Creates a locally linear mapping from the high dimensional coordinates to the manifold coordinates Scissors analogy Find Reconstruction Weights Invariances scale rotation translation Comparison to ISOMAP Similarities Both use nearest neighbor to achieve nonlinearity Differences Preserving different things reconstruction weights versus geodesic distances LLE involves no dynamic programming LLE tends to emphasize preserving purely local geometry and ignores faraway inputs whereas Isomap is dominated by preserving geodesics of far away points Example Convex Reconstructions Basic LLE weights sum to 1 Convex Reconstruction weights sum to 1 and are non negative Geometrically reconstruction lies in the convex hull instead of the subspace of NN Pros More robust to outliers Cons More computationally expensive May do poorly for points that lie out of the convex hull of the neighbors Enforcing Intrinsic Dimensionality If we know the manifold dimension a priori we can remove noise before computing reconstruction weights Simply artificially limit the rank of G by doing PCA on the neighborhood first Creating a Mapping nonParametric Map from X to Y Compute K nearest neighbors of x find optimal reconstruction weights Y is computed by using the weights found above and the embeddings of the nearest neighbors computed previously Evaluating Non Parametric Mapping Creating a Mapping Parametric Model the joint distribution of original and embedded coordinates Training data available from the embedding Mapping given by E y x and E x y Can be seen as a mixture of conditional linear models with Gaussian noise Parametric Mapping cont Joint distribution p x y z p z p y z p x y z Intuitively p z is the density of points in a neighborhood p y z is models the local behavior of the embedded coordinates with respect to the embedded mean of the neighborhood z p x y z models the transformation from embedded points to original points by a linear mapping with Gaussian noise Learning the Parametric Mapping Generative model is learned using E M with the added twist that we do not treat y as a hidden variable but rather clamp it to the values we obtain from our embedding Issues with LLE Rank deficient local gram matrix can yield uninformative solutions By limiting ourselves to one reconstruction weight vector we are ignoring many other almost optimal vectors MLLE Modified LLE MLLE Basic Idea Instead of having a single vector wi for each point have a set of vectors Wi wi1 wi2 wis Why care Shortcomings of LLE The singularity Gram Matrix and the regularization with parameter creates problems Here is one problem Consider the rare case when GTG is near singular In LLE we regularize the Gram matrix using a parameter and proceed If the G was exactly singular usually happens when k D not d then there is a problem with convergence behavior of lim 0 Another problem When GTG is not full rank the learned solutions are brittle multiple solutions exist as the Null space is multidimensional Example below N 20 k 4 D 2 optimal affine transformation Solution Problems occur because some singular values of Gi say s of them are small the solution is There exists s approximately optimal weight vectors optimal if singular values are zero which are linearly ind Of each other Proof A problem w is large when G is approximately singular so we replace w by w else all the solutions will be more or less similar This improves the condition number of W which is now bounded by Modified embedding Find s vectors with the below equations To find the embedding Determining si Not always k d because the background may not be settled This brings out the trade off between accuracy and stability s k d Adaptive strategy Reorder Set So when there is curvature i is more hence is more and s is less allowing for lesser number of weight vectors and hence combination error decreases Summary Results 1 Swiss Roll 2 Shape with peaks 3 Digit embedding in a 2 D space 4 Face embedding into a 2 D space

View Full Document


School:
Email:
New Password:
Confirm Password:

UCSD CSE 291 - Locally Linear Embedding

Sign up for free to view:

Please select your school