Unformatted text preview:

title pagepage 2Charting a manifoldpage 2page 3page 4page 5page 6page 7page 8MERL – A MITSUBISHI ELECTRIC RESEARCH LABORATORYhttp://www.merl.comCharting a manifoldMatthew BrandTR-2003-13 March 2003AbstractWe construct a nonlinear mapping from a high-dimensional sample space to a low-dimensionalvector space, effectively recovering a Cartesian coordinate system for the manifold from whichthe data is sampled. The mapping preserves local geometric relations in the manifold and ispseudo-invertible. We show how to estimate the intrinsic dimensionality of the manifold fromsamples, decompose the sample data into locally linear low-dimensional patches, merge thesepatches into a single lowdimensional coordinate system, and compute forward and reverse map-pings between the sample and coordinate spaces. The objective functions are convex and theirsolutions are given in closed form.This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or inpart without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copiesinclude the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; anacknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying,reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric InformationTechnology Center America. All rights reserved.Copyrightc Mitsubishi Electric Information Technology Center America, 2003201 Broadway, Cambridge, Massachusetts 02139Presented at NIPS-15, December 2002. Appears in Proceedings, Neural Information Processing Systems,volume 15. MIT Press, March 2003.Charting a manifoldMatthew BrandMitsubishi Electric Research Labs201 Broadway, Cambridge MA 02139 USAwww.merl.com/people/brand/AbstractWe construct a nonlinear mapping from a high-dimensional sample spaceto a low-dimensional vector space, effectively recovering a Cartesiancoordinate system for the manifold from which the data is sampled.The mapping preserves local geometric relations in the manifold and ispseudo-invertible. We show how to estimate the intrinsic dimensionalityof the manifold from samples, decompose the sample data into locallylinear low-dimensional patches, merge these patches into a single low-dimensional coordinate system, and compute forward and reverse map-pings between the sample and coordinate spaces. The objective functionsare convex and their solutions are given in closed form.1 Nonlinear dimensionality reduction (NLDR) by chartingCharting is the problem of assigning a low-dimensional coordinate system to data pointsin a high-dimensional sample space. It is presumed that the data lies on or near a low-dimensional manifold embedded in the sample space, and that there exists a 1-to-1 smoothnonlinear transform between the manifold and a low-dimensional vector space. The data-modeler’s goal is to estimate smooth continuous mappings between the sample and co-ordinate spaces. Often this analysis will shed light on the intrinsic variables of the data-generating phenomenon, for example, revealing perceptual or configuration spaces.Our goal is to find a mapping—expressed as a kernel-based mixture of linear projections—that minimizes information loss about the density and relative locations of sample points.This constraint is expressed in a posterior that combines a standard gaussian mixture model(GMM) likelihood function with a prior that penalizes uncertainty due to inconsistent pro-jections in the mixture. Section 3 develops a special case where this posterior is unimodaland maximizable in closed form, yielding a GMM whose covariances reveal a patchwork ofoverlapping locally linear subspaces that cover the manifold. Section 4 shows that for this(or any) GMM and a choice of reduced dimension d, there is a unique, closed-form solutionfor a minimally distorting merger of the subspaces into a d-dimensional coordinate space,as well as an reverse mapping defining the surface of the manifold in the sample space.The intrinsic dimensionality d of the data manifold can be estimated from the growth pro-cess of point-to-point distances. In analogy to differential geometry, we call the subspaces“charts” and their merger the “connection.” Section 5 considers example problems wherethese methods are used to untie knots, unroll and untwist sheets, and visualize video data.1.1 BackgroundTopology-neutral NLDR algorithms can be divided into those that compute mappings, andthose that directly compute low-dimensional embeddings. The field has its roots in map-ping algorithms: DeMers and Cottrell [3] proposed using auto-encoding neural networkswith a hidden layer “bottleneck,” effectively casting dimensionality reduction as a com-pression problem. Hastie defined principal curves [5] as nonparametric 1D curves that passthrough the center of “nearby” data points. A rich literature has grown up around properlyregularizing this approach and extending it to surfaces. Smola and colleagues [10] analyzedthe NLDR problem in the broader framework of regularized quantization methods.More recent advances aim for embeddings: Gomes and Mojsilovic [4] treat manifold com-pletion as an anisotropic diffusion problem, iteratively expanding points until they connectto their neighbors. The ISOMAP algorithm [12] represents remote distances as sums of atrusted set of distances between immediate neighbors, then uses multidimensional scalingto compute a low-dimensional embedding that minimally distorts all distances. The locallylinear embedding algorithm (LLE) [9] represents each point as a weighted combination ofa trusted set of nearest neighbors, then computes a minimally distorting low-dimensionalbarycentric embedding. They have complementary strengths: ISOMAP handles holes wellbut can fail if the data hull is nonconvex [12]; and vice versa for LLE [9]. Both offer em-beddings without mappings. It has been noted that trusted-set methods are vulnerable tonoise because they consider the subset of point-to-point relationships that has the lowestsignal-to-noise ratio; small changes to the trusted set can induce large changes in the set ofconstraints on the embedding, making solutions unstable [1].In a return to mapping, Roweis and colleagues [8] proposed global coordination—learninga


View Full Document

MIT 6 454 - CHARTING A MANIFOLD

Documents in this Course
Load more
Download CHARTING A MANIFOLD
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CHARTING A MANIFOLD and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CHARTING A MANIFOLD 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?