##
This **preview** shows page *1-2-3*
out of 10 **pages**.

*View Full Document*

End of preview. Want to read all 10 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

Random projection trees and low dimensional manifoldsSanjoy DasguptaUC San [email protected] FreundUC San [email protected] present a simple variant of the k-d tree which automat-ically adapts to intrinsic low dimensional structure in datawithout having to explicitly learn this structure.1. INTRODUCTIONA k-d tree [4] is a spatial data structure that partitions RDinto hyperrectangular cells. It is built in a recursive manner,splitting along one coordinate direction at a time (Figure 1,left). The succession of splits corresponds to a binary treewhose leaves contain the individual cells in RD.These trees are among the most widely-used spatial par-titionings in machine learning and statistics. To understandtheir application, consider Figure 1(left), and suppose thatthe dots are points in a database, while the cross is a queryp oint q. The cell containing q, henceforth d enoted cell(q),can quickly be identified by moving q down the tree. If thediameter of cell(q) is small (where the diameter is taken tomean the distance between the furthest pair of data pointsin the cell), then the points in it can be expected to havesimilar prop erties, for instance similar labels. In classifica-tion, q is assigned the majority label in its cell, or the labelof its nearest neighbor in the cell. In regression, q is assignedthe average response value in its cell. In vector quantization,q is replaced by the mean of the data points in the cell. Nat-urally, the statistical theory around k-d trees is centered onthe rate at which the diameter of individual cells drops asyou move down the tree; for details, see page 320 of [8].It is an empirical observation that the usefulness of k-dtrees diminishes as the dimension D increases. This is easyto explain in terms of cell diameter; specifically, we will showthat there is a data set in RDfor which a k-d tree requiresD levels in order to halve the cell diameter. In other words,if the data lie in R1000, it could take 1000 levels of the treeto bring the diameter of cells down to half that of the entiredata set. This would require 21000data points!Thus k-d trees are susceptible to the same curse of dimen-sionality that has been the bane of other nonparametric sta-Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.STOC’08, May 17–20, 2008, Victoria, British Columbia, Canada.Copyright 2008 ACM 978-1-60558-047-0/08/05 ...$5.00.qqFigure 1: Left: A spatial partitioning of R2inducedby a k-d tree with three levels. The dots are datapoints; the cross marks a query point q. Right: Par-titioning induced by an RP tree.tistical methods. However, a recent positive development inmachine learning has been the realization that a lot of datawhich superficially lie in a very high-dimensional space RD,actually have low intrinsic d imension, in the sense of lyingclose to a manifold of dimension d ≪ D. There has beensignificant interest in algorithms which learn this manifoldfrom data, with the intention that future data can then betransformed into this low-dimensional space, in which stan-dard methods will work well. This field is quite recent andyet the literature on it is already voluminous; early founda-tional work includes [24, 23, 3].In this p aper, we are interested in techniques that auto-matically adapt to intrinsic low dimensional structure with-out having to explicitly learn this structure. The most ob-vious first question is, do k-d trees adapt to intrinsic lowdimension? The answer is no: the bad example mentionedab ove has an intrinsic dimension of just O(log D). But weintroduce a simple variant of k -d trees that does possess thisproperty. Instead of splitting along coordinate directions atthe median, we split along a random direction in SD−1(theunit sphere in RD), and instead of splitting exactly at themedian, we add a small amount of “jitter”. We call theserandom projection trees (Figure 1, right), or RP trees forshort, and we show the following.Pick any cell C in the RP tree. If the data in Chave intrinsic dimension d, then all descendantcells ≥ d log d levels below will have at most halfthe diameter of C.There is no dependence on the extrinsic dimensionality (D)of the data.2. DETAILED OVERVIEWIn what follows, we always assume the data lie in RD.2.1 Low-dimensional manifoldsThe increasing ubiquity of massive, high-dimensional datasets has focused the attention of the statistics and machinelearning communities on the curse of dimensionality. A largepart of this effort is based on exploiting the observation thatmany high-dimensional data sets have low intrinsic dimen-sion. This is a loosely defined notion, which is typically usedto mean that the data lie near a smooth low-dimensionalmanifold.For instance, suppose that you wish to create realistic an-imations by collecting human motion data and then fittingmod els to it. A common method for collecting motion datais to have a person wear a skin-tight suit with high contrastreference points printed on it. Video cameras are used totrack the 3D trajectories of the reference points as the per-son is walking or running. In order to ensure good coverage,a typical suit has about N = 100 reference points. The posi-tion and postur e of the body at a particular point of time isrepresented by a (3N )-dimensional vector. However, despitethis seeming high dimensionality, the number of degrees offreedom is small, cor responding to the dozen-or-so joint an-gles in the body. The positions of the reference points aremore or less deterministic functions of these joint angles.To take another example, a speech signal is commonlyrepresented by a high-dimensional time series: the signal isbroken into overlapping windows, and a variety of filters areapplied within each window. Even richer representationscan be obtained by using more filters, or by concatenatingvectors corresponding to consecutive windows. Through allthis, the intrinsic dimensionality remains small, because thesystem can be described by a few physical parameters de-scribing the configuration of the speaker’s vocal apparatus .2.2 Intrinsic dimensionalityIn this paper we explore three