BU CS 565 - Dimensionality reduction - D1815279

Home> Schools> Boston University> Computer Science (CS) > CS 565> Dimensionality reduction

DOC PREVIEW

BU CS 565 - Dimensionality reduction

School name Boston University

Course Cs 565- Advanced Java Programming

Pages 44

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture outline• Dimensionality reduction– SVD/PCA– CUR decompositions• Nearest-neighbor search in low dimensions– kd-treesDatasets in the form of matricesWe are given n objects and d features describing the objects. (Each object has d numeric values describing it.)DatasetAn n-by-d matrix A, Aijshows the “importance” of feature j for object i.Every row of A represents an object.Goal1. Understand the structure of the data, e.g., the underlying process generating the data.2. Reduce the number of features representing the dataMarket basket matricesn customersd products (e.g., milk, bread, wine, etc.)Aij= quantity of j-th product purchased by the i-th customerFind a subset of the products that characterize customer behaviorSocial-network matricesn usersd groups (e.g., BU group, opera, etc.)Aij= partiticipation of the i-thuser in the j-th groupFind a subset of the groups that accurately clusters social-network usersDocument matricesn documentsd terms (e.g., theorem, proof, etc.)Aij= frequency of the j-thterm in the i-th documentFind a subset of the terms that accurately clusters the documentsRecommendation systemsn customersd products Aij= frequency of the j-th product is bought by the i-th customerFind a subset of the products that accurately describe the behavior or the customersThe Singular Value Decomposition (SVD)feature 1feature 2Object xObject d(d,x)Data matrices have n rows (one for each object) and d columns (one for each feature).Rows: vectors in a Euclidean space,Two objects are “close” if the angle between their corresponding vectors is small.4.0 4.5 5.0 5.5 6.02345SVD: ExampleInput: 2-d dimensional pointsOutput:1st (right) singular vector1st (right) singular vector: direction of maximal variance,2nd (right) singular vector2nd (right) singular vector: direction of maximal variance, after removing the projection of the data along the first singular vector.Singular values1: measures how much of the data variance is explained by the first singular vector.2: measures how much of the data variance is explained by the second singular vector.14.0 4.5 5.0 5.5 6.023451st (right) singular vector2nd (right) singular vectorSVD decompositionU (V): orthogonal matrix containing the left (right) singular vectors of A.S: diagonal matrix containing the singular values of A:(1≥ 2≥ … ≥ ℓ)Exact computation of the SVD takes O(min{mn2 , m2n}) time. The top k left/right singular vectors/values can be computed faster using Lanczos/Arnoldi methods.00n x d n x ℓ ℓ x ℓ ℓ x dA VTSU=objectsfeaturessignificantnoisenoisenoisesignificantsig.=SVD and Rank-k approximationsRank-k approximations (Ak)Uk(Vk): orthogonal matrix containing the top k left (right) singular vectors of A.Sk: diagonal matrix containing the top k singular values of AAkis an approximation of An x d n x k k x k k x dAkis the bestapproximation of ASVD as an optimization problemGiven C it is easy to find X from standard least squares.However, the fact that we can find the optimal C is fascinating!Frobenius norm:2minFdkkndnCXCAFind C to minimize:jiijFAA,22PCA and SVD• PCA is SVD done on centered data• PCA looks for such a direction that the data projected to it has the maximal variance• PCA/SVD continues by seeking the next direction that is orthogonal to all previously found directions• All directions are orthogonalHow to compute the PCA• Data matrix A, rows = data points, columns = variables (attributes, features, parameters)1. Center the data by subtracting the mean of each column2. Compute the SVD of the centered matrix A’ (i.e., find the first k singular values/vectors) A’ = UΣVT3. The principal components are the columns of V, the coordinates of the data in the basis defined by the principal components are UΣSingular values tell us something about the variance• The variance in the direction of the k-th principal component is given by the corresponding singular value σk2• Singular values can be used to estimate how many components to keep• Rule of thumb: keep enough to explain 85% of the variation: 85.01212njjkjjSVD is “the Rolls-Royce and the Swiss Army Knife of Numerical Linear Algebra.”**Dianne O’Leary, MMDS ’06SVD as an optimization problemGiven C it is easy to find X from standard least squares.However, the fact that we can find the optimal C is fascinating!Frobenius norm:2minFdkkndnCXCAFind C to minimize:jiijFAA,22The CX-decompositionGiven Cit is easy to find X from standard least squares.However, finding C is now hard!!! 2minFdkkndnCXCAFind C that contains subset of the columns of A to minimize:jiijFAA,22Why CX-decomposition• If A is an object-feature matrix, then selecting “representative” columns is equivalent to selecting “representative” features• This leads to easier interpretability; compare to eigenfeatures, which are linear combinations of all features.Algorithms for the CXdecomposition• The SVD-based algorithm• The greedy algorithm• The k-means-based algorithmAlgorithms for the CXdecomposition• The SVD-based algorithm– Do SVD first– Map k columns of A to the left singular vectors• The greedy algorithm– Greedily pick k columns of A that minimize the error• The k-means-based algorithm– Find k centers (by clustering the columns)– Map the k centers to columns of ADiscussion on the CX decomposition• The vectors in C are not orthogonal – they do not define a space• It maintains the sparcity of the dataNearest Neighbour in low dimensionsDefinition• Given: a set X of n points in Rd• Nearest neighbor: for any query point qєRdreturn the point xєX minimizing Lp(x,q)Motivation• Learning: Nearest neighbor rule• Databases: Retrieval• Donald Knuth in vol.3 of The Art of Computer Programming called it the post-office problem, referring to the application of assigning a resident to the nearest-post officeNearest neighbor ruleMNIST dataset “2”Methods for computing NN • Linear scan: O(nd) time• This is pretty much all what is known for exact algorithms with theoretical guarantees• In practice:– kd-trees work “well” in “low-medium” dimensions2-dimensional kd-trees• A data structure to support range queries in R2– Not the most efficient solution in theory– Everyone uses it in practice• Preprocessing time: O(nlogn)• Space complexity: O(n)• Query time: O(n1/2+k)2-dimensional kd-trees•

View Full Document