Rutgers University CS 536 - Dimensionality Reduction - D2300992

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> Dimensionality Reduction

DOC PREVIEW

Rutgers University CS 536 - Dimensionality Reduction

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press 2004 Edited for CS 536 Fall 2005 Rutgers University Ahmed Elgammal alpaydin boun edu tr http www cmpe boun edu tr ethem i2ml CHAPTER 6 Dimensionality Reduction 1 Why Reduce Dimensionality 1 2 3 4 5 6 Reduces time complexity Less computation Reduces space complexity Less parameters Saves the cost of observing the feature Simpler models are more robust on small datasets More interpretable simpler explanation Data visualization structure groups outliers etc if plotted in 2 or 3 dimensions 3 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 Feature Selection vs Extraction Feature selection Choosing k d important features ignoring the remaining d k Subset selection algorithms Feature extraction Project the original xi i 1 d dimensions to new k d dimensions zj j 1 k Principal components analysis PCA linear discriminant analysis LDA factor analysis FA 4 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 2 Subset Selection There are 2d subsets of d features Forward search Add the best feature at each step Set of features F initially At each iteration find the best new feature j argmini E F xi Add xj to F if E F xj E F Hill climbing O d2 algorithm Backward search Start with all features and remove one at a time if possible Floating search Add k remove l 5 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 Principal Components Analysis PCA Find a low dimensional space such that when x is projected there information loss is minimized The projection of x on the direction of w is z wTx Find w such that Var z is maximized Var z Var wTx E wTx wT 2 E wTx wT wTx wT E wT x x Tw wT E x x T w wT w where Var x E x x T 6 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 3 Maximize Var z subject to w 1 max w 1T w 1 w 1T w 1 1 w1 w1 w1 that is w1 is an eigenvector of Choose the one with the largest eigenvalue for Var z to be max Second principal component Max Var z2 s t w2 1 and orthogonal to w1 max w 2T w 2 w 2T w 2 1 w 2T w 1 0 w2 w2 w2 that is w2 is another eigenvector of and so on 7 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 What PCA does z WT x m where the columns of W are the eigenvectors of and m is sample mean Centers the data at the origin and rotates the axes 8 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 4 How to choose k Proportion of Variance PoV explained 1 2 L k 1 2 L k L d when i are sorted in descending order Typically stop at PoV 0 9 Scree graph plots of PoV vs k stop at elbow 9 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 10 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 5 11 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 Principle Component Analysis PCA d x x L x x R Given a set of points 1 2 N i We are looking for a linear projection a linear combination of orthogonal basis vectors x A c R R m m d d A x c1 c2 c3 cm x c What is the projection that minimizes the reconstruction error E xi Aci i 12 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 6 Principle Component Analysis PCA Given a set of points x1 x 2 L xN xi R d Center the points compute 1 N x i i P x1 x 2 L xN xi R d Compute covariance matrix Q PP Qek k ek Compute the eigen vectors for Q Eigenvectors are the orthogonal basis we are looking for T 13 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 Singular Value Decomposition SVD If A is a real m by n matrix then there exist orthogonal matrices U m m and V n n such that UtAV diag 1 2 p p min m n UtAV A U Vt Singular values Non negative square roots of the eigenvalues of AtA Denoted i i 1 n AtA is symmetric eigenvalues and singular values are real Singular values arranged in decreasing order At A U V t t U V t V t U tU V t V t V t V 2 V 1 At A V V 2 At A v v A mxn U mxm mxn Vt nxn 14 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 7 SVD for PCA SVD can be used to efficiently compute the image basis PP t U V t U V t t U V tV t U t U t U t U 2 U 1 PP t U U 2 PP t v v U are the eigen vectors basis Most important thing to notice Distance in the eigen space is an approximation to the correlation in the original space xi x j ci c j 15 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 PCA x Ac Rd c AT x R m m d Most important thing to notice Distance in the eigen space is an approximation to the correlation in the original space xi x j ci c j 16 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 8 Eigenface Use PCA and subspace projection to perform face recognition How to describe a face as a linear combination of face basis Matthew Turk and Alex Pentland Eigenfaces for Recognition 1991 17 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 Face Recognition Eigenface MIT Media Lab Face Recognition demo page http vismod media mit edu vismod demos facerec 18 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 9 Factor Analysis Find a small number of factors z which when combined generate x xi i vi1z1 vi2z2 vikzk i where zj j 1 k are the latent factors with E zj 0 Var zj 1 Cov zi zj 0 i j i are the noise sources E i i Cov i j 0 i j Cov i zj 0 and vij are the factor loadings 19 Lecture Notes for E Alpayd n 2004 Introduction to Machine Learning The MIT Press V1 1 PCA vs …

View Full Document