Rutgers University CS 536 - INTRODUCTION TO Machine Learning

Unformatted text preview:

INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER6:DimensionalityReductionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Why Reduce Dimensionality?1. Reduces time complexity: Less computation2. Reduces space complexity: Less parameters3. Saves the cost of observing the feature4. Simpler models are more robust on small datasets5. More interpretable; simpler explanation6. Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensionsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Feature Selection vs Extraction Feature selection: Choosing k<d important features, ignoring the remaining d – kSubset selection algorithms Feature extraction: Project the original xi, i =1,...,d dimensions to new k<d dimensions, zj, j =1,...,kPrincipal components analysis (PCA), linear discriminant analysis (LDA), factor analysis (FA)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Subset Selection There are 2dsubsets of d features Forward search: Add the best feature at each step Set of features F initially Ø. At each iteration, find the best new featurej = argminiE ( F ∪ xi) Add xjto F if E ( F ∪ xj) < E ( F )  Hill-climbing O(d2) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Principal Components Analysis (PCA) Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = wTx Find w such that var(z) is maximizedvar(z) = var(wTx) = E[(wTx – wTµ)2] = E[(wTx – wTµ)(wTx – wTµ)]= E[wT(x – µ)(x – µ)Tw]= wT E[(x – µ)(x –µ)T]w = wT∑ wwhere var(x)= E[(x – µ)(x –µ)T] = ∑Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7 Maximize var(z) subject to ||w||=1∑ w1= α w1that is, w1is an eigenvector of ∑Choose the one with the largest eigenvalue for var(z) to be max Second principal component: Max var(z2), s.t., ||w2||=1 and orthogonal to w1∑ w2= α w2that is, w2is another eigenvector of ∑ and so on.Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8What PCA doesz = WT(x – m)where the columns of W are the eigenvectors of ∑, and m is sample meanCenters the data at the origin and rotates the axesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9How to choose k ? Proportion of variance (PoV) explainedwhen λi are sorted in descending order  Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Factor Analysis Find a small number of factors z, which when combined generate x :xi– µi= vi1z1+ vi2z2+ ... + vikzk+ εiwhere zj, j =1,...,k are the latent factors with E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i ≠ j , εiare the noise sourcesE[ εi]= ψi, Cov(εi, εj) =0, i ≠ j, Cov(εi, zj) =0 ,and vijare the factor loadingsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13PCA vs FA PCA From x to zz = WT(x – µ) FA From z to x x – µ = Vz + εxzzxLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14Factor Analysis In FA, factors zjare stretched, rotated and translated to generate xLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Multidimensional Scaling Given pairwise distances between N points, dij, i,j =1,...,Nplace them on a low-dimensional map such that the distances are preserved. z = g (x | θ )Find θ that min Sammon stressLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Map of Europe by MDSMapfromCIA–TheWorldFactbook:http://www.cia.gov/Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Linear Discriminant Analysis  Find a low-dimensional space such that when xis projected, classes are well-separated. Find w that maximizesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18 Between-class scatter Within-class scatterwhereLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)19 Find w that max LDA soln: Parametric soln:Fisher’s Linear DiscriminantLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20K>2 Classes Within-class scatter:  Between-class scatter: Find W that maxLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press


View Full Document
Download INTRODUCTION TO Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view INTRODUCTION TO Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view INTRODUCTION TO Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?