Machine Learning 1010 701 15701 15 781 Spring 2008 Principal Components Analysis Modified from www cs princeton edu picasso mats Lecture1 jps ppt Eric Xing Lecture 24 April 16 2008 Reading Chap 12 1 CB book Eric Xing 1 Factor or Component Analysis Why z z z We study phenomena that can not be directly observed z ego personality intelligence in psychology z Underlying factors that govern the observed data We want to identify and operate with underlying latent factors rather than the observed data z E g topics in news articles z Transcription factors in genomics We want to discover and exploit hidden relationships z beautiful car and gorgeous automobile are closely related z So are driver and automobile z But does your search engine know this z Reduces noise and error in results Eric Xing 2 1 Factor or Component Analysis Why cond z z We have too many observations and dimensions z To reason about or obtain insights from z To visualize z Too much noise in the data z Need to reduce them to a smaller set of factors z Better representation of data without losing much information z Can build more effective data analyses on the reduced dimensional space classification clustering pattern recognition Combinations of observed variables may be more effective bases for insights even if physical meaning is obscure Eric Xing 3 The goal z z Discover a new set of factors dimensions axes based on which to represent describe or evaluate the data z For more effective reasoning insights or better visualization z Reduce noise in the data z Typically a smaller set of factors dimension reduction z Better representation of data without losing much information z Can build more effective data analyses on the reduced dimensional space classification clustering pattern recognition Factors are combinations of observed variables z May be more effective bases for insights even if physical meaning is obscure z Observed data are described in terms of these factors rather than in terms of original variables dimensions Eric Xing 4 2 Basic Concept z Areas of variance in data are where items can be best discriminated and key underlying phenomena observed z z Areas of greatest signal in the data If two items or dimensions are highly correlated or dependent z They are likely to represent highly related phenomena z If they tell us about the same underlying variance in the data combining them to form a single measure is reasonable z Parsimony z Reduction in Error z So we want to combine related variables and focus on uncorrelated or independent ones especially those along which the observations have high variance z We want a smaller set of variables that explain most of the variance in the original data in more compact and insightful form Eric Xing 5 Basic Concept z What if the dependences and correlations are not so strong or direct z And suppose you have 3 variables or 4 or 5 or 10000 z Look for the phenomena underlying the observed covariance co dependence in a set of variables z z Once again phenomena that are uncorrelated or independent and especially those along which the data show high variance These phenomena are called factors or principal components or independent components depending on the methods used z Factor analysis based on variance covariance correlation z Independent Component Analysis based on independence Eric Xing 6 3 An example Eric Xing 7 z Most common form of factor analysis z The new variables dimensions z Are linear combinations of the original ones z Are uncorrelated with one another z Capture as much of the original variance in the data as possible z Are called Principal Components z Eric Xing Original Variable B Principal Component Analysis PC 2 PC 1 Orthogonal in original dimension space Original Variable A z Orthogonal directions of greatest variance in data z Projections along PC1 discriminate the data most along any one axis 8 4 Original Variable B Principal Component Analysis PC 2 z First principal component is the direction of greatest variability covariance in the data z Second is the next orthogonal uncorrelated direction of greatest variability PC 1 z Original Variable A z So first remove all the variability along the first component and then find the next direction of greatest variability And so on Eric Xing 9 Computing the Components z Data points are vectors in a multidimensional space z Projection of vector x onto an axis dimension u is uTx z Direction of greatest variability is that in which the average square of the projection is greatest z I e u such that E uTx 2 over all x z Matrix representation z we subtract the mean along each dimension is maximized and center the original axis system at the centroid of all data points for simplicity z This direction of u is the direction of the first Principal Component Eric Xing 10 5 Computing the Components z E i uTxi 2 E uTX uTX T E uTXXTu z The covariance matrix C XXT contains the correlations similarities of the original axes based on how the data values project onto them z So we are looking for w that maximizes uTCu subject to u being unit length z It is maximized when w is the principal eigenvector of the matrix C in which case z z uTCu uT u if u is unit length where is the principal eigenvalue of the correlation matrix C The eigenvalue denotes the amount of variability captured along that dimension Eric Xing 11 Why the Eigenvectors Maximise uTXXTu s t uTu 1 Construct Langrangian uTXXTu uTu Vector of partial derivatives set to zero xxTu u xxT I u 0 As u 0 then u must be an eigenvector of XXT with eigenvalue Eric Xing 12 6 Eigenvalues Eigenvectors z Eigenvectors for a square m m matrix S Example right eigenvector z eigenvalue How many eigenvalues are there at most only has a non zero solution if this is a m th order equation in which can have at most m distinct solutions roots of the characteristic polynomial can be complex even though S is real Eric Xing 13 Eigenvalues Eigenvectors z For symmetric matrices eigenvectors for distinct eigenvalues are orthogonal Sv 1 2 1 2 v 1 2 and 1 2 v1 v2 0 z All eigenvalues of a real symmetric matrix are real for complex if S I 0 and S ST z All eigenvalues of a positive semidefinite matrix are nonnegative w n wT Sw 0 then if Sv v 0 Eric Xing 14 7 Eigen diagonal Decomposition z z Let be a square matrix with m linearly independent eigenvectors a non defective matrix Theorem Exists an eigen decomposition diagonal Unique for distinct eigenvalues cf matrix diagonalization theorem z Columns of U are
View Full Document