DOC PREVIEW
CMU CS 10701 - lecture24-annotated

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Spring 2008 Principal Components Analysis Modified from www cs princeton edu picasso mats Lecture1 jps ppt Eric Xing Lecture 24 April 16 2008 Reading Chap 12 1 CB book Eric Xing 1 Factor or Component Analysis Why z z z We study phenomena that can not be directly observed z ego personality intelligence in psychology z Underlying factors that govern the observed data We want to identify and operate with underlying latent factors rather than the observed data z E g topics in news articles z Transcription factors in genomics We want to discover and exploit hidden relationships z beautiful car and gorgeous automobile are closely related z So are driver and automobile z But does your search engine know this z Reduces noise and error in results Eric Xing 2 1 Factor or Component Analysis Why cond z z We have too many observations and dimensions z To reason about or obtain insights from z To visualize z Too much noise in the data z Need to reduce them to a smaller set of factors z Better representation of data without losing much information z Can build more effective data analyses on the reduced dimensional space classification clustering pattern recognition Combinations of observed variables may be more effective bases for insights even if physical meaning is obscure Eric Xing 3 The goal z z Discover a new set of factors dimensions axes based on which to represent describe or evaluate the data z For more effective reasoning insights or better visualization z Reduce noise in the data z Typically a smaller set of factors dimension reduction z Better representation of data without losing much information z Can build more effective data analyses on the reduced dimensional space classification clustering pattern recognition Factors are combinations of observed variables z May be more effective bases for insights even if physical meaning is obscure z Observed data are described in terms of these factors rather than in terms of original variables dimensions Eric Xing 4 2 Basic Concept z Areas of variance in data are where items can be best discriminated and key underlying phenomena observed z z Areas of greatest signal in the data If two items or dimensions are highly correlated or dependent z They are likely to represent highly related phenomena z If they tell us about the same underlying variance in the data combining them to form a single measure is reasonable z Parsimony z Reduction in Error z So we want to combine related variables and focus on uncorrelated or independent ones especially those along which the observations have high variance z We want a smaller set of variables that explain most of the variance in the original data in more compact and insightful form Eric Xing 5 Basic Concept z What if the dependences and correlations are not so strong or direct z And suppose you have 3 variables or 4 or 5 or 10000 z Look for the phenomena underlying the observed covariance co dependence in a set of variables z z Once again phenomena that are uncorrelated or independent and especially those along which the data show high variance These phenomena are called factors or principal components or independent components depending on the methods used z Factor analysis based on variance covariance correlation z Independent Component Analysis based on independence Eric Xing 6 3 An example Eric Xing 7 z Most common form of factor analysis z The new variables dimensions z Are linear combinations of the original ones z Are uncorrelated with one another z Capture as much of the original variance in the data as possible z Are called Principal Components z Eric Xing Original Variable B Principal Component Analysis PC 2 PC 1 Orthogonal in original dimension space Original Variable A z Orthogonal directions of greatest variance in data z Projections along PC1 discriminate the data most along any one axis 8 4 Original Variable B Principal Component Analysis PC 2 z First principal component is the direction of greatest variability covariance in the data z Second is the next orthogonal uncorrelated direction of greatest variability PC 1 z Original Variable A z So first remove all the variability along the first component and then find the next direction of greatest variability And so on Eric Xing 9 Computing the Components z Data points are vectors in a multidimensional space z Projection of vector x onto an axis dimension u is uTx z Direction of greatest variability is that in which the average square of the projection is greatest z I e u such that E uTx 2 over all x z Matrix representation z we subtract the mean along each dimension is maximized and center the original axis system at the centroid of all data points for simplicity z This direction of u is the direction of the first Principal Component Eric Xing 10 5 Computing the Components z E i uTxi 2 E uTX uTX T E uTXXTu z The covariance matrix C XXT contains the correlations similarities of the original axes based on how the data values project onto them z So we are looking for w that maximizes uTCu subject to u being unit length z It is maximized when w is the principal eigenvector of the matrix C in which case z z uTCu uT u if u is unit length where is the principal eigenvalue of the correlation matrix C The eigenvalue denotes the amount of variability captured along that dimension Eric Xing 11 Why the Eigenvectors Maximise uTXXTu s t uTu 1 Construct Langrangian uTXXTu uTu Vector of partial derivatives set to zero xxTu u xxT I u 0 As u 0 then u must be an eigenvector of XXT with eigenvalue Eric Xing 12 6 Eigenvalues Eigenvectors z Eigenvectors for a square m m matrix S Example right eigenvector z eigenvalue How many eigenvalues are there at most only has a non zero solution if this is a m th order equation in which can have at most m distinct solutions roots of the characteristic polynomial can be complex even though S is real Eric Xing 13 Eigenvalues Eigenvectors z For symmetric matrices eigenvectors for distinct eigenvalues are orthogonal Sv 1 2 1 2 v 1 2 and 1 2 v1 v2 0 z All eigenvalues of a real symmetric matrix are real for complex if S I 0 and S ST z All eigenvalues of a positive semidefinite matrix are nonnegative w n wT Sw 0 then if Sv v 0 Eric Xing 14 7 Eigen diagonal Decomposition z z Let be a square matrix with m linearly independent eigenvectors a non defective matrix Theorem Exists an eigen decomposition diagonal Unique for distinct eigenvalues cf matrix diagonalization theorem z Columns of U are


View Full Document

CMU CS 10701 - lecture24-annotated

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download lecture24-annotated
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture24-annotated and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture24-annotated and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?