U of I CS 498 - Principal Component Analysis - D1034565

Home> Schools> University of Illinois> Computer Science (CS) > CS 498> Principal Component Analysis

DOC PREVIEW

U of I CS 498 - Principal Component Analysis

School name University of Illinois

Course Cs 498- Special Topics

Pages 52

This preview shows page 1-2-3-25-26-27-28-50-51-52 out of 52 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 52 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Principal Component Analysis CS498Today’s lecture • Adaptive Feature Extraction • Principal Component Analysis – How, why, when, whichA dual goal • Find a good representation – The features part • Reduce redundancy in the data – A side eﬀect of “proper” featuresExample case • Describe this inputWhat about now?A “good feature” • “Simplify” the explanation of the input – Represent repeating patterns – When deﬁned makes the input simpler • How do we deﬁne these abstract qualities? – On to the math …Linear features Z = WX= samples ⟶ samples ⟶ features ⟶ dimensions ⟶ features ⟶ dimensions ⟶ Feature Matrix Input Matrix Weight MatrixA 2D case Matrix representation of data Scatter plot of same data Z = WX ==zT1zT2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥=wT1wT2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥xT1xT2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥x1 x2 x1 x2 …Deﬁning a goal • Desirable feature features – Give “simple” weights – Avoid feature similarity • How do we deﬁne these?One way to proceed • “Simple weights” – Minimize relation of the two dimensions • “Feature similarity” – Same thing!One way to proceed • “Simple weights” – Minimize relation of the two dimensions – Decorrelate: • “Feature similarity” – Same thing! – Decorrelate: zT1z2= 0wT1w2= 0Diagonalizing the covariance • Covariance matrix • Diagonalizing the covariance suppresses cross-dimensional co-activity – if z1 is high, z2 won’t be, etc Cov z1,z2( )=zT1z1zT1z2zT2z1zT2z2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥/ NCov z1,z2( )=zT1z1zT1z2zT2z1zT2z2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥/ N =1 00 1⎡⎣⎢⎢⎤⎦⎥⎥= IProblem deﬁnition • For a given input • Find a feature matrix • So that the weights decorrelate WX( )WX( )T= N I ⇒ ZZT= N IWXHow do we solve this? • Any ideas? WX( )WX( )T= N ISolving for diagonalization WX( )WX( )T= N I ⇒⇒ WXXTWT= N I ⇒⇒ WCov X( )WT= ISolving for diagonalization • Covariance matrices are positive deﬁnite – Therefore symmetric • have orthogonal eigenvectors and real eigenvalues – and are factorizable by: – Where U has eigenvectors of A in its columns – Λ=diag(λi), where λi are the eigenvalues of A UTAU = ΛSolving for diagonalization • The solution is a function of the eigenvectors U and eigenvalues Λ of Cov(X) WCov X( )WT= I ⇒⇒ W =λ100 λ2⎡⎣⎢⎢⎢⎢⎤⎦⎥⎥⎥⎥−1UTSo what does it do? • Input data covariance: • Extracted feature matrix: • Weights covariance: Cov X( )≈14.9 0.050.05 1.08⎡⎣⎢⎢⎤⎦⎥⎥Cov(WX) =1 00 1⎡⎣⎢⎢⎤⎦⎥⎥W ≈0.12 −30.4−8.17 −0.03⎡⎣⎢⎢⎤⎦⎥⎥/ NAnother solution • This is not the only solution to the problem • Consider this one: WX( )WX( )T= N I ⇒WXXTWT= N I ⇒W = XXT( )−1/2Another solution • This is not the only solution to the problem • Consider this one: • Similar but out of scope for now WX( )WX( )T= N I ⇒WXXTWT= N I ⇒W = XXT( )−1/2⇒W = US−1/2VT[U,S,V] = SVD XXT( )Decorrelation in pictures • An implicit Gaussian assumption – N-D data has N directions of variance Input DataUndoing the variance • The decorrelating matrix W contains two vectors that normalize the input’s variance Input DataResulting transform • Input gets scaled to a well behaved Gaussian with unit variance in all dimensions Transformed Data (feature weights) Input DataA more complex case • Having correlation between two dimensions – We still ﬁnd the directions of maximal variance – But we also rotate in addition to scaling Transformed Data (feature weights) Input DataOne more detail • So far we considered zero-mean inputs – The transforming operation was a rotation • If the input mean is not zero bad things happen! – Make sure that your data is zero-mean! Transformed Data (feature weights) Input DataPrincipal Component Analysis • This transform is known as PCA – The features are the principal components • They are orthogonal to each other • And produce orthogonal (white) weights – Major tool in statistics • Removes dependencies from multivariate data • Also known as the KLT – Karhunen-Loeve transformA better way to compute PCA • The Singular Value Decomposition way • Relationship to eigendecomposition – In our case (covariance input A), U and S will hold the eigenvectors/values of A • Why the SVD? – More stable, more robust, fancy extensions [U,S,V] = SVD(A) ⇒ A = USVTPCA through the SVD = SVD samples ⟶ samples ⟶ samples ⟶ dimensions ⟶ features ⟶ dimensions ⟶ Feature Matrix Input Matrix Weight Matrix √eigenvalue matrix = SVD dimensions⟶ features ⟶ features ⟶ dimensions ⟶ samples ⟶ dimensions ⟶ Feature Matrix Input Covariance Weight Matrix Eigenvalue matrix features ⟶ features ⟶ features ⟶ features ⟶Dimensionality reduction • PCA is great for high dimensional data • Allows us to perform dimensionality reduction – Helps us ﬁnd relevant structure in data – Helps us throw away things that won’t matterA simple example • Two very correlated dimensions – e.g. size and weight of fruit – One eﬀective variable • PCA matrix here is: – Large variance between the two components • about two orders of magnitude W =−0.2 −0.13−13.7 28.2⎡⎣⎢⎢⎤⎦⎥⎥A simple example • Second principal component needs to be super-boosted to whiten the weights – maybe is it useless? • Keep only high variance – Throw away components with minor contributionsWhat is the number of dimensions? • If the input was M dimensional, how many dimensions do we keep? – No solid answer (estimators exist but are ﬂaky) • Look at the singular/eigen-values – They will show the variance of each component, at some point it will be smallExample • Eigenvalues of 1200 dimensional video data – Little variance after component 30 – We don’t need to keep the rest of the data 0 10 20 30 40 50 6000.511.522.5x 105…So where are the features? • We strayed oﬀ-subject – What

View Full Document