STEVENS CS 559 - CS 559 6th Set of Notes - D2481833

Home> Schools> Stevens Institute of Technology> (CS) > CS 559> CS 559 6th Set of Notes

STEVENS CS 559 - CS 559 6th Set of Notes

School name Stevens Institute of Technology

Course Cs 559- Machine Learning Syllabus

Pages 21

Download Save

Unformatted text preview:

1CS 559: Machine Learning Fundamentals and Applications6thSet of Notes1Instructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiE-mail: [email protected]: Lieb 215Overview• Principal Component Analysis (DHS Sec. 3.8.1 – essentially notes only)– Intro–Singular Value Decompositiongp• Dimensionality Reduction - PCA in practice (Notes based on Carlos Guestrin’s)• Eignefaces (notes by SrinivasaNarasimhan, CMU)2Principal Component Analysis(PCA)Pattern Classification, Chapter 3 32PCA Resources• A Tutorial on Principal Component Analysis– by Jonathon Shlens (Salk Institute for Biological Sciences), 2005– http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf•Singular Value Decomposition TutorialSingular Value Decomposition Tutorial – by Michael Elad (Technion, Israel), 2005– http://webcourse.cs.technion.ac.il/234299/Spring2005/ho/WCFiles/Tutorial7.ppt• Dimensionality Reduction (lecture notes)– by Carlos Guestrin (CMU), 2006– http://www.cs.cmu.edu/~guestrin/Class/10701-S06/Slides/tsvms-pca.pdf4A Tutorial on Principal Component AnalysisJonathon Shlens5A Toy Problem• Ball of mass mattached to massless, frictionless spring• Ball moved away from equilibrium results in spring oscillating indefinitely along x-axis• All dynamics are a function of a single variable x6J. Shlens3• We do not know which or how many axes and dimensions are important to measure• Place three video cameras that capture 2-D measurements at 120Hz– Camera optical axes are not orthogonal to each other• If we knew what we need to measure, one camera measuring displacement along xwould be sufficient7J. ShlensGoal of PCA• Compute the most meaningful basis to re-express a noisy data set• Hope that this new basis will filter out the noise and reveal hidden structure• In toy example: – Determine that the dynamics are along a single axis– Determine the important axis8J. ShlensNaïve Basis• At each point in time, record 2 coordinates of ball position in each of the 3 images• After 10 minutes at 120Hz, we have 10×60×120=7200 6D vectors• These vectors can be represented in arbitrary coordinate systems• Naïve basis is formed by the image axis– Reflects the method with gathered the data 9J. Shlens4Change of Basis• PCA: Is there another basis, which is a linear combination of the original basis, that best re-expresses our data set?• Assumption: linearity–Restricts set of potential bases–Restricts set of potential bases– Implicitly assumes continuity in data (superposition and interpolation are possible)10J. ShlensChange of Basis• X is original data (m×n, m=6, n=7200)• Let Y be another m×n matrix such that Y=PX• P is a matrix that transforms X into Y– Geometrically it is a rotation and stretch– The rows of P {p1,…, pm} are the new basis vectors for the columns of X– Each element of yiis a dot product of xiwith the corresponding row of P (a projection of xionto pj)11J. ShlensHow to find anAppropriate Change of Basis?• The row vectors {p1,…, pm} will become the principal componentsof X• What is the best way to re-express X?• What features would we like Y to exhibit?• If we call X “garbled data”, garbling in a linear system can refer to three things:– Noise– Rotation– Redundancy12J. Shlens5Noise and Rotation• Measurement noise in any data set must be low or else, no matter the analysis technique, no information about a system can be extracted• Signal-to-Noise Ratio (SNR)13J. Shlens• Ball travels in straight line– Any deviation must be noise•Variance due to signal andVariance due to signal andnoise are indicated in diagram• SNR: ratio of the two lengths– “Fatness” of data corresponds to noise• Assumption: directions of largest variance in measurement vector space contain dynamics of interest14J. Shlens• Neither xA, not yAhowever are directionswith maximum variance• Maximizing the variance corresponds to finding the appropriate rotation of the naive basis• In 2D this is equivalent to finding best fitting line– How to generalize?15J. Shlens6Redundancy• Is it necessary to record 2 variables for the ball-spring system?• Is it necessary to use 3 cameras?Redundancy spectrum for 2 variables16J. ShlensCovariance Matrix• Assume zero-mean measurements– Subtract mean from all vectors in X• Each column of X is a set of measurements at a point in time•Each row ofXcorresponds to all measurements of a•Each row of Xcorresponds to all measurements of a particular type (e.g. x-coordinate in image B)• Covariance matrix CX=XXT(biased estimate)• ijthelement of CXis the dot product between the ithmeasurement type and the jthmeasurement type– Covariance between two measurement types17J. ShlensCovariance Matrix• Diagonal elements of CX– Large  interesting dynamics– Small  noise• Off-diagonal elements of CX–Largehigh redundancy–Large high redundancy– Small  low redundancy• We wish to maximize signal and minimize redundancy– Off-diagonal elements should be zero• CY must be diagonal18J. Shlens7Sketch of Algorithm• Pick vector in m-D space along which variance is maximal and save as p1• Pick another direction along which variance is maximized among directions perpendicular to p1• Repeat until m principal components have been selectedpppp• From linear algebra: a square matrix can be diagonalizedusing its eigenvectors as new basis• X is not square in general (m>n in our case), but Cxalways is• Solution: Singular Value Decomposition (SVD)19J. ShlensSingular Value Decomposition Tutorial Michael Elad20The eigenvectors of a matrix A form a basis for working with AHowever, for rectangular matrices A (m x n), dim(Ax) ≠ dim(x) and the concept of eigenvectors does not existYet, ATA (n x n) is a symmetric, real matrix (A is real) and therefore, there is an orthonormal basis of eigenvectors {uK} for ATA.Singular Value DecompositionNote: here each row of A is a measurement in time and each column a measurement type21Consider the vectors {vK}They are also orthonormal, since:kkkuvA)(2jkuukkTTjAAM. Elad, 20068Singular Value DecompositionSince ATA is positive semidefinite, its eigenvalues are non-negative {λk≥0}Define the singular values of A asand order them in a non-increasing order:Motivation: One can see, that if A itself is square and symmetric, kk0...n2122then {uk, σk} are the set of its own

View Full Document


School:
Email:
New Password:
Confirm Password:

STEVENS CS 559 - CS 559 6th Set of Notes

Sign up for free to view:

Please select your school