UMD CMSC 878R - Fast Kernel Principal Component Analysis(KPCA)

Unformatted text preview:

Principal Component Analysis(PCA)Kernel Principal Component Analysis(KPCA)Different KernelsComputational Complexity of KPCAIterative methods for Eigen value ProblemsLanczos iterationPolynomial kernelGaussian kernelOther ApplicationsConclusionsFast Kernel Principal ComponentAnalysis(KPCA) for the Polynomialand Gaussian kernelsVikas Chandrakant Raykar, Ankur AnkurPerceptual Interfaces and Reality LaboratoryInstitute for Advanced Computer StudiesUniversity of Maryland, College ParkFebruary 1, 20031. Principal Component Analy-sis(PCA)• PCA is a statistical dimensionality reduction technique. GivenN points in d dimensions PCA essentially projects the datapoints onto p, directions(p < d)which capture the maximumvariance of the data.• These directions correspond to the eigen vectors of the covari-ance matrix of the training data points.• Intuitively PCA fits an ellipsoid in d dimensions and uses theprojections of the data points on the first p major axes of theellipsoid.Let αnmbe the projection of the nthdata point on the mthbasis em.Using these projections and the basis functions we can reconstructa cthorder approximation of xnasbxn=cXm=1αnmemE = [e1e2......ec]αTn= [αn1αn2......αnc]Then bxncan be written asbxn= EαnminE,αn12NXn=1kxn− Eαnk2subject to the constraint that ETE = I. This function can beminimized by forming using the method of Lagrange multipliers.αk= ETxk(1)and E is got by solving the eigen value problemSE = EΛ (2)where S is the covariance matrixS =NXn=1xnxTn(3)1. Subtract the mean from all the data points xn← xn−1NPNi=1xi2. Compute the covariance matrix S =PNn=1xnxTn3. Diagonalize S to get its eigen values and eigen vectors i.e getE and Λ.4. Retain c eigen vectors corresponding to c largest eigen valuessuch thatPcj=1λjPNj=1λjequals the desired variance to be captured.5. Project the data points on the eigen vectors αn= ET(xk−1NPNi=1xi) and use the projections instead of the data points.PCA-Alternate approa ch There are some applications where wehave a few samples of very high dimensions(Face recognition). Insuch cases where N < d we can formulate the problem in terms ofan N × N matrix called as the dot matrix.X = [x1x2......xN]According to our previous formulation for one eigen value we haveto solveSe = λeXXTe = λeXTXXTe = λXTeKα = λα (4)where K = XTX is called as the dot product matrix and α = XTe.XTe = αXXTe = XαSe = Xαλe = Xαe =1λXα (5)Therefore e lies in the span of X the data p oints. Normalizing ekek2= 1kXαk2= 1αTXTXα = 1αTKα = 1αTλα = 1kαk2=1λ(6)Finally the principal components can be written as where y is theprojection of x on ey = eTxy = αTXTxy =NXiαihxi.xi (7)One thing to be noted is that we do not need e explicitly(we needonly the dot product).Two dimensional toy example illustrating PCAPCA is linear• It uses only second order statistics in the form of covar iancematrix.• The best it can do is to fit an ellipsoid around the data.• Kernel Principal Component Analysis(KPCA) is an attractivemethod for extracting nonlinear f eatures f rom a given set ofmulti variate data.• While Principal Component Analysis(PCA) finds the best ellip-soidal fit for the data KPCA has the capability of extracting thenonlinear features which could be a more natural and compactrepresentation of the data.2. Kernel Principal Component A nal-ysis(KPCA)• The essential idea of KPCA is based on the hope that if wedo some non linear mapping of the data points to a higherdimensional(possibly infinite) space we can get better non linearfeatures which are a more natural and compact representationof the data.• The computational complexity arising fro m the high dimension-ality mapping is mitigated by using the kernel trick.• KPCA belongs to a class of more general methodology whichuse the for algorithms which can be written only in terms ofdot products and not on the variable themselves.• Consider a nonlinear mapping φ(.) : Rd→ Rhfrom Rdthespace of d dimensional data points to some higher dimensionalspace Rh..• Once we have this mapping KPCA is nothing but Linear PCAdone on the points in the higher dimensional space.• As of now assume that the mapp ed data are zero centered(i.ePNi=1φ(xi) = 0). So now in our case the dot product matrixK becomes[K]ij= [φ(xi).φ(xj)] (8)K is called the Gram matrix.• Solving the eigen value problem Kα = λα gives the corre-sponding N eigen vectors. Note that the eigen vector needs tobe normalized to satisfy kαk2=1λFinally the principal compo-nent of any data point x, which is the projection of φ(x) on eis given byy =NXiαihφ(xi).φ(x)i (9)As mentioned earlier we do not need e explicitly(we need onlythe dot product).The kernel trick basically makes use of this fact and replaces thedot product by a kernel function which is more eas y to computethan the dot product.k(x, y) = hφ(x).φ(y)i (10)This allows us to compute the dot product without having to carryout the mapping. The most commonly used kernels are the poly-nomial, Gaussian and the tanh kernel.For the more general case where the data is not centered in thehigher dimensional space we will have to use the modified GramMatrix(Derivation in the report)• Given N data points in d dimensions let X = [x1x2.....xN]where each column represents one data point.• Subtract the mean from all the data p oints.• Choose an appropriate kernel k(., .).• Form the N × N Gram matrix [K]ij= [k(xi, xj)].• Form the modified Gram matrix˜K = (I −1N×NN)TK(I −1N×NN)where where 1N×Nis an N × N matrix with all entries equalto 1• Diagonalize˜K to get its eigen values λnand eigen vectors αn.• Normalize αn⇐αn√λn.• Retain c eigen vectors corresponding to c largest eigen valuessuch thatPcj=1λjPNj=1λjequals the desired variance to be captured.• Project the data points on the eigen vectorsy = αT(I −1N×NN)(k(x1, x)...k(xN, x)− K1N×1N)where 1N×1is an N ×1 matrix with all entries equal to 1. anduse the projections instead of the data po ints.3. Different Kernels• The polynomial kernel of degree d is given byk(x, y) = (x.y + c)d(11)• The Gaussian or the radial basis function kernel is given byk(x, y) = exp(−kx − yk22σ2) (12)where the parameter σ controls the support region of the kernel.• The tanh kernel which is mostly used in Neural network typeapplication is given byk(x, y) = tanh((x.y) + b) (13)Two dimensional toy example illustrating KPCATwo dimensional toy example


View Full Document

UMD CMSC 878R - Fast Kernel Principal Component Analysis(KPCA)

Download Fast Kernel Principal Component Analysis(KPCA)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fast Kernel Principal Component Analysis(KPCA) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fast Kernel Principal Component Analysis(KPCA) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?