DOC PREVIEW
UCSD ECE 271A - Dimensionality and Dimensionality Reduction

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ECE-271AStatistical Learning I:Dimensionality and dimensionality reductionMotivationExampleCurse of dimensionalityCurse of dimensionalityPrincipal component analysisPrincipal component analysisPrincipal component analysis (learning)Principal component analysisPrincipal component analysisPCA by SVDPCA by SVDPCA by SVDPCA by SVDPCA by SVDPCA by SVDLimitations of PCAExampleExamplePrincipal component analysisFischer’s linear discriminantLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisLinear discriminant analysisQuadratic discriminant analysisECE-271AStatistical Learning I:Dimensionality and dimensionality reductionNuno Vasconcelos ECE Department, UCSD2Motivationrecall, in Bayesian decision theory we• world, states Y in {1, ..., M}• observations X• class conditional densities PX|Y(x|y)• class probabilities PY(i),• Bayes decision rule (BDR)we have hinted that the dimension of observation space can play a significant role in the quality of the BDR3Examplecheetah Gaussian classifier, DCT space8 best features all 64 featuresProb. of error: 4% 8%more features = higher error!4Curse of dimensionalitythe problem is the quality of the density estimatesall we have seen so far, assumes perfect estimation of the BDRas we have seen in PS2, when the estimates are different from the true densities the BDR can be quite poorDHS give a perfect example ofthis where, by tweaking the location of the delta functions(and assuming a Gaussianmodel) you can make the error go to 100%!−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 1000.10.20.30.40.50.60.70.80.91Y = 1Y = 25Curse of dimensionalitywe have seen that the variance of an estimator tends to be inversely proportional to the number of points n• e.g. ML estimate of the mean of a Gaussian has variance σ2/nhence, we need a large n to have good estimatesQ: what does “large” mean? This depends on the dimension of the spacethe best way to see this is to think of an histogram• suppose you have 100 points and you need at least 10 bins per axis in order to get a reasonable quantizationfor uniform data you get, on average,decent in1D, bad in 2D, terrible in 3D(9 out of each10 bins empty)dimension 1 2 3points/bin 10 1 0.16Principal component analysisbasic idea:• if the data lives in a subspace, it is going to look very flat when viewed from the full space, e.g.• this means that if we fit a Gaussian to the data the equiprobability contours are going to be highly skewed ellipsoids2D subspace in 3D1D subspace in 2D7Principal component analysisIf y is Gaussian with covariance Σ, the equiprobability contours are the ellipses whose• principal components φiare the eigenvectors of Σ• principal lengths λiare the eigenvalues of Σby computing the eigenvalues we know if the data is flatλ1>> λ2: flat λ1=λ2: not flatλ1λ2y2y1φ1φ2λ1λ2y2y1λ1λ2y2y18Principal component analysis (learning)9Principal component analysis10Principal component analysisthere is an alternative manner to compute the principal components, based on singular value decompositionSVD:• any real n x m matrix (n>m) can be decomposed as• where M is a n x m column orthonormal matrix of left singular vectors (columns of M)• Π a m x m diagonal matrix of singular values•NTa m x m row orthonormal matrix of right singular vectors (columns of N)TAΜΠΝ=IITT=ΝΝ=ΜΜ11PCA by SVDto relate this to PCA, we consider the data matrixthe sample mean is⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=||||1 nxxX K1111||||111Xnxxnxnnii=⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡==∑MKµ12PCA by SVDand we can center the data by subtracting the mean to each column of Xthis is the centered data matrix⎟⎠⎞⎜⎝⎛−=−=−=⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡−⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=TTTncnIXXnXXxxX1111111||||||||1µµµ KK13PCA by SVDthe sample covariance iswhere xicis the ithcolumn of Xcthis can be written as()()()∑∑=−−=ΣiTciciiTiixxnxxn11µµTcccnccncXXnxxxxn1||||111=⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡−−−−⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡=Σ MK14PCA by SVDthe matrixis real n x d. Assuming n > d it has SVD decompositionand ⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡−−−−=cncTcxxXM1TTcXΜΠΝ=IITT=ΝΝ=ΜΜ TTTTccnnXXnΝΝΠ=ΜΠΝΝΠΜ==Σ211115PCA by SVDTnΝ⎟⎠⎞⎜⎝⎛ΠΝ=Σ21noting that N is d x d and orthonormal, and Π2diagonal, shows that this is just the eigenvalue decomposition of Σit follows that • the eigenvectors of Σ are the columns of N• the eigenvalues of Σ arethis gives an alternative algorithm for PCAiin πλ=16PCA by SVDcomputation of PCA by SVDgiven X with one example per column• 1) create the centered data-matrix• 2) compute its SVD• 3) principal components are columns of N, eigenvalues areTTTcXnIX⎟⎠⎞⎜⎝⎛−= 111TTcXΜΠΝ=iin πλ=17Limitations of PCAPCA is not optimal for classification• note that there is no mention of the class label in the definition of PCA• keeping the dimensions of largest energy (variance) is a good idea, but not always enough• certainly improves the density estimation, since space has smaller dimension• but could be unwise from a classification point of view• the discriminant dimensions could be thrown outit is not hard to construct examples where PCA is the worst possible thing we could do18Exampleconsider a problem with• two n-D Gaussian classes with covariance Σ=σ2I, σ2= 10• we add an extra variable which is the class label itself• assuming that PY(0)=PY(1)=0.5•dimension n+1 has the smallest variance and is the first to be discarded!)10,(~iNXµ] ,['iXX =5.015.005.0][ =×+×=YE10125.0 )5.01(5.0)5.00(5.0]var[22<=−×+−×=Y19Examplethis is• a very contrived example• but shows that PCA can throw away all the discriminant infodoes this mean you should never use PCA?• no, typically it is a good method to find a suitable subset of variables, as long as you are not too greedy• e.g. if you start with n = 100, and know that there are only 5 variables of interest• picking the top 20 PCA components is likely to keep the desired 5• your classifier will be much better than for n = 100, probablynot much worse than the one with the best 5 featuresis there a rule of thumb for finding the number of PCA components?20Principal component analysisa natural


View Full Document

UCSD ECE 271A - Dimensionality and Dimensionality Reduction

Download Dimensionality and Dimensionality Reduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Dimensionality and Dimensionality Reduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dimensionality and Dimensionality Reduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?