DOC PREVIEW
CMU CS 10701 - lecture

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Eric Xing 1Machine LearningMachine Learning1010--701/15701/15--781, Fall 2006781, Fall 2006Dimensionality Reduction IIDimensionality Reduction IIFactor Analysis and Metric Factor Analysis and Metric LearningLearningEric XingEric XingLecture 20, November 22, 2006Reading: Chap. ??, C.B bookEric Xing 2Outlinez Probabilistic PCA (breif)z Factor Analysis (somewhat detail)z ICA (will skip)z Distance metric learning from very little side info (a very coolmethod)2Eric Xing 3z Popular dimensionality reduction techniquez Project data onto directions of greatest variationz Consequence:z xiare uncorrelated such that the covariance matrix is z Truncation errorRecap of PCA()()uyCovuuyymuuymuTmiTiiTmiTi)(maxargmaxargmaxarg=⎟⎠⎞⎜⎝⎛==∑∑==11211rrrqiTqiTqiTiTiyUyuyuyux R∈=⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡=rrMrrr21y1 y2 () ()xqkTkkkKkTkkkyuuuu Σ=≈=Σ∑∑== 11γγ∑=miTiixxm11rr⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡qγγO1iixUyrr=Eric Xing 4z Popular dimensionality reduction techniquez Project data onto directions of greatest variationUseful tool for visualising patterns and clusters within the data set, but …Need centeringDoes not explicitly model data noiseRecap of PCA3Eric Xing 5Probabilistic Interpretation?AYXcontinuouscontinuousregressionAYXcontinuouscontinuous?Eric Xing 6Probabilistic PCAz PCA can be cast as a probabilistic modelwith q-dimensional latent variables z The resulting data distribution is z Maximum likelihood solution is equivalent to PCADiagonal Γqcontains the top q sample covariance eigen-values and Uqcontains associated eigenvectorsTipping and Bishop, J. Royal Stat. Soc. 6, 611 (1999).nnnxyεµ++Λ=),(~ In20σεN),(~ Ixn0N),(~ IyTn2σµ+ΛΛN∑=nnMLyN1µ212 /)( IUqqMLσ−Γ=Λ4Eric Xing 7Factor analysisz An unsupervised linear regression modelz Geometric interpretationz To generate data, first generate a point within the manifold then add noise. Coordinates of point are components of latent variable.AYXwhere Λ is called a factor loading matrix, and Ψ is diagonal. ),;()(),;()(ΨΛ+==xyxyxxµNNppI0Eric Xing 8Relationship between PCA and FAz Probabilistic PCA is equivalent to factor analysis with equalnoise for every dimension, i.e., εn~ isotropic Gaussianz In factor analysis for a diagonal covariance matrix z An iterative algorithm (eg. EM) is required to find parameters if precisions are not known in advance),(~Ψ0NnεΨ),( I20σN5Eric Xing 9Factor analysisz An unsupervised linear regression modelz Geometric interpretationz To generate data, first generate a point within the manifold then add noise. Coordinates of point are components of latent variable.AYXwhere Λ is called a factor loading matrix, and Ψ is diagonal. ),;()(),;()(ΨΛ+==xyxyxxµNNppI0Eric Xing 10Marginal data distribution z A marginal Gaussian (e.g., p(x)) times a conditional Gaussian (e.g., p(y|x)) is a joint Gaussianz Any marginal (e.g., p(y) of a joint Gaussian (e.g., p(x,y)) is also a Gaussianz Since the marginal is Gaussian, we can determine it by just computing its mean and variance. (Assume noise uncorrelated with data.)[][][] [ ][]()()[]()()[]()()[][] [ ]Ψ+ΛΛ=+ΛΛ=+Λ+Λ=−+Λ+−+Λ+=−−==++=+Λ+=Ψ+Λ+=TTTTTTTEEEEEVarEEEEWWXXWXWXWXWXYYYWXWWXYµµµµµµµµµµ000 ),(~ here wNAYX6Eric Xing 11FA = Constrained-Covariance Gaussianz Marginal density for factor analysis (y is p-dim, x is k-dim):z So the effective covariance is the low-rank outer product of two long skinny matrices plus a diagonal matrix:z In other words, factor analysis is just a constrained Gaussian model. (If were not diagonal then we could model any Gaussian and it would be pointless.)),;()|( Ψ+ΛΛ=TµθyyNpEric Xing 12Review:A primer to multivariate Gaussianz Multivariate Gaussian density:z A joint Gaussian: z How to write down p(x1), p(x1|x2) or p(x2|x1) using the block elements in µand Σ?z Formulas to remember:{})-()-(-exp)(),|(//µµπµxxx12121221−ΣΣ=ΣTnp) , (),|(⎥⎦⎤⎢⎣⎡ΣΣΣΣ⎥⎦⎤⎢⎣⎡⎥⎦⎤⎢⎣⎡=Σ⎥⎦⎤⎢⎣⎡22211211212121µµµxxxxNp2112212112122122121212121121ΣΣΣ−Σ=−ΣΣ+==−−||||)(),|()(VxmVmxxxµµNp222222222Σ===mmmmpVmVmxxµ),|()(NAX2X17Eric Xing 13Review:Some matrix algebraz Trace and derivativesz Cyclical permutationsz Derivativesz Determinants and derivatives[]∑=iiiadeftr A[]TBBAA=∂∂tr[][]TTTxxAxxAAxxA=∂∂=∂∂trtr[][][]BCACABABC trtrtr==-TAAA=∂∂logEric Xing 14FA joint distributionz Modelz Covariance between x and yz Hence the joint distribution of x and y:z Assume noise is uncorrelated with data or latent variables.),;()(),;()(ΨΛ+==xyxyxxµNNppI0[]()()[]()[][]TTTTTTEEECovΛ=+Λ=−+Λ+=−−=XWXXWXXYXYX,µµµ0) , ()(⎥⎦⎤⎢⎣⎡Ψ+ΛΛΛΛ⎥⎦⎤⎢⎣⎡⎥⎦⎤⎢⎣⎡=⎥⎦⎤⎢⎣⎡TTIµ0yxyxNp8Eric Xing 15Inference in Factor Analysisz Apply the Gaussian conditioning formulas to the joint distribution we derived above, wherewe can now derive the posterior of the latent variable x given observation y, , whereApplying the matrix inversion lemmaz Here we only need to invert a matrix of size |x|×|x|, instead of |y|×|y|.())()(|µµµ−Ψ+ΛΛΛ=−ΣΣ+=−−yym1212212121TT()Ψ+ΛΛ=ΣΛ=Σ=Σ=ΣTTTI22121211),|()(|| 2121VmxyxN=p⇒⇒()()1111 −−−−+= FG()ΛΨ+ΛΛΛ−=ΣΣΣ−Σ=−−121122121121TTI|V()1121−−ΛΨΛ+=TI|V)(||µ−ΨΛ=−yVm12121TEric Xing 16Geometric interpretation: inference is linear projectionz The posterior is:z Posterior covariance does not depend on observed data y!z Computing the posterior mean is just a linear operation:),;()(|| 2121VmxyxN=p()1121−−ΛΨΛ+=TI|V)(||µ−ΨΛ=−yVm12121T9Eric Xing 17EM for Factor Analysisz Incomplete data log likelihood function (marginal density of y)z Estimating m is trivial: z Parameters Λ and Ψ are coupled nonlinearly in log-likelihoodz Complete log likelihood ()()[]TnnnnTnnµyµy NyyND)()( where , trlog)()(log),(−−=Ψ+ΛΛ−Ψ+ΛΛ−=−Ψ+ΛΛ−−Ψ+ΛΛ−=∑∑−−SS11212212TTTTµµθl∑=nnNMLy1µˆ[] []TnnnnnnTnnnnTnnnnnTnnnnnnnnxyxyN NxxNxyxyNxxNxypxpyxpD)()( where , trtrlog)()(loglog)|(log)(log),(log),(Λ−Λ−=Ψ−−Ψ−=Λ−ΨΛ−−Ψ−−−=+==∑∑∑∑∑∑−−1221221221211SSIθclEric Xing 18E-step for Factor Analysis z Computez Recall that we have derived: [][] trtrlog),(12212−Ψ−−Ψ−=∑SNXXNDnTnnθcl)|(),(yxpDθcl)(TnTnnTnTnTTnnTnnXXyXXyyyN


View Full Document

CMU CS 10701 - lecture

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?