CMU CS 10701 - lecture - D2646681

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> lecture

DOC PREVIEW

CMU CS 10701 - lecture

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 26

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Fall 2006 Dimensionality Reduction II Factor Analysis and Metric Learning Eric Xing Lecture 20 November 22 2006 Reading Chap C B book Eric Xing 1 Outline z Probabilistic PCA breif z Factor Analysis somewhat detail z ICA will skip z Distance metric learning from very little side info a very cool method Eric Xing 2 1 Recap of PCA z Popular dimensionality reduction technique z Project data onto directions of greatest variation u arg max 1 m rT yi u m i 1 2 r xi rr 1 arg max u T yi yiT u m i 1 arg max u T Cov y u y2 m r yi r u1T yi Tr u y r 2 i U qT yi R q M Tr uq yi r Uxi y1 z Consequence z xi are uncorrelated such that the covariance matrix z Truncation error K q 1 m r rT xi xi is m i 1 1 O q y k uk ukT k uk ukT x k 1 k 1 Eric Xing 3 Recap of PCA z Popular dimensionality reduction technique z Project data onto directions of greatest variation Useful tool for visualising patterns and clusters within the data set but Need centering Does not explicitly model data noise Eric Xing 4 2 Probabilistic Interpretation continuous X continuous X continuous A Y continuous A Y regression Eric Xing 5 Probabilistic PCA z PCA can be cast as a probabilistic model yn xn n n N 0 2 I with q dimensional latent variables xn N 0 I z The resulting data distribution is yn N T 2 I z Maximum likelihood solution is equivalent to PCA ML 1 N y n ML U q q 2 I 1 2 n Diagonal q contains the top q sample covariance eigen values and Uq contains associated eigenvectors Eric Xing Tipping and Bishop J Royal Stat Soc 6 611 1999 6 3 Factor analysis z An unsupervised linear regression model X p x N x 0 I p y x N y x A Y z where is called a factor loading matrix and is diagonal Geometric interpretation z To generate data first generate a point within the manifold then add noise Coordinates of point are components of latent variable Eric Xing 7 Relationship between PCA and FA z z z Probabilistic PCA is equivalent to factor analysis with equal noise for every dimension i e n isotropic Gaussian N 0 2 I In factor analysis n N 0 for a diagonal covariance matrix An iterative algorithm eg EM is required to find parameters if precisions are not known in advance Eric Xing 8 4 Factor analysis z An unsupervised linear regression model X p x N x 0 I p y x N y x A Y z where is called a factor loading matrix and is diagonal Geometric interpretation To generate data first generate a point within the manifold then add noise Coordinates of point are components of latent variable z Eric Xing 9 Marginal data distribution z z A marginal Gaussian e g p x times a conditional Gaussian e g p y x is a joint Gaussian Any marginal e g p y of a joint Gaussian e g p x y is also a Gaussian Since the marginal is Gaussian we can determine it by just computing its mean and variance Assume noise uncorrelated with data z E Y E X W where W N 0 E X E W X 0 0 E X W X W E X W X W Var Y E Y Y T T A Y T E XX E WW T T T T Eric Xing 10 5 FA Constrained Covariance Gaussian z Marginal density for factor analysis y is p dim x is k dim p y N y T z So the effective covariance is the low rank outer product of two long skinny matrices plus a diagonal matrix z In other words factor analysis is just a constrained Gaussian model If were not diagonal then we could model any Gaussian and it would be pointless Eric Xing 11 Review A primer to multivariate Gaussian z Multivariate Gaussian density p x z 1 2 n 2 1 2 T 1 X1 exp x x 1 2 A2 X A joint Gaussian 12 x x p 1 N 1 1 11 x2 x2 2 21 22 z How to write down p x1 p x1 x2 or p x2 x1 using the block elements in and z Formulas to remember p x2 N x2 m 2m V2m m 2m 2 V2m 22 Eric Xing p x1 x2 N x1 m1 2 V1 2 1 x 2 2 m1 2 1 12 22 1 21 V1 2 11 12 22 12 6 Review Some matrix algebra z tr A aii def Trace and derivatives z Cyclical permutations z Derivatives i tr ABC tr CAB tr BCA tr BA B T A tr xT Ax tr xxT A xxT A A z Determinants and derivatives log A A T A Eric Xing 13 FA joint distribution z Model p x N x 0 I p y x N y x z Covariance between x and y Cov X Y E X 0 Y E X X W T E XX XW T T T T T z Hence the joint distribution of x and y x x 0 I T p N T y y z Assume noise is uncorrelated with data or latent variables Eric Xing 14 7 Inference in Factor Analysis z Apply the Gaussian conditioning formulas to the joint distribution we derived above where 11 I 12 12 T T 22 T we can now derive the posterior of the latent variable x given observation y p x y N x m1 2 V1 2 where 1 m1 2 1 12 22 y 2 T T 1 z I T T 1 y Applying the matrix inversion lemma 1 V1 2 11 12 22 21 V1 2 I T 1 1 G 1 1 1 F 1 m1 2 V1 2 T 1 y Here we only need to invert a matrix of size x x instead of y y Eric Xing 15 Geometric interpretation inference is linear projection z The posterior is p x y N x m1 2 V1 2 V1 2 I T 1 1 m1 2 V1 2 T 1 y z Posterior covariance does not depend on observed data y z Computing the posterior mean is just a linear operation Eric Xing 16 8 EM for Factor Analysis z Incomplete data log likelihood function marginal density of y N 1 1 T yn T yn 2 n 1 N 1 log T tr T S where S 2 2 l D z z z 2 log T ML N1 n yn Parameters and are coupled nonlinearly in log likelihood n yn yn T Estimating m is trivial Complete log likelihood lc D log p xn yn log p xn log p yn xn n n N N 1 1 T log I xnT xn log yn …

View Full Document

CMU CS 10701 - lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-24-25-26 out of 26 pages.

CMU CS 10701 - lecture

Sign up for free to view:

Please select your school