10-701 Recitation 10 PCA and Topic ModelsLatent Space Methods • PCA, Topic Models are latent space methods – Unsupervised – Reduce data to fewer dimensions – Easier to visualize and understand PCA Topic ModelPrincipal Component Analysis • Key idea: find vectors that capture most of the variability in the data X – These vectors are the principal components:Principal Component Analysis • Let’s say the data X has n rows and p columns – Every row xi is a p-dimensional data point – We assume that X has zero mean • What quantity represents the variability of X? – Answer: The p-by-p covariance matrix XTX, which is equal toPrincipal Component Analysis • What does it mean for a vector to “capture” variability in the data? • Let C = XTX be the covariance matrix. We want a (unit length) vector u that maximizes uTCu – Why do we want this? – Intuition: let v = Cu, so we are maximizing uTv – uTv is high when • The magnitude of v is large • The angle between u and v is small – In other words, we want to find u such that • C makes u longer • C doesn’t change the angle of u • uTCu is maximized when u is the principal eigenvector of C – Hence Cu = λu where λ is the principal (largest) eigenvalue of C – Graphically, the principal eigenvector gives the direction of highest variabilityPrincipal Component Analysis • Graphically, the principal eigenvector gives the direction of highest variability:Principal Component Analysis • We found the first principal component u1 (the principal eigenvector). – How do we find the other principal components? • Again, find a unit-length u that maximizes uTCu, such that u is perpendicular to u1 – The solution is the second eigenvector u2 – Next, maximize uTCu s.t. u perpendicular to u1 and u2, which gives the third eigenvector u3 – And so on…Principal Component Analysis • We maximize uTCu s.t. u perpendicular to u1:Finding the eigenvectors • MATLAB code to find the top k eigenvectors: – [V,D] = eigs(X’*X,k); • V is p-by-k, D is k-by-k • V contains top k eigenvectors as columns • D contains top k eigenvalues on its diagonalPrincipal Component Analysis • So far we’ve talked about eigenvectors of C – What’s the connection with a latent space? • Let’s project the data points onto the 1st PC/eigenvector – Notice how the points didn’t move muchPrincipal Component Analysis • If we pick the top k PCs and project the data X onto them, we get a lower dimensional latent space representation of the data – Note that k < p (original data dimensionality) – By picking the top k PCs, we ensure the latent space distorts the data as little as possiblePrincipal Component Analysis • How do we project the data on the top k PCs? • MATLAB code: – [V,D] = eigs(X’*X,k); – W = X*V; • W is n-by-k, and is the projection of X onto the top k PCs – Wij represents how much Xi depends on the j-th PCPrincipal Component Analysis • The projection W can be thought of as a “compressed” version of X – In some cases, we can even interpret the columns of W as “concepts” • How do we reconstruct the data from W? • MATLAB code: – [V,D] = eigs(X’*X,k); – W = X*V; – Y = W*V’; • Y is n-by-p, and is the reconstruction of X from the top k PCs – If the top k PCs capture most of the data variability, then Y will be similar to XSummary of PCA • PCA projects p-dimensional data X onto a k-dimensional latent space W, where k < p • Properties of the latent space W – Captures most of the variability in X – Easier to visualize and understand than X – Less storage than X – We can reconstruct X from W with minimal errorTopic Models • Setting: want to organize documents, represented as (high-dimensional) bags of words • Key idea: find K topics (collections of words) so we can describe documents as combinations of topics – Note that topics may overlapTopic Models • A Topic Model is a Bayesian generative model – Observed data depends on hidden variables, which can depend on other hidden variables, etc. – Can be graphically depicted as a Bayes Net – You already know another generative model • K-Gaussians Mixture ModelTopic Models • The Topic Model “generative process” describes the model in a compact form: – Draw topic vocabularies k = 1…K: • βk ~ Dirichlet(η) – Draw document topic vectors i = 1…N: • θi ~ Dirichlet(α) – For each document i = 1…N: • Draw words j = 1…Mi: – zij ~ Multinomial(θi) (word-topic indicator) – wij ~ Multinomial(βzij) (the word itself)Topic Models • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative Process: – For k = 1…K: • βk ~ Dirichlet(η) – For i = 1…N: • θi ~ Dirichlet(α) – For i = 1…N, j = 1…Mi: – zij ~ Multinomial(θi) – wij ~ Multinomial(βzij)Reading Plate Notation • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative Process: – For k = 1…K: • βk ~ Dirichlet(η) – For i = 1…N: • θi ~ Dirichlet(α) – For i = 1…N, j = 1…Mi: – zij ~ Multinomial(θi) – wij ~ Multinomial(βzij) This is a plate. Variables inside the plate are duplicated (K times in this case). Dependencies (arrows) are also duplicated This is a (non-random) parameter This is a hidden random variable This is an observed random variableReading Plate Notation η βk K η β1 βK β2 … is the same asCorrespondence with Gen. Process • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative Process: – For k = 1…K: • βk ~ Dirichlet(η) – For i = 1…N: • θi ~ Dirichlet(α) – For i = 1…N, j = 1…Mi: – zij ~ Multinomial(θi) – wij ~ Multinomial(βzij)Correspondence with Gen. Process • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative Process: – For k = 1…K: • βk ~ Dirichlet(η) – For i = 1…N: • θi ~ Dirichlet(α) – For i = 1…N, j = 1…Mi: – zij ~ Multinomial(θi) – wij ~ Multinomial(βzij)Correspondence with Gen. Process • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative Process: – For k = 1…K: • βk ~ Dirichlet(η) – For i = 1…N: • θi ~ Dirichlet(α) – For i = 1…N, j = 1…Mi: – zij ~ Multinomial(θi) – wij ~ Multinomial(βzij)Correspondence with Gen. Process • “Graphical Model” illustration: α η βk θi zij wij K Mi N Generative
View Full Document