Review•Parallel importance sampling!bias due to 1/normalizer!particle filter = recursive parallel IS•MCMC!randomized search for high P(x)!burn-in, mixing!approx. iid: { Xt, Xt+", Xt+2", Xt+3", … }!use to construct estimator of EP(g(X))1Review•Metropolis-Hastings!way to design chain w/ stationary dist’n P(X)!proposal distribution Q(X’ | X)!e.g., random walk N(X’ | X, #2I)!accept w.p. min(1, )!tension btwn long moves, high accept rateMH algorithm•Initialize X1 arbitrarily•For t = 1, 2, …:!Sample X’ ~ Q(X’ | Xt)!Compute p =!With probability min(1, p), set Xt+1 := X’!else Xt+1 := Xt•Note: sequence X1, X2, … will usually contain duplicates182MH example!!!"#$""#$!!!!"#%!"#&!"#'!"#(""#("#'"#&"#%!"!()'$*+,-+.*/3MH example!! !"#$ " "#$ !!!!"#$""#$!4In example•g(x) = x2•True E(g(X)) = 0.28…•Proposal: •Acceptance rate 55–60%•After 1000 samples, minus burn-in of 100:final estimate 0.282361final estimate 0.271167final estimate 0.322270final estimate 0.306541final estimate 0.308716Q(x!| x)=N(x!| x, 0.252I)5Gibbs sampler•Special case of MH•Divide X into blocks of r.v.s B(1), B(2), …•Proposal Q:!pick a block i uniformly!sample XB(i) ~ P(XB(i) | X¬B(i))•Useful property: acceptance rate p = 16Gibbs example!!"#!!"$!!"% !!"&! !"&!"%!"$ !"#' '"&!!"#!!"$!!"%!!"&!!"&!"%!"$!"#7Gibbs example!!"#!! !$"# $ $"#! !"#!!!$"#$$"#!8Gibbs failure example!!!" !#$# "!!%!"!&!#!'$'#&"%9Relational learning•Linear regression, logistic regression: attribute-value learning!set of i.i.d. samples from P(X, Y)•Not all data is like this!an attribute is a property of a single entity!what about properties of sets of entities?10Application: document clustering11Application: recommendations12Latent-variable models13Best-known LVM: PCA•Suppose Xij, Uik, Vjk all ~ Gaussian!yields principal components analysis!or probabilistic PCA!or Bayesian PCA14PCA: the picture15PCA: cartoon example123456…ABCDEF…110010…011000…110110…100110…010100…011101……………………MovieUser16PCA: cartoon examplex1x2x3...xnData matrix X!Compressed matrix Uu1u2u3...unv1 … vkBasis matrix VT17PCA: cartoon examplex1x2x3...xnData matrix X!Compressed matrix Uu1u2u3...unv1 … vkBasis matrix VTrows of VT span the low-rank space17Interpreting PCAu1u2u3...unv1 … vkusersmoviesbasis weightsbasis vectors18Interpreting PCAu1u2u3...unv1 … vkusersmoviesbasis weightsbasis vectorsBasis vectors represent movies that vary togetherWeights say how much each user cares about each type of movie18Mean subtraction!Uik ~ N(0, $2)!Vjk ~ N(0, $2)!Xij ~ N(Ui!Vj, #2)>> mu = mean(X(:));>> colmu = mean(X - mu);>> rowmu = mean(X' - mu)';>> X = X - mu - repmat(colmu, size(X,1), 1) - repmat(rowmu, 1, size(X,2));19Data weights•Let Wij =•Likelihood ! prior = •More generally, Wij ! 020Another use of PCAface images from Groundhog Day, extracted by Cambridge face DB project21Image matrixx1x2x3...xnimagespixels22Result of factoringu1u2u3...unv1 … vkimagespixelsbasis weightsbasis vectorsBasis vectors are often called “eigenfaces”23Eigenfacesimage credit: AT&T Labs Cambridge24PCA: finding the MLE•PCA: !Uik ~ N(0, $2)!Vjk ~ N(0, $2)!Xij ~ N(Ui!Vj, #2)!#/$ % 025PCA & SVD•The singular value decomposition is!X = R & ST!R, S orthonormal; & ! 0 diagonal!All matrices can be expressed this way!See svd, svds in Matlab•So, PCA is U = V
View Full Document