Review•Models that use SVD or eigen-analysis!PageRank: eigen-analysis of random surfer transition matrix!usually uses only first eigenvector!Spectral embedding: eigen-analysis (or equivalently SVD) of random surfer model in symmetric graph!usually uses 2nd–Kth EVs (small K)!first EV is boring!Spectral clustering = spectral embedding followed by clustering!!"# !!"$! !"$!"#!"%!!"%!!"#!!"$!!"$!"#!"%dolphin friendships1Review: PCA•The good: simple, successful•The bad: linear, Gaussian!E(X) = UVT!X, U, V ~ Gaussian•The ugly: failure to generalize to new entities!Partial answer: hierarchical PCA2What about the second rating for a new user?•MLE/MAP of Ui! from one rating:!knowing "U:!result:•How should we fix?•Note: often have only a few ratings per user3MCMC for PCA•Can do Bayesian inference by Gibbs sampling—for simplicity, assume #s knownNeed:4Recognizing a Gaussian•Suppose X ~ N(X | ", #2)•L = –log P(X=x | ", #2) =!dL/dx =!d2L/dx2 =•So: if we see d2L/dx2 = a, dL/dx = a(x – b)!" = #2 =5Gibbs step for an element of "U•L =6Gibbs: element of U•L =•dL / dUik =•dL2 / (dUik)2 =!post. mean = post. var. =7In reality•Above, blocks are single elements of U or V•Better: blocks are entire rows of U or V!take gradient, Hessian to get mean, covariance!formulas look a lot like linear regression (normal equations)•And, want to fit #U, #V too!sample 1/#2 from a Gamma (or $–1 from a Wishart) distribution8Nonlinearity: conjunctive featuresP(rent)ComedyForeign9Disjunctive featuresP(rent)ComedyForeign10Non-Gaussian•X, U, and V could each be non-Gaussian!e.g., binary!!rents(U, M), comedy(M), female(U)•For X: predicting –0.1 instead of 0 is only as bad as predicting +0.1 instead of 0•For U, V: might infer –17% comedy or 32% female11Logistic PCA•Regular PCA: Xij ~ N(Ui ! Vj, #2)•Logistic PCA:•Might expect learning, inference to be hard!but, MH works well, using dL/d%, d2L/d%2•Generalization: exponential family PCA!w/ optional hierarchy, Bayesianism12Application: fMRIAugmented Brain Imaging39co-occurs(dog,cat) = 1co-occurs(dog,walk) = 1co-occurs(dog,physics) = 0co-occurs(dog,cupcake) = 0:-):-));->fMRIfMRIfMRI(Word + Picture) stimulusDogText CorpusDogBrain activityMitchell et al. (2008) Predicting Human Brain Activity Associated with the Meaning of Nouns. Science.100000010001100001010010101000000StimulusWordsXStimulusVoxelsYstimulus: “dog”stimulus: “cat”stimulus: “hammer”credit: Ajit Singh132-matrix modelΣZXijYjpUiVjZpi = 1 . . . nj = 1 . . . mp = 1 . . . rΣUΣVµUµVµZΣZfMRI voxels(linear PCA)co-occurrences(logistic PCA)14Results (logistic PCA)credit: Ajit SinghPredictive accuracy40Words + Voxels Voxels00.20.40.60.811.21.4Mean Squared Error HB!CMFH!CMFCMFWords + Voxels Voxels00.20.40.60.811.21.4Mean Squared Error HB!CMFH!CMFCMFBetterLower isBetterLower isY (fMRI data): Hold-out Y (fMRI data): Fold-inHierarchical Bayesian ModelHierarchical maximum a posteroriMaximum a posteriori (fixed hyperparameters)Just using fMRI dataAugmenting fMRI data with word co-occurrenceJust using fMRI dataAugmenting fMRI data with word
View Full Document