Review•Models that use SVD or eigen-analysis‣PageRank: eigen-analysis of random surfer transition matrix‣usually uses only first eigenvector‣Spectral embedding: eigen-analysis (or equivalently SVD) of random surfer model in symmetric graph‣usually uses 2nd–Kth EVs (small K)‣first EV is boring‣Spectral clustering = spectral embedding followed by clustering!!"# !!"$! !"$!"#!"%!!"%!!"#!!"$!!"$!"#!"%dolphin friendships1Review: PCA•The good: simple, successful•The bad: linear, Gaussian‣E(X) = UVT‣X, U, V ~ Gaussian•The ugly: failure to generalize to new entities‣Partial answer: hierarchical PCA2What about the second rating for a new user?•MLE/MAP of Ui⋅ from one rating:‣knowing μU:‣result:•How should we fix?•Note: often have only a few ratings per user3MCMC for PCA•Can do Bayesian inference by Gibbs sampling—for simplicity, assume σs knownNeed:4Recognizing a Gaussian•Suppose X ~ N(X | μ, σ2)•L = –log P(X=x | μ, σ2) =‣dL/dx =‣d2L/dx2 =•So: if we see d2L/dx2 = a, dL/dx = a(x – b)‣μ = σ2 =5Gibbs step for an element of μU•L =6Gibbs: element of U•L =•dL / dUik =•dL2 / (dUik)2 =‣post. mean = post. var. =7In reality•Above, blocks are single elements of U or V•Better: blocks are entire rows of U or V‣take gradient, Hessian to get mean, covariance‣formulas look a lot like linear regression (normal equations)•And, want to fit σU, σV too‣sample 1/σ2 from a Gamma (or Σ–1 from a Wishart) distribution8Nonlinearity: conjunctive featuresP(rent)ComedyForeign9Disjunctive featuresP(rent)ComedyForeign10Non-Gaussian•X, U, and V could each be non-Gaussian‣e.g., binary!‣rents(U, M), comedy(M), female(U)•For X: predicting –0.1 instead of 0 is only as bad as predicting +0.1 instead of 0•For U, V: might infer –17% comedy or 32% female11Logistic PCA•Regular PCA: Xij ~ N(Ui ⋅ Vj, σ2)•Logistic PCA:•Might expect learning, inference to be hard‣but, MH works well, using dL/dθ, d2L/dθ2•Generalization: exponential family PCA‣w/ optional hierarchy, Bayesianism12Application: fMRIAugmented Brain Imaging39co-occurs(dog,cat) = 1co-occurs(dog,walk) = 1co-occurs(dog,physics) = 0co-occurs(dog,cupcake) = 0:-):-));->fMRIfMRIfMRI(Word + Picture) stimulusDogText CorpusDogBrain activityMitchell et al. (2008) Predicting Human Brain Activity Associated with the Meaning of Nouns. Science.100000010001100001010010101000000StimulusWordsXStimulusVoxelsYstimulus: “dog”stimulus: “cat”stimulus: “hammer”credit: Ajit Singh132-matrix modelΣZXijYjpUiVjZpi = 1 . . . nj = 1 . . . mp = 1 . . . rΣUΣVµUµVµZΣZfMRI voxels(linear PCA)co-occurrences(logistic PCA)14Results (logistic PCA)credit: Ajit SinghPredictive accuracy40Words + Voxels Voxels00.20.40.60.811.21.4Mean Squared Error HB!CMFH!CMFCMFWords + Voxels Voxels00.20.40.60.811.21.4Mean Squared Error HB!CMFH!CMFCMFBetterLower isBetterLower isY (fMRI data): Hold-out Y (fMRI data): Fold-inHierarchical Bayesian ModelHierarchical maximum a posteroriMaximum a posteriori (fixed hyperparameters)Just using fMRI dataAugmenting fMRI data with word co-occurrenceJust using fMRI dataAugmenting fMRI data with word
View Full Document