Machine Learning 10 701 Tom M Mitchell Machine Learning Department Carnegie Mellon University March 31 2011 Today Learning representations III Readings Deep Belief Networks ICA CCA Neuroscience example Latent Dirichlet Allocation Deep Belief Networks Hinton Salakhutdinov Science 2006 Problem training networks with many hidden layers doesn t work very well local minima very slow training if initialize with zero weights Deep belief networks autoencoder networks to learn low dimensional encodings but more layers to learn better encodings 1 Deep Belief Networks Hinton Salakhutdinov 2006 original image reconstructed from 2000 1000 500 30 DBN reconstructed from 2000 300 linear PCA versus logistic transformations linear transformations Encoding of digit images in two dimensions Hinton Salakhutdinov 2006 784 2 linear encoding PCA 784 1000 500 250 2 DBNet 2 Restricted Boltzman Machine Bipartite graph logistic activation Inference fill in any nodes estimate other nodes consider vi hj are boolean variables h1 v1 v2 h2 h3 vn Hinton Salakhutdinov 2006 Deep Belief Networks Training 3 Independent Components Analysis ICA PCA seeks orthogonal directions Y1 YM in feature space X that minimize reconstruction error ICA seeks directions Y1 YM that are most statistically independent I e that minimize I Y the mutual information between the Yj x x Dimensionality reduction across multiple datasets Given data sets A and B find linear projections of each into a common lower dimensional space Generalized SVD minimize sq reconstruction errors of both Canonical correlation analysis maximize correlation of A and B in the projected space learned shared representation data set A data set B 4 slide courtesy of Indra Rustandi An Example Use of CCA arbitrary word Generative theory of word representation predicted brain activity 5 fMRI activation for bottle bottle Mean activation averaged over 60 different stimuli fMRI activation high average below average bottle minus mean activation Idea Predict neural activity from corpus statistics of stimulus word Mitchell et al Science 2008 Generative theory predicted activity for telephone telephone Statistical features from a trillion word text corpus Mapping learned from fMRI data 6 Semantic feature values celery 0 8368 eat 0 3461 taste 0 3153 fill 0 2430 see 0 1145 clean 0 0600 open 0 0586 smell 0 0286 touch 0 0000 drive 0 0000 wear 0 0000 lift 0 0000 break 0 0000 ride Semantic feature values airplane 0 8673 ride 0 2891 see 0 2851 say 0 1689 near 0 1228 open 0 0883 hear 0 0771 run 0 0749 lift 0 0049 smell 0 0010 wear 0 0000 taste 0 0000 rub 0 0000 manipulate Predicted Activation is Sum of Feature Contributions eat Predicted Celery 0 84 taste 0 35 fill 0 32 feat celery from corpus statistics c14382 eat learned high low 500 000 learned parameters Predicted Celery 7 celery airplane fMRI activation Predicted high average Observed below average Predicted and observed fMRI images for celery and airplane after training on 58 other words Evaluating the Computational Model Train it using 58 of the 60 word stimuli Apply it to predict fMRI images for other 2 words Test show it the observed images for the 2 held out and make it predict which is which celery airplane 1770 test pairs in leave 2 out Random guessing 0 50 accuracy Accuracy above 0 61 is significant p 0 05 8 Q4 What are the actual semantic primitives from which neural encodings are composed word predict neural representation verb cooccurrence features predicted neural representation 25 verb co occurrence counts Alternative semantic feature sets PREDEFINED corpus features Mean Acc 25 verb co occurrences 79 486 verb co occurrences 79 50 000 word co occurences 76 300 Latent Semantic Analysis features 73 50 corpus features from Collobert Weston ICML08 78 218 features collected using Mechanical Turk 83 20 features discovered from the data 87 developed by Dean Pommerleau developed by Indra Rustandi 9 Rustandi et al 2009 Discovering shared semantic basis specific to study subject independent of study subject word w 20 learned latent features subj 1 word pict 218 base features predict representation predict representation subj 9 word pict predict representation subj 10 word only predict representation subj 20 word only learned intermediate semantic features trained using Canonical Correlation Analysis Multi study WP WO Multi subject 9 11 CCA Top Stimulus Words component 1 component 2 component 3 component 4 most active stimuli apartment church closet house barn screwdriver pliers refrigerator knife hammer shelter manipulation telephone butterfly bicycle beetle dog pants dress glass coat chair things that touch me 10 Subject 1 Word Picture stimuli Multi study WP WO Multi subject 9 11 CCA Component 1 Subject 1 Word ONLY stimuli Multi study WP WO Multi subject 9 11 CCA Component 1 11
View Full Document