DOC PREVIEW
MIT 9 520 - Learning Deep Generative Models

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning Deep Generative Models9.520 Class 19Ruslan SalakhutdinovBCS and CSAIL, MIT1Talk Outline1. Introduction.1. Autoencoders, Boltzmann Machines.2. Deep Belief Networks (DBN’s).3. Learning Feature Hierarchies with DBN’s.4. Deep Boltzmann Machines (DBM’s).5. Extensions.2Long-term GoalRaw pixel valuesSlightly higher level representation...High level representationTiger rests on the grassLearn progressively complex high-level representations.Use bottom-up + top-down cues.Build high-level repr esentat ionsfrom large unlabeled datasets.Labeled data is used to only slightlyadjust the model for a specific task.3ChallengesRaw pixel valuesSlightly higher level representation...High level representationTiger rests on the grassDeep models are composed ofseveral layers of nonlinear modules.Associated loss functions arealmost always non-convex.Many bad local optima makes deepmodels very difficult to optimize.Idea: lear n one layer at a t ime.4Key RequirementsRaw pixel valuesSlightly higher level representation...High level representationTiger rests on the grassOnline Learning.Learning should scale to largedatasets, containing millions orbillions examples.Inferring high-level representationshould be fast: fraction of asecond.Demo.5AutoencodersDecodervhvCode LayerEncoderWWConsider having D binary visible units v and K binaryhidden units h.Idea: transfor m data into a (low-dimensional) code andthen reconstruct the data from the code.6AutoencodersDecodervhvCode LayerEncoderWWEncoder: hj=11 + exp(−PiviWij), j = 1, ..., K.Decoder: ˆvi=11 + exp(−PjhjWij), i = 1, ..., D.7AutoencodersDecodervhvCode LayerEncoderWWMinimize reconstruction error:minWLoss(v,ˆv, W ) + Penalty(h, W )Loss functions: cross-entropy or squared loss.Typically, one imposes l1regularization on hidden units h and l2regularization on parameters W (related to sp arse coding).8Building Block: RBM’sProbabilistic Analog: Restricted Boltzmann Machines.hvWVisible stochastic binary units v are connected to hiddenstochastic binary f eat ur e detector s h:P (v, h) =1Z(W )expXijvihjWij Markov Random Fields, Log-linear Models, B oltzmann machines.9Building Block: RBM’shvWP (v, h) =1Z(W )expXijvihjWij ,where Z(W ) is known as a partition function:Z(W ) =Xh,vexpXijvihjWij .10Inference with RBM’sP (v, h) =1ZexpXijvihjWijConditional distributions over hidden and visible unit s aregiven by logistic functions:p(hj= 1|v) =11 + exp(−PiviWij)p(vi= 1|h) =11 + exp(−PjhjWji)Key Observation: Given v, we can easily infer thedistribution over hidden units.11Learning with RBM’sPmodel(v) =XhP (v, h) =1ZXhexpXijvihjWijMaximum Likelihood lear ning:∂ log P (v)∂Wij= EPdata[vihj] − EPmodel[vihj],where Pdata(h, v) = P (h|v)Pdata(v), withPdata(v) representing the empirical distribut ion.However, computing EPmodelis difficult due to the presenceof a partition function Z.12Contrastive Divergencei ijijdata1<v h >ijj <v h >i j<v h >i jinfdata reconstruction fantasyMaximum Likelihood lear ning:∆Wij= EPdata[vihj] − EPmodel[vihj]Contrastive Divergence learning:∆Wij= EPdata[vihj] − EPT[vihj]PTrepresent s a distribut ion defined by running a Gibbs c h ain,initialized at the data, for T full steps.13Learning with RBM’sMNIST Digits NORB 3D ObjectsLearned W14Learning with RBM’sInputExtracted FeaturesPLogisticReconstruction15Modeling DocumentsRestricted Boltzm ann Machines: 2-layer modules.hvW• Visible units ar e multinomials over wor d counts.• Hidden units ar e topic detectors.16Extracted Latent Topics20 Newsgroup 2−D Topic Spacecomp.graphicsrec.sport.hockey sci.cryptographysoc.religion.christiantalk.politics.gunstalk.politics.mideast17Collaborative FilteringFor m of Matrix Factor izat ion.hvW• Visible units ar e multinomials over rating values.• Hidden units ar e user preference detectors.Used in Netflix competition.18Deep Belief Networks (DBN’s)• There are limitations on the types of structure that canbe represented efficiently by a single layer of hiddenvariables.• We would like to efficiently lear n multi-layer modelsusing a large supply of high-dimensional highly-structured unlabeled sensory input.19Learning DBN’sRBMRBMRBMGreedy, layer-by-layer learning:• Learn and Freeze W1.• Sample h1∼ P (h1|v; W1).Treat h1as if it were data.• Learn and Freeze W2.• ...Learn high-level representations.20Learning DBN’sRBMRBMRBMUnder certain conditions addingan extra layer always improvesa lower bound on the log probabilityof data.Each layer of features capt ureshigh-order cor relations betweenthe activities of units in thelayer below.21Learning DBN’s1st-layer features 2nd-layer features22Density Es timationDBN samples Mixture of Bernoulli’sMoB, test log- prob: -137.64 per digit.DBN, t est log-prob: -85.97 per digit.Difference of over 50 nats is quite large.23Learning Deep AutoencodersWWWW12000RBM220001000500RBM500RBM1000RBM3430Pretraining consists of learning a stackof RBMs.Each RBM has only one layer of featuredetectors.The learned featur e activations of oneRBM are used as the data for trainingthe next RBM in the stack.24Learning Deep AutoencodersWWWWWWWW500100020005002000UnrollingEncoder123304321Code layerDecoder41000TTTTAfter pretraining multiple layers, themodel is unrolled to create a deepautoencoderInitially encoder and decoder networksuse the same weights.The global fine-tuning usesbackpropagation through the wholeautoencoder to fine-tune the weights foroptimal reconstruction.25Learning Deep AutoencodersWWW +εWWWWW +εW +εW +εWW +εW +εW +ε+εWWWWWW12000RBM220001000500500100010005001 1200020005005001000100020005002000T4TRBMPretraining Unrolling1000RBM343030Fine−tuning4 42 23 34T53T62T71T8Encoder12330432T1TCode layerDecoderRBMTop26Learning Deep AutoencodersWe used a 25 × 25-2000-1000-500-30 aut oencoder to extract 30-Dreal-valued codes for Olivetti face patches (7 hidden layer s is u suallyhard to train).Top Random samples from the test dataset; Middle reconstruct ionsby t he 30-dimensional d e e p autoencoder; and Bottom reconst ructionsby 30-dimensional PCA.27Dimensionality ReductionLegal/JudicialLeading Economic Indicators European Community Monetary/Economic Accounts/Earnings Interbank MarketsGovernment Borrowings Disasters and Accidents Energy MarketsThe Reuters Corpus: 804,414 newswire stories.Simple “bag-of-wor ds” representation.2000-500-250-125- 2 autoencoder.28Document RetrievalPrecision-recall curves when a 10-D query document


View Full Document

MIT 9 520 - Learning Deep Generative Models

Documents in this Course
Load more
Download Learning Deep Generative Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning Deep Generative Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning Deep Generative Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?