CMU 11722 Grammar Fomalism - Diffusion Models + Variational Inference - D3625101

Home> Schools> Carnegie Mellon University> (11722) > 11722 Grammar Fomalism> Diffusion Models + Variational Inference

CMU 11722 Grammar Fomalism - Diffusion Models + Variational Inference

Pages 49

Download Save

Unformatted text preview:

10 423 10 623 Generative AI Machine Learning Department School of Computer Science Carnegie Mellon University Diffusion Models Variational Inference Matt Gormley Lecture 8 Feb 9 2024 1 Reminders Homework 2 Generative Models of Images Out Thu Feb 8 Due Mon Feb 19 at 11 59pm 2 U NET 3 Semantic Segmentation Given an image predict a label for every pixel in the image Not merely a classification problem because there are strong correlations between pixel specific labels Figure from https openaccess thecvf com content iccv 2015 papers Noh Learning Deconvolution Network ICCV 2015 paper pdf 4 Instance Segmentation Predict per pixel labels as in semantic segmentation but differentiate between different instances of the same label Example if there are two people in the image one person should be labeled person 1 and one should be labeled person 2 Figure from https openaccess thecvf com content ICCV 2017 papers He Mask R CNN ICCV 2017 paper pdf 5 U Net Contracting path block consists of 3x3 convolution 3x3 convolution ReLU max pooling with stride of 2 downsample repeat the block N times doubling number of channels Expanding path block consists of 2x2 convolution upsampling concatenation with contracting path features 3x3 convolution 3x3 convolution ReLU repeat the block N times halving the number of channels 6 U Net Originally designed for applications to biomedical segmentation Key observation is that the output layer has the same dimensions as the input image possibly with different number of channels 7 UNSUPERVISED LEARNING 8 Unsupervised Learning Assumptions 1 our data comes from some distribution q x0 2 we choose a distribution p x0 for which sampling x0 p x0 is tractable Goal learn s t p x0 q x0 9 Unsupervised Learning Assumptions 1 our data comes from some distribution q x0 2 we choose a distribution p x0 for which sampling x0 p x0 is tractable Goal learn s t p x0 q x0 Example autoregressive LMs true q x0 is the human process that produced text on the web choose p x0 to be an autoregressive language model autoregressive structure means that p xt x1 xt 1 Categorical and ancestral sampling is exact efficient learn by finding log p x0 argmax log p x0 using gradient based updates on 10 Unsupervised Learning Assumptions 1 our data comes from some distribution q x0 2 we choose a distribution p x0 for which sampling x0 p x0 is tractable Goal learn s t p x0 q x0 Example GANs true q x0 is distribution over photos taken and posted to Flikr choose p x0 to be an expressive model e g noise fed into inverted CNN that can generate images sampling is typically easy z N 0 I and x0 f z learn by finding argmax log p x0 No Because we can t even compute log p x0 or its gradient Why not Because the integral is intractable even for a simple 1 hidden layer neural network with nonlinear activation p x0 z p x0 z p z dz 11 so optimize a minimax loss instead Unsupervised Learning Assumptions 1 our data comes from some distribution q x0 2 we choose a distribution p x0 for which sampling x0 p x0 is tractable Goal learn s t p x0 q x0 Example Diffusion Models true q x0 is distribution over photos taken and posted to Flikr choose p x0 to be an expressive model e g noise fed into inverted CNN that can generate images sampling is will be easy learn by finding argmax log p x0 Sort of We can t compute the gradient log p x0 So we instead optimize a variational lower bound more on that later Figure from Ho et al 2020 12 Latent Variable Models For GANs we assume that there are unknown latent variables which give rise to our observations The noise vector z are those latent variables After learning a GAN we can interpolate between images in latent z space Figure from Radford et al 2016 13 DIFFUSION MODELS 14 Diffusion Models Next we will consider 1 diffusion models and 2 The standard presentation variational autoencoders VAEs Although VAEs came first we re going to dive into diffusion models since they will receive more of our attention requires an understanding of variational inference we ll do that next time The steps in defining these models is roughly of diffusion models Define a probability distribution involving Gaussian noise Use a variational lower bound as an objective function alternate presentation Learn the parameters of the probability distribution by optimizing without variational Today we ll do an the objective function So what is a variational lower bound inference 15 Diffusion Model p xT p xT 1 xT p xt xt 1 p xt 1 xt p x0 x1 xT xt 1 xt xt 1 x0 q xT xT 1 q xt 1 xt q xt xt 1 q x1 x0 q x0 Forward Process q x1 T q x0 q xt xt 1 Learned Reverse Process p x1 T p xT p xt 1 xt T t 1 T t 1 Exact Reverse Process q x1 T q xT q xt 1 xt T t 1 The exact reverse process requires inference And even though q xt xt 1 is simple computing q xt 1 xt is intractable Why Because q x0 might be not so simple 16 Diffusion Model p xT 1 xT p xt xt 1 p xt 1 xt p x0 x1 xT xt 1 xt xt 1 x0 q xT xT 1 q xt 1 xt q xt xt 1 q x1 x0 q x0 p xT adds noise to the image Forward Process if we could sample from this we d be done q x1 T q x0 q xt xt 1 T t 1 T t 1 Learned Reverse Process removes noise p x1 T p xT p xt 1 xt goal is to learn this Exact Reverse Process q x1 T q xT q xt 1 xt T t 1 The exact reverse process requires inference And even though q xt xt 1 is simple computing q xt 1 xt is intractable Why Because q x0 might be not so simple 17 Diffusion Model p xT p xT 1 xT p xt xt 1 p xt 1 xt p x0 x1 xT xt 1 xt xt 1 x0 q xT xT 1 q xt 1 xt q xt xt 1 q x1 x0 q x0 Figure from Ho et al 2020 18 Diffusion Model p xT p xT 1 xT p xt xt 1 p xt 1 xt p x0 x1 xT xt 1 xt xt 1 x0 q xT xT 1 q xt 1 xt q xt xt 1 q x1 x0 q x0 Forward Process q x1 T q x0 q xt xt 1 Learned Reverse Process p x1 T p xT p xt 1 xt T t 1 T t 1 19 Diffusion Model Analogy 21 Denoising Diffusion Probabilistic Model DDPM p xT p xT 1 xT p xt xt 1 p xt 1 xt p x0 x1 xT xt 1 xt xt 1 x0 q xT xT 1 q …

View Full Document


School:
Email:
New Password:
Confirm Password:

CMU 11722 Grammar Fomalism - Diffusion Models + Variational Inference

Sign up for free to view:

Please select your school