95 891 Introduction to Artificial Intelligence Session 8 Image Synthesis and Applications of Computer Vision and David Steier dmsteier andrew cmu edu September 19 2024 95 891 Introduction to Artificial Intelligence 1 Agenda Quiz Contrastive learning Image synthesis using DALLE 2 and other models Case study Computer vision for parasite detection 95 891 Introduction to Artificial Intelligence 2 Contrastive Learning Learn the general features of a data set without labels by training a network to recognize which points are similar Three steps 1 Augment the data to create two similar points for images cropping resizing recoloring etc 2 Create vector representations for each data point in the augmented set 3 Maximize the similarity of the augmented data points by minimizing a contrastive loss function E Tiu Understanding Contrastive Learning January 7 2021 https towardsdatascience com understanding contrastive learning d5b19fd96607 95 891 Introduction to Artificial Intelligence 3 Remember Triplets in Facial Recognition Use CNN to compute face embeddings as 128 element vector Need to do this millions of times for millions of images Build on pretrained networks such as OpenFace https github com cmusatyalab openface A Geitgey Machine Learning is Fun Part 4 Modern Face Recognition with Deep Learning July 24 2016 https medium com ageitgey machine learning is fun part 4 modern face recognition with deep learning c3cffc121d78 95 891 Introduction to Artificial Intelligence 4 Self Supervised Learning Without Labels Use available images to generate labels based on similarity Represent images as vectors Quantify similarity of images Chaudhary A The Illustrated SimCLR2 Framework March 2020 https amitness com 2020 03 illustrated simclr 95 891 Introduction to Artificial Intelligence 5 SimCLR Framework for Contrastive Learning Chaudhary A The Illustrated SimCLR2 Framework March 2020 https amitness com 2020 03 illustrated simclr Based on https arxiv org abs 2002 05709 95 891 Introduction to Artificial Intelligence 6 Getting Similar Images Augment color resize rotate data sets so that augmented images are known to be similar Use cosine similarity on vector representations Chaudhary A The Illustrated SimCLR2 Framework March 2020 https amitness com 2020 03 illustrated simclr 95 891 Introduction to Artificial Intelligence 7 Encoding the Images Chaudhary A The Illustrated SimCLR2 Framework March 2020 https amitness com 2020 03 illustrated simclr 95 891 Introduction to Artificial Intelligence 8 Minimizing the Loss Function The goal is to represent the images to maximize the probability of the augmented second image being most similar to the first image The loss function is smallest when the augmented image is most similar to the original image and most dissimilar from the remaining images Minimize overall loss as average across all pairs Chaudhary A The Illustrated SimCLR2 Framework March 2020 https amitness com 2020 03 illustrated simclr 95 891 Introduction to Artificial Intelligence 9 Google s SimCLRv2 for Contrastive Learning E Tiu Understanding Contrastive Learning January 7 2021 https towardsdatascience com understanding contrastive learning d5b19fd96607 Also see https amitness com 2020 03 illustrated simclr 95 891 Introduction to Artificial Intelligence 10 SimCLRv2 Performance Self supervised it performs equivalently to AlexNet When supervised learning is used to fine tune a SimCLRv2 model it outperforms supervised learner AlexNet which has 100x as many labeled examples https arxiv org abs 2002 05709 95 891 Introduction to Artificial Intelligence 11 Contrastive Learning Image Pre Training CLIP Encodes images and natural language descriptions to create image embeddings A Radford Learning Transferable Visual Models From Natural Language Supervision 26 Feb 2021 https arxiv org abs 2103 00020 95 891 Introduction to Artificial Intelligence 12 How CLIP Learns Text Image Pairings Standard object recognition jointly trains an image feature extractor and a linear classifier to predict some label During pre training given a batch of N image text pairs CLIP predicts which of N N image text pairings actually occurred CLIP jointly trains an image encoder and text encoder to maximize the cosine similarity of the image and text embeddings of the N real pairs in the batch while minimizing the cosine similarity of the embeddings of the N2 N incorrect pairings At test time the text encoder creates a zero shot linear classifier by embedding the natural language description of the target classes Pre trained on 400M image text pairs CLIP can perform OCR activity recognition from videos geo localization and outperforms single task classifiers on supervised ImageNet recognition based on ResNet 50 A Radford Learning Transferable Visual Models From Natural Language Supervision 26 Feb 2021 https arxiv org abs 2103 00020 95 891 Introduction to Artificial Intelligence 13 DALL E 2 0 https openai com dall e 2 Combines pre training on CLIPS 400M image text pairs curated to exclude harmful content with image synthesis to produce variations Prompt Vibrant portrait of Salvador Dali with a robotic half face DALLE 2 0 released in summer 2022 to a million users but source code and models are not released A Ramesh Hierarchical Text Conditional Image Generation with CLIP Latents 13 April 2022 https arxiv org abs 2204 06125 14 DALL E 2 0 Combines CLIP and unCLIP A Ramesh Hierarchical Text Conditional Image Generation with CLIP Latents 13 April 2022 https arxiv org abs 2204 06125 15 Diffusion Models J Ho Denoising Diffusion Probabilistic Models 16 Dec 2020 https arxiv org pdf 2006 11239 pdf ref assemblyai com 95 891 Introduction to Artificial Intelligence 16 DALLE 2 0 Captures Semantics and Style Two components A prior P zi y that produces CLIP image embeddings zi conditioned on captions y A decoder P x zi y that produces images x conditioned on CLIP image embeddings zi and optionally text captions y Stacking the two components yields the generative model A Ramesh Hierarchical Text Conditional Image Generation with CLIP Latents 13 April 2022 https arxiv org abs 2204 06125 95 891 Introduction to Artificial Intelligence 17 DALLE 3 0 Adds More Detail https openai com dall e 3 95 891 Introduction to Artificial Intelligence 18 Re captioning to train DALL E 3 0 Betker K et al Improving Image Generation with Better Captions https cdn openai com papers dall e 3 pdf 95 891 Introduction to Artificial Intelligence 19 StableDiffusion Open source Built
View Full Document