Lecture 16: Bag-of-words modelsCS6670: Computer VisionNoah SnavelyObject Bag of ‘words’Announcements• Project 3: Eigenfaces– due Wednesday, November 11 at 11:59pm– solo project• Final project presentations:– During the final exams periodSkin detection resultsEigenfacesPCA extracts the eigenvectors of A• Gives a set of vectors v1, v2, v3, ...• Each one of these vectors is a direction in face space– what do these look like?Perceptual and Sensory Augmented ComputingVisual Object Recognition TutorialVisual Object Recognition TutorialK. Grauman, B. LeibeViola-Jones Face Detector: Results5K. Grauman, B. LeibeMoving forward• Faces are pretty well-behaved– Mostly the same basic shape– Lie close to a low-dimensional subspace• Not all objects are as niceDifferent appearance, similar partsBag of WordsModelsAdapted from slides by Rob Fergus and Svetlana LazebnikObject Bag of ‘words’Origin 1: Texture RecognitionExample textures (from Wikipedia)Origin 1: Texture recognition• Texture is characterized by the repetition of basic elements or textons• For stochastic textures, it is the identity of the textons, not their spatial arrangement, that mattersJulesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003Origin 1: Texture recognitionUniversal texton dictionaryhistogramJulesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003Universal texton dictionaryOrigin 2: Bag-of-words models• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)Origin 2: Bag-of-words modelsUS Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)Origin 2: Bag-of-words modelsUS Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)Origin 2: Bag-of-words modelsUS Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)Bags of features for object recognitionCsurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)face, flowers, building• Works pretty well for image-level classification and for recognizing object instancesBags of features for object recognitionCaltech6 datasetbag of features bag of features Parts-and-shape modelBag of features• First, take a bunch of images, extract features, and build up a “dictionary” or “visual vocabulary” – a list of common features• Given a new image, extract features and build a histogram – for each feature, find the closest visual word in the dictionaryBag of features: outline1. Extract featuresBag of features: outline1. Extract features2. Learn “visual vocabulary”Bag of features: outline1. Extract features2. Learn “visual vocabulary”3. Quantize features using visual vocabularyBag of features: outline1. Extract features2. Learn “visual vocabulary”3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words”Regular grid• Vogel & Schiele, 2003• Fei-Fei & Perona, 20051. Feature extractionRegular grid• Vogel & Schiele, 2003• Fei-Fei & Perona, 2005Interest point detector• Csurka et al. 2004• Fei-Fei & Perona, 2005• Sivic et al. 20051. Feature extractionRegular grid• Vogel & Schiele, 2003• Fei-Fei & Perona, 2005Interest point detector• Csurka et al. 2004• Fei-Fei & Perona, 2005• Sivic et al. 2005Other methods• Random sampling (Vidal-Naquet & Ullman, 2002)• Segmentation-based patches (Barnard et al. 2003)1. Feature extractionNormalize patchDetect patches[Mikojaczyk and Schmid ’02][Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03]Compute SIFT descriptor[Lowe’99]Slide credit: Josef Sivic1. Feature extraction…1. Feature extraction2. Learning the visual vocabulary…2. Learning the visual vocabularyClustering…Slide credit: Josef Sivic2. Learning the visual vocabularyClustering…Slide credit: Josef SivicVisual vocabularyK-means clustering• Want to minimize sum of squared Euclidean distances between points xiand their nearest cluster centers mkAlgorithm:• Randomly initialize K cluster centers• Iterate until convergence:• Assign each data point to the nearest center• Recompute each cluster center as the mean of all points assigned to it kkikimxMXDclusterclusterinpoint2)(),(From clustering to vector quantization• Clustering is a common method for learning a visual vocabulary or codebook• Unsupervised learning process• Each cluster center produced by k-means becomes a codevector• Codebook can be learned on separate training set• Provided the training set is sufficiently representative, the codebook will be “universal”• The codebook is used for quantizing features• A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook• Codebook = visual vocabulary• Codevector = visual wordExample visual vocabularyFei-Fei et al. 2005Image patch examples of visual wordsSivic et al. 2005Visual vocabularies: Issues• How to choose vocabulary size?• Too small: visual words not representative of all patches• Too large: quantization artifacts, overfitting• Generative or discriminative learning?• Computational efficiency• Vocabulary trees (Nister & Stewenius, 2006)3. Image representation…..frequencycodewordsImage classification• Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?Uses of BoW representation• Treat as feature vector for standard classifier– e.g k-nearest neighbors, support vector machine• Cluster BoW vectors over image collection– Discover visual themesK nearest neighbors• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify• Works well provided there is lots of data and the distance function is goodk = 5Source: D. LoweLinear classifiers• Find linear function (hyperplane) to separate positive and negative examples0:negative0:positivebbiiiiwxxwxxWhich hyperplaneis best?Support
View Full Document