Using Multiple Segmentations to Discover Objects and their Extent in Image CollectionsCarolina GalleguillosPDF created with pdfFactory trial version www.pdffactory.comGoal: Given a collection of unlabelled images, discover visual object categories and their segmentation automatically.IntroductionApproach: 1) Produce multiple segmentations of each image.2) Discover clusters of similar segments.3) Score all segments by how well they fit object cluster.PDF created with pdfFactory trial version www.pdffactory.comBackgroundThe task of discovering objects and scene categories [Fei-Fei& Perona, 2005] [Quelhas et al, 2005] and [Sivicet all, 2005]Borrowing tools from the statistical text analysis community (pLSAand LDA) that use bag of words approach.§ Images are treated as documents.§ Cluster affine invariant point descriptors as visual words.§ Each Image is represented by a histogram of visual words.MAPPING ONTO VISUAL DOMAIN:Issues: Visual words are not always as descriptive as text (visual phonemes or visual letters).PDF created with pdfFactory trial version www.pdffactory.comRepresent an image as a histogram of “visual words”•Detect affine covariant regions.•Represent each region by a SIFT descriptor.•Build visual vocabulary by k-means clustering (K~1,000).•Assign each region to the nearest cluster centre.2010...Background: Bag-of-words ApproachesPDF created with pdfFactory trial version www.pdffactory.comVisual word shortcomingsVisual Polysemy: Single visual word occurring on different (but locally similar) parts on different object categories.Visual Synonyms: Two different visual words representing a similar part of an object (wheel of a motorbike).If the object and its background are highly correlated, modelling the entire image can actually help recognition.PDF created with pdfFactory trial version www.pdffactory.comImagesMultiple segmentationsCars BuildingsIntuition #1: All segmentations are wrong, but some segments are goodIntuition #2: All good segments are alike, each bad segment is bad in its own way. Multiple segmentations for to produce groups of visual wordsPDF created with pdfFactory trial version www.pdffactory.comThe AlgorithmGiven a large collection of unlabeled images:1.For each image, compute multiple candidate segmentations using Normalized-Cuts.2.For each segment, compute histograms of visual words.3.Perform topic discovery, treating each segment as a document, using LDA over all segments in the collection.4.For each topic sort segments using KL divergence.PDF created with pdfFactory trial version www.pdffactory.comMultiple segmentationsWe use Normalized Cuts, varying parameter settings: # segments and image scale.PDF created with pdfFactory trial version www.pdffactory.comFind visual wordsDiscovering Objectsw …visual words d …documents (images) z …topics (‘objects’)P(w|d), P(z|d) and P(w|z) are multinomial distributionsUse statistical text analysis techniques such as Latent Semantic Analysis (LSA), Probabilistic LSA [Hofmann ’99] or Latent Dirichlet Allocation (LDA) [Bleiet al. ’03]. Here we chose LDA.Form histogramsDiscover topics (objects)SegmentsVisual wordsRepresenting Segments:Finding coherent segment clusters (topics):PDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. Dirichlet priorPDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. Multinomial distrib.of topics (or topics mixture)PDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. TopicPDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. wordPDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. Matrix P(wi=1 | zi=1)PDF created with pdfFactory trial version www.pdffactory.comLatent DirichletAllocation [Bleiet al, 2003]Generative probabilistic model for collections of discrete data such as text corpora.LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. Free variational parametersPDF created with pdfFactory trial version www.pdffactory.comSegment scoringCompare segment distributions against learned topic distribution over visual words using KL divergenceVisual wordsProbabilityVisual wordsProbabilityVisual wordsProbabilityLearned topicdistributionKL divergence: 1.89 KL divergence: 2.90PDF created with pdfFactory trial version www.pdffactory.comSegmentations and their KL divergenceRetrieval accuracy Segmentation accuracyAverage precision for MSRC Average overlap area
View Full Document