UT CS 395T - Clustering appearance and shape by learning jigsaws - D481751

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Clustering appearance and shape by learning jigsaws

DOC PREVIEW

UT CS 395T - Clustering appearance and shape by learning jigsaws

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Clustering appearance and shape by learning jigsawsAnitha Kannan, John Winn, Carsten RotherMicrosoft Research Cambridge[ankannan, jwinn, carrot]@microsoft.comAbstractPatch-based appearance models are used in a wide range of computer vision ap-plications. To learn such models it has previously been necessary to specify asuitable set of patch sizes and shapes by hand. In the jigsaw model presentedhere, the shape, size and appearance of patches are learned automatically fromthe repeated structures in a set of training images. By learning such irregularlyshaped ‘jigsaw pieces’, we are able to discover both the shape and the appearanceof object parts without supervision. When applied to face images, for example,the learned jigsaw pieces are surprisingly strongly associated with face parts ofdifferent shapes and scales such as eyes, noses, eyebrows and cheeks, to namea few. We conclude that learning the shape of the patch not only improves theaccuracy of appearance-based part detection but also allows for shape-based partdetection. This enables parts of similar appearance but different shapes to be dis-tinguished; for example, while foreheads and cheeks are both skin colored, theyhave markedly different shapes.1 IntroductionMany computer vision tasks require the use of appearance and shape models to represent objectsin the scene. The choices for appearance models range from histogram-based representations thatthrows away spatial information, to template-based representations that try to capture the entire spa-tial layout of the objects but cope poorly with articulation, deformation or variation in appearance.In the middle of this spectrum lie patch-based models that aim to find the right balance between thetwo extremes.However, a central problem with existing patch-based models is that there is no way to choose theshape and size of a patch; typically a predefined set of patch sizes and shapes (often rectangles orcircles) are used. We believe that natural images can provide enough cues to allow patches to bediscovered of varying shape and size corresponding to the shape and size of object parts presentin the images. Indeed, we will show that the patches discovered by the jigsaw model can becomestrongly associated with semantic object parts.With this motivation, we introduce a generative model for a set of images that learns to extract irreg-ularly shaped and sized patches from a latent image which are combined to generate each trainingimage. We call this latent image a jigsaw as it contains all the necessary ‘jigsaw pieces’ that can beused to generate the target image set. We present an inference algorithm for learning the jigsaw andfor finding the jigsaw pieces that make up each image.As our proposed jigsaw model is a generative model for an image, it can be readily used as a compo-nent in many computer vision applications for both image understanding and image synthesis. Theseinclude object recognition, detection, image segmentation and image classification, object synthe-sis, image de-noising, super resolution, texture transfer between images and image in-painting. Infact, the jigsaw model is likely to be useable as a direct replacement for a fixed patch model in anyexisting patch-based system.2 Related workThe closest work to ours is the epitome model of Jojic et al. [1]. This is a generative model forimage patches, or alternatively a model for images if patches that share coordinates in the imageare averaged together (although this averaging often leads to a blurry result). Epitomes are learnedusing a set of fixed shaped patches over a small range of sizes. In contrast, in the jigsaw model,the inference process chooses appropriately shaped and sized pieces from the training images whenlearning the jigsaw. The difference between these two models is illustrated in section 4.Our work also closely relates to the seminal work of Freeman et al. [2] that proposed a generalmachinery for inferring underlying scenes from images, with goals such as in optical flow estimationand super-resolution. They define a Markov random field over image patches and infer the hiddenscene representation using belief propagation. Again, they use a set of fixed size image patches,hoping to reach a reasonable trade-off between capturing sufficient statistics in each patch, anddisambiguating different kinds of features. Along these lines, Markov random field with largercliques have also been used to capture the statistic of natural images, such as the field of expertsmodel proposed in [3] which represents the field potentials as non-linear functions of linear filters.Again, the underlying linear filters are applied to patches of a fixed size.In the domain of image synthesis the work of Freeman et al. [2] has inspired many patch-basedsynthesis algorithms including super resolution, texture transfer, image in-painting or photo syn-thesis. They can be viewed as a data-driven way of sampling from the Markov random field withhigh-order cliques given by the overlapping patches. The texture synthesis and transfer algorithm ofEfros et al. [4] constructs a new image by greedily selecting overlapping patches so that the seamtransition is not visible. Whilst this work does allow different patch shapes, it does not learn patchappearance since it works from a supplied texture image. Recently a similar approach has beenproposed in [5] for synthesising a collage image from a given set of input images, although in thiscase a probabilistic model is defined and optimised.Patch-based models are also widely applied in object recognition research [6, 7, 8, 9, 10]. Thesemodels use hand-selected patch shapes (typically rectangles) which can lead to poor results giventhat different object parts have different sizes and shapes. In fact, the use of fixed patches reducesaccuracy when the object part is of different size and shape than the chosen patch; in this case, thepatch model has to cope with the variability outside the object part. This effect is particularly evidentwhen the part is at the edge of the object as the model then has to try and capture the variability ofthe background. In addition, such models ignore the shape of the object part which is frequentlymuch more discriminative than appearance alone.The paper is structured as follows: In section 3 we introduce the probabilistic model and describea method for performing learning and inference in the model. In section 4 we show results forsynthetic and real data and

View Full Document

UT CS 395T - Clustering appearance and shape by learning jigsaws

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

UT CS 395T - Clustering appearance and shape by learning jigsaws

Sign up for free to view:

Please select your school