DOC PREVIEW
UT CS 395T - Clustering appearance and shape by learning jigsaws

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Clustering appearance and shape by learning jigsawsAnitha Kannan, John Winn, Carsten RotherMicrosoft Research Cambridge[ankannan, jwinn, carrot]@microsoft.comAbstractPatch-based appearance models are used in a wide range of computer vision ap-plications. To learn such models it has previously been necessary to specify asuitable set of patch sizes and shapes by hand. In the jigsaw model presentedhere, the shape, size and appearance of patches are learned automatically fromthe repeated structures in a set of training images. By learning such irregularlyshaped ‘jigsaw pieces’, we are able to discover both the shape and the appearanceof object parts without supervision. When applied to face images, for example,the learned jigsaw pieces are surprisingly strongly associated with face parts ofdifferent shapes and scales such as eyes, noses, eyebrows and cheeks, to namea few. We conclude that learning the shape of the patch not only improves theaccuracy of appearance-based part detection but also allows for shape-based partdetection. This enables parts of similar appearance but different shapes to be dis-tinguished; for example, while foreheads and cheeks are both skin colored, theyhave markedly different shapes.1 IntroductionMany computer vision tasks require the use of appearance and shape models to represent objectsin the scene. The choices for appearance models range from histogram-based representations thatthrows away spatial information, to template-based representations that try to capture the entire spa-tial layout of the objects but cope poorly with articulation, deformation or variation in appearance.In the middle of this spectrum lie patch-based models that aim to find the right balance between thetwo extremes.However, a central problem with existing patch-based models is that there is no way to choose theshape and size of a patch; typically a predefined set of patch sizes and shapes (often rectangles orcircles) are used. We believe that natural images can provide enough cues to allow patches to bediscovered of varying shape and size corresponding to the shape and size of object parts presentin the images. Indeed, we will show that the patches discovered by the jigsaw model can becomestrongly associated with semantic object parts.With this motivation, we introduce a generative model for a set of images that learns to extract irreg-ularly shaped and sized patches from a latent image which are combined to generate each trainingimage. We call this latent image a jigsaw as it contains all the necessary ‘jigsaw pieces’ that can beused to generate the target image set. We present an inference algorithm for learning the jigsaw andfor finding the jigsaw pieces that make up each image.As our proposed jigsaw model is a generative model for an image, it can be readily used as a compo-nent in many computer vision applications for both image understanding and image synthesis. Theseinclude object recognition, detection, image segmentation and image classification, object synthe-sis, image de-noising, super resolution, texture transfer between images and image in-painting. Infact, the jigsaw model is likely to be useable as a direct replacement for a fixed patch model in anyexisting patch-based system.2 Related workThe closest work to ours is the epitome model of Jojic et al. [1]. This is a generative model forimage patches, or alternatively a model for images if patches that share coordinates in the imageare averaged together (although this averaging often leads to a blurry result). Epitomes are learnedusing a set of fixed shaped patches over a small range of sizes. In contrast, in the jigsaw model,the inference process chooses appropriately shaped and sized pieces from the training images whenlearning the jigsaw. The difference between these two models is illustrated in section 4.Our work also closely relates to the seminal work of Freeman et al. [2] that proposed a generalmachinery for inferring underlying scenes from images, with goals such as in optical flow estimationand super-resolution. They define a Markov random field over image patches and infer the hiddenscene representation using belief propagation. Again, they use a set of fixed size image patches,hoping to reach a reasonable trade-off between capturing sufficient statistics in each patch, anddisambiguating different kinds of features. Along these lines, Markov random field with largercliques have also been used to capture the statistic of natural images, such as the field of expertsmodel proposed in [3] which represents the field potentials as non-linear functions of linear filters.Again, the underlying linear filters are applied to patches of a fixed size.In the domain of image synthesis the work of Freeman et al. [2] has inspired many patch-basedsynthesis algorithms including super resolution, texture transfer, image in-painting or photo syn-thesis. They can be viewed as a data-driven way of sampling from the Markov random field withhigh-order cliques given by the overlapping patches. The texture synthesis and transfer algorithm ofEfros et al. [4] constructs a new image by greedily selecting overlapping patches so that the seamtransition is not visible. Whilst this work does allow different patch shapes, it does not learn patchappearance since it works from a supplied texture image. Recently a similar approach has beenproposed in [5] for synthesising a collage image from a given set of input images, although in thiscase a probabilistic model is defined and optimised.Patch-based models are also widely applied in object recognition research [6, 7, 8, 9, 10]. Thesemodels use hand-selected patch shapes (typically rectangles) which can lead to poor results giventhat different object parts have different sizes and shapes. In fact, the use of fixed patches reducesaccuracy when the object part is of different size and shape than the chosen patch; in this case, thepatch model has to cope with the variability outside the object part. This effect is particularly evidentwhen the part is at the edge of the object as the model then has to try and capture the variability ofthe background. In addition, such models ignore the shape of the object part which is frequentlymuch more discriminative than appearance alone.The paper is structured as follows: In section 3 we introduce the probabilistic model and describea method for performing learning and inference in the model. In section 4 we show results forsynthetic and real data and


View Full Document

UT CS 395T - Clustering appearance and shape by learning jigsaws

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Clustering appearance and shape by learning jigsaws
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Clustering appearance and shape by learning jigsaws and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Clustering appearance and shape by learning jigsaws 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?