New version page

Learning Class-Specific Affinities for Image Labelling

This preview shows page 1-2-3 out of 8 pages.

View Full Document
View Full Document

End of preview. Want to read all 8 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Learning Class-Specific Affinities for Image LabellingDhruv Batra1Rahul Sukthankar2,1Tsuhan Chen1www.ece.cmu.edu/˜dbatra [email protected] [email protected] Mellon University2Intel Research PittsburghAbstractSpectral clustering and eigenvector-based methods havebecome increasingly popular in segmentation and recogni-tion. Although the choice of the pairwise similarity met-ric (or affinities) greatly influences the quality of the re-sults, this choice is typically specified outside the learningframework. In this paper, we present an algorithm to learnclass-specific similarity functions. Mapping our problem ina Conditional Random Fields (CRF) framework enables usto pose the task of learning affinities as parameter learningin undirected graphical models. There are two significantadvances over previous work. First, we learn the affinity be-tween a pair of data-points as a function of a pairwise fea-ture and (in contrast with previous approaches) the classesto which these two data-points were mapped, allowing usto work with a richer class of affinities. Second, our for-mulation provides a principled probabilistic interpretationfor learning all of the parameters that define these affini-ties. Using ground truth segmentations and labellings fortraining, we learn the parameters with the greatest discrim-inative power (in an MLE sense) on the training data. Wedemonstrate the power of this learning algorithm in the set-ting of joint segmentation and recognition of object classes.Specifically, even with very simple appearance features, theproposed method achieves state-of-the-art performance onstandard datasets.1. IntroductionSpectral clustering and eigenvector-based methods havebecome the focus of significant recent research in com-puter vision, particularly in clustering and image segmen-tation [2, 4, 18, 21, 26, 29]. An important benefit of thesemethods is that they offer good, computationally-efficientapproximations to combinatorial problems. The typical ap-proach can be summarized as follows. First, a weightedgraph is constructed, where each node corresponds to eithera data element (in clustering) or a pixel (in segmentation);and the undirected edge weight between two nodes is de-fined by a pairwise similarity metric between the nodes. ForFigure 1: The need for class-specific affinities [best viewedin colour]: the affinity between “blue” and “white” regionsshould be high for images in the top row (those colors occurtogether in street signs); the same affinities should be lowfor images in the bottom row to enable white buildings andbirds to be segmented from blue sky.example, Felzenszwalb and Huttenlocher [6] employ theEuclidean (L2) distance between the colors of connectedpixels as a measure of their dissimilarity.1On the otherhand, in their work on normalized cuts (Ncut), Shi and Ma-lik [26] define the edge affinity using an exponential kernelof the distance between two nodes in feature space. As hasbeen observed by several authors [2, 4, 25], the quality ofresults achieved by such methods is strongly dependent onthe choice of the affinity function. Thus, it a natural to seekprincipled ways for selecting these affinities.For some problems, such as unsupervised, task-independent bottom-up segmentation, it may be impossibleto propose an optimal affinity function for all possible im-ages; indeed the current trend in computer vision is to treatthis type of segmentation simply as a pre-processing stepthat generates over-segmentations [5, 9]. However, in caseswhere segmentation is more closely tied to a specific task(such as object category recognition), could we exploit theavailability of ground-truth segmentations by placing seg-mentation within a supervised learning framework? Meilaand Shi [18] explore this problem in the random walk inter-pretation of Ncuts, by minimizing the KL-divergence be-1Dissimilarity and similarity are both employed in related work. Thispaper consistently defines affinities as similarity.(a) (b) (c) (d)(e) (f)! "j(#)! "ij(µ,#)(g)Figure 2: Overview of our approach [best viewed in colour]: (a) an input image; (b) superpixels extracted from this image;(c) region graph G constructed over those superpixels; (d) optimal labelling of the image; (e) visualization of raw featurespace F; (f) visual words extracted in this feature space; (g) shows the complete graph Gfover these visual words, alongwith weights on nodes and edges. Unlike previous work, we employ class-specific edge weights.tween the transition probabilities derived from the affin-ity matrix and ground-truth segmentation. Bach and Jor-dan [2] define a cost function measuring error between theNcut eigenvector and the ground-truth partition. They usea differentiable approximation of the eigenvector to derivethe affinity matrix that is optimal under this cost function.Cour et al. [4] derive an analytic form for the derivative ofthe Ncut eigenvector and select the affinity matrix that mini-mizes the L2distance between the Ncut eigenvector and thetarget partition. Shental et al. [25], reformulate the typicalcuts criterion [3, 7] as inference in an undirected graphi-cal model, and learn the affinities as the “best” (in an MLEsense) linear combination of features. This last work is theone that is most closely related to our proposed method.Fundamentally, all of these methods attempt to learn amapping from the features derived at a pair of data points(or pixels) to that affinity that best mimics the segmenta-tion provided in the training set. However, this is inherentlyan ill-posed task. Consider, for example, the images shownin Figure 1, where the ground-truth segmentation separatesthe foreground object (sign, bird or building) from the back-ground. Suppose that our (weak) features are colour, andthat we would like to learn the affinity between “blue” and“white”. The images in the top row would suggest that weshould like to keep the affinity between “blue” and “white”high in order to penalize any cuts that separate the two. Onthe other hand, the images in bottom row suggest that theaffinity between “blue” and “white” should be low in orderto encourage cuts that separate the two. We claim that theseconflicting notions can both be simultaneously incorporatedby learning class-specific affinities, i.e., affinities that arenot just a function of the features measured at the data-points, but also the classes to which these two data-pointswere mapped. In our framework, we no


Loading Unlocking...
Login

Join to view Learning Class-Specific Affinities for Image Labelling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning Class-Specific Affinities for Image Labelling and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?