Shape Matching and Object Recognition using Low Distortion Correspondences

Home> Academic Documents> Shape Matching and Object Recognition using Low Distortion Correspondences

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Shape Matching and Object Recognition using Low Distortion CorrespondencesAlexander C. Berg Tamara L. Berg Jitendra MalikDepartment of Electrical Engineering and Computer ScienceU.C. Berkeley{aberg,millert,malik}@eecs.berkeley.eduAbstractWe approach recognition in the framework of deformableshape matching, relying on a new algorithm for finding cor-respondences between feature points. This algorithm setsup correspondence as an integer quadratic programmingproblem, where the cost function has terms based on sim-ilarity of corresponding geometric blur point descriptorsas well as the geometric distortion between pairs of cor-responding feature points. The algorithm handles outliers,and thus enables matching of exemplars to query imagesin the presence of occlusion and clutter. Given the corre-spondences, we estimate an aligning transform, typicallya regularized thin plate spline, resulting in a dense corre-spondence between the two shapes. Object recognition isthen handled in a nearest neighbor framework where thedistance between exemplar and query is the matching costbetween corresponding points. We show results on twodatasets. One is the Caltech 101 dataset (Fei-Fei, Fergusand Perona), an extremely challenging dataset with largeintraclass variation. Our approach yields a 48% correctclassification rate, compared to Fei-Fei et al’s 16%. We alsoshow results for localizing frontal and profile faces that arecomparable to special purpose approaches tuned to faces.1. IntroductionOur thesis is that recognizing object categories, be theyfish or bicycles, is fundamentally a problem of deformableshape matching. Back in the 1970s, at least three differ-ent research groups working in different communities ini-tiated such an approach: in computer vision, Fischler andElschlager [10], in statistical image analysis, Grenander( [12]and earlier), and in neural networks, von der Malsburg([15] and earlier). The core idea that related but not identi-cal shapes can be deformed into alignment using simple co-ordinate transformations dates even further back, to D’ArcyThompson, in the 1910’s with, On Growth and Form [30].The basic subroutine in deformable matching takes asinput an image with an unknown object (shape) and com-pares it to a model by: solving the correspondence prob-lem between the model and the object, using the correspon-dences to estimate and perform an aligning transformationand computing a similarity based on both the aligning trans-form and the residual difference after applying the align-ing transformation. This subroutine can be used for objectrecognition by using stored exemplars for different objectcategories as models, possibly with multiple exemplars fordifferent 2D aspects of a 3D object.Practically speaking, the most difficult step is the corre-spondence problem: how do we algorithmically determinewhich points on two shapes correspond? The correspon-dence problem in this setting is more difficult than in thesetting of binocular stereopsis, for a number of reasons:1. Intra-category variation: the aligning transform be-tween instances of a category is not a simple param-eterized transform. It is reasonable to assume that themapping is a smooth, but it may be difficult to charac-terize by a small number of parameters as in a rigid oraffine transform.2. Occlusion and clutter: while we may assume that thestored prototype shapes are present in a clean, isolatedversion, the shape that we have to recognize in an im-age is in the context of multiple other objects, possiblyoccluding each other.3. 3D pose changes: since the stored exemplars representmultiple 2D views of a 3D object, we could have varia-tion in image appearance which is purely pose-related,the 3D shapes could be identicalThe principal contribution of this paper is a novel al-gorithm for solving the correspondence problem for shapematching.We represent shape by a set of points sampled from con-tours on the shape. Typically 50-100 pixel locations sam-pled from the output of an edge detector are used; as we usemore samples we get better approximations. Note that thereis nothing special about these points – they are not requiredto be keypoints such as those found using a Harris/Forstnertype of operator or scale-space extrema of a Laplacian ofGaussian operator, such as used by Lowe [18].We exploit three kinds of constraints to solve the corre-spondence problem between shapes:1. Corresponding points on the two shapes should havesimilar local descriptors. There are several choiceshere: SIFT [18], Shape contexts [3], and Geometricblur[4]. We use geometric blur.2. Minimizing geometric distortion: If i and j are pointson the model corresponding to i0and j0respectively,then the vector from i to j, ~rijshould be consistentwith the vector from i0to j0, ~ri0j0. If the transformationfrom one shape to another is a translation accompaniedby pure scaling, then these vectors must be scalar mul-tiples. If the transformation is a pure Euclidean mo-tion, then the lengths must be preserved. Etc.3. Smoothness of the transformation from one shape tothe other. This enables us to interpolate the transfor-mation to the entire shape, given just the knowledge ofthe correspondences for a subset of the sample points.We use regularized thin plate splines to characterizethe transformations.The similarity of point descriptors and the geometricdistortion is encoded in a cost function defined over thespace of correspondences. We purposely construct this tobe an integer quadratic programming problem (cf. Macieland Costeira [19]) and solve it using fast-approximate tech-niques.1We address two object recognition problems, multiclassrecognition and face detection. In the multiple object classrecognition problem, given an image of an object we mustidentify the class of the object and find a correspondencewith an exemplar. We use the Caltech 101 object classdataset consisting of images from 101 classes of objects:from accordion to kangaroo to yin-yang, available at [1].This dataset includes significant intra class variation, a widevariety of classes, and clutter. On average we achieve 48%accuracy on object classification with quite good localiza-tion on the correctly classified objects. This compares fa-vorably with the state of the art of 16% from [8].We also consider face detection for large faces, suitablefor face recognition experiments. Here the task is to detectand localize a number of faces in an image. The face datasetwe use is sampled from the very


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school