Johns Hopkins EN 600 461 - Local Feature View Clustering for 3D Object Recognition - D2375709

Home> Schools> Johns Hopkins University> EN Computer Science (EN 600) > EN 600 461> Local Feature View Clustering for 3D Object Recognition

DOC PREVIEW

Johns Hopkins EN 600 461 - Local Feature View Clustering for 3D Object Recognition

School name Johns Hopkins University

Course En 600 461- Computer Vision

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Local Feature View Clustering for 3D Object RecognitionDavid G. LoweComputer Science DepartmentUniversity of British ColumbiaVancouver, B.C., V6T 1Z4, [email protected]. of the IEEE Conference on Computer Vision andPattern Recognition,Kauai, Hawaii (December 2001)AbstractThere have been important recent advances in object recog-nition through the matching of invariant local image fea-tures. However, the existing approaches are based onmatching to individual training images. This paper presentsa method for combining multiple images of a 3D object intoa single model representation. This provides for recogni-tion of 3D objects from any viewpoint, the generalizationof models to non-rigid changes, and improved robustnessthrough the combinationof features acquired under a rangeof imaging conditions. The decision of whether to clustera training image into an existing view representation or totreat it as a new view is based on the geometric accuracyof the match to previous model views. A new probabilis-tic model is developed to reduce the false positive matchesthat would otherwise arise due to loosened geometric con-straints on matching 3D and non-rigid models. A systemhas been developed based on these approaches that is ableto robustly recognize 3D objects in cluttered natural imagesin sub-second times.1. IntroductionThere has recently been considerable progress in develop-ing real-world object recognition systems based on the useof invariant local features [12, 6]. The local features are ofintermediate complexity, which means that they are distinc-tive enough to determine likely matches in a large databaseof features but are sufficiently local to be insensitive to clut-ter and occlusion. Such features can be densely sampledover the image, clustered with a Hough transform, and veri-fied with model fitting, leading to efficient and robust recog-nition in complex real-world scenes.The existing work in this area has been based upon tak-ing single training images of objects to be recognized andstoring their features in a database for future recognition.The local feature approach can be made invariant to imagerotation, translation, and scaling, but can only tolerate mod-erate object rotation in depth (typically about 20 degrees ineach direction from the training view). One approach togeneralizing to full 3D recognition might be to simply storetraining images acquired around the view sphere and selectthe best match. However, this means that new views mayhave features matching any of several nearby training im-ages without any ability to integrate the information. Asimportantly, robustness can be greatly improved by com-bining features from multiple images taken under differingconditions of illumination or object variation, so that eachview model contains many more of the features likely to beseen in a new image.This paper describes an approach to combining featuresfrom multiple views to provide for full 3D object recogni-tion and better modeling of object and imaging variations.The feature combinations are performed by measuring thecloseness of the geometric fit to previous views, and viewsthat are similar are combined into view clusters. For nearbyviews that are not combined, matching features are linkedacross the views so that a match in one view is automati-cally propagated as a potential match in neighboring views.The result is that additionaltraining images continue to con-tribute to the robustness of the system by modeling fea-ture variation without leading to a continuous increase inthe number of view models. The goal is to eventually usethis approach for on-line learning in which object modelsare continuously updated and refined as recognition is per-formed.Another possible approach to the problem of 3D objectrecognition would be to solve for the explicit 3D structureof the object from matches between multiple views. Thiswould have the advantage of leading to a more accurate fitbetween a rigid model and the image, leading to more ac-curate determination of pose and more reliable verification.However, the approach given in this paper has the advan-tage of not making rigidity assumptions, and therefore be-ing able to model non-rigid object deformations. It also isable to perform recognition starting with just single train-1ing images, whereas a 3D model approach would likely re-quire at least several images for an accurate 3D solution. Itis likely that the ultimate performance would be achievedthrough a combination of these methods, but we show thatview clustering is sufficient in many cases.The view clustering approach allows for substantial vari-ation in feature position during matching to account for 3Dview change as well as non-rigid object variation. One con-sequence is that the final least-squares solution for modelparameters is less effective at discarding false positive setsof feature matches than would be the case for a tightlyconstrained solution. Therefore, this paper develops anew probabilistic model for determining valid instancesof recognition that has proved successful for these less-constrained models.2. Related researchThere is a long history of research in object recognitionthat has modeled 3D objects using multiple 2D views.This includes the use of aspect graphs [4], which representtopologically distinct views of image contours; eigenspacematching [8], which measures distance from a basis set ofeigenvalue images; and histogram matching [11, 14] whichsummarize image appearance with histograms of selectedproperties. The work in this paper follows most closelyfrom [10], in which the appearance of a set of images wasmodeled as a probability distribution, which in turn was rep-resented as a conjunction of simpler distributions of inde-pendent features. This paper uses a different type of featurethat provides more specific matches to a model database,which allows for a simpler and much more efficient modelrepresentation.Another approach has been to use linear interpolationbetween edge contours that have been matched between3 views under an orthographic viewing assumption [17].While this can produce more accurate geometric constraintsfor edge contoursofrigid objects, it cannot handle non-rigidobjects and does not incorporate the many features that donot match between all 3 views.3. Feature detection and matchingTo allow for efficient matching between models and images,all images are first represented as a set of SIFT (Scale In-variant Feature Transform) features,

View Full Document