New version page

MIT 6 870 - Object Recognition and Scene Understanding

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

6.870Object Recognition and Scene Understandingstudent presentationMIT6.870Template matching and histogramsNicolas PintoIntroductionHostsAntonio T...(who knows a lot about vision)a frog...(who has big eyes)and thus should knowa lot about vision...a guy...(who has big arms)Object Recognition from Local Scale-Invariant FeaturesDavid G. LoweComputer Science DepartmentUniversity of British ColumbiaVancouver, B.C., V6T 1Z4, [email protected] of the International Conference onComputer Vision,Corfu (Sept. 1999 )An object recognition system has been developed that uses anew class o f local image features. The features are invariantto image scaling, translation, and rotation, and partially in-variant to illumina tion changes and affine or 3 D p rojection.These features share similar properties with neurons in in-ferior temporal cortex that are used for object recognitionin primate vision. Features are efficiently detected througha sta ged filtering ap proach t hat identifies stable p oints inscale space. Image keys are created that a llow for local ge-ometric deformations by representing blurred image gradi-ents in multiple orientation planes and at multiple scales.The keys are used as input to a nearest-neig hbor indexingmethod that identifies candidate object matches. Fina l veri-fication of each match is achieved by finding a low-residualleast-squares solution for the unknown model parameters.Experimental results show that robust object recognitioncan be achieved in cluttered partially-occluded images witha computation t ime of under 2 seconds.1. IntroductionObject recognition in cluttered real-world scenes requireslocal image features that are unaffected by nearby clutter orpartial occlusion. The features must be at least partially in -variant to illumination, 3D projective transforms, and com-mon o bject variations. On the other hand, th e features mustalso be sufficiently distinctive to identi fy specific objectsamong many alternatives. The difficulty of the object recog-nition problem is due in large part to the lack of success infinding such i mage features. However, r ecent research onthe use of dense local features (e.g., Schmid & Mohr [19])has shown that efficient recognition can often be achievedby using local image descriptors sampled at a large numberof repeatable locations.This paper presents a new method for image feature gen-eration called the Scale Invariant Feature Transform (SIFT).This approach transforms an image i nto a large collectionof local feature vectors, each of which is invariant to imagetranslation, scaling, and rotati on, and partially invariant toillumination changes and affine or 3D proj ection. Previousapproaches to local feature g eneratio n lacked invariance toscale and were more sensitiv e to projective distortion andillumination change. The SIFT features share a number ofproperties in common with the responses of n eurons in infe-rior t emporal (IT) cortex in primate vision. This paper alsodescribes improved approaches to indexing and model ver-ification.The scale-invariant features are efficiently identi fied byusing a staged filtering approach. The first st age identifieskey locatio ns in scale space by looking f or locations thatare maxima or minima of a difference-of-Gaussian f unction.Each point is used to generate a feature vector that describesthe local image region sampled relative to its scale-space co-ordinate frame. The features achieve partial invariance tolocal variations, such as affine or 3D projections, by blur-ring image gradient locations. This approach is based o n amodel of the behavior of complex cells in t he cerebral cor-tex of mammalian vision. The resulting feature vectors arecalled SIFT keys. In the current implementation, each im-age generates on the order of 1000 SIFT keys, a process thatrequires less than 1 second of computation time.The SIFT keys derived from an image are used in anearest-neigh bour approach to i ndexing to identify candi-date object models. Collections of keys t hat agree on a po -tential model pose are first identified through a Hough trans-form hash table, and then through a least-squares fit to a finalestimate of model parameters. When at least 3 keys agreeon the model parameters wit h low residual, there is strongevidence for the presence of the object. Since there may bedozens of SIFT keys in the image of a typical ob ject, it ispossible to have substantial levels of occlusion in the imageand yet retain high levels of reliability.The current object models are represented as 2D loca-tions of SIFT keys that can undergo affine projection. Suf-ficient variation in feature location is allowed to recognizeperspective projection of planar shapes at up to a 60 degreerotation away from the camera or to allow up to a 20 degreerotation of a 3D object.1Lowe(1999)Histograms of Oriented Gradients for Human Detectio nNavneet Dalal and Bill TriggsINRIA Rhˆone-Alps, 655 avenue de l’Europe, Montbonnot 38334, France{Navneet.Dalal,Bill.Trig g s} @inrialpes.fr, http://lear.inrialpes.frAbstractWe study the question of feature sets for robust v isual ob-ject recognition, adopting linear SVM based human detec-tion as a test case. After reviewing existing edge and gra-dient based descriptors, we show experimentally that gridsof Histograms of Oriented Gradient (HOG) descriptors sig-nificantly ou tperform existing feature sets for hu man detec-tion. We stu dy the influence of e ach stage of the computationon performance, concluding that fine- scale gradients, fineorientation binning, relatively coarse spatial binning, andhigh-quality local contrast normalization in overlapping de-scriptor blocks are all important for good results. The newapproach gives near-perfect separation on the original MITpedestrian database, so we introd uce a mo re challengingdataset conta ining over 1800 annotated human images witha la rge range of pose variatio ns and backgrounds.1 IntroductionDetecting humans in images is a challenging task owingto their variable appearance and the wide range of poses thatthey can adopt. The first need is a robust fea ture set thatallows the human form to be d iscriminated clea nly, even incluttered backgrounds under difficult illumination. We studythe issue of feature sets for human detection , showing that lo-cally normalized Histogram of Oriented Gradient (HOG) de-scriptors provide excellent performance relative to other ex-isting feature sets including wave le ts [17,22]. The


View Full Document
Download Object Recognition and Scene Understanding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Object Recognition and Scene Understanding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Object Recognition and Scene Understanding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?