UCSD CSE 291 - Robust Real-time Object Detection - D2663156

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 291> Robust Real-time Object Detection

UCSD CSE 291 - Robust Real-time Object Detection

School name University of California, San Diego

Course Cse 291-

Pages 18

Download Save

Unformatted text preview:

Robust Real time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by Gyozo Gidofalvi Computer Science and Engineering Department University of California San Diego gyozo cs ucsd edu October 25 2001 Outline Object detection task Definition and rapid evaluation of simple features for object detection Method for classification and feature selection a variant of AdaBoost Speed up through the Attentional Cascade Experiments and Results Conclusions Object detection task Object detection framework Given a set of images find regions in these images which contain instances of a certain kind of object Task Develop an algorithm to learn an fast and accurate method for object detection To capture ad hoc domain knowledge classifiers for images do not operate on raw grayscale pixel values but rather on values obtained from applying simple filters to the pixels Definition of simple features for object detection 3 rectangular features types two rectangle feature type horizontal vertical three rectangle feature type four rectangle feature type Using a 24x24 pixel base detection window with all the possible combination of horizontal and vertical location and scale of these feature types the full set of features has 49 396 features The motivation behind using rectangular features as opposed to more expressive steerable filters is due to their extreme computational efficiency Integral image Def The integral image at location x y is the sum of the pixel values above and to the left of x y inclusive Using the following two recurrences where i x y is the pixel value of original image at the given location and s x y is the cumulative column sum we can calculate the integral image representation of the image in a single pass x 0 0 s x y s x y 1 i x y ii x y ii x 1 y s x y y x y Rapid evaluation of rectangular features Using the integral image representation one can compute the value of any rectangular sum in constant time For example the integral sum inside rectangle D we can compute as ii 4 ii 1 ii 2 ii 3 As a result two three and four rectangular features can be computed with 6 8 and 9 array references respectively Challenges for learning a classification function Given a feature set and labeled training set of images one can apply number of machine learning techniques Recall however that there is 45 396 features associated with each image sub window hence the computation of all features is computationally prohibitive Hypothesis A combination of only a small number of these features can yield an effective classifier Challenge Find these discriminant features A variant of AdaBoost for aggressive feature selection G iven exam ple im ages x 1 y 1 x n y n w here y i 0 1 for negative and positive ex am ples respectively Initialize w eights w 1 i 1 2m 1 2l for training exam ple i w here m and l are the num ber of negatives and positives respectively For t 1 T 1 N orm alize w eights so that w t is a distribution 2 For each feature j train a classifier h j and evaluate its error j w ith respect to w t 3 C hose the classifier h j w ith low est error 4 U pdate w eights according to 1 i w t 1 i w t i t w here e i 0 is x i is classified correctly 1 otherw ise and t 1 t t T he final strong classifier is 1 h x 0 T t 1 x t ht otherwise 1 2 T t 1 t w here t log 1 t Performance of 200 feature face detector The ROC curve of the constructed classifies indicates that a reasonable detection rate of 0 95 can be achieved while maintaining an extremely low false positive rate of approximately 10 4 First features selected by AdaBoost are meaningful and have high discriminative power By varying the threshold of the final classifier one can construct a two feature classifier which has a detection rate of 1 and a false positive rate of 0 4 Speed up through the Attentional Cascade Simple boosted classifiers can reject many of negative subwindows while detecting all positive instances Series of such simple classifiers can achieve good detection performance while eliminating the need for further processing of negative sub windows Processing in training of the Attentional Cascade Processing is essentially identical to the processing performed by a degenerate decision tree namely only a positive result from a previous classifier triggers the evaluation of the subsequent classifier Training is also much like the training of a decision tree namely subsequent classifiers are trained only on examples which pass through all the previous classifiers Hence the task faced by classifiers further down the cascade is more difficult To achieve efficient cascade for a given false positive rate F and detection rate D we would like to minimize the expected number of features evaluated N K N n0 ni p j i 1 j i Since this optimization is extremely difficult the usual framework is to choose a minimal acceptable false positive and detection rate per layer Algorithm for training a cascade of classifiers User selects values for f the maximum acceptable false positive rate per layer and d the minimum acceptable detection rate per layer User selects target overall false positive rate Ftarget P set of positive examples N set of negative examples F0 1 0 D0 1 0 i 0 While Fi Ftarget i ni 0 Fi Fi 1 while Fi f x Fi 1 o ni o Use P and N to train a classifier with ni features using AdaBoost o Evaluate current cascaded classifier on validation set to determine Fi and Di o Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d x Di 1 this also affects Fi N If Fi Ftarget then evaluate the current cascaded detector on the set of non face images and put any false detections into the set N Experiments dataset for training 4916 positive training example were hand picked aligned normalized and scaled to a base resolution of 24x24 10 000 negative examples were selected by randomly picking sub windows from 9500 images which did not contain faces Experiments cont structure of the detector cascade The final detector had 32 layers and 4297 features total Layer number Number of feautures Detection rate Rejection rate 1 2 100 60 2 5 100 80 3 to 5 20 6 and 7 50 8 to 12 100 13 to 32 200 Speed of the detector total number of features evaluated On the MIT CMU test set the average number of features evaluated is 8 out of 4297 The processing time of a 384 by 288 pixel image on a conventional personal computer about 067 seconds Processing time

View Full Document


School:
Email:
New Password:
Confirm Password:

UCSD CSE 291 - Robust Real-time Object Detection

Sign up for free to view:

Please select your school