Simple Real-Time Human Detection Using a Single Correlation Filter

Home> Academic Documents> Simple Real-Time Human Detection Using a Single Correlation Filter

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Simple Real-Time Human Detection Using a Single Correlation FilterDavid S. Bolme Yui Man Lui Bruce A. Draper J. Ross BeveridgeComputer Science DepartmentColorado State UniversityFort Collins, CO 80521, USA{bolme,lui,draper,ross}@cs.colostate.eduAbstractThis paper presents an extremely simple human detectionalgorithm based on correlating edge magnitude images witha filter. The key is the technology used to train the filter:Average of Synthetic Exact Filters (ASEF). The ASEF baseddetector can process images at over 25 frames per secondand achieves a 94.5% detection rate with less than one falsedetection per frame for sparse crowds. Filter training isalso fast, taking only 12 seconds to train the detector on 32manually annotated images. Evaluation is performed on thePETS 2009 dataset and results are compared to the OpenCVcascade classifier and a state-of-the-art deformable partsbased person detector.1. IntroductionOne of the simplest ways to detect targets in images is toconvolve an image with a filter or template that responds tothe target. The output of the convolution should produce alarge response where the target is present and a suppressedresponse over the background. Targets are then detectedwhere the convolution output exceeds a threshold. The pri-mary advantages of this approach is that it is extremely sim-ple and very fast.The success of the filter-based object detection dependson the ability of the filter to distinguish between targets andbackground. A typical way to produce a filter is to crop atemplate of the target from a training image. Unfortunately,templates based on one image often do not capture appear-ance variation adequately and therefore only perform wellin highly controlled object detection scenarios. To com-pensate, there are a number of techniques to produce filtersfrom large numbers of templates and therefore more accu-rately represent targets appearance. For example, a filter canbe produced by averaging templates. Unfortunately, such afilter often fails to adequately discriminate between targetsand background.More sophisticated methods based on Synthetic Dis-criminant Functions (SDF)s [13] can also be used to pro-duce filters that respond well to the training templates andproduce sharp and stable peaks. One problem with SDFs isthat they do not consider the entire convolution output dur-ing training. Instead they emphasize only one point in theoutput when the filter is aligned with the target. These tech-niques emphasize good peaks for targets but have much lesscontrol when it comes to suppressing peaks for backgroundobjects with similar appearances.Recently, a new concept for training filters was intro-duced called Average of Synthetic Exact Filters (ASEF) [3].ASEF considers the entire output of the filter under a fullconvolution operation. By exploiting the Convolution The-orem, ASEF provides a mechanism where the entire out-put for a full training image can be specified. Producingan ASEF filter is much more like deconvolution than priortechniques. In [3] it was shown that ASEF filters were muchbetter at locating eyes on a face because the filters weremuch better at suppressing the response of other facial fea-tures. This study will show that ASEF filters are able toproduce good target/background separation on a more gen-eral detection problem, namely the PETS 2009 dataset[8].Detectors based on ASEF filters have many advantages.Training only requires a small number of hand annotatedimages and a few seconds of computation time. The result-ing detector is tuned specifically to the camera setup. Detec-tion is much simpler than competing techniques and basedon the highly regular convolution, which means that it isideally suited for embedded systems or existing signal pro-cessing chips. Filter-based detection is many times fasterthan competing techniques, while its accuracy is compara-ble or better.The rest of this paper is organized as follows. Section 2discusses other person detection techniques and how theyrelate to the work presented here. Section 3 discusses theprocess of creating a filter based detector and the methodused to learn the ASEF filter. Section 4 compares the filterbased detector to a morphable parts based approach and acascade based classifier. Section 5 summarizes the findings.12. Related WorkThis paper compares the ASEF filter to two publicly avail-able detectors. The first detector is based on the Viola andJones cascade classifier. This classifier is interesting be-cause it is a good object detection algorithm and is fastenough for real time systems[14]. The original context ofthis work was in the area of face detection. Viola et. al. alsoadapted this algorithm to the problem of people detection[15]. In that study, detection was based on both visual fea-tures and motion features computed between video frames.The detector was also fast enough for real time detection,reporting a speed of 4 frames per second. In this paper theOpenCV[16] implementation of the cascade detector wasretrained on the PETS data with good results.The second detector is based on a deformable partsmodel is based on the work of Felzenszwalb et.al. [7]. Thisdetector adopts many ideas from [5] such as Histogram ofOriented Gradient (HOG) based features and using and useda Support Vector Machine (SVM) like classifier. The pri-mary improvement of this method is that it also uses de-formable parts models in addition to holistic matching toimprove detection accuracy. While accurate, this detector istoo slow for real time detection and takes a few seconds toprocess each frame.We also briefly investigated the person detector from [5].This method is simpler than the Parts Based model and care-fully investigated HOG based features as a basis of persondetection. The performance of the detector seemed to besimilar to [7], but was also slower.In [9], the problem of accurate object detection incrowded scenarios is discussed. Leibe et.al. point out thatmany pedestrian detection techniques have been evaluatedon isolated people and as a result those detectors often failin crowded or complex real world situations. They proposean iterative detection system that both detects and segmentspeople in a crowded scene. They also suggests that partialocclusions in crowded scenes may be too difficult for de-tectors based on simple features or models. In this work,we have seen evidence to the contrary. The simple ASEFfilter based detector handled partial occlusion better thanthe more complex Part


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school