New version page

CALTECH EE 148A - Robust Real-Time Face Detection

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

International Journal of Computer Vision 57(2), 137–154, 2004c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.Robust Real-Time Face DetectionPAUL VIOLAMicrosoft Research, One Microsoft Way, Redmond, WA 98052, [email protected] J. JONESMitsubishi Electric Research Laboratory, 201 Broadway, Cambridge, MA 02139, [email protected] September 10, 2001; Revised July 10, 2003; Accepted July 11, 2003Abstract. This paper describes a face detection framework that is capable of processing images extremely rapidlywhile achieving high detection rates. There are three key contributions. The first is the introduction of a newimage representation called the “Integral Image” which allows the features used by our detector to be computedvery quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo-rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set ofpotential features. The third contribution is a method for combining classifiers in a “cascade” which allows back-ground regions of the image to be quickly discarded while spending more computation on promising face-likeregions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor-mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman andKanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames persecond.Keywords: face detection, boosting, human sensing1. IntroductionThis paper brings together new algorithms and insightsto construct a framework for robust and extremely rapidvisual detection. Toward this end we have constructeda frontal face detection system which achieves detec-tion and false positive rates which are equivalent tothe best published results (Sung and Poggio, 1998;Rowley et al., 1998; Osuna et al., 1997a; Schneidermanand Kanade, 2000; Roth et al., 2000). This face detec-tion system is most clearly distinguished from previ-ous approaches in its ability to detect faces extremelyrapidly. Operating on 384 by 288 pixel images, facesare detected at 15 frames per second on a conventional700 MHz Intel Pentium III. In other face detectionsystems, auxiliary information, such as image differ-ences in video sequences, or pixel color in color im-ages, have been used to achieve high frame rates. Oursystem achieves high frame rates working only withthe information present in a single grey scale image.These alternative sources of information can also be in-tegrated with our system to achieve even higher framerates.There are three main contributions of our face detec-tion framework. We will introduce each of these ideasbriefly below and then describe them in detail in sub-sequent sections.The first contribution of this paper is a new imagerepresentation called an integral image that allows forvery fast feature evaluation. Motivated in part by thework of Papageorgiou et al. (1998) our detection sys-tem does not work directly with image intensities. Like138 Viola and Jonesthese authors we use a set of features which are rem-iniscent of Haar Basis functions (though we will alsouse related filters which are more complex than Haarfilters). In order to compute these features very rapidlyat many scales we introduce the integral image repre-sentation for images (the integral image is very similarto the summed area table used in computer graphics(Crow, 1984) for texture mapping). The integral im-age can be computed from an image using a few op-erations per pixel. Once computed, any one of theseHaar-like features can be computed at any scale or lo-cation in constant time.The second contribution of this paper is a simpleand efficient classifier that is built by selecting a smallnumber of important features from a huge library of po-tential features using AdaBoost (Freund and Schapire,1995). Within any image sub-window the total num-ber of Haar-like features is very large, far larger thanthe number of pixels. In order to ensure fast classifi-cation, the learning process must exclude a large ma-jority of the available features, and focus on a smallset of critical features. Motivated by the work of Tieuand Viola (2000) feature selection is achieved usingthe AdaBoost learning algorithm by constraining eachweak classifier to depend on only a single feature. As aresult each stage of the boosting process, which selectsanew weak classifier, can be viewed as a feature selec-tion process. AdaBoost provides an effective learningalgorithm and strong bounds on generalization perfor-mance (Schapire et al., 1998).The third major contribution of this paper is a methodfor combining successively more complex classifiersin a cascade structure which dramatically increases thespeed of the detector by focusing attention on promis-ing regions of the image. The notion behind focusof attention approaches is that it is often possible torapidly determine where in an image a face might oc-cur (Tsotsos et al., 1995; Itti et al., 1998; Amit andGeman, 1999; Fleuret and Geman, 2001). More com-plex processing is reserved only for these promisingregions. The key measure of such an approach is the“false negative” rate of the attentional process. It mustbe the case that all, or almost all, face instances areselected by the attentional filter.We will describe a process for training an extremelysimple and efficient classifier which can be used as a“supervised” focus of attention operator.1Aface de-tection attentional operator can be learned which willfilter out over 50% of the image while preserving 99%of the faces (as evaluated over a large dataset). Thisfilter is exceedingly efficient; it can be evaluated in 20simple operations per location/scale (approximately 60microprocessor instructions).Those sub-windows which are not rejected by theinitial classifier are processed by a sequence of classi-fiers, each slightly more complex than the last. If anyclassifier rejects the sub-window, no further processingis performed. The structure of the cascaded detectionprocess is essentially that of a degenerate decision tree,and as such is related to the work of Fleuret and Geman(2001) and Amit and Geman (1999).The complete face detection cascade has 38 classi-fiers, which total over 80,000 operations. Neverthelessthe cascade structure results in extremely rapid averagedetection times. On a

View Full Document
Download Robust Real-Time Face Detection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...

Join to view Robust Real-Time Face Detection and access 3M+ class-specific study document.

We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Robust Real-Time Face Detection 2 2 and access 3M+ class-specific study document.


By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?