Unformatted text preview:

Sharing features: efficient boosting procedures for multiclass object detectionAntonio Torralba Kevin P. Murphy William T. FreemanComputer Science and Artificial Intelligence Lab., MITCambridge, MA 02139AbstractWe consider th e problem of detecting a large number of dif-feren t object classes in cluttered scenes. Traditional ap-proaches require applying a battery o f different classifiersto the image, which can be slow and require m uch train-ing data. We present a multi-class boosting procedurethat reduces bo th the computational and sample complex-ity, by finding common features that can be shared acrossthe classes. The detecto rs for each class are trained jointly,rather than independe ntly. For a given perfo rmance level,the total nu m ber of fea tures required is observed to scaleapproximately logarithmically with the num ber of classes.In additio n, we find that the features selected by indepen-dently trained classifiers are often specific to the class,whereas the feature s selected by the jointly trained classi-fiers are more generic features, such as lines and edges.1. IntroductionA long-standing goal of machine vision has been to builda system that is able to recognize many different kinds ofobjects in cluttered scenes. Progress has been made onrestricted versions of this goal. In particular, it is nowpossible to recognize instances of highly textured objects,such as magazine covers or toys, despite clutter, occlu-sion and affine transformations, by using object-specificfeatures [12, 19]. In addition, it is possible to recognizemany classes of objects, generalizing over intra-class varia-tion, but only when the objects are presented against simplebackgrounds [14, 15, 10, 4, 13].The problem of detecting classes of objects in clut-tered images has proved more challenging. Most cur-rent approaches slide a window across the image, and ap-ply a binary classifier to each such window; the classifier,which discriminates between the class or the background, istrained using standard machine learning techniques, such asboosting [22] or support vector machines [16]). However,such approaches seem unlikely to scale up to the detectionof hundreds or thousands of different object classes becauseeach classifier is trained and run independently.In this paper, we develop a new object classification ar-chitecture that explicitly learns to share features across mul-G12G1G123G13G2G23G3Figure 1: All possible ways to share features amongst 3classifiers. Each classifier H(v, c) is constructed by adding,only once, all the nodes that connect to each of the leaves.The leaves correspond to single classes.tiple object classes (classifiers). The basic idea is an ex-tension of the boosting algorithm [17, 6], which has beenshown to be useful for detecting individual object classes incluttered scenes [22]. Rather than training C binary classi-fiers independently, we train them jointly. The result is thatmany fewer features are needed to achieve a desired levelof performance than if we were to train the classifiers in-dependently. This results in a faster classifier (since thereare fewer features to compute) and one which works better(since the features are fit to larger, shared data sets).2. Sharing featuresBoosting [17, 6] provides a simple way to sequentially fitadditive models of the formH(v, c) =MXm=1hm(v, c),where c is the class label, v is the input feature vector, andM is the number of rounds. In the boosting literature, thehmare often called weak learners. It is common to definethese to be simple decision or regression stumps of the formhm(v) = aδ(vf> θ) + b, where vfdenotes the f ’th com-ponent (dimension) of the feature vector v, θ is a threshold,δ is the indicator function, and a and b are regression pa-rameters (note that b does not contribute to the final classi-fication).We propose to share weak-learners across classes. Forexample, if we have 3 classes, we might define the follow-ing classifiers:H(v, 1) = G1,2,3(v) + G1,2(v) + G1,3(v) + G1(v)1H(v, 2) = G1,2,3(v) + G1,2(v) + G2,3(v) + G2(v)H(v, 3) = G1,2,3(v) + G1,3(v) + G2,3(v) + G3(v)where each GS(n)(v) is itself an additive model of the formGS(n)(v) =PMnm=1hnm(v). The n refers to a node in the“sharing graph”, which specifies which functions can beshared between classifiers (Fig.1). S(n) is the subset ofclasses that share the node n.The decomposition is not unique (different choices offunctions GS(n)(v) give the same functions H(v, c)). Butwe are interested in the choices of GS(n)(v) that mini-mize the computational cost. We impose the constraint thatPnMn= M, where M is the total number of functionsthat have to be learned. If the classifiers are trained inde-pendently, we find we need O(C) functions, whereas if theclassifiers are trained jointly, the required number of fea-tures grows sub-linearly with the number of classes.To gain some intuition as to why sharing might help,suppose we have C classes, and, for each class, the fea-ture vectors reside within some sphere in a D-dimensionalspace. Further, suppose the weak classifiers are hyper-planes in the D-dimensional space. If the classes are ar-ranged into a regular grid, then, by sharing features, weneed M = 2DC1/Dhyperplanes to approximate the hy-perspherical decision boundary with hypercubes.2.1. The jo int Boosting algorithmThe idea of the algorithm is that at each boosting round,we examine various subsets of classes, S ⊆ C, and con-sider fitting a weak classifier to distinguish that subset fromthe background. We pick the subset that maximally reducesthe error on the weighted training set for all the classes. Thebest weak learner h(v, c) is then added to the strong learnersH(v, c) for all the classes c ∈ S, and their weight distribu-tions are updated so as to optimize the following multiclasscost function:J =CXc=1Ehe−zcH(v,c)i(1)where zcis the membership label (±1) for class c. Theterm zc× H(v, c) is called the “margin”, and is related tothe generalization error (out-of-sample error rate).We chose to base our algorithm on the version of boost-ing called “gentleboost” [7], because it is simple to im-plement, numerically robust, and has been shown experi-mentally [11] to outperform other boosting variants for theface detection task. The optimization of J is done us-ing adaptive Newton steps [7] which corresponds to min-imizing a weighted squared error at each step. Specifi-cally, at step m, the function H is updated as: H(v, c) :=H(v, c) + hm(v, c), where


View Full Document

CALTECH EE 148A - Sharing features

Download Sharing features
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sharing features and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sharing features 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?