Unsupervised Improvement of Visual Detectors using Co-TrainingAnat Levin, Paul Viola, Yoav FreundICCV 2003UCSD CSE252cpresented by William BeaverOctober 25, 2005Task•Traffic camera taking video•You want to identify the cars•Simple•pick a detector•gather training data•trainFigure 1: Exampl e images used to test and train the car detection system. On the left are the original images. On the right are backgroundsubt r acted images.Figure 2: Left: A scatter pl ot of the joint distribution of margins for t he two classifiers. These results are shown on test data, and thereforerepresents the distri buti on on unlabeled data (positive examples are ci r cles, negative are grey/green). For each classifi er two threshold arealso shown, the threshold above which no negative is found,, and the threshold below which no positive is found, . The regionslabeled A,B,C, and D contain informative examples. Right: Particular examples taken from A, B, C, or D; images which are mislabeled byone classifier (or have small margin) which are confidently labeled by the other classifier. E.G. Set B contains images confidently labeledposi tive by the Grey classifier but are misclassified by the BackSub classifier. These examples are added to the t r aining set of the BackSubclassifier during co-training.regression a lgorithm of Collins et. al.(which we will callLogAdaBoost in this p a per). In each round th e f e a ture se-lected is th a t with the lowest weigh ted error. Each featur eis a simple linea r function made up of r e c tangular sums fol-lowed by a thre sh old. In th e fina l classifier, the selectedfeature is assigned a weight b a sed on its perfor mance onthe current task. As in all variants of AdaBoost, examplesare also assigned a weight. I n subsequent ro unds incorrectlylabeled examples are given a highe r weight while corr e c tlylabeled examples ar e given a lower we ight.In order to red uce the false positive rate while preservingefficiency, c lassification is divided into a cascade of classi-fiers. The ear ly classifiers are co nstrained to use few fea-ture s (and are therefore efficient) while achieving a veryhig h detection ra te. Constrain ts on the later classifiers arerelaxed: they contain more features and have a lower d e tec -tion ra te. Later cascade stages are trained only on the trueand false positives of earlier stages.LogAdaBoost is used to train each stag e in the cascadeto ac hieve low error on a training set. Due to the asym-metric structure of the detection casca de, each stage in thecascade mu st a c hieve a very low false negative ra te. Thefalse negative ra te o f the trained c lassifier is adjusted, po sthoc, using a set of validation images in which positives h avebeen ide ntified. These im a ges are scann e d and the thresholdis set so that the required detection rate is achieved on thesevalidation positives.In order to train a full cascade to achieve very low falsepositive rates, a large number of examples are required, b othpositive a nd negative. The number of requir e d negative ex-amples is especially larg e . After 5 stage s the false positiverate is often well below 1%. Therefo re ove r 99% of thenegative d a ta is rejected and is unavailable for tra ining sub-sequent stages.5 Experiments and Algorit hmsData wa s acquired from a Washington State Departme nt ofTransit web site. The cameras selected provide 15 secondvid e o clips on c e every 5 minutes. Data from a total of 8cameras was used for experimen ts. The ca meras were sim-ilar, in that they wer e plac e d by the same author ity. TheyFigure 1: E xample images used to test and train the car detection system. On the left are the original images. On the right are backgroundsubt r acted images.Figure 2: L eft : A scatter plot of the joint distributi on of margins for the two classifiers. These r esults are shown on test data, and thereforerepresents the distribution on unlabeled data (positive examples are circles, negative are grey/green). For each classifier two threshold arealso shown, the t hreshold above which no negative is found,, and the t hreshold bel ow which no positive is found, . T he regionslabeled A,B,C, and D contain infor mative examples. Right: Particular examples taken from A, B, C, or D; i mages which are mislabeled byone classifi er (or have small margin) whi ch are confidently labeled by the other classifier. E.G. Set B cont ains images confidently l abeledposi tive by the Grey classifier but are misclassified by the BackSub classifier. These exampl es are added t o the training set of the BackSubclassifier during co-training.regression algorithm of Collins et. al.(which we will callLogAdaBoost in this paper) . In e a c h round the featu re se-lected is that with the lowest weighted error. Each featureis a simple line a r function made up of rectangular sums fol-lowed by a threshold. In the final classifier, the selectedfeature is assigned a weight based on its perfo rmance onthe current task. As in all variants of AdaBoost, examplesare also a ssign e d a weight. In subseque nt rounds incorrectlylabeled examples are given a higher weight while co rrectlylabeled examp les are given a lower weight.In order to r e duce the false positive rate while preservingefficiency, classifica tion is divided in to a cascade of c lassi-fiers. The early c lassifiers are constrained to use f ew fea-ture s (a nd are therefore efficient) while achieving a veryhig h detection rate. Constraints on the later classifiers arerelaxed: th ey co ntain mor e features and have a lower detec-tion rate. La ter cascade stages are tra ined only on the trueand fa lse positives of ea rlier stages.LogAdaBoost is used to train each stag e in the cascadeto achieve low erro r on a tr a ining set. Du e to the asym-metric structu re of the detection cascade, each stage in thecascade must achieve a ver y low false negative rate. Thefalse n egative rate of the train e d classifier is adjusted, posthoc, using a set of va lidation ima ges in which positives havebeen identified. These images are sca nned and the thresh oldis se t so that the required detection rate is achieved on thesevalidation positives.In order to train a full ca scade to achieve very low falsepositive rates, a large numb e r of exa mples are required, bothpositive and negative. The numbe r of required negative ex-amples is espe c ially large. After 5 stages the false positiverate is often well below 1%. The refore over 99% of thenegative data is
View Full Document