DOC PREVIEW
A New Boosting Algorithm Using Input-Dependent Regularizer

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A New Boosting Algorithm Using Input-Dependent RegularizerRong Jin [email protected] Liu [email protected] Si [email protected] Carbonell [email protected] G. Hauptmann [email protected] of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-8213, USAAbstractAdaBoost has proved to be an effectivemethod to improve the performance of baseclassifiers both theoretically and empirically.However, previous studies have shown thatAdaBoost might suffer from the overfittingproblem, especially for noisy data. In ad-dition, most current work on bo osting as-sumes that the combination weights are fixedconstants and therefore does not take par-ticular input patterns into consideration. Inthis paper, we present a new boosting algo-rithm, “WeightBoost”, which tries to solvethese two problems by introducing an input-dependent regularization factor to the com-bination weight. Similarly to AdaBoost, wederive a learning procedure for WeightBoost,which is guaranteed to minimize training er-rors. Empirical studies on eight different UCIdata sets and one text categorization dataset show that WeightBoost almost alwaysachieves a considerably better classificationaccuracy than AdaBoost. Furthermore, ex-periments on data with artificially controllednoise indicate that the WeightBoost is morerobust to noise than AdaBoost.1. IntroductionAs a generally effective algorithm to create a ”strong”classifier out of a weak classifier, boosting has gainedpopularity recently. Boosting works by repeatedlyrunning the weak classifier on various training exam-ples sampled from the original training pool, and com-bining the base classifiers produced by the week learnerinto a single composite classifier. AdaBoost has beentheoretically proved and empirically shown to be an ef-fective method to improve the classification accuracy.Through the particular weight updating procedure,AdaBoost is able to focus on those data points thathave been misclassified in previous training iterationsand therefore minimizes the training errors.Since AdaBoost is a greedy algorithm and intention-ally focuses on the minimization of training errors,there have been many studies on the issues of over-fitting for AdaBoost (Quinlan, 1996; Grove & Schuur-mans, 1998; Ratsch et al., 1998). The general conclu-sion from early studies appears to be that in practiceAdaBoost seldom overfits the training data; namely,even though the AdaBoost algorithm greedily mini-mizes the training errors via gradient decent, the test-ing error usually goes down accordingly. Recent stud-ies have implied that this phenomena might be relatedto the fact that the greedy search procedure used inAdaBoost is able to implicitly maximize the classifi-cation margin (Onoda et al., 1998; Friedman et al.,1998). However, other studies (Opitz & Macline, 1999;Jiang, 2000; Ratsch et al., 2000; Grove & Schuur-mans, 1998; Dietterich, 2000) have shown that Ad-aBoost might have the problem of overfitting, partic-ularly when the data are noisy.In fact, noise in the data can be introduced by twofactors, either the mislabelled data or the limitationof the hypothesis space of the base classifier. Whennoise level is high, there could be some data pat-terns that are difficult for the classifiers to capture.Therefore, the boosting algorithm is forced to focuson those noisy data patterns and thereby distort theoptimal decision boundary. As a result, the decisionboundary will only be suitable for those difficult datapatterns and not necessarily general enough for otherdata. The overfitting problem can also be analyzedfrom the viewpoint of generalized error bound (Ratschet al., 2000). As discussed in (Ratsch et al., 2000), themargin maximized by AdaBoost is actually a “hardmargin”, namely the smallest margin of those noisydata patterns. As a consequence, the margin of theProceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.The Twentieth International Conference on Machine Learning (ICML'03), Washington, DC, August 21-24, 2003other data points may decrease significantly when wemaximize the “hard margin” and thus force the gen-eralized error bound (Schapire, 1999) to increase.In order to deal with the overfitting problems in Ad-aboost, several strategies have been proposed, such assmoothing (Schapire & Singer, 1998), Gentle Boost(J. Friedman & Tibshirani, 1998), BrownBoost (Fre-und, 2001), Weight Decay (Ratsch et al., 1998) andregularized AdaBoost (Ratsch et al., 2000). The mainideas of these methods can be summarized into twogroups: one is changing the cost function such as in-troducing regularization factors into the cost function;the other is introducing a soft margin. The problemof overfitting for AdaBoost may be related to the ex-ponential cost function, which makes the weights ofthe noisy data grow exponentially and leads to theoveremphasis of those data patterns. The solutionto this issue can be, either to introduce a differentcost function, such as a logistic regression functionin (Friedman et al., 1998), or to regularize the ex-ponential cost function with a penalty term such asthe weight decay method used in (Ratsch et al., 1998),or to introduce a different weighting function, such asBrownBoost (Freund, 2001). A more general solutionis to replace the “hard margin” in AdaBoost with a“soft margin”. Similar to the strategy used in supportvector machine (SVM) algorithm (Cortes & Vapnik,1995), the boosting algorithm with a soft margin isable to allow a larger margin at the expenses of somemisclassification errors. This idea leads to works suchas the regularized boosting algorithms using both lin-ear programming and quadratic programming (Ratschet al., 1999).However, there is another problem with AdaBoost thathas been overlooked in previous studies. The Ad-aBoost algorithm employs a linear combination strat-egy, that is, combining different base classifiers witha set of constants. As illustrated in previous studies,one advantage of the ensemble approach over a sin-gle model is that, an ensemble approach allows eachsub-model to cover a different aspect of the data set.By combining them together, we are able to explainthe whole data set thoroughly. Therefore, in order totake full strength of each submodel, a good combi-nation strategy should be able to examine the inputpattern and invoke the sub-models that are only ap-propriate for the input pattern. For example, in theHierarchical Mixture


A New Boosting Algorithm Using Input-Dependent Regularizer

Download A New Boosting Algorithm Using Input-Dependent Regularizer
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A New Boosting Algorithm Using Input-Dependent Regularizer and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A New Boosting Algorithm Using Input-Dependent Regularizer 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?