DOC PREVIEW
CMU CS 10701 - The Boosting Approach to Machine Learning An Overview

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MSRI Workshop on Nonlinear Estimation and Classification, 2002.The Boosting Approach to Machine LearningAn OverviewRobert E. SchapireAT&T Labs;ResearchShannon Laboratory180 Park Avenue, Room A203Florham Park, NJ 07932 USAwww.research.att.com/schapireDecember 19, 2001AbstractBoosting is a general method for improving the accuracy of any givenlearning algorithm. Focusing primarily on the AdaBoost algorithm, thischapter overviews some of the recent work on boosting including analysesof AdaBoost’s training error and generalization error; boosting’s connectionto game theory and linear programming; the relationship between boostingand logistic regression; extensions of AdaBoost for multiclass classificationproblems; methods of incorporating human knowledge into boosting; andexperimental and applied work using boosting.1 IntroductionMachine learning studies automatic techniques for learning to make accurate pre-dictions based on past observations. For example, suppose that we would like tobuild an email filter that can distinguish spam (junk) email from non-spam. Themachine-learning approach to this problem would be the following: Start by gath-ering as many examples as posible of both spam and non-spam emails. Next, feedthese examples, together with labels indicating if they are spam or not, to yourfavorite machine-learning algorithm which will automatically produce a classifi-cation or prediction rule. Given a new, unlabeled email, such a rule attempts topredict if it is spam or not. The goal, of course, is to generate a rule that makes themost accurate predictions possible on new test examples.1Building a highly accurate prediction rule is certainly a difficult task. On theother hand, it is not hard at all to come up with very rough rules of thumb thatare only moderately accurate. An example of such a rule is something like thefollowing: “If the phrase ‘buy now’ occurs in the email, then predict it is spam.”Such a rule will not even come close to covering all spam messages; for instance,it really says nothing about what to predict if ‘buy now’ does not occur in themessage. On the other hand, this rule will make predictions that are significantlybetter than random guessing.Boosting, the machine-learning method that is the subject of this chapter, isbased on the observation that finding many rough rules of thumb can be a lot easierthan finding a single, highly accurate prediction rule. To apply the boosting ap-proach, we start with a method or algorithm for finding the rough rules of thumb.The boosting algorithm calls this “weak” or “base” learning algorithm repeatedly,each time feeding it a different subset of the training examples (or, to be more pre-cise, a different distribution or weighting over the training examples1). Each timeit is called, the base learning algorithm generates a new weak prediction rule, andafter many rounds, the boosting algorithm must combine these weak rules into asingle prediction rule that, hopefully, will be much more accurate than any one ofthe weak rules.To make this approach work, there are two fundamental questions that must beanswered: first, how should each distribution be chosen on each round, and second,how should the weak rules be combined into a single rule? Regarding the choiceof distribution, the technique that we advocate is to place the most weight on theexamples most often misclassified by the preceding weak rules; this has the effectof forcing the base learner to focus its attention on the “hardest” examples. Asfor combining the weak rules, simply taking a (weighted) majority vote of theirpredictions is natural and effective.There is also the question of what to use for the base learning algorithm, butthis question we purposely leave unanswered so that we will end up with a generalboosting procedure that can be combined with any base learning algorithm.Boosting refers to a general and provably effective method of producing a veryaccurate prediction rule by combining rough and moderately inaccurate rules ofthumb in a manner similar to that suggested above. This chapter presents anoverview of some of the recent work on boosting, focusing especially on the Ada-Boost algorithm which has undergone intense theoretical study and empirical test-ing.1A distribution over training examples can be used to generate a subset of the training examplessimply by sampling repeatedly from the distribution.2Given:(x1y1):::(xmym)wherexi2X,yi2Y=f;1+1gInitializeD1(i)=1=m.Fort=1:::T: Train base learner using distributionDt. Get base classifierht:X!R. Chooset2R. Update:Dt+1(i)=Dt(i)exp(;tyiht(xi))ZtwhereZtis a normalization factor (chosen so thatDt+1will be a distribu-tion).Output the final classifier:H(x)=signTXt=1tht(x)!:Figure 1: The boosting algorithm AdaBoost.2 AdaBoostWorking in Valiant’s PAC (probably approximately correct) learning model [75],Kearns and Valiant [41, 42] were the first to pose the question of whether a “weak”learning algorithm that performs just slightly better than random guessing can be“boosted” into an arbitrarily accurate “strong” learning algorithm. Schapire [66]came up with the first provable polynomial-time boosting algorithm in 1989. Ayear later, Freund [26] developed a much more efficient boosting algorithm which,although optimal in a certain sense, nevertheless suffered like Schapire’s algorithmfrom certain practical drawbacks. The first experiments with these early boostingalgorithms were carried out by Drucker, Schapire and Simard [22] on an OCR task.The AdaBoost algorithm, introduced in 1995 by Freund and Schapire [32],solved many of the practical difficulties of the earlier boosting algorithms, and isthe focus of this paper. Pseudocode for AdaBoost is given in Fig. 1 in the slightlygeneralized form given by Schapire and Singer [70]. The algorithm takes as inputa training set(x1y1):::(xmym)where eachxibelongs to some domain orinstance spaceX, and each labelyiis in some label setY. For most of this paper,we assumeY=f;1+1g; in Section 7, we discuss extensions to the multiclasscase. AdaBoost calls a given weak or base learning algorithm repeatedly in a series3of roundst= 1:::T. One of the main ideas of the algorithm is to maintain adistribution or set of weights over the training set. The weight of this distribution ontraining exampleion roundtis denotedDt(i). Initially, all weights are set equally,but on each round, the


View Full Document

CMU CS 10701 - The Boosting Approach to Machine Learning An Overview

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download The Boosting Approach to Machine Learning An Overview
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Boosting Approach to Machine Learning An Overview and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Boosting Approach to Machine Learning An Overview 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?