DOC PREVIEW
CMU CS 10701 - The Boosting Approach to Machine Learning An Overview

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MSRI Workshop on Nonlinear Estimation and Classification 2002 The Boosting Approach to Machine Learning An Overview Robert E Schapire AT T Labs Research Shannon Laboratory 180 Park Avenue Room A203 Florham Park NJ 07932 USA www research att com schapire December 19 2001 Abstract Boosting is a general method for improving the accuracy of any given learning algorithm Focusing primarily on the AdaBoost algorithm this chapter overviews some of the recent work on boosting including analyses of AdaBoost s training error and generalization error boosting s connection to game theory and linear programming the relationship between boosting and logistic regression extensions of AdaBoost for multiclass classification problems methods of incorporating human knowledge into boosting and experimental and applied work using boosting 1 Introduction Machine learning studies automatic techniques for learning to make accurate predictions based on past observations For example suppose that we would like to build an email filter that can distinguish spam junk email from non spam The machine learning approach to this problem would be the following Start by gathering as many examples as posible of both spam and non spam emails Next feed these examples together with labels indicating if they are spam or not to your favorite machine learning algorithm which will automatically produce a classification or prediction rule Given a new unlabeled email such a rule attempts to predict if it is spam or not The goal of course is to generate a rule that makes the most accurate predictions possible on new test examples 1 Building a highly accurate prediction rule is certainly a difficult task On the other hand it is not hard at all to come up with very rough rules of thumb that are only moderately accurate An example of such a rule is something like the following If the phrase buy now occurs in the email then predict it is spam Such a rule will not even come close to covering all spam messages for instance it really says nothing about what to predict if buy now does not occur in the message On the other hand this rule will make predictions that are significantly better than random guessing Boosting the machine learning method that is the subject of this chapter is based on the observation that finding many rough rules of thumb can be a lot easier than finding a single highly accurate prediction rule To apply the boosting approach we start with a method or algorithm for finding the rough rules of thumb The boosting algorithm calls this weak or base learning algorithm repeatedly each time feeding it a different subset of the training examples or to be more precise a different distribution or weighting over the training examples1 Each time it is called the base learning algorithm generates a new weak prediction rule and after many rounds the boosting algorithm must combine these weak rules into a single prediction rule that hopefully will be much more accurate than any one of the weak rules To make this approach work there are two fundamental questions that must be answered first how should each distribution be chosen on each round and second how should the weak rules be combined into a single rule Regarding the choice of distribution the technique that we advocate is to place the most weight on the examples most often misclassified by the preceding weak rules this has the effect of forcing the base learner to focus its attention on the hardest examples As for combining the weak rules simply taking a weighted majority vote of their predictions is natural and effective There is also the question of what to use for the base learning algorithm but this question we purposely leave unanswered so that we will end up with a general boosting procedure that can be combined with any base learning algorithm Boosting refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb in a manner similar to that suggested above This chapter presents an overview of some of the recent work on boosting focusing especially on the AdaBoost algorithm which has undergone intense theoretical study and empirical testing 1 A distribution over training examples can be used to generate a subset of the training examples simply by sampling repeatedly from the distribution 2 Given x1 y1 xm ym where xi Initialize D1 i 1 m For t 1 T 2 X yi 2 Y f 1 1g Train base learner using distribution Dt Get base classifier ht X R Choose t 2 R Update Dt 1 i Dt i exp Z t yiht xi t where Zt is a normalization factor chosen so that Dt 1 will be a distribution Output the final classifier H x sign T X t 1 t ht x Figure 1 The boosting algorithm AdaBoost 2 AdaBoost Working in Valiant s PAC probably approximately correct learning model 75 Kearns and Valiant 41 42 were the first to pose the question of whether a weak learning algorithm that performs just slightly better than random guessing can be boosted into an arbitrarily accurate strong learning algorithm Schapire 66 came up with the first provable polynomial time boosting algorithm in 1989 A year later Freund 26 developed a much more efficient boosting algorithm which although optimal in a certain sense nevertheless suffered like Schapire s algorithm from certain practical drawbacks The first experiments with these early boosting algorithms were carried out by Drucker Schapire and Simard 22 on an OCR task The AdaBoost algorithm introduced in 1995 by Freund and Schapire 32 solved many of the practical difficulties of the earlier boosting algorithms and is the focus of this paper Pseudocode for AdaBoost is given in Fig 1 in the slightly generalized form given by Schapire and Singer 70 The algorithm takes as input a training set x1 y1 xm ym where each xi belongs to some domain or instance space X and each label yi is in some label set Y For most of this paper we assume Y f 1 1g in Section 7 we discuss extensions to the multiclass case AdaBoost calls a given weak or base learning algorithm repeatedly in a series 3 of rounds t 1 T One of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set The weight of this distribution on training example i on round t is denoted Dt i Initially all weights are set equally but on each round the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set The base learner s


View Full Document

CMU CS 10701 - The Boosting Approach to Machine Learning An Overview

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download The Boosting Approach to Machine Learning An Overview
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Boosting Approach to Machine Learning An Overview and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Boosting Approach to Machine Learning An Overview and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?