AdaBoost Freund and Schaipre Experiments with a new Boosting Algorithm Schapire and Singer Improved Boosting Algorithms using Confidence rated Predictions Ian Fasel October 23 2001 Vision and Learning Freund Schapire Singer AdaBoost 1 Resampling for Classifier Design Arcing adaptive reweighting and combining reusing or selecting data in order to improve classification Two most popular Bagging Breiman 1994 AdaBoost Freund and Schapire 1996 Combine the results of multiple weak classifiers into a single strong classifier Vision and Learning Freund Schapire Singer AdaBoost The horse track problem 2 Gambler wants to write a computer program to accurately predict winner of horse races Gathers Historical horse race data input vector Odds dry or muddy jockey output label win or lose Discovers easy to find rules of thumb that are often correct hard to find a single rule that is very highly accurate Vision and Learning Freund Schapire Singer AdaBoost The gambler s plan 3 The gambler has decided intuitively that he wants to use arcing and he wants his final solution to be some simple combination of simply derived rules Repeat T times 1 Examine small sample of races 2 Derive rough rule of thumb e g bet on horse with most favorable odds 3 Select new sample derive 2nd rule of thumb If track is muddy then bet on the lightest horse otherwise choose randomly end Vision and Learning Freund Schapire Singer AdaBoost 4 Questions How to choose samples Select multiple random samples Concentrate only on the errors How to combine rules of thumb into a single accurate rule Vision and Learning Freund Schapire Singer AdaBoost 5 More Formally Given training data x1 y1 xm ym where xi X yi Y 1 1 For t 1 T 1 Train Weak Learner on the training set Let ht X 1 1 represent the classifier obtained after training 2 Modify the training set somehow The final hypothesis H x is some combination of all the weak hypotheses H x f h x The question is how to modify the training set and how to combine the weak classifiers 1 Vision and Learning Freund Schapire Singer AdaBoost 6 Bagging The simplest algorithm is called Bagging used by Breiman 1994 Algorithm Given m training examples repeat for t 1 T Select at random with replacement m training examples Train learning algorithm on selected examples to generate hypothesis ht Final hypothesis is simple vote H x M AJ h1 x hT x Vision and Learning Freund Schapire Singer AdaBoost Bias Variance Dilemma 7 Vision and Learning Freund Schapire Singer AdaBoost 8 Bagging Pros and Cons Bagging reduces variance Helps improve unstable classifiers i e small changes in training data lead to significantly different classifiers and large changes in accuracy no proof for this however does not reduce bias Vision and Learning Freund Schapire Singer AdaBoost 9 Boosting Two modifications 1 instead of a random sample of the training data use a weighted sample to focus learning on most difficult examples 2 instead of combining classifiers with equal vote use a weighted vote Several previous methods Schapire 1990 Freund 1995 were effective but had limitations We skip ahead to Freund and Schapire 1996 Vision and Learning Freund Schapire Singer AdaBoost 10 AdaBoost Freund and Schapire 1996 Initialize distribution over the training set D1 i 1 m For t 1 T 1 Train Weak Learner using distribution Dt 2 Choose a weight or confidence value t R 3 Update the distribution over the training set Dt i e t yi ht xi Dt 1 i Zt 2 Where Zt is a normalization factor chosen so that Dt 1 will be a distribution Final vote H x is a weighted sum H x sign f x sign T X t 1 t ht x 3 Vision and Learning Freund Schapire Singer AdaBoost 11 If the underlying classifiers are linear networks then AdaBoost builds multilayer perceptrons one node at a time n Example 6 Example 5 Example 4 Example 3 Example 2 Example 1 Example 6 Example 5 Example 4 Example 3 Example 2 Example 1 Importance 2 Importance Example 6 Example 5 Example 4 Example 3 Example 2 Example 1 Importance 1 However underlying classifier can be anything Decision trees neural networks hidden Markov models Vision and Learning Freund Schapire Singer AdaBoost 12 How do we pick To decide how to pick the s we have to understand what the relationship is between the distribution the t and the training error Later we can show that reducing training error this way should reduce test error as well Vision and Learning Freund Schapire Singer AdaBoost 13 Theorem 1 Error is minimized by minimizing Zt Proof 1 e yi 1 h1 xi e yi T hT xi DT 1 i m Z1 ZT e t yi t ht xi e yi t t ht xi Q Q m t Zt m t Zt 4 e yi f xi Q m t Zt But if H xi 6 yi then yi f xi 0 implying that e yi f xi 1 Thus H xi 6 yi e yi f xi 1 X yi f xi 1 X H xi 6 yi e m i m i 5 Vision and Learning Freund Schapire Singer AdaBoost Combining these results 1 X 1 X yi f xi H xi 6 yi e m i m i X Y Zt DT 1 i 6 t i Y 14 Zt since DT 1 sums to 1 t Thus we can see that minimizing Zt will minimize this error bound Consequences 1 We should choose t to minimize Zt 2 We should modify the weak learner to minimize Zt instead of say squared error Vision and Learning Freund Schapire Singer AdaBoost 15 Finding t analytically Define ui yi h xi Then ui is positive if h xi is correct and negative if it is incorrect Magnitude is confidence P r i Di ui r 1 1 this is a measure of the overall error rate If we restrict h xi 1 1 we can approximate Z X D i e ui Z i X i D i 1 ui ui 1 ui ui e e 2 2 Note if h xi 1 1 this approximation is exact 7 Vision and Learning Freund Schapire Singer AdaBoost We can find to minimize Z analytically by finding which gives us 1 1 r ln 2 1 r dZ d 16 0 a Plugging this into equation 7 this gives the upper bound p Z 1 r2 for a particular t or plugging into equation 6 we have an upper bound for the training error of H t Y Y 1 H xi 6 yi Zt m t t 1 q 1 rt2 a For the careful it can be shown that chosen this way is guaranteed to minimize Z and that it is unique Vision and Learning Freund Schapire Singer AdaBoost Finding numerically 17 When using real valued hypotheses h xi 1 1 the best choice of to minimize Z must be found numerically If we do this and find such that Z 0 0 then d X dZ D i e ui Z d d i X D i ui e …
View Full Document
Unlocking...