DOC PREVIEW
CMU CS 10701 - lecture

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning 1010 701 15701 15 781 Spring 2008 Boosting from Weak Learners Eric Xing Lecture 10 February 18 2008 Reading Chap 14 3 C B book Rationale Combination of methods z There is no algorithm that is always the most accurate z We can select simple weak classification or regression methods and combine them into a single strong method z Different learners use different z z Algorithms z Hyperparameters z Representations Modalities z Training sets z Subproblems The problem how to combine them 1 Some early algorithms z z z Boosting by filtering Schapire 1990 z Run weak learner on differently filtered example sets z Combine weak hypotheses z Requires knowledge on the performance of weak learner Boosting by majority Freund 1995 z Run weak learner on weighted example set z Combine weak hypotheses linearly z Requires knowledge on the performance of weak learner Bagging Breiman 1996 z Run weak learner on bootstrap replicates of the training set z Average weak hypotheses z Reduces variance Combination of classifiers z Suppose we have a family of component classifiers generating 1 labels such as decision stumps h x sign wxk b where k w b z Each decision stump pays attention to only a single component of the input vector 2 Combination of classifiers con d z We d like to combine the simple classifiers additively so that the final classifier is the sign of h x 1 h x 1 K m h x m where the votes i emphasize component classifiers that make more reliable predictions than others z Important issues z what is the criterion that we are optimizing measure of loss z we would like to estimate each new component classifier in the same manner modularity Measurement of error z Loss function e g I y h x y h x z Generalization error L h E y h x z Objective find h with minimum generalization error z Main boosting idea minimize the empirical error 1 L h N N y h x i 1 i i 3 Exponential Loss z Empirical loss 1 L h N z N y h x i i 1 i Another possible measure of empirical loss is n L h exp yi h m x i i 1 Exponential Loss z One possible measure of empirical loss is n Recall that L h exp yi h m x i i 1 n h m x 1 h x 1 K m h x m exp yi h m 1 x i yi am h x i m i 1 n exp yi h m 1 x i exp yi am h x i m i 1 n Wi m 1 exp yi am h x i m Wi m 1 exp yi h m 1 x i i 1 z The combined classifier based on m 1 iterations defines a weighted loss criterion for the next simple classifier to add z each training sample is weighted by its classifiability or difficulty seen by the classifier we have built so far 4 Linearization of loss function z We can simplify a bit the estimation criterion for the new component classifiers assuming is small exp yi am h x i m 1 yi am h x i m z Now our empirical loss criterion reduces to exp y h n i m i 1 x i n Wi m 1 1 yi am h x i m i 1 n Wi m 1 i 1 z n am Wi i 1 m 1 Wi m 1 exp yi h m 1 x i yi h x i m We could choose a new component classifier to optimize this weighted agreement A possible algorithm z At stage m we find that maximize or at least give a sufficiently high weighted agreement n W i 1 z i m 1 yi h x i m z each sample is weighted by its difficulty under the previously combined m 1 classifiers z more difficult samples received heavier attention as they dominates the total loss Then we go back and find the votes m associated with the new classifier by minimizing the original weighted exponential loss n L h Wi m 1 exp yi am h x i m i 1 n n i 1 i 1 Wi m 1 am Wi m 1 yi h x i m 5 Boosting z We have basically derived a Boosting algorithm that sequentially adds new component classifiers each trained on reweighted training examples z z each component classifier is presented with a slightly different problem AdaBoost preliminaries z we work with normalized weights Wi on the training examples initially uniform Wi 1 n z the weight reflect the degree of difficulty of each datum on the latest classifier The AdaBoost algorithm z Wi m 1 exp yi h m 1 x i At the kth iteration we find any classifier h x k for which the weighted classification error 1 n 2 i 1 k 0 5 Wi k 1 yi h x i k is better than change z z Determine how many votes to assign to the new component classifier k 0 5 log 1 k k z z This is meant to be easy weak classifier stronger classifier gets more votes Update the weights on the training examples Wi k Wi k 1 exp yi ak h x i k 6 The AdaBoost algorithm cont d z The final classifier after m boosting iterations is given by the sign of h x 1 K m h x m h x 1 1 K m z the votes here are normalized for convenience AdaBoost summary z z z Input z N examples SN x1 y1 xN yN z a weak base learner h h x Initialize equal example weights wi 1 N for all i 1 N Iterate for t 1 T 1 z train base learner according to weighted example set wt x and obtain hypothesis ht h x t 2 compute hypothesis error t 3 compute hypothesis weight t 4 update example weights for next iteration wt 1 Output final hypothesis as a linear combination of ht 7 AdaBoost dataflow diagram w1 A w S w2 A w S wT A w S Boosting examples 8 Boosting example cont d Boosting example cont d 9 Base Learners z z Weak learners used in practice z Decision stumps axis parallel splits z Decision trees e g C4 5 by Quinlan 1996 z Multi layer neural networks z Radial basis function networks Can base learners operate on weighted examples z In many cases they can be modified to accept weights along with the examples z In general we can sample the examples with replacement according to the distribution defined by the weights Boosting performance z The error rate of component classifier the decision stumps does not improve much if at all over time z But both training and testing error improve over time z Even after the training error of the combined classifier goes to zero boosting iterations can still improve the generalization error 10 Why it is working z You will need some learning theory …


View Full Document

CMU CS 10701 - lecture

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?