Boosting • Main idea: – train classifiers (e.g. decision trees) in a sequence. – a new classifier should focus on those cases which were incorrectly classified in the last round. – combine the classifiers by letting them vote on the final prediction (like bagging). – each classifier could be (should be) very “weak”, e.g. a decision stump.Boosting Intuition • We adaptively weigh each data case. • Data cases which are wrongly classified get high weight (the algorithm will focus on them) • Each boosting round learns a new (simple) classifier on the weighed dataset. • These classifiers are weighed to combine them into a single powerful classifier. • Classifiers that obtain low training error rate have high weight. • We stop by using monitoring a hold out set (cross-validation).ExampleBoosting in a Picture training cases correctly classified training case has large weight in this round this DT has a strong vote. boosting roundsAnd in animation Original Training set : Equal Weights to all training samples Taken from “A Tutorial on Boosting” by Yoav Freund and Rob SchapireAdaBoost(Example) ROUND 1AdaBoost(Example) ROUND 2AdaBoost(Example) ROUND 3AdaBoost(Example)Given: where Initialise . For : • Find the classifier that minimizes the error with respect to the distribution Dt: • Prerequisite: εt < 0.5, otherwise stop. • Choose , typically where εt is the weighted error rate of classifier ht. • Update: where Zt is a normalisation factor (chosen so that Dt + 1 will be a distribution). Output the final classifier: The equation to update the distribution Dt is constructed so that: Thus, after selecting an optimal classifier for the distribution , the examples that the classifier identified correctly are weighted less and those that it identified incorrectly are weighted more. Therefore, when the algorithm is testing the classifiers on the distribution , it will select a classifier that better identifies those examples that the previous classifer missed.
View Full Document