Unformatted text preview:

Bagging and Boosting: Resampling for Classifier DesignBaggingBagging MethodBagging Overcomes Classifier InstabilityBoostingBoosting versus BaggingBoosting with 2D two-category problemCreating Classifier SequenceCreating Second Training SetCreating third data setBoostingAdaptive BoostingAlgorithm AdaBoostAdaBoost Ensemble DecisionIllustration of AdaBoostAdaBoost Error RatesLearning with QueriesLearning with QueriesReferencesBagging and Boosting: Resampling for Classifier DesignSargur [email protected]• Arcing– adaptive re-weighting and combining–refers to reusing or selecting data to improve classification• Includes both bagging and boosting• Simplest method is bagging• Name derived from bootstrap aggregation• Uses multiple versions of training setBagging Method• Drawing n’ < n samples from D with replacement• Each bootstrap set is used to train a different component classifier• Final classification decision based on vote of each component classifier• Component classifiers are all of the same form– all HMMs, all neural networks, all decision trees– Different parameter values due to different training setsBagging Overcomes Classifier Instability• Unstable if small changes in training data lead to significantly different classifiers or large changes in accuracy• Decision Tree algorithms can be unstable– Slight change in the position of a training point can lead to a radically different tree• Bagging improves recognition for unstable classifiers by smoothing over discontinuitiesBoosting• Powerful technique for combining multiple “base”classifiers to form a committee whose performance can be significantly better than any of the base classifiers• AdaBoost (adaptive boosting) can give good results even if base classifier performance is only slightly better than random• Hence base classifiers are known as weak learners• Designed for classification, can be used for regression as wellBoosting versus Bagging• Base classifiers are trained in sequence• Each base classifier trained using a weighted form of the dataset– Weighting coefficient depends on the performance of the previous classifiers– Points misclassified by one of the classifiers is given more weight when used to train next classifier• Decisions are combined using a weighted majority weighting schemeBoosting with 2D two-category problemClassification Task• Component classifierstrained using LMS• Training sets selected bybasic boosting procedureVoting ofComponentsCreating Classifier Sequence• Randomly select n1< n patterns form full training set D• Train the first classifier C1with D1• C1only need be a weak learner, accuracy only slightly better than chance• Seek to find second learning set D2that is most “informative”Creating Second Training Set• Informative set will be taken from the remaining samples from D– such that half are classified correctly and half are classified incorrectly by C1• Can be done by flipping a coin – if heads choosing a misclassified sample – if tails a correctly classified sample• Now train a second component classifier C2with D2Creating third data set• Seek data set D3which are not well classified by voting by C1and C2• Select samples remaining in D and include them only if C1and C2disagree• Use of the ensemble of three classifiers:– If C1and C2agree then use that label– If they disagree use the label given by C3• Choice of n1– Roughly equal numbers for the three classifiersBoosting• Boosting be recursively applied to the classifiers themselves creating more classifiers– With 9 or 27 classifiers very low taining error can be achieved• Continue adding weak learners until some desired low eror has been achieved• Focuses on difficult patterns by giving them more weightAdaptive Boosting• Continue adding weak learners until some desired low error has been achieved• Each training pattern receives a weight that determines its probability of being selected• Lower weight for those correctly classified• Initially uniform weights across training set• On iteration k draw training set according to weights and train classifier Ck• Repeat to get Ck+1• Until sufficient number kmaxof classifiers are generatedAlgorithm AdaBoostWeights uniformClass labelAs E increases α decreasesError measured w.r.t. distrib WNormalizing constant to ensure W is a distribClass decision of CkAdaBoost Ensemble Decision• Final classification is given by• In two class case this is same as sgn[g(x)]∑==max1)()(kkkkxhxgαIllustration of AdaBoost• Weak Learner: Threshold applied to one or other axis• Ensemble decision boundary is Green• Current base learner is dashed black line• Size of circle indicates weight assignedAdaBoost Error Rates• Ensemble Error rate keeps decreasing.• Individual Classifiers perform worse on weighted training setsLearning with Queries• Data Set is unlabeled• There exists a process for labeling that is costly– Handwritten numeral classification• Present a pattern to an oracle (teacher) who will label pattern without errorLearning with QueriesActive learning can result in more accurate classifiersthan with i.i.d. sampling 1. Bayes Classifier for Two Gaussians2. Nearest_neighborwith 30 samples and i.i.d. sampling3. Queries posed for theextremes of feature spacemidway between two pointsalready selectedReferences• Duda, Hart and Stork 2001• Bishop,


View Full Document

UB CSE 555 - Bagging and Boosting - Resampling for Classifier Design

Download Bagging and Boosting - Resampling for Classifier Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bagging and Boosting - Resampling for Classifier Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bagging and Boosting - Resampling for Classifier Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?