BoostingInefficiency with BaggingImprove the Efficiency of BaggingIntuition: Education in ChinaAdaBoost AlgorithmAdaBoost Example: t=ln2How To Choose t in AdaBoost?Optimization View for Choosing tAdaBoost: A Greedy Approach to Optimize the Exponential FunctionSlide 10Empirical Study of AdaBoostBia-Variance Tradeoff for AdaBoostBoostingRong JinInefficiency with BaggingDBagging…D1D2DkBoostrap SamplingPr( | , )iic h x�h1h2hkInefficiency with boostrap sampling:Every example has equal chance to be sampledNo distinction between “easy” examples and “difficult” examplesInefficiency with model combinationA constant weight for each classifierNo distinction between accurate classifiers and inaccurate classifiersImprove the Efficiency of BaggingBetter sampling strategyFocus on the examples that are difficult to classify correctlyBetter combination strategyAccurate model should be assigned with more weightsIntuition: Education in ChinaTraining ExamplesX1Y1X2Y2X3Y3X4Y4MistakesX1Y1X3Y3 Classifier1Classifier2MistakesX1Y1+Classifier3 No training mistakes !! May overfitting to training data !!+AdaBoost AlgorithmAdaBoost Example: t=ln2x1, y1x2, y2x3, y3x4, y4x5, y51/5 1/5 1/51/51/5D0:x5, y5x3, y3x1, y1Sampleh1Training2/7 1/7 2/71/71/7D1:x1, y1x2, y2x3, y3x4, y4x5, y5Update Weightsh1 Samplex3, y3x1, y1h2Trainingx1, y1x2, y2x3, y3x4, y4x5, y5 h2Update Weights2/9 1/9 4/91/91/9D2:Sample …How To Choose t in AdaBoost?Consider how to construct the best distribution Dt+1(i) given Dt(i) and ht1. Dt+1(i) should be significantly differen from Dt(i)2. Dt+1(i) should create a situation that classifier ht performs poorly[ ]1[0,1][0,1]arg max KL( || )( )arg max ( ) ln( )s.t.( ) 1, ( ) ( ) 1/ 2mmt tDD itt ii iD D DD iD iD iD i h i y D i+��=== � =�� �[ ]11ln2 1( ) ( )tttt t iirrr h i y D ia+=-=�Optimization View for Choosing tht(x): x{1,-1}; a basis (weak) classifierHT(x): a linear combination of basic classifiersGoal: minimize training errorApproximate the training error with a exponential function1 1 2 21( ) ( ) ( ) ... ( )( ) ( )T T TT T TH x h x h x h xH x h xa a aa-= + + += +11( ( ) )NT i iierr I H x yN== -�11exp( ( ) )NT i iierr H x yN=� -�AdaBoost: A Greedy Approach to Optimize the Exponential Function11exp( ( ) )NT i iiH x yN=-�•Exponential cost function•Use the inductive form HT(x)=HT-1(x)+ThT(x))),(()exp())(exp(1)),(()exp())(exp(1))(exp())(exp(1))(exp(11111111iiTNiTiiTiiTNiTiiTNiiiTTiiTNiiiTyxhIyxHNyxhIyxHNyxhyxHNyxHNerrAdaBoost: A Greedy Approach to Optimize the Exponential Function11exp( ( ) )NT i iiH x yN=-�•Exponential cost function•Use the inductive form HT(x)=HT-1(x)+ThT(x))),(()exp())(exp(1)),(()exp())(exp(1))(exp())(exp(1))(exp(11111111iiTNiTiiTiiTNiTiiTNiiiTTiiTNiiiTyxhIyxHNyxhIyxHNyxhyxHNyxHNerrtiittitittttZyxhxDxDrr ))(exp()()( and )11ln(211•Minimize the exponential functionData points that hT(x) predict correctlyData points that hT(x) predict incorrectlyAdaBoost is a greedy approach overfitting ?• Empirical studies show that AdaBoost is robust in general• AdaBoost tends to overfit with noisy dataEmpirical Study of AdaBoostAdaBoosting decision treesGenerate 50 decision trees through the AdaBoost procedureLinearly combine decision trees using the weights computed by the AdaBoost AlgorithmIn general:AdaBoost = Bagging > C4.5AdaBoost usually needs less number of classifiers than BaggingBia-Variance Tradeoff for AdaBoostAdaBoost can reduce both model variance and model biassingle decision treeBagging decision treebiasvarianceAdaBoosting decision
View Full Document