MSU CSE 847 - boosting - D688330

Home> Schools> Michigan State University> Computer Science & Engineering (CSE) > CSE 847> boosting

DOC PREVIEW

MSU CSE 847 - boosting

School name Michigan State University

Course Cse 847-

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

BoostingInefficiency with BaggingImprove the Efficiency of BaggingIntuition: Education in ChinaAdaBoost AlgorithmAdaBoost Example: t=ln2How To Choose t in AdaBoost?Optimization View for Choosing tAdaBoost: A Greedy Approach to Optimize the Exponential FunctionSlide 10Empirical Study of AdaBoostBia-Variance Tradeoff for AdaBoostBoostingRong JinInefficiency with BaggingDBagging…D1D2DkBoostrap SamplingPr( | , )iic h x�h1h2hkInefficiency with boostrap sampling:Every example has equal chance to be sampledNo distinction between “easy” examples and “difficult” examplesInefficiency with model combinationA constant weight for each classifierNo distinction between accurate classifiers and inaccurate classifiersImprove the Efficiency of BaggingBetter sampling strategyFocus on the examples that are difficult to classify correctlyBetter combination strategyAccurate model should be assigned with more weightsIntuition: Education in ChinaTraining ExamplesX1Y1X2Y2X3Y3X4Y4MistakesX1Y1X3Y3 Classifier1Classifier2MistakesX1Y1+Classifier3 No training mistakes !! May overfitting to training data !!+AdaBoost AlgorithmAdaBoost Example: t=ln2x1, y1x2, y2x3, y3x4, y4x5, y51/5 1/5 1/51/51/5D0:x5, y5x3, y3x1, y1Sampleh1Training2/7 1/7 2/71/71/7D1:x1, y1x2, y2x3, y3x4, y4x5, y5Update Weightsh1   Samplex3, y3x1, y1h2Trainingx1, y1x2, y2x3, y3x4, y4x5, y5  h2Update Weights2/9 1/9 4/91/91/9D2:Sample …How To Choose t in AdaBoost?Consider how to construct the best distribution Dt+1(i) given Dt(i) and ht1. Dt+1(i) should be significantly differen from Dt(i)2. Dt+1(i) should create a situation that classifier ht performs poorly[ ]1[0,1][0,1]arg max KL( || )( )arg max ( ) ln( )s.t.( ) 1, ( ) ( ) 1/ 2mmt tDD itt ii iD D DD iD iD iD i h i y D i+��=== � =�� [ ]11ln2 1( ) ( )tttt t iirrr h i y D ia+=-=�Optimization View for Choosing tht(x): x{1,-1}; a basis (weak) classifierHT(x): a linear combination of basic classifiersGoal: minimize training errorApproximate the training error with a exponential function1 1 2 21( ) ( ) ( ) ... ( )( ) ( )T T TT T TH x h x h x h xH x h xa a aa-= + + += +11( ( ) )NT i iierr I H x yN== -�11exp( ( ) )NT i iierr H x yN=� -�AdaBoost: A Greedy Approach to Optimize the Exponential Function11exp( ( ) )NT i iiH x yN=-�•Exponential cost function•Use the inductive form HT(x)=HT-1(x)+ThT(x))),(()exp())(exp(1)),(()exp())(exp(1))(exp())(exp(1))(exp(11111111iiTNiTiiTiiTNiTiiTNiiiTTiiTNiiiTyxhIyxHNyxhIyxHNyxhyxHNyxHNerrAdaBoost: A Greedy Approach to Optimize the Exponential Function11exp( ( ) )NT i iiH x yN=-�•Exponential cost function•Use the inductive form HT(x)=HT-1(x)+ThT(x))),(()exp())(exp(1)),(()exp())(exp(1))(exp())(exp(1))(exp(11111111iiTNiTiiTiiTNiTiiTNiiiTTiiTNiiiTyxhIyxHNyxhIyxHNyxhyxHNyxHNerrtiittitittttZyxhxDxDrr ))(exp()()( and )11ln(211•Minimize the exponential functionData points that hT(x) predict correctlyData points that hT(x) predict incorrectlyAdaBoost is a greedy approach  overfitting ?• Empirical studies show that AdaBoost is robust in general• AdaBoost tends to overfit with noisy dataEmpirical Study of AdaBoostAdaBoosting decision treesGenerate 50 decision trees through the AdaBoost procedureLinearly combine decision trees using the weights computed by the AdaBoost AlgorithmIn general:AdaBoost = Bagging > C4.5AdaBoost usually needs less number of classifiers than BaggingBia-Variance Tradeoff for AdaBoostAdaBoost can reduce both model variance and model biassingle decision treeBagging decision treebiasvarianceAdaBoosting decision

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

MSU CSE 847 - boosting

Sign up for free to view:

Please select your school