U of M CS 5751 - Ensemble Learning - D1739677

Home> Schools> University of Minnesota- Twin Cities> (CS) > CS 5751> Ensemble Learning

U of M CS 5751 - Ensemble Learning

School name University of Minnesota- Twin Cities

Pages 3

Download Save

Unformatted text preview:

1CS 5751 Machine LearningEnsemble Learning 1Ensemble Learning • what is an ensemble?• why use an ensemble?• selecting component classifiers• selecting combining mechanism• some resultsCS 5751 Machine LearningEnsemble Learning 2A Classifier EnsembleClassifier 1 Classifier 2 Classifier N. . .Input FeaturesCombinerClass PredictionsClass PredictionCS 5751 Machine LearningEnsemble Learning 3Key Ensemble QuestionsWhich components to combine?• different learning algorithms• same learning algorithm trained in different ways• same learning algorithm trained the same wayHow to combine classifications?• majority vote• weighted (confidence of classifier) vote• weighted (confidence in classifier) vote• learned combinerWhat makes a good (accurate) ensemble?CS 5751 Machine LearningEnsemble Learning 4Why Do Ensembles Work?Hansen and Salamon, 1990If we can assume classifiers are random in predictions and accuracy > 50%, can push accuracy arbitrarily high by combining more classifiersKey assumption: classifiers are independent in their predictions• not a very reasonable assumption• more realistic: for data points where classifiers predict with > 50% accuracy, can push accuracy arbitrarily high (some data points just too hard)CS 5751 Machine LearningEnsemble Learning 5What Makes a Good Ensemble?Krogh and Vedelsby, 1995Can show that the accuracy of an ensemble is mathematically related:Effective ensembles have accurate and diverse componentscomponents theofdiversity themeasuring terma is sclassifiercomponent theoferror average theis ensemble entire theoferror theis ˆˆDEEDEE −=CS 5751 Machine LearningEnsemble Learning 6Ensemble Mechanisms - Components• Separate learning methods– not often used– very effective in certain problems (e.g., protein folding,Rost and Sander, Zhang)• Same learning method– generally still need to vary something externally• exception, some good results with neural networks– most often, data set used for training varied:• Bagging (Bootstrap and Aggregate), Breiman• Boosting, Freund & Schapire– Ada, Freund & Schapire–Arcing, Breiman2CS 5751 Machine LearningEnsemble Learning 7Ensemble Mechanisms - Combiners• Voting• Averaging (if predictions not 0,1)• Weighted Averaging– base weights on confidence in component• Learning combiner– Stacking, Wolpert• general combiner– RegionBoosting, Maclin• piecewise combinerCS 5751 Machine LearningEnsemble Learning 8BaggingVaries data setEach training set a bootstrap samplebootstrap sample - select set of examples (with replacement) from original sampleAlgorithm:for k = 1 to #classifierstrain´ = bootstrap sample of train setcreate classifier using train´ as training setcombine classifications using simple votingCS 5751 Machine LearningEnsemble Learning 9Weak LearningSchapire showed that a set of weak learners (learners with > 50% accuracy, but not much greater) could be combined into a strong learnerIdea: weight the data set based on how well we have predicted data points so far– data points predicted accurately - low weight– data points mispredicted - high weightResult: focuses components on portion of data space not previously well predictedCS 5751 Machine LearningEnsemble Learning 10Boosting - AdaVaries weights on training dataAlgorithm:for each data points: weight wito 1/#datapointsfor k = 1 to #classifiersgenerate classifierkwith current weighted train setεk= sum of wi’s of misclassified pointsβk = 1- εk/ εkmultiply weights of all misclassified points by βknormalize weights to sum to 1combine: weighted vote, weight for classifierkis log(βk )Q: what to do if εk= 0.0 or εk> 0.5?CS 5751 Machine LearningEnsemble Learning 11Boosting - ArcingSample data set (like Bagging), but probability of data point being chosen weighted (like Boosting)mi= #number of mistakes made on point i by previous classifiersprobability of selecting point i :Value 4 chosen empiricallyCombine using voting∑=++=Njjiimmprob04411CS 5751 Machine LearningEnsemble Learning 12Some Results - BP, C4.5 ComponentsDataset C4.5 BP BagC4 BagBP AdaC4 AdaBP ArcC4 ArcBPletter 14.0 18.0 7.0 10.5 4.1 5.7 3.9 4.6segment 3.7 6.6 3.0 5.4 1.7 3.5 1.5 3.3promoter 12.8 5.3 10.6 4.0 6.8 4.5 6.4 4.6kr-vs-kp 0.6 2.3 0.6 0.8 0.3 0.4 0.4 0.3splice 5.9 4.7 5.4 3.9 5.1 4.0 5.3 4.2breastc 5.0 3.4 3.7 3.4- 3.5 3.8- 3.5 4.0-housev 3.6 4.9 3.6 4.1 5.0- 5.1- 4.8- 5.3-3CS 5751 Machine LearningEnsemble Learning 13Some Theories on Bagging/BoostingError = Bayes Optimal Error + Bias + VarianceBayes Optimal Error = noise errorTheories:Bagging can reduce variance part of errorBoosting can reduce variance AND bias part of errorBagging will hardly ever increase errorBoosting may increase errorBoosting susceptible to noiseBoosting’s increases marginsCS 5751 Machine LearningEnsemble Learning 14Combiner - StackingIdea: generate component (level 0) classifiers with part of the data (half, three quarters)train combiner (level 1) classifier to combine predictions of components using remaining dataretrain component classifiers with all of training dataIn practice, often equivalent to votingCS 5751 Machine LearningEnsemble Learning 15Combiner - RegionBoost• Train “weight” classifier for each component classifier• “weight” classifier predicts how likely point will be predicted correctly• “weight” classifiers: k-Nearest Neighbor, Backprop• Combiner, generate component classifier prediction and weight using corresponding “weight” classifier• Small gains in

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CS 5751 - Ensemble Learning

Sign up for free to view:

Please select your school