DOC PREVIEW
UT Dallas CS 6375 - 13.ensemble

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11Machine LearningCS6375 --- Spring 2015Ensemble LearningInstructor: Yang Liu2Ensembles• Old goals– learn one good model (decision tree, Bayes, neural nets, kNN, etc.)• New goal– learn a good set of models– Can we achieve better performance by combining classifiers? • A good example of interplay between theory and practice in machine learning23Ensemble• Given training samples S• Generate multiple hypotheses • Optionally: determine corresponding weights• Classify new points according to:Σwlhl> θ4Ensembles of Neural Networks(or any supervised learner)• Ensembles often produce accuracy gains• Can combine “classifiers” of various typesE.g., decision trees, rule sets, neural networks, etc.Network Network NetworkINPUTCombinerOUTPUT35Combining Multiple ModelsSome simple ideas• Simple (unweighted) votes• Weighted votese.g., weight by tuning-set accuracy• Train a combining functionExample• Netflix prize • Many state-of-the-art systems for speech recognition, machine translation, computer vision, etc.This LectureFocus on two methods:• Bagging • Boosting647Some Methods for Producing “Uncorrelated”Members of an Ensemble• For effective combination, systems should be different • k times randomly choose (with replacement) N examples from a training set of size N – give each training set to a standard ML algorithm– “Bagging” by Brieman (MLJ, 1996)• Reweight examples each cycle (if wrong, increase weight; else decrease weight)– “AdaBoosting” by Freund & Schapire (1995, 1996)8Bagging: Bootstrap AggregationTake repeated bootstrap samples from training set D.Bootstrap sampling: Given set D containing N training examples, create D’ by drawing N examples at random with replacement from D. [Leo Breiman (1994)]Bagging algorithm:Training:– Create k bootstrap samples D1… Dk(draw with replacement).– Train distinct classifier on each Di.Testing:– classify new instance using each of k classifiers, – final decision based on majority vote or average.59Bagging ResultsBreiman “Bagging Predictors” Berkeley Statistics Department TR#421, 1994Data set single bagged decreasewaveform 29.0 19.4 33%heart 10.0 5.3 47%Breast cancer 6.0 4.2 30%ionosphere 11.2 8.6 23%diabetes 23.4 18.8 20%glass 32.0 24.9 22%soybean 14.5 10.6 27%Misclassification rate (%) 10How Many Bootstrap Samples?Breiman “Bagging Predictors” Berkeley Statistics Department TR#421, 1994No. bootstrap replicatesMisclassification rate10 21.825 19.550 19.4100 19.4Bagged misclassification rate (%)611When Will Bagging Improve Accuracy?• Depends on the stability of the base-level classifiers.• A learner is unstable if a small change to the training set D causes a large change in the output hypothesis ϕ.– If small changes in D causes large changes ϕ in then there will be an improvement in performance.• Bagging helps unstable procedures, but could hurt the performance of stable procedures.• Neural nets and decision trees are unstable.• k-nn and naïve Bayes classifiers are stable.12Boosting• Weight all training samples equally• Train model on training set• Compute error of model on training set• Increase weights on training cases that model gets wrong• Train new model on re-weighted training set• Re-compute errors on weighted training set• Increase weights again on cases that model gets wrong• Repeat until tired (100+ iterations)• Final model: weighted prediction of each model713A Formal View of Boosting14AdaBoost81516Example917Round 118Round 21019Round 320Final Hypothesis1121Boosting Performance22Dealing with Weighted Examples in an ML AlgorithmTwo approaches:Sample from this probability distribution and train as normal (i.e., create prob dist from weights, then sample to create an unweighted train set)Alter learning algorithm so it counts weighted examples and not just examplese.g., from accuracy = # correct / # totalto weighted accuracy = ∑wiof correct / ∑wiof all#2 preferred – avoids sampling effects1223Reweighting vs. ResamplingExample weights might be harder to deal with• Some learning methods can’t use weights on examplesWe can resample instead:• Draw a bootstrap sample from the data with the probability of drawing each example proportional to its weightReweighting usually works better but resampling is easier to implement24Empirical Studies (from Freund & Schapire; reprinted in Dietterich’s AI Mag paper)Error Rate of C4.5Error Rate of Bagged (Boosted) C4.5(Each point one data set)Boosting and Bagging helped almost always!Error Rate of AdaBoostError Rate of BaggingOn average, boosting slightly better?1325Large Empirical Study of Bagging vs. Boosting Opitz & Maclin (UW CS PhD’s), JAIR Vol 11, pp 169-198, 1999 www.jair.org/abstracts/opitz99a.htmlBagging almost always better than single D-tree or ANNBoosting can be much better than BaggingHowever, boosting can sometimes be harmful (too much emphasis on “outliers”?)26Often get better test-set results, even when (and after) train error is 0Freund & Schapire: • theory for “weak learners” in late 80’sweak learner: performance on any train set is slightly better than chance predictiontesttrainError (on unweighted examples)cyclesBoosting1427Boosting vs. BaggingBagging doesn’t work so well with stable models. Boosting might still help.Boosting might hurt performance on noisy datasets. Bagging doesn’t have this problem.On average, boosting helps more than bagging, but it is also more common for boosting to hurt performance. Bagging is easier to parallelize.28Boosting/Bagging/Etc Wrapup An easy to use and usually highly effective technique- consider it when applying ML to practical problemsDoes reduce “comprehensibility” of modelsAlso an increase in runtime, but cycles usually much cheaper than examples1529Other System Combination• Train different classifiers• Train classifiers using different feature setsOther approaches:• Stacked learning (meta-classifier)– Train a classifier that uses hypotheses from base classifiers as features• error correcting output code30Random Forests(Breiman, MLJ 2001; related to Ho, 1995)A variant of BaggingRepeat k times• Draw with replacement N examples, put in train set• Build d-tree, but in each recursive call• Choose (w/o replacement) i features• Choose the best of these to split• Do NOT pruneLet N= # of examplesF= # of featuresi= some number << F1631More on Random Forests• Increasing i– Increases


View Full Document

UT Dallas CS 6375 - 13.ensemble

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download 13.ensemble
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 13.ensemble and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 13.ensemble 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?