DOC PREVIEW
STEVENS CS 559 - CS 559 12th Set of Notes

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications12thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Outline • Ensemble methods–Bagging–Boosting– Random forestsPattern Classification, Chapter 5 2Ensemble MethodsEnsemble Methods3Ensemble MethodsEnsemble Methods•Bagging (Breiman1994,…)Bagging (Breiman1994,…)• Boosting (Freund and Schapire 1995, Friedman et al. 1998,…),)• Random forests (Breiman 2001,…)Predict class label for unseen data by aggregating a set of predictions (classifiersaggregating a set of predictions (classifiers learned from the training data)4General IdeaSTraining DataDataS1S2SnMultiple Data SetsC1C2CnMultiple ClassifiersHCombined Classifier5Ensemble Classifiers• Basic idea: Build different “experts” and let them votevote• Advantages: • Improve predictive performancepp p• Different types of classifiers can be directly included• Easy to implement• Not too much parameter tuning•Disadvantages:•Disadvantages:• The combined classifier is not transparent (black box)(black box)• Not a compact representation6Why do they work?Why do they work?•Suppose there are 25 base classifiers•Suppose there are 25 base classifiers• Each classifier has error rate •Assume independence among classifiers35.0pg• Probability that the ensemble classifier makes a wrong prediction:06.0)1(2525i25i)(13ii7BaggingBagging• Traininggo Given a dataset S, at each iteration i, a training set Siis sampled with replacement from S (i.e. bootstraping)o A classifier Ciis learned for each Si• Classification: given an unseen sample Xgpo Each classifier Cireturns its class predictiono The bagged classifier H counts the votes and assigns the class gg gwith the most votes to X8BaggingBagging• Bagging works because it reduces variance by voting/averagingo In some pathological hypothetical situations the overall error might increaseo Usually, the more classifiers the better• Problem: we only have one dataset• Solution: generate new ones of size n by gybootstrapping, i.e. sampling with replacement• Can help a lot if data is noisypy9The Challenge : Reducing Churn•The ChallengeThe Challenge : Reducing Churn•The Challenge– U.S. Wireless Telecom company – To discover whether the customer will churn or not during the next few monthsnot during the next few months Churn is the response variable such that:Ch rn 1 if the c stomer is ch rningChurn=1 if the customer is churning Churn= -1 if the customer is not churning10Opportunity for this ChallengeOpportunity for this Challenge•Bagging and BoostingBagging and BoostingAggregating ClassifiersFINAL RULE Breiman (1996) found gains in accuracy by aggregating predictors built from reweighedversions of the learning set11Bagging•Bagging=BootstrapAggregatingBagging Bootstrap Aggregating•Reweighing of the learning sets is done byReweighing of the learning sets is done by drawing at random with replacement from the learning sets• Predictors are aggregated by pluralitytivoting12The Bagging Algorithm• B bootstrap samples• From which we derive:B321B10321Bcccc ,...,,, :1,1321B ClassifiersBpppp,...,,,:1,0321B Estimated probabilitiesThe aggregate classifier becomes:The aggregate classifier becomes:orBbxcsignxc)(1)(Bbxpxp)(1)(or13bbagxcBsignxc1)()(bbagxpBxp1)()(Initial setWeightingDrawing with replacement 1ggClassifier 1Drawing with replacement 1Drawing with replacement 2Classifier 214AggregationInitial setSignClassifier 1Classifier 1Classifier 2Classifier 3…Classifier TFinal rule=15BoostingBoosting16ResourcesResources•Slides based on:Slides based on:– Tutorial by Rob Schapire– Tutorial by Yoav Freund– Slides by Carlos Guestrin– Tutorial by Paul Viola –Tutorial by Ron Meir– Slides by Aurélie LemmensSlides byZhuowenTu–Slides by ZhuowenTu•Not in DHS book•Not in DHS book17CodeCode•AntonioTorralba(Object detection)Antonio Torralba(Object detection)– http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting htmleRLOC/boosting/boosting.html• GML AdaBoosthttp://graphics cs msu ru/ru/science/research/–http://graphics.cs.msu.ru/ru/science/research/machinelearning/adaboosttoolboxGuyLeshem(AdaBoostand more)•Guy Leshem(AdaBoostand more)– http://shum.huji.ac.il/~gleshem/18BoostingBoosting•Invented independently bySchapireInvented independently by Schapire(1989) and Freund (1990)Later joined forces–Later joined forcesMi id t itlifib•Main idea: train a strong classifier by combining weak classifiersf–Practically useful– Theoretically interesting19BoostingBoosting• Idea: given a set of weak learners, run them g,multiple times on (reweighted) training data, then let learned classifiers vote•At each iterationt:•At each iteration t:– Weight each training example by how incorrectly it was classified– Learn a hypothesis –ht• The one with the smallest error–Choose a strength for this hypothesis–αChoose a strength for this hypothesis αt• Final classifier: weighted combination of weak learners20Learning from Weighted DataLearning from Weighted Data•Sometimes not all data points are equalSometimes not all data points are equal– Some data points are more equal than others• Consider a weighted datasetg– D(i) – weight of ithtraining example (xi,yi)– Interpretations:• ithtraining example counts as D(i) examples• If I were to “resample” data, I would get more samples of “heavier” data pointsp• Now, in all calculations the ithtraining example counts as D(i) “examples”21Definition of BoostingDefinition of Boosting• Given training set (x1,y1),…,(xm,ym)g(1,y1), ,(m,ym)• yiϵ{-1,+1} correct label of instance xiϵX• For t=1,…,T– construct distribution Dton {1,…,m}– find weak hypothesis h:X{1+1}–ht: X{-1,+1}with small error εton Dt• Output final hypothesis Hfinal22AdaBoostAdaBoost• Constructing DtD1/–D1=1/m– Given Dtand ht:whereZis a normalizationwhere Ztis a normalizationconstant •Final hypothesis:Final hypothesis: 23TheAdaBoostAlgorithmThe AdaBoostAlgorithm24TheAdaBoostAlgorithmThe AdaBoostAlgorithmwith minimum25TheAdaBoostAlgorithmThe


View Full Document

STEVENS CS 559 - CS 559 12th Set of Notes

Download CS 559 12th Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 559 12th Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 559 12th Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?