Rutgers University CS 536 - Lecture Notes - D129170

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> Lecture Notes

DOC PREVIEW

Rutgers University CS 536 - Lecture Notes

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1INTRODUCTION TOMachine LearningETHEM ALPAYDIN© The MIT Press, 2004Edited for CS 536 Fall 2005 – Rutgers UniversityAhmed [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER 15:Combining Multiple Learners2Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3Rationale No Free Lunch thm: There is no algorithm that is always the most accurate Generate a group of base-learners which when combined has higher accuracy Different learners use different Algorithms Hyperparameters Representations (Modalities) Training sets SubproblemsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4Voting Linear combination Classification∑==Ljjijidwy11 and 011=≥=∑∑==LjjjLjjjwwdwy3Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5 Bayesian perspective: If djare iid Bias does not change, variance decreases by L Average over randomness()()()jjiiP,xCPxCPjMMM∑= models all||[][] []()() ()jjjjjjjjjjdLdLLdLdLydEdELLdLEyEVar1Var1Var11VarVar1122=⋅====⋅==∑∑∑Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8Bagging  Use bootstrapping to generate L training sets and train one base-learner with each (Breiman, 1996) Draw L training sets at random with replacement.  Use voting (Average or median with regression) Unstable algorithms profit from bagging Unstable algorithms: if small changes in the training set causes large difference in the generated learner: the algorithm has high variance. E.g., decision trees, multilayer perceptrons.4Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9Boosting In bagging: generating complementary base-learner is left to chance and to the unstability of the learning methods In Boosting: actively try to generate complementary base-learner How: by training the next learner based on the mistakes of previous learners. Schapire 1990: combine three weak learners to generate a strong learner. Weak learner: error probability less than 1/2 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10AdaBoostAdaptive Boosting:Generate a sequence of base-learners each focusing on previous one’s errors(Freund and Schapire, 1996)5Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11AdaBoost AdaBoost works because it increases the margin at each step as the sample probabilities change Not all algorithms will benefit from Boosting Base-learner has to be simple and not accurate (high variance)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12Mixture of ExpertsVoting where weights are input-dependent (gating)(Jacobs et al., 1991)Experts or gating can be nonlinear∑==Ljjjdwy16Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13Mixture of Experts In RBF, each local fit is a constant, wih, second layer weight In MoE, each local fit is a linear function of x, a local expert:(Jacobs et al., 1991)ttihtihw xv=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14MoE as Models Combined Radial gating: Softmax gating:∑−−−−=lllthhtths/s/g22222exp2expmxmx[][]∑=ltTltThthgxmxmexpexp7Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15Stacking Combiner f () is another learner (Wolpert, 1992)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16CascadingUse djonly if preceding ones are not confidentCascade learners in order of

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Rutgers University CS 536 - Lecture Notes

Sign up for free to view:

Please select your school