1INTRODUCTION TOMachine LearningETHEM ALPAYDIN© The MIT Press, 2004Edited for CS 536 Fall 2005 – Rutgers UniversityAhmed [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER 15:Combining Multiple Learners2Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3Rationale No Free Lunch thm: There is no algorithm that is always the most accurate Generate a group of base-learners which when combined has higher accuracy Different learners use different Algorithms Hyperparameters Representations (Modalities) Training sets SubproblemsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4Voting Linear combination Classification∑==Ljjijidwy11 and 011=≥=∑∑==LjjjLjjjwwdwy3Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5 Bayesian perspective: If djare iid Bias does not change, variance decreases by L Average over randomness()()()jjiiP,xCPxCPjMMM∑= models all||[][] []()() ()jjjjjjjjjjdLdLLdLdLydEdELLdLEyEVar1Var1Var11VarVar1122=⋅====⋅==∑∑∑Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8Bagging Use bootstrapping to generate L training sets and train one base-learner with each (Breiman, 1996) Draw L training sets at random with replacement. Use voting (Average or median with regression) Unstable algorithms profit from bagging Unstable algorithms: if small changes in the training set causes large difference in the generated learner: the algorithm has high variance. E.g., decision trees, multilayer perceptrons.4Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9Boosting In bagging: generating complementary base-learner is left to chance and to the unstability of the learning methods In Boosting: actively try to generate complementary base-learner How: by training the next learner based on the mistakes of previous learners. Schapire 1990: combine three weak learners to generate a strong learner. Weak learner: error probability less than 1/2 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10AdaBoostAdaptive Boosting:Generate a sequence of base-learners each focusing on previous one’s errors(Freund and Schapire, 1996)5Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11AdaBoost AdaBoost works because it increases the margin at each step as the sample probabilities change Not all algorithms will benefit from Boosting Base-learner has to be simple and not accurate (high variance)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12Mixture of ExpertsVoting where weights are input-dependent (gating)(Jacobs et al., 1991)Experts or gating can be nonlinear∑==Ljjjdwy16Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13Mixture of Experts In RBF, each local fit is a constant, wih, second layer weight In MoE, each local fit is a linear function of x, a local expert:(Jacobs et al., 1991)ttihtihw xv=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14MoE as Models Combined Radial gating: Softmax gating:∑−−−−=lllthhtths/s/g22222exp2expmxmx[][]∑=ltTltThthgxmxmexpexp7Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15Stacking Combiner f () is another learner (Wolpert, 1992)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16CascadingUse djonly if preceding ones are not confidentCascade learners in order of
View Full Document