ILLINOIS CS 446 - 092817.2 (9 pages)

Previewing pages 1, 2, 3 of 9 page document View the full content.
View Full Document

092817.2



Previewing pages 1, 2, 3 of actual document.

View the full content.
View Full Document
View Full Document

092817.2

49 views


Pages:
9
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning
Machine Learning Documents

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 10 Bagging Random Forests Surrogate Losses Lecturer Sanmi Koyejo Scribe Yiming Gao yimingg2 Sept 28th 2017 Outlines Boosting Recap Bagging ERM 1 Recap 1 1 Error of a random classifier When h xi 1 1 w p 0 5 we have error rate otherwise n 1X Err h 1 1 yi h xi n i 1 and E Error h 0 5 1 2 Double CV In the loop CV Figure 1 Double CV In the loop CV 1 2 10 Bagging Random Forests Surrogate Losses 1 3 A Boosting Example Recall that the goal is f x K X m m x m 1 where m is the weight at step m and m x is the weak classifier Figure 2 Step1 Figure 3 Step2 The process goes similarly At every step we fit a weighted loss function n Lm n 1X wi m l yi xi n wi m 0 i 1 When our classifier cannot use weighted loss i e can only do L n n X l yi xi i 1 we could let the probability of selecting example i wi Sampling replacement 10 Bagging Random Forests Surrogate Losses 3 2 Multi class Boosting Adaboost SAMME Stage wise Additive Modeling using Multi class Exponential loss x 1 2 C Same steps as binary boosting Start with wn 1 n for m 1 K Fit m x using weights wm P P Errorm ni 1 wi 1 yi 6 m xi ni 1 wi m m log 1 Error Errorm log C 1 wi wi exp m 1 yi 6 m xi Normalize wi Return f x K X m m x i 1 Predict using majority weighted vote Boosting is an ensemble learner i e combination of multiple sub learners 3 Bagging Bootstrap Aggregating 3 1 Bootstrap Sampling Given Dn create a bootstrap sample by sampling with replacement xi yi ni 1 uniformly from Dn D 1 D 2 D k Aggregate step For j 1 b Bootstrap sample D m Dn Learn a model m D m f x K X i 1 It works for reducing variances m x 4 10 Bagging Random Forests Surrogate Losses 3 2 Jensen s Inequality For a convex function we have E X E X Figure 4 Jensen s Inequality Jensen s inequality generalizes the statement that a secant line of a convex function lies above the graph For example V ar Avg Avg V ar Bagging works best if classifiers are unbiased works best if D m are approximately uncorrelated unlike boosting no need for weak classifiers Weak classifiers have high bias and high variance which contradicts to our goal of bagging to reduce variance 3 3 Bagging for Decision Trees Random Forests One way to reduce the variance of an estimate is to average together many estimates For example we can train M different trees on different subsets of the data chosen randomly with replacement and then compute the ensemble



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view 092817.2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 092817.2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?