# ILLINOIS CS 446 - 092817.2 (9 pages)

Previewing pages*1, 2, 3*of 9 page document

**View the full content.**## 092817.2

Previewing pages *1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 092817.2

0 0 49 views

- Pages:
- 9
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 10 Bagging Random Forests Surrogate Losses Lecturer Sanmi Koyejo Scribe Yiming Gao yimingg2 Sept 28th 2017 Outlines Boosting Recap Bagging ERM 1 Recap 1 1 Error of a random classifier When h xi 1 1 w p 0 5 we have error rate otherwise n 1X Err h 1 1 yi h xi n i 1 and E Error h 0 5 1 2 Double CV In the loop CV Figure 1 Double CV In the loop CV 1 2 10 Bagging Random Forests Surrogate Losses 1 3 A Boosting Example Recall that the goal is f x K X m m x m 1 where m is the weight at step m and m x is the weak classifier Figure 2 Step1 Figure 3 Step2 The process goes similarly At every step we fit a weighted loss function n Lm n 1X wi m l yi xi n wi m 0 i 1 When our classifier cannot use weighted loss i e can only do L n n X l yi xi i 1 we could let the probability of selecting example i wi Sampling replacement 10 Bagging Random Forests Surrogate Losses 3 2 Multi class Boosting Adaboost SAMME Stage wise Additive Modeling using Multi class Exponential loss x 1 2 C Same steps as binary boosting Start with wn 1 n for m 1 K Fit m x using weights wm P P Errorm ni 1 wi 1 yi 6 m xi ni 1 wi m m log 1 Error Errorm log C 1 wi wi exp m 1 yi 6 m xi Normalize wi Return f x K X m m x i 1 Predict using majority weighted vote Boosting is an ensemble learner i e combination of multiple sub learners 3 Bagging Bootstrap Aggregating 3 1 Bootstrap Sampling Given Dn create a bootstrap sample by sampling with replacement xi yi ni 1 uniformly from Dn D 1 D 2 D k Aggregate step For j 1 b Bootstrap sample D m Dn Learn a model m D m f x K X i 1 It works for reducing variances m x 4 10 Bagging Random Forests Surrogate Losses 3 2 Jensen s Inequality For a convex function we have E X E X Figure 4 Jensen s Inequality Jensen s inequality generalizes the statement that a secant line of a convex function lies above the graph For example V ar Avg Avg V ar Bagging works best if classifiers are unbiased works best if D m are approximately uncorrelated unlike boosting no need for weak classifiers Weak classifiers have high bias and high variance which contradicts to our goal of bagging to reduce variance 3 3 Bagging for Decision Trees Random Forests One way to reduce the variance of an estimate is to average together many estimates For example we can train M different trees on different subsets of the data chosen randomly with replacement and then compute the ensemble

View Full Document