# ILLINOIS CS 446 - 092817.2 (9 pages)

Previewing pages*1, 2, 3*of 9 page document

**View the full content.**## 092817.2

Previewing pages
*1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 092817.2

0 0 58 views

- Pages:
- 9
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture 10 Bagging Random Forests Surrogate Losses Lecturer Sanmi Koyejo Scribe Yiming Gao yimingg2 Sept 28th 2017 Outlines Boosting Recap Bagging ERM 1 Recap 1 1 Error of a random classifier When h xi 1 1 w p 0 5 we have error rate otherwise n 1X Err h 1 1 yi h xi n i 1 and E Error h 0 5 1 2 Double CV In the loop CV Figure 1 Double CV In the loop CV 1 2 10 Bagging Random Forests Surrogate Losses 1 3 A Boosting Example Recall that the goal is f x K X m m x m 1 where m is the weight at step m and m x is the weak classifier Figure 2 Step1 Figure 3 Step2 The process goes similarly At every step we fit a weighted loss function n Lm n 1X wi m l yi xi n wi m 0 i 1 When our classifier cannot use weighted loss i e can only do L n n X l yi xi i 1 we could let the probability of selecting example i wi Sampling replacement 10 Bagging Random Forests Surrogate Losses 3 2 Multi class Boosting Adaboost SAMME Stage wise Additive Modeling using Multi class Exponential loss x 1 2 C Same steps as binary boosting Start with wn 1 n for m 1 K Fit m x using weights wm P P Errorm ni 1 wi 1 yi 6 m xi ni 1 wi m m log 1 Error Errorm log C 1 wi wi exp m 1 yi 6 m xi Normalize wi Return f x K X m m x i 1 Predict using majority weighted vote Boosting is an ensemble learner i e combination of multiple sub learners 3 Bagging Bootstrap Aggregating 3 1 Bootstrap Sampling Given Dn create a bootstrap sample by sampling with replacement xi yi ni 1 uniformly from Dn D 1 D 2 D k Aggregate step For j 1 b Bootstrap sample D m Dn Learn a model m D m f x K X i 1 It works for reducing variances m x 4 10 Bagging Random Forests Surrogate Losses 3 2 Jensen s Inequality For a convex function we have E X E X Figure 4 Jensen s Inequality Jensen s inequality generalizes the statement that a secant line of a convex function lies above the graph For example V ar Avg Avg V ar Bagging works best if classifiers are unbiased works best if D m are approximately uncorrelated unlike boosting no need for weak classifiers Weak classifiers have high bias and high variance which contradicts to our goal of bagging to reduce variance 3 3 Bagging for Decision Trees Random Forests One way to reduce the variance of an estimate is to average together many estimates For example we can train M different trees on different subsets of the data chosen randomly with replacement and then compute the ensemble f x M X 1 fm x M m 1 10 Bagging Random Forests Surrogate Losses 5 where fm is the m th tree This technique is called bagging In practice N subsamples subsample dimensions Figure 5 Bagging for Decision Trees 4 Surrogate Loss Functions Error rate 1 n Pn i 1 1 yi 6 hi Suppose a binary case See Figure 6 h x sign f x n 1X Error Rate f 1 yi f xi 0 n i 1 which means yf 0 yf 0 correct incorrect Surrogate Another loss function x R has the property that n minf 1X n l yi f xi minh E 1 yi h xi 0 n h sign f i 1 Examples log loss log 1 e yf x See Figure 7 exp loss exp yhi 6 10 Bagging Random Forests Surrogate Losses Figure 6 0 1 Loss Figure 7 log loss 5 Algorithms Classification so far Heuristic Decision Tree Nearest Neighbors Probabilistic Decision Tree Murphy 2012 Chapter 16 4 Nearest Neighbors Murphy 2012 Chapter 1 4 2 Logistic Regression Linear Regression 10 Bagging Random Forests Surrogate Losses Naive Bayes Boosting Murphy 2012 Chapter 16 9 9 ERM Empirical Risk Minimization Logistic Regression Linear Regression Boosting 7 8 10 Bagging Random Forests Surrogate Losses Bibliography Murphy K P 2012 Machine Learning A Probabilistic Perspective 9

View Full Document