# ILLINOIS CS 446 - 092617.1 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 092617.1

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 092617.1

0 0 57 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 9 Adaboost Lecturer Sanmi Koyejo Scribe Chaoyun Chen Sep 26th 2017 Recap 1 Decision Tree Idea of Decision Tree Construct learners by using axis align split in regions How to build a tree The function we used to grow a tree based on every node j t argminj 1 d mint j cost xi yi xij t cost xi yi xij t where j and t are dimensions and thresholds j is a set of all thresholds for dimension j We will check and minimizing the cost of left split and right split examples of cost functions Entropy Gini index Error rate How to make prediction For binary case we can apply majority vote and get regions Rm as shown below 1 2 9 Adaboost Figure 1 Decision Tree Region Split For multi class we can still apply majority vote and determine the region by the most popular class For regression we can use square loss How to optimize a decision tree Decision Tree tends to have low bias and high variance similar to nearest neighbor classifier Some approaches could be applied to improve these properties Pruning get a smaller tree by removing leaves from the tree Figure 2 Decision Tree Pruning Why This approach could reduce over fitting increase bias by merging regions together Testing and validation error will measure the degree of over fitting that will help us determine whether an edge removal will help our prediction How Choosing from evaluation metric Because we care accuracy and error rate for the overall model we could check them on the merging or separating validation set Randomized averaging bagging Use random forest to reduce bias and variance will cover this in next few weeks 9 Adaboost 3 2 Random classifier Let h be a random classifier such as that h x 1 1 P 0 5 P 0 5 The error rate is 0 5 we can prove it as below n Errn hi 1X 1 yi 6 h xi n i 1 1 ACCn hi n 1X 1 1 yi 1 1 h xi 1 1 yi 1 1 h x 1 n i 1 1 0 5 0 5 1 0 5 where 1 n Pn i 1 1 yi 1 3 Boosting Idea of Boosting Learn a strong classifier by stage wise combining weak classifier The keywords here are stage wise combining and weak classifier Weak Classifier we usual mean its error rate less than and close to 0 5 which is the error rate of random classifier Examples of weak classifier Decision Stump which is a decision tree with only one level This is a very simple classifier and make slightly better prediction than random classifier 4 9 Adaboost Figure 3 Example of Decision Stump Stage wise The idea of accumulate classifiers one by one with weight is given by f x k X m m x m 1 For the equation above and are the weight and weak classifier

View Full Document