# ILLINOIS CS 446 - 100317.3 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 100317.3

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 100317.3

0 0 47 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 11 SVMs Kernel methods Lecturer Sanmi Koyejo Scribe Yiting Pan Oct 3rd 2017 Recap Output Representation When we are making prediction we have ways to represent our targets that we are trying to predict In regression the output space is y R In binary classification the idea is to tell apart two choices objects and the standard representation is 1 1 or 0 1 The mapping from 0 1 to 1 1 is 2y 1 and reversely is y2 21 In multiclass classification we want to tell apart c choices so the representation is 1 2 3 c c Which is equivalent to 0 1 2 c 1 c In one hot encoding we use an example to demonstrate the representation c 4 1 1 1 0 0 0 2 2 0 1 0 0 3 3 0 0 1 0 4 4 0 0 0 1 For most cases instead of c we could bring 0 1 c and either way to get the same answer More generally 0 0 0 1 0 0 1 5 Pc i 1 yi 1 We can represent 2 11 SVMs Kernel methods Where 1 is in the k t h position and others are 0 Or we can write this algebracally 0 1 c 4c 1 Where the triangle is a way to write simplex which is a set of vectors sum to 1 Following is further explanation x 4c 1 c x R and X xi 1 xi 0 i 1 Multilabel classification problem Pick a subset from c choices Suppose we predict k things out of c yi 0 1 0 1 1 2 4 5 The 0 and 1 above represents if item i is in the set and we want to predict the 2 4 and 5th items associated with this example Or we can write as y c 1 2 3 c Where y can pick any subset of entries from c The following is the mathematical definition for y is subset of 1 through c y powerset of c For binary representation 0 1 c 2c Alternative is multiclass classification number of classes 2c Because exponential grows super quickly 2c can get very big If for example we are dealing with the document labeling problem the number of potential topics documents can easily be a thousand and 21 000 is infeasible to solve in a large multiclass classification problem But it is feasible to solve in the binary case that we only need to solve a thousand binary classification problems Bagging bootstrap aggregating Resample with replacement Train a model with each sample 11 SVMs Kernel methods 3 Predict using average over models Key Ideas Averaging reduces variances vs single predictor Minimal effect on bias Tends to reduce overall error Example of ensemble algorithm Bias variance using Darts In this example there are a bunch of different players trying to throw the darts to hit the bull s eye Each player

View Full Document