# ILLINOIS CS 446 - 100317.2 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 100317.2

Previewing pages *1, 2*
of
actual document.

**View the full content.**View Full Document

## 100317.2

0 0 42 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 10 SVMs Kernel methods Lecturer Sanmi Koyejo Scribe Xingyu Xiang Oct 3th 2017 Agenda Recap Kernels Support Vector Machines SVM Output Presentation Regression y R Binary Classification tell apart two choices 1 y 1 mapping 2 1 1 to 0 1 and you can use 2y 1 mapping 0 1 to 1 1 both 1 1 and 0 1 are valid representation You can use Multiclass classification tell apart c choices 1 2 3 c c where 1 2 3 c is equivalent to 0 1 2 c 1 One hot encoding representation e g c 4 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 Pc c 0 1 c and i 1 yi 1 You can think about it in a more general way ei 0 0 0 1 0 0 where 1 only 1 2 10 SVMs Kernel methods exits in the i th position Or you can write it as mathmatical notation 0 1 c 4c 1 where x 4c 1 x R and P xi 0 xi 0 Multilabel Classification Problem Multilabel classification problem Pick a subset from c choices In a binary representtaion setting yi 0 1 0 1 1 2 4 5 where 0 1 0 1 1 represent if item i is in the set y c 1 2 3 4 c In a mathmatical way we write it as y powerset of c Binary Representation 0 1 c 2c Alternative in multiclass classification number of classes 2c 2c can be very big Recap Bagging also called Bootstrap aggregating Resample with replacement Train a model with each sample Prediction using average over the models Key Ideas Averaging reduces variance vs single predictor Minimal effect on bias Tends to reduce overall error This is an example we called ensemble algorithm Bias Variances using Darts For the upper left dart in Figure 1 we could observe High bias Average prediction is far from the balls eye Low variace 10 SVMs Kernel methods 3 Figure 1 Illustrate bias variance using darts dar For the bottom right dart in Figure 1 we could observe Low bias Average prediction is closer to balls eye High variace In class we are doing bais variance for risk functions In homework we are doing bais variance for parameters Surrogate Loss Functions Alternative loss function that achieves the same risk with large samples as optimizing the true loss risk function 0 1 Loss h sign f 1 yi f xi 0 Log Loss log 1 e yfi Exp Loss e yfi Figure 2 Illustration of various loss functions for binary classification The horizontal axis is the margin y the vertical axis is the loss The log loss uses log base 2 Figure generated by hingeLossPlot Murphy 2012 From Logistic Regression to Logloss 4 10 SVMs Kernel methods Logistic Regression p y x f x 1 y 1 1 f x 1 y 1 where

View Full Document