# ILLINOIS CS 446 - 103117.1 (9 pages)

Previewing pages*1, 2, 3*of 9 page document

**View the full content.**## 103117.1

Previewing pages
*1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 103117.1

0 0 55 views

- Pages:
- 9
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 18 Convolutional Neuron Net Work Lecturer Sanmi Koyejo Scribe Scribe Yayi Ning Oct 31th 2017 Introduction This lecture will include a recap on Feed Forward Neural Network and introduce to the topic of Convolutional Neuron Net Work Recap Neural Network A typical two hidden layer Feed Forward Neural Net also called Multilayer Perceptrons McCullock 2012 1 2 18 Convolutional Neuron Net Work The number of layers number of hidden layers 1 output layer The final prediction function f x u Wl g Wl 1 g g W1T X 1 Where g is the activation function and u is the outout layer function There is a bias term bi at every layer Adding bias term can be very useful in implementation Loss function n X L l yi f xi 2 i An overview of Neuron Network construction u output nonlinearity Sigmoid Function Softmax Function Linear Function Sigmoid Function Linear Function l loss function Log Loss Cross entropy Hinge Loss Square Loss Square Loss Problem Type Binary classification Multiclass Binary classification Multiclass Multilabel 0 1 regression Linear Regression Rk Feature mapping For simplicity consider nonlinearity f as linear function f u WlT x x g Wl 1 g g W1T X 3 4 Where is feature representation function that map nonlinearity into lineariity Regularization L1 Regularization w argminw n X l yi f W xi kW k 5 l yi f W xi kW k2 6 i L2 Regularization w argminw n X i 18 Convolutional Neuron Net Work 3 Optimization Stochastic Gradient Descent SGD Q Does SGD find the optimal model the model actually minimize the loss function w minw n X l yi f xi 7 i Where f is Neuron Network Answer In general even with linear activation function Neuron Network is not convex and does not depends on loss function or activation function For example f x u wk g w1 g w0T x w w2 w1 w0 f x is not convex Exercise show that f w1 w2 w1 w2 is not convex The reason behind this the composition of two convex function is not necessary convex If we have a and b be two convex function in order to make a b x convex additional properties are needed such as one of the function is monotone increasing This nonconvex property implies SGD gives us local optimal In fact local optimal is good enough for final prediction We are still trying to understand the underlying mechanics of what make this work in practice However evaluation is most important when we construct our model We need to have a good sense of how accurate our model predicts Convolutional Neural Networks CNN Introduction We had already seen classification models However what should we do when we want to predict certain pattern like a graph For example if our data is like in this figure 4 18 Convolutional Neuron Net Work Assume we know the pattern we looking for is a rectangle shape

View Full Document