# ILLINOIS CS 446 - 103117.2 (6 pages)

Previewing pages*1, 2*of 6 page document

**View the full content.**## 103117.2

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 103117.2

0 0 48 views

- Pages:
- 6
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview: **

CS446 Machine Learning Fall 2017 Lecture 18 Convolutional NN s Project discussion Lecturer Sanmi Koyejo Scribe Dhruv Agarwal Oct 31st 2017 Announcements Homework 3 is due on Wednesday 5pm Microsoft Azure tutorial Wednesday 6 30pm Siebel center 1109 Project page is live Final project submission is due on 19th Decmeber Literature review submission is also due on the final day 19th Decmeber Recap Feed Forward Neural Networks Multi Layer Perceptron Figure 1 Feed forward neural Network The feedforward neural can have as many hidden layers as desired During training we add a loss function after the output layer and during prediction it can be seen as function f x from the last layer and back Also the size of the network is defined as the number of hidden layers plus the output layer There are multiple ways to visualize a neural network In compositional form the network can be written as f x u Wl g Wl 1 g g w x 1 2 18 Convolutional NN s Project discussion where w weight vector x input vector which also includes the bias term g non linear activation function u output layer non linear function which is optional Another way to look at Neural networks is in terms of feature extraction f x u wl x where x g wl 1 g g w x x can be considered as a learned feature representation A practical approach to training neural networks is to take this learned feature representation and learn the last last layer again for a new problem Also a mapping between the activation function the loss function and the kind of classification problem can be defined as follows u l Non Linear Function loss function sigmoid Soft Max linear sigmoid linerar log loss Cross Entropy Hinge loss squared loss squared loss Problem type Binary Classfication Multi Class Classification Binary Classification Multiclass Multilabel 0 1 regression linear regression Following the above mapping a general approach to classifications problems could be adopted Depending upon the the type of the problem we can decide upon a loss function that is easy to optimize surrogate loss functions For example for binary classification we may care about accuracy or error rate so we would choose the appropriate loss function and then map our input features to that loss function Regularization Regularization helps to control the complexity of the network to allow us to fit a better predictive model Different apporaches to regularization include L1 L2 L dropout which were discussed in the previous lecture 18 Convolutional NN s Project discussion 3 Optimization Different approaches to optimize the weights of a neural network discussed in the previous lecture were as follows Stochastic Gradient Descent Minibatch SGD SGD variants such as RMS prop etc An important note about SGD is that it does not find the most optimal model i e it does not find the global optimum and instead converges to a local optimum solution One way to look at it is that multi layer

View Full Document