# ILLINOIS CS 446 - 103117.2 (6 pages)

Previewing pages*1, 2*of 6 page document

**View the full content.**## 103117.2

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 103117.2

0 0 54 views

- Pages:
- 6
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture 18 Convolutional NN s Project discussion Lecturer Sanmi Koyejo Scribe Dhruv Agarwal Oct 31st 2017 Announcements Homework 3 is due on Wednesday 5pm Microsoft Azure tutorial Wednesday 6 30pm Siebel center 1109 Project page is live Final project submission is due on 19th Decmeber Literature review submission is also due on the final day 19th Decmeber Recap Feed Forward Neural Networks Multi Layer Perceptron Figure 1 Feed forward neural Network The feedforward neural can have as many hidden layers as desired During training we add a loss function after the output layer and during prediction it can be seen as function f x from the last layer and back Also the size of the network is defined as the number of hidden layers plus the output layer There are multiple ways to visualize a neural network In compositional form the network can be written as f x u Wl g Wl 1 g g w x 1 2 18 Convolutional NN s Project discussion where w weight vector x input vector which also includes the bias term g non linear activation function u output layer non linear function which is optional Another way to look at Neural networks is in terms of feature extraction f x u wl x where x g wl 1 g g w x x can be considered as a learned feature representation A practical approach to training neural networks is to take this learned feature representation and learn the last last layer again for a new problem Also a mapping between the activation function the loss function and the kind of classification problem can be defined as follows u l Non Linear Function loss function sigmoid Soft Max linear sigmoid linerar log loss Cross Entropy Hinge loss squared loss squared loss Problem type Binary Classfication Multi Class Classification Binary Classification Multiclass Multilabel 0 1 regression linear regression Following the above mapping a general approach to classifications problems could be adopted Depending upon the the type of the problem we can decide upon a loss function that is easy to optimize surrogate loss functions For example for binary classification we may care about accuracy or error rate so we would choose the appropriate loss function and then map our input features to that loss function Regularization Regularization helps to control the complexity of the network to allow us to fit a better predictive model Different apporaches to regularization include L1 L2 L dropout which were discussed in the previous lecture 18 Convolutional NN s Project discussion 3 Optimization Different approaches to optimize the weights of a neural network discussed in the previous lecture were as follows Stochastic Gradient Descent Minibatch SGD SGD variants such as RMS prop etc An important note about SGD is that it does not find the most optimal model i e it does not find the global optimum and instead converges to a local optimum solution One way to look at it is that multi layer perceptron is not a convex function For example suppose that a neural network is given as follows f x u w2 g w1 g w0 x then even if the activation functions are linear still the resulting function is not convex since it becomes a product of the matrices In general neural networks can be seen as a composition of functions and even if the inner functions are convex the resulting function is convex only under very strong conditions In this class we should focus on evaluation i e how our prediction function performs on data not seen and finding the local minima has been proven to be good enough for practice Convolution Suppose we are given a 1 dimensional input space and and the prediction is 1 if the input contains a specific pattern otherwise prediction should be 0 then for the following set of inputs as shown in figure below the predictive model can be written such that the weight Figure 2 1 d input space examples 4 18 Convolutional NN s Project discussion Figure 3 Prediction Model matrix is a matrix of the input patttern vectors shifted by 1 position each as shown above Hence the prediction becomes that if any dimension in the output function is 1 then predict 1 otherwise predict 0 In this case the prediction is invariant to location and the memory required to store the model is propotional O d2 Another more effiecient approach to this problem which is equivalent to the above approach but requires a lot less storage for the model could be Fix a single weight w of size 2 Scan the weight filter over the entire input space for each input example The above approach is also known as convolution and memory required to store the model is propotional to O 2 Convolution shows up quite often in Computer vision applications where the object to be detected can be anywhere in the image and we take an object template and scan it over the entire image which acts as a good object detector This approach is also known as filter approach If the application requires detection of multiple pattterns then we use multiple filters single layer or if the pattern is compositional i e is a combination of multiple parts then we use multiple layers of filters known as a Convolution Neural network CNN Structure The general convolution neural network structure is shown in Figure 4 where each convolution layer can be expanded as in Figure 5 18 Convolutional NN s Project discussion 5 Figure 4 CNN Network Figure 5 Expanded view of a single convolution layer A general formula for the output dimensions when filter is applied to an image is given as follows Figure 6 Filter Application to an image W1 F 2P 1 S H1 F 2P H2 1 S W2 D2 K where K number of filters F size of filter S Stride step size P zero padding 6 18 Convolutional NN s Project discussion Project Discussion The goal of the project is to predict a set of labels associated with each FMRI image Student s performance will be evaluated via two kaggle competitions against certain baselines which will be decided soon The project submission is due on the final day 19th December but can be completed earlier and submitted

View Full Document