DOC PREVIEW
ILLINOIS CS 446 - 103117.2

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture 18 : Convolutional NN’s, Project discussionLecturer: Sanmi Koyejo Scribe: Dhruv Agarwal, Oct. 31st, 2017Announcements• Homework 3 is due on Wednesday 5pm.• Microsoft Azure tutorial - Wednesday 6:30pm Siebel center 1109• Project page is live. Final project submission is due on 19th Decmeber.• Literature review submission is also due on the final day, 19th DecmeberRecapFeed Forward Neural Networks (Multi Layer Perceptron)Figure 1: Feed forward neural NetworkThe feedforward neural can have as many hidden layers as desired. During training we adda loss function after the output layer and during prediction it can be seen as function f (x)from the last layer and back. Also, the size of the network is defined as the number of hiddenlayers plus the output layer.There are multiple ways to visualize a neural network. In compositional form, the networkcan be written as :f(x) = u(Wl· g(Wl−1· g(....g(w, x))))12 18 : Convolutional NN’s, Project discussionwhere,w = weight vector,x = input vector which also includes the bias term,g = non linear activation function,u = output layer non linear function, which is optionalAnother way to look at Neural networks, is in terms of feature extraction,f(x) = u(wlφ(x))where,φ(x) = g(wl−1(g(....g(w, x))))φ(x) can be considered as a learned feature representation. A practical approach to trainingneural networks is to take this learned feature representation and learn the last last layeragain for a new problem.Also, a mapping between the activation function, the loss function and the kind of classifica-tion problem can be defined as follows ,u l Problem type(Non Linear Function) (loss function)sigmoid log loss Binary ClassficationSoft Max Cross Entropy Multi Class Classificationlinear Hinge loss Binary Classificationsigmoid squared loss Multiclass,Multilabel, [0,1] regressionlinerar squared loss linear regressionFollowing the above mapping a general approach to classifications problems could be adopted.Depending upon the the type of the problem we can decide upon a loss function that is easyto optimize (surrogate loss functions). For example, for binary classification we may careabout accuracy or error rate so we would choose the appropriate loss function and then mapour input features to that loss function.RegularizationRegularization helps to control the complexity of the network to allow us to fit a betterpredictive model. Different apporaches to regularization includeL1, L2, L∞,dropout whichwere discussed in the previous lecture.18 : Convolutional NN’s, Project discussion 3OptimizationDifferent approaches to optimize the weights of a neural network discussed in the previouslecture were as follows,• Stochastic Gradient Descent• Minibatch SGD• SGD variants such as RMS prop etc.An important note about SGD is that it does not find the most optimal model i.e. it doesnot find the global optimum and instead converges to a local optimum solution. One way tolook at it is that multi layer perceptron is not a convex function. For example suppose thata neural network is given as follows,f(x) = u(w2· g(w1· g(w0x)))then even if the activation functions are linear still the resulting function is not convex,since it becomes a product of the matrices. In general, neural networks can be seen as acomposition of functions and even if the inner functions are convex the resulting function isconvex only under very strong conditions.In this class we should focus on evaluation i.e. how our prediction function performs on datanot seen, and finding the local minima has been proven to be good enough for practice.ConvolutionSuppose we are given a 1 dimensional input space and and the prediction is 1 if the inputcontains a specific pattern otherwise prediction should be 0,then for the following set ofinputs as shown in figure below, the predictive model can be written such that the weightFigure 2: 1-d input space examples4 18 : Convolutional NN’s, Project discussionFigure 3: Prediction Modelmatrix is a matrix of the input patttern vectors shifted by 1 position each, as shown above.Hence the prediction becomes that if any dimension in the output function is 1 then predict1 otherwise predict 0. In this case the prediction is invariant to location and the memoryrequired to store the model is propotional O(d2).Another more effiecient approach to this problem which is equivalent to the above approachbut requires a lot less storage for the model could be,• Fix a single weight w, of size 2.• Scan the weight filter over the entire input space, for each input example.The above approach is also known asconvolutionand memory required to store the modelis propotional to O(2).Convolution shows up quite often in Computer vision applications where the object to bedetected can be anywhere in the image and we take an object template and scan it overthe entire image, which acts as a good object detector.This approach is also known as filterapproach.If the application requires detection of multiple pattterns then we use multiple filters(singlelayer) or if the pattern is compositional i.e. is a combination of multiple parts, then we usemultiple layers of filters known as a Convolution Neural network.CNN StructureThe general convolution neural network structure is shown in Figure 4 where each convolutionlayer can be expanded as in Figure 5,18 : Convolutional NN’s, Project discussion 5Figure 4: CNN NetworkFigure 5: Expanded view of a single convolution layerA general formula for the output dimensions when filter is applied to an image is given asfollows,Figure 6: Filter Application to an imageW2=W1− F + 2PS+ 1H2=H1− F + 2PS+ 1D2= Kwhere,K = number of filtersF = size of filterS = Stride/ step sizeP = zero padding6 18 : Convolutional NN’s, Project discussionProject DiscussionThe goal of the project is to predict a set of labels associated with each FMRI image.Student’sperformance will be evaluated via two kaggle competitions against certain baselines whichwill be decided soon. The project submission is due on the final day, 19th December butcan be completed earlier and


View Full Document

ILLINOIS CS 446 - 103117.2

Download 103117.2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 103117.2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 103117.2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?