ILLINOIS CS 446 - 103117.1 - D3426160

Home> Schools> University of Illinois - urbana> (CS) > CS 446> 103117.1

DOC PREVIEW

ILLINOIS CS 446 - 103117.1

School name University of Illinois - urbana

Course Cs 446- Machine Learning

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture 18 : Convolutional Neuron Net WorkLecturer: Sanmi Koyejo Scribe: Scribe Yayi Ning, Oct. 31th, 2017IntroductionThis lecture will include a recap on Feed Forward Neural Network and introduce tothe topic of Convolutional Neuron Net Work.RecapNeural NetworkA typical two hidden layer Feed Forward Neural Net (also called Multilayer Percep-trons):McCullock (2012).12 18 : Convolutional Neuron Net WorkThe number of layers = number of hidden layers + 1 output layerThe final prediction function:f(x) = u(Wlg(Wl−1g(...(g(WT1X))...))) (1)Where, g is the activation function and u is the outout layer function.There is a bias termbiat every layer. Adding bias term can be very useful inimplementation.Loss function:L =nXil(yi, f(xi)) (2)An overview of Neuron Network construction:u (output nonlinearity) l (loss function) Problem TypeSigmoid Function Log Loss Binary classificationSoftmax Function Cross entropy MulticlassLinear Function Hinge Loss Binary classificationSigmoid Function Square Loss Multiclass, Multilabel, [0,1]regressionLinear Function Square Loss Linear Regression ∈ RkFeature mappingFor simplicity, consider nonlinearity f as linear function:f = u(WTlφ(x)) (3)φ(x) = g(Wl−1g(...(g(WT1X))...) (4)Where φ is feature representation function that map nonlinearity into lineariity.RegularizationL1 Regularization:w∗ = argminwnXil(yi− f(W, xi)) + λkW k (5)L2 Regularization:w∗ = argminwnXil(yi− f(W, xi)) + λkW k2(6)18 : Convolutional Neuron Net Work 3OptimizationStochastic Gradient Descent (SGD):Q: Does SGD find the optimal model (the model actually minimize the loss function)?ˆw = minwnXil(yi, f(xi)) (7)Where f is Neuron Network.Answer:In general, even with linear activation function, Neuron Network is not convexand does not depends on loss function or activation function.For example:f(x) = u(wkg(w1g(wT0x)))w = w2· w1· w0f(x) is not convex. Exercise: show that f (w1, w2) = w1· w2is not convex.The reason behind this: the composition of two convex function is not necessary convex.If we haveaandbbe two convex function, in order to makea(b(x)) convex, additionalproperties are needed such as one of the function is monotone increasing.This nonconvex property implies SGD gives us local optimal. In fact, local optimalis good enough for final prediction. We are still trying to understand the underlyingmechanics of what make this work in practice. However, evaluation is most importantwhen we construct our model. We need to have a good sense of how accurate ourmodel predicts.Convolutional Neural Networks (CNN)IntroductionWe had already seen classification models. However, what should we do when we wantto predict certain pattern (like a graph).For example, if our data is like in this figure:4 18 : Convolutional Neuron Net WorkAssume we know the pattern we looking for is a rectangle shape. Then how could usbuild the model?Possible solution 1:We can build the weight vector by listing rectangles at all possible location:18 : Convolutional Neuron Net Work 5The prediction function, h(x) = max(z):h(x) = 1 , when max(z)> thresholdh(x) = 0 , otherwiseAn alternative solution:We can instead just have one weight and shift the filter around in data vector(show in the figure below).This approach is spatial efficientQ: What if we do not know W ahead of time?A: We can use back propagation to learn W.Q: What if we have multiple patterns to search for?A: We can use multiple filters.Q: What if patterns are in composition (combination of parts)A: We can use multiple layers.This is the basic idea of convolution.Overview of Convolutional Neural NetworkConvolutional Neural Networks is constructed similar with Neural Networks. However,it is composed by convolution layers. Convolutional Neural Networks is designedspecifically for images predictions. Inspired by how human brain recognize image,Convolutional Neural Networks process image into smaller pieces and evaluate byfilters. Convolutional Neural Networks is, in fact, easier to train than ordinary NeuralNetwork, since it has few parameters than ordinary Neural Network.6 18 : Convolutional Neuron Net WorkThe right side of the figure (below) shows a ordinary 3 layers Neural Network. Theright side of the figure displays a three Convolutional Neural Networks. In (right)Convolutional Neural Networks, instead of have 2D hidden layers, Convolutional NeuralNetworks actually have two volumes of hidden convolutional layers.figure sourceThe figure below shows a short cut of Convolutional Neural Network procedure.pacocp/github (2016)Usually our input image is 3D. If we have an image that is (32×32)×3, then theimage length and width are 32 by 32, and 3 is three RGB color layers. Assume wewant to construct 10 filters (for example, recognize number from 0 to 9). Then ourfirst convolutional layer will have dimension 32×32×10. Pooling will shrink thedimensions. For example shrink it to 16×16×10. Then full connected layes willcompute the class scores, in this case how likely it will be [0, ..., 9].Let:W = Input volume sizeF = Filter sizeS = StrideP = Number of zero paddings18 : Convolutional Neuron Net Work 7The formular of how well is our spatial arrangement of hyperparameters:W2=W − F + 2PS+ 1 (8)IfW2turn out to not be an integer, then our hyperparameters probably will not giveus a good fit.Next class: will continue on Convolutional Neural Network.8 18 : Convolutional Neuron Net WorkBibliographyMcCullock, J. (2012). Introduction: The xor problem .pacocp/github (2016). Convolutional neural network

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

ILLINOIS CS 446 - 103117.1

Sign up for free to view:

Please select your school