# ILLINOIS CS 446 - 110217.2 (7 pages)

Previewing pages*1, 2*of 7 page document

**View the full content.**## 110217.2

Previewing pages
*1, 2*
of
actual document.

**View the full content.**View Full Document

## 110217.2

0 0 55 views

- Pages:
- 7
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture 19 Recurrent Neural Networks LSTM Lecturer Sanmi Koyejo Scribe Ren Wu Nov 11th 2017 Recap CNN Figure 1 CNN Basic idea Scan the image in both directions there will be multiple new images that corresponding to each kernel Size Calculations Start with image size W1 H1 D1 Other parameters P zeros padding S stride K number of filters We can get new images size W1 F 2P 1 2 W2 F 2P H2 1 2 D2 K W2 1 2 19 Recurrent Neural Networks LSTM Tuning hyper parameters in CNN Small network and small data set Standard hyper parameters search Cross validation Bayesian Optimization Larger network days or weeks for training Start with the polished Architecture tweak the parameters Architecture of CNN Figure 2 CNN Architecture Inside the convolution layer there are convolution process non linearity and Pooling We already talked about the first two process We now talk about pooling Pooling Main idea Capture some location or scale invariance The process is very similar to Convolution Except for instead of multipling some weights we have other operations Max pooling max values of blocks Average pooling average values of blocks 19 Recurrent Neural Networks LSTM 3 parameters pooling size stride usually use the pooling size for non overlapping zero padding It is essentially the simple and fixed version of convolution The operation will not change during the training Figure 3 Pooling 1 D Example Figure 4 1D example We are looking for the square pattern What we need to do is move the filter and convolution every positions 1 0 X w 0 For prediction we use max z For more complicated pattern we need more complicated structure 4 19 Recurrent Neural Networks LSTM Output layer We usually use the fully connected layer We reshape the image into a vector and do a feed forward neural network One can use more hidden layers in before output Figure 5 output layer Tips for computer vision Invariance to rotation Data augmentation create new data with random rotation same size for any invariance Size invariance random rescaling patches different predictions for different patches to make better prediction Recurrent Neural Network Example We want a character level letters om alphabet of sequence prediction As shown on figure 6 the inputs xi and outputs yi are all one hot vectors of characters Figure 6 RNN 19 Recurrent Neural Networks LSTM 5 Question How about word level sequences Answer Word embedded layer word vector Calculation of RNN y 2 g V 2 T Z 2 a 2 z 2 g U 2 X 2 W 2 T Z 1 b 2 y t g V t T Z t a t Where z m incorporate the information of z m 1 and X 2 Comments We can have weight sharing V t V t U t U t W t W t allow you to model arbitrary sequence length Calculation within the box Figure 7 Calculation within box When it unroll through time it will become figure 6 6 19 Recurrent Neural Networks LSTM Backpropagation Character level model Figure 8 Back Propagation of RNN We have weight sharing so the W is going to be updated by all the nodes Take node 3 as an example cost y 3 cost y 3 y 3 z 3 g a w y 3 z 3 g a w a UX 3 WT Z 2 b However Z 2 is also depends on W One can show that Lt g W t t W a When W 1 Lt W 0 very quick which is called vanishing gradient When W 1 Lt W quickly which is called exploding gradient Bibliography 7

View Full Document