# ILLINOIS CS 446 - 110217.3 (8 pages)

Previewing pages*1, 2, 3*of 8 page document

**View the full content.**## 110217.3

Previewing pages
*1, 2, 3*
of
actual document.

**View the full content.**View Full Document

## 110217.3

0 0 47 views

- Pages:
- 8
- School:
- University of Illinois - urbana
- Course:
- Cs 446 - Machine Learning

**Unformatted text preview:**

CS446 Machine Learning Fall 2017 Lecture Lecture 19 CNN and Recurrent Neural Network Lecturer Sanmi Koyejo Scribe Zhiyu Zhu November 2nd 2017 Announcement Midterm Two is on next Thursday There will be a review section on next Tuesday Project is now out and Due on December 19th Topics Today Recap Convolutional Neural Network Pooling Output Layer Demo New topics Recurrent Neural Network 1 not covered in Midterm Two Introduction to Recurrent Neural Network Next Tuesday Recurrent Neural Network 2 Long Short Term Memory LSTM Unsupervised Learning Midterm Review 1 2 Lecture 19 CNN and Recurrent Neural Network Convolutional Neural Network Input Image This is a model of neural network with multiple filters If we use P for zero padding H for the stride size W as the width H as the height D as the depth and F as the filter We can have the Reset Size formula as W2 W1 F 2P 1 S H1 F 2P 1 S D2 K Number of Filters H2 Model Parameter Selecting Smaller Network Hyper parameter search Cross Validation Bayesian Optimization Larger Network That may take days to weeks to train Start with Published Architecture Tweak the Parameter Recall CNN Architecture Here is a graph for the CNN Architecture Lecture 19 CNN and Recurrent Neural Network 3 Pooling Main idea Capture some location or scale in variance Recall about stride and zero padding For the stride size 1 with 7 7 input and 5 5 output If we change stride size 2 with same input size of 7 7 and the output size becomes 3 3 If we change stride size 3 with same input size of 7 7 we notice that there is a problem with the spacing Then we can use the method called zero padding which add layers outside of the current layre Example of stride size 5 with 32 32 input and zero padding 2 4 Lecture 19 CNN and Recurrent Neural Network Max pooling is to take a filter and stride of the same size Max pooling Max of values in blocks Average pooling Average of values in blocks Parameters Pooling sizes Stride Zero padding Pooling sizes Examples Pattern Matching in 1 D to find a good filter For prediction use max Z Lecture 19 CNN and Recurrent Neural Network Output Layer Fully connected usually Tips Computer Vision In variance to Rotation Data Augmentation Create New Data with random rotation same trick for any in variant Size in variant random re scaling pattern Recurrent Neural Network on next page 5 6 Lecture 19 CNN and Recurrent Neural Network Recurrent Neural Network Predict Sequence General Concept In previous neural network all inputs and outputs are independent of each other but for Recurrent Neural Network we want to predict the sequence based on the sequences before it Recurrent Neural Network performs the same task for every element of a sequence with the output being depended on the previous computations Recurrent Neural Network have a memory which captures information about what has been calculated so far Example of Usage Character letter alphabet level sequence prediction Aside Is it possible to operate on word level sentence Yes we can use word embedding change words into vectors Lecture 19 CNN and Recurrent Neural Network 7 Character Level Model Generally the Recurrent Neural Network has a structure like this And the formula y 2 g V 2 T z 2 a 2 z 2 g U 2 T X 2 W 2 T z 1 b 2 y t g V t T z t a t z t g U t T X t W t T z t 1 b t Why training like this Find pattern Weight Sharing V t for all t U t for all t W t for all t Advantage of sharing weights Allow modeling arbitrary sequence length 8 Lecture 19 CNN and Recurrent Neural Network Model in box another representation If we unroll it over time Back propagation of Character Level Sequence Model Cost y 3 Cost y 3 y 3 z 3 g a 3 W g a W y 3 z a U T X 3 W T Z 3 b a depends on W You can show that Lt W W T g a When W 1 When W 1 T Lt W Lt W 0 it has a vanishing gradient it has a exploding gradient

View Full Document