DOC PREVIEW
ILLINOIS CS 446 - 110217.3

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture Lecture 19: CNN and Recurrent Neural NetworkLecturer: Sanmi Koyejo Scribe: Zhiyu Zhu, November.2nd, 2017Announcement• Midterm Two is on next Thursday.• There will be a review section on next Tuesday.• Project is now out and Due on December.19th.Topics Today• Recap• Convolutional Neural Network– Pooling– Output Layer– Demo• New topics: Recurrent Neural Network #1 (not covered in Midterm Two)– Introduction to Recurrent Neural Network• Next Tuesday:– Recurrent Neural Network #2 → Long Short Term Memory (LSTM)– Unsupervised Learning– Midterm Review12 Lecture 19: CNN and Recurrent Neural NetworkConvolutional Neural Network• Input ImageThis is a model of neural network with multiple filters.If we use P for zero padding, H for the stride size, W as the width, H as the height, Das the depth and F as the filter. We can have the Reset Size formula as:W2=W1− F − 2PS+ 1H2=H1− F − 2PS+ 1D2= K = Number of Filters• Model Parameter (Selecting)– Smaller Network:∗ Hyper parameter search∗ Cross Validation∗ Bayesian Optimization– Larger Network (That may take days to weeks to train):∗ Start with Published Architecture∗ Tweak the Parameter• Recall CNN Architecture Here is a graph for the CNN Architecture:Lecture 19: CNN and Recurrent Neural Network 3• Pooling– Main idea:Capture some location or scale in-variance.– Recall about stride and zero-padding:∗ For the stride size = 1 with 7×7 input and 5×5 output:∗If we change stride size = 2 with same input size of 7×7 and the output sizebecomes 3×3:∗If we change stride size = 3 with same input size of 7×7, we notice thatthere is a problem with the spacing. Then we can use the method called zeropadding, which add layers outside of the current layre.Example of stride size = 5 with 32×32 input and zero padding = 2:4 Lecture 19: CNN and Recurrent Neural Network– Max pooling is to take a filter and stride of the same size:∗ Max pooling: Max of values in blocks∗ Average pooling: Average of values in blocks– Parameters:∗ Pooling sizes∗ Stride∗ Zero padding∗ Pooling sizes– Examples: Pattern Matching in 1-D to find a good filter:For prediction: use max(Z)Lecture 19: CNN and Recurrent Neural Network 5• Output LayerFully connected (usually):• Tips (Computer Vision):– In-variance to Rotation:∗ Data Augmentation∗ Create New Data with random rotation (same trick for any in-variant)– Size in-variant (random re-scaling + pattern)(Recurrent Neural Network on next page)6 Lecture 19: CNN and Recurrent Neural NetworkRecurrent Neural Network (Predict Sequence)• General Concept:–In previous neural network, all inputs (and outputs) are independent of eachother, but for Recurrent Neural Network, we want to predict the sequence basedon the sequences before it.–Recurrent Neural Network performs the same task for every element of a sequence,with the output being depended on the previous computations.–Recurrent Neural Network have a memory which captures information aboutwhat has been calculated so far• Example of Usage:Character (letter alphabet) level sequence prediction.• Aside: Is it possible to operate on word level sentence?:Yes, we can use word embedding (change words into vectors).Lecture 19: CNN and Recurrent Neural Network 7• Character Level Model:– Generally, the Recurrent Neural Network has a structure like this:– And the formula:y(2)= g(V(2)Tz(2)+ a(2))z(2)= g(U(2)TX(2)+ W(2)Tz(1)+ b(2))...y(t)= g(V(t)Tz(t)+ a(t))z(t)= g(U(t)TX(t)+ W(t)Tz(t−1)+ b(t))...(Why training like this? - Find pattern)– Weight Sharing:∗V(t), for all tU(t), for all tW(t), for all t∗ Advantage of sharing weights:Allow modeling arbitrary sequence length.8 Lecture 19: CNN and Recurrent Neural Network∗ Model in box (another representation):∗ If we unroll it over time:– Back-propagation of Character Level Sequence Model:∗∂Cost(y(3))∂W=∂Cost(y(3))∂y(3)·∂y(3)∂z(3)·∂z(3)∂g·∂g∂a·∂a∂W(a = UTX(3)+ WTZ(3)+ b)(a depends on W)∗ You can show that:∂Lt∂W∝ |W |T∂g∂aTWhen |W | < 1,∂Lt∂W→ 0, it has a vanishing gradient.When |W | > 1,∂Lt∂W→ ∞, it has a exploding


View Full Document

ILLINOIS CS 446 - 110217.3

Download 110217.3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 110217.3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 110217.3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?