ILLINOIS CS 446 - 110217.3 - D3426164

Home> Schools> University of Illinois - urbana> (CS) > CS 446> 110217.3

DOC PREVIEW

ILLINOIS CS 446 - 110217.3

School name University of Illinois - urbana

Course Cs 446- Machine Learning

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS446: Machine Learning, Fall 2017Lecture Lecture 19: CNN and Recurrent Neural NetworkLecturer: Sanmi Koyejo Scribe: Zhiyu Zhu, November.2nd, 2017Announcement• Midterm Two is on next Thursday.• There will be a review section on next Tuesday.• Project is now out and Due on December.19th.Topics Today• Recap• Convolutional Neural Network– Pooling– Output Layer– Demo• New topics: Recurrent Neural Network #1 (not covered in Midterm Two)– Introduction to Recurrent Neural Network• Next Tuesday:– Recurrent Neural Network #2 → Long Short Term Memory (LSTM)– Unsupervised Learning– Midterm Review12 Lecture 19: CNN and Recurrent Neural NetworkConvolutional Neural Network• Input ImageThis is a model of neural network with multiple filters.If we use P for zero padding, H for the stride size, W as the width, H as the height, Das the depth and F as the filter. We can have the Reset Size formula as:W2=W1− F − 2PS+ 1H2=H1− F − 2PS+ 1D2= K = Number of Filters• Model Parameter (Selecting)– Smaller Network:∗ Hyper parameter search∗ Cross Validation∗ Bayesian Optimization– Larger Network (That may take days to weeks to train):∗ Start with Published Architecture∗ Tweak the Parameter• Recall CNN Architecture Here is a graph for the CNN Architecture:Lecture 19: CNN and Recurrent Neural Network 3• Pooling– Main idea:Capture some location or scale in-variance.– Recall about stride and zero-padding:∗ For the stride size = 1 with 7×7 input and 5×5 output:∗If we change stride size = 2 with same input size of 7×7 and the output sizebecomes 3×3:∗If we change stride size = 3 with same input size of 7×7, we notice thatthere is a problem with the spacing. Then we can use the method called zeropadding, which add layers outside of the current layre.Example of stride size = 5 with 32×32 input and zero padding = 2:4 Lecture 19: CNN and Recurrent Neural Network– Max pooling is to take a filter and stride of the same size:∗ Max pooling: Max of values in blocks∗ Average pooling: Average of values in blocks– Parameters:∗ Pooling sizes∗ Stride∗ Zero padding∗ Pooling sizes– Examples: Pattern Matching in 1-D to find a good filter:For prediction: use max(Z)Lecture 19: CNN and Recurrent Neural Network 5• Output LayerFully connected (usually):• Tips (Computer Vision):– In-variance to Rotation:∗ Data Augmentation∗ Create New Data with random rotation (same trick for any in-variant)– Size in-variant (random re-scaling + pattern)(Recurrent Neural Network on next page)6 Lecture 19: CNN and Recurrent Neural NetworkRecurrent Neural Network (Predict Sequence)• General Concept:–In previous neural network, all inputs (and outputs) are independent of eachother, but for Recurrent Neural Network, we want to predict the sequence basedon the sequences before it.–Recurrent Neural Network performs the same task for every element of a sequence,with the output being depended on the previous computations.–Recurrent Neural Network have a memory which captures information aboutwhat has been calculated so far• Example of Usage:Character (letter alphabet) level sequence prediction.• Aside: Is it possible to operate on word level sentence?:Yes, we can use word embedding (change words into vectors).Lecture 19: CNN and Recurrent Neural Network 7• Character Level Model:– Generally, the Recurrent Neural Network has a structure like this:– And the formula:y(2)= g(V(2)Tz(2)+ a(2))z(2)= g(U(2)TX(2)+ W(2)Tz(1)+ b(2))...y(t)= g(V(t)Tz(t)+ a(t))z(t)= g(U(t)TX(t)+ W(t)Tz(t−1)+ b(t))...(Why training like this? - Find pattern)– Weight Sharing:∗V(t), for all tU(t), for all tW(t), for all t∗ Advantage of sharing weights:Allow modeling arbitrary sequence length.8 Lecture 19: CNN and Recurrent Neural Network∗ Model in box (another representation):∗ If we unroll it over time:– Back-propagation of Character Level Sequence Model:∗∂Cost(y(3))∂W=∂Cost(y(3))∂y(3)·∂y(3)∂z(3)·∂z(3)∂g·∂g∂a·∂a∂W(a = UTX(3)+ WTZ(3)+ b)(a depends on W)∗ You can show that:∂Lt∂W∝ |W |T∂g∂aTWhen |W | < 1,∂Lt∂W→ 0, it has a vanishing gradient.When |W | > 1,∂Lt∂W→ ∞, it has a exploding

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

ILLINOIS CS 446 - 110217.3

Sign up for free to view:

Please select your school