ILLINOIS CS 446 - 101917.2 (6 pages)

Previewing pages 1, 2 of 6 page document View the full content.
View Full Document

101917.2



Previewing pages 1, 2 of actual document.

View the full content.
View Full Document
View Full Document

101917.2

33 views


Pages:
6
School:
University of Illinois - urbana
Course:
Cs 446 - Machine Learning
Machine Learning Documents

Unformatted text preview:

CS446 Machine Learning Fall 2017 Lecture 15 Multi layer Perceptron and Backpropagation Lecturer Sanmi Koyejo Scribe Yihui Cui Oct 19th 2017 Agenda Recap of SGD Perceptron Backpropagation Multi Layer Perceptron Recap SGD Recall Empirical loss l w 1 n Pn i 1 li w R f w P Ep l h w for Gradient Descent wt 1 wt 5w R f w p under weak condition5Ep l w Ep 5w l w we need an unbiased estimator 5w li w Algorithms One way to optimize stochastic objective such as Ep l h w is to perform the update at each step See Algorithms 1 for pseudocode Mini Batch SGD has high variance To reduce variance we use multiple samples As known as Mini Batch We compute gradient of a mini batch of k data cases then take average If k 1 this is SGD if k N this is standard steepest descent Comparision Tradeoff 1 2 15 Multi layer Perceptron and Backpropagation Algorithm 1 Stochastic Gradient Descent Initialize repeat Randomly permuta data for i 1 to n do g 5f zi proj g update end for until converge Gradient Descent Computation Cost N High memory Generally converge fast SGD Comutation Cost less likely to get stuck in flat regions Mini batch SGD With size k Computational cost K Less likely to get stuck in flat regions stands for cost for 1 gradient Find Parameter Estimator T step Pick the final value wT P T1 Tt 1 wt P 1s st T s wt Seting the step size In order to guarantee convergence of SGD there are some sufficient conditions on the learning rate which are known as Robbins Monro conditions X k 1 k X k2 k 1 Choice of stepsize The set values if over time is called learning rate schedule There are different ways to choose learning rate 15 Multi layer Perceptron and Backpropagation 3 k 1 k 0 slows down early iterations of algorithsm and k k m m 0 5 1 controls the rate at which old values are forgotten k e t The need to adjust these tuning parameters is one of the main drawback of stochasticoptimization One simple heuristic is as follows store an initial subset of thedata and try a range of values on this



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view 101917.2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 101917.2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?