DOC PREVIEW
UB CSE 574 - Error Backpropagation!

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning Srihari Error Backpropagation Sargur Srihari 1 Machine Learning Srihari Topics Neural Network Learning Problem Need for computing derivatives of Error function Forward propagation of activations Backward propagation of errors Statement of Backprop algorithm Use of backprop in computing the Jacobian matrix 2 Machine Learning Srihari Neural Network Learning Problem Goal is to learn the weights w from a labelled set of training samples Learning procedure has two stages 1 Evaluate derivatives of error function E w with respect to weights w1 wT 2 Use derivatives to compute adjustments to weights w 1 w E w No of weights is T D 1 M M 1 K M D K 1 K Where D is no of inputs M is no of hidden units K is no of outputs 3 Machine Learning Back propagation Terminology Goal Efficient technique for evaluating gradient of an error function E w for a feed forward neural network Backpropagation is term used for derivative computation only In subsequent stage derivatives are used to make adjustments to weights Achieved using a local message passing scheme Information sent forwards and backwards alternately Srihari Machine Learning Srihari Overview of Backprop algorithm Choose random weights for the network Feed in an example and obtain a result Calculate the error for each node starting from the last stage and propagating the error backwards Update the weights Repeat with other examples until the network converges on the target output How to divide up the errors needs a little calculus 5 Machine Learning Srihari Wide use of Backpropagation Can be applied to error function other than sum of squared errors Used to evaluate other matrices such as Jacobian and Hessian matrices Second stage of weight adjustment using calculated derivatives can be tackled using variety of optimization schemes substantially more powerful than gradient descent 6 Machine Learning Srihari Evaluation of Error Function Derivatives Derivation of back propagation algorithm for Arbitrary feed forward topology Arbitrary differentiable nonlinear activation function Broad class of error functions Error functions of practical interest are sums of errors associated with each training data point N E w E n w n 1 We consider problem of evaluating For nth term in the error function E n w Derivatives are wrt the weights w1 wT Can be used directly for sequential optimization or accumulated over training set for batch 7 Machine Learning A simple Linear Model Outputs yk are linear combinations of input variables xi y k w ki x i yk wki xi i Srihari Error function for a particular input xn has the form 2 1 E n y nk t nk 2 k where ynk yk xn w Subscript n is for a particular input xn which is ignored below of Error function wrt a weight w Gradient ji yj E n y nj t nj x ni w ji xi a local computation involving product of error signal ynj tnj associated with output end of link wji variable xni associated with input end of link wji E y j t j xi w ji Machine Learning Srihari General Feed Forward Network Forward Propagation Each unit computes weighted sum of its inputs a j w ji zi i zi is activation of a unit or input that sends a connection to unit j and wji is the weight associated with the connection Transformed by nonlinear activation function zj h aj zi wji zj h aj aj iwjizi Machine Learning Srihari Evaluation of Derivative En wrt a weight wji By chain rule for partial derivatives aj iwjizi En En a j w ji a j w ji Define j E n a j Substituting wji a j w ji zi i we have E n j zi w ji Thus required derivative zi En w ji a j zi w ji is obtained by Multiplying value of for the unit at output end of weight by of z for unit at input end of weight value Need to figure out how to calculate j E n a j Machine Learning Srihari Calculation of Error for hidden unit j For output unit k yk tk For hidden unit j 1 yk t k 2 and yk ak wki zi 2 k E n Substituting k ak We get the backpropagation formula for for error derivatives at stage j j h a j w kj k k E ak We are summing partial derivatives over several variables ak E E a j n n k a j k ak a j By chain rule Since E ak w ki zi w ki h ai i i ak w kj h a j a j k k Input to activation from earlier units error derivative at later unit k Blue arrow for forward propagation Red arrows indicate direction of information flow during error backpropagation Machine Learning Srihari Error Backpropagation Algorithm Unit j Unit k 1 Apply input vector xn to network and forward propagate through network using a j w ji zi and zj h aj i Backpropagation Formula j h a j w kj k k Value of for a particular hidden unit can be obtained by propagating the s backward from units higherup in the network 2 Evaluate k for all output units using k yk tk 3 Backpropagate the s using j h a j w kj k k to obtain j for each hidden unit 4 Use E n w ji j zi to evaluate required derivatives 12 Machine Learning Srihari A Simple Example Two layer network Sum of squared error Output units linear activation functions yk ak Hidden units have logistic sigmoid activation function h a tanh a where e a e a tanh a e a e a simple form for derivative h a 1 h a 2 Standard Sum of Squared Error 2 1 K E n yk tk 2 k 1 yk activation of output unit k t corresponding target k for input xk Machine Learning Srihari Simple Example Forward and Backward Prop For each input in training set a w Forward Propagation D j 1 ji i x i 0 z j tanh a j Output differences M y k w 2 kj z j j 0 k y k t k Backward Propagation s for hidden units 1 z w K j 2 j kj k j h a j w kj k k h a 1 h a 2 k 1 Derivatives wrt first layer and second layer weights E n j xi w 1 ji E n k z j w 2 kj Update the weights using w 1 w E n w Machine Learning Srihari Efficiency of Backpropagation Computational Efficiency is main aspect of back prop Number of operations to compute derivatives of error function scales with total number W of weights and biases Single evaluation of error function for a single input requires O W operations for large W This is in contrast to O W2 for numerical differentiation As seen next 15 Machine Learning Srihari Another Approach Numerical Differentiation Compute derivatives using method of finite differences Perturb each weight in turn and approximate derivatives by E n E n w ji E n w ji …


View Full Document

UB CSE 574 - Error Backpropagation!

Download Error Backpropagation!
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Error Backpropagation! and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Error Backpropagation! 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?