DOC PREVIEW
UB CSE 574 - The Hessian Matrix

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! ! ! ! !Srihari The Hessian Matrix!Sargur Srihari!1Machine Learning ! ! ! ! !Srihari Hessian of Neural Network Error Function!• Backpropagation can be used to obtain first derivatives of error function wrt weights in network!• It can also be used to derive second derivatives!• If all weights and bias parameters are elements wi of single vector w then the second derivatives form the elements Hij of Hessian matrix H where i,j ε {1,..W} € ∂2E∂wjiwlkMachine Learning ! ! ! ! !Srihari Hessian Matrix Definition!If f is a real-valued function f(x1,..,xn) And if all real-valued derivatives exist then the Hessian matrix of f is the matrix H(f)ij(x)=DiDjf(x) where x=(x1,..xn) and Di is the differential operator wrt the ith variable Hessian matrices are used in large-scale optimization problems within Newton-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is, where J is the Jacobian matrix, which is a vector (the gradient) for scalar-valued functions. For a vector-valued function y=(y1,..yn) In backprop we deal with scalar error function EMachine Learning ! ! ! ! !Srihari Role of Hessian in Neural Computing!1. Several nonlinear optimization algorithms for neural networks !!are based on second order derivatives of error surface!2. Basis for fast procedure for retraining with small change of training data!3. Identifying least significant weights !!for network pruning requires inverse of Hessian!4. Bayesian neural network!!Central role in Laplace approximation !4Machine Learning ! ! ! ! !Srihari Evaluating the Hessian Matrix!• Full Hessian matrix can be difficult to compute in practice!• quasi-Newton algorithms have been developed that use approximations to the Hessian!• Various approximation techniques have been used to evaluate the Hessian for a neural network!• calculated exactly using an extension of backpropagation!• Important consideration is efficiency!• With W parameters (weights and biases) matrix has dimension W x W • Efficient methods have O(W2) 5Machine Learning ! ! ! ! !Srihari Methods for evaluating the Hessian Matrix!• Diagonal Approximation!• Outer Product Approximation!• Inverse Hessian!• Finite Differences!• Exact Evaluation using Backpropagation!• Fast multiplication by the Hessian!6Machine Learning ! ! ! ! !Srihari Diagonal Approximation!• In many case inverse of Hessian is needed!• If Hessian approximation is diagonal, its inverse is trivially computed!• Complexity is O(W) rather than O(W2) for full Hessian!7Machine Learning ! ! ! ! !Srihari Outer product approximation!• Neural networks commonly use sum-of-squared errors function!• Can write Hessian matrix in the form!• Where!• Elements can be found in O(W2) steps !8 € E =12(yn− tn)2n=1N∑€ H ≈ bnbnTn=1n∑€ bn= ∇yn= ∇anMachine Learning ! ! ! ! !Srihari Inverse Hessian!• Use outer product approximation to obtain computationally efficient procedure for approximating inverse of Hessian!9Machine Learning ! ! ! ! !Srihari Finite Differences!• Using backprop, complexity is reduced from O(W3) to O(W2) 10Machine Learning ! ! ! ! !Srihari Exact Evaluation of the Hessian!• Using an extension of backprop!• Complexity is O(W2) 11Machine Learning ! ! ! ! !Srihari Fast Multiplication by the Hessian!• Application of the Hessian involve multiplication by the Hessian!• The vector vTH has only W elements!• Instead of computing H as an intermediate step, find efficient method to compute


View Full Document

UB CSE 574 - The Hessian Matrix

Download The Hessian Matrix
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Hessian Matrix and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Hessian Matrix 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?