UB CSE 574 - The Hessian Matrix - D2997548

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> The Hessian Matrix

DOC PREVIEW

UB CSE 574 - The Hessian Matrix

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning ! ! ! ! !Srihari The Hessian Matrix!Sargur Srihari!1Machine Learning ! ! ! ! !Srihari Hessian of Neural Network Error Function!• Backpropagation can be used to obtain first derivatives of error function wrt weights in network!• It can also be used to derive second derivatives!• If all weights and bias parameters are elements wi of single vector w then the second derivatives form the elements Hij of Hessian matrix H where i,j ε {1,..W} € ∂2E∂wjiwlkMachine Learning ! ! ! ! !Srihari Hessian Matrix Definition!If f is a real-valued function f(x1,..,xn) And if all real-valued derivatives exist then the Hessian matrix of f is the matrix H(f)ij(x)=DiDjf(x) where x=(x1,..xn) and Di is the differential operator wrt the ith variable Hessian matrices are used in large-scale optimization problems within Newton-type methods because they are the coefficient of the quadratic term of a local Taylor expansion of a function. That is, where J is the Jacobian matrix, which is a vector (the gradient) for scalar-valued functions. For a vector-valued function y=(y1,..yn) In backprop we deal with scalar error function EMachine Learning ! ! ! ! !Srihari Role of Hessian in Neural Computing!1. Several nonlinear optimization algorithms for neural networks !!are based on second order derivatives of error surface!2. Basis for fast procedure for retraining with small change of training data!3. Identifying least significant weights !!for network pruning requires inverse of Hessian!4. Bayesian neural network!!Central role in Laplace approximation !4Machine Learning ! ! ! ! !Srihari Evaluating the Hessian Matrix!• Full Hessian matrix can be difficult to compute in practice!• quasi-Newton algorithms have been developed that use approximations to the Hessian!• Various approximation techniques have been used to evaluate the Hessian for a neural network!• calculated exactly using an extension of backpropagation!• Important consideration is efficiency!• With W parameters (weights and biases) matrix has dimension W x W • Efficient methods have O(W2) 5Machine Learning ! ! ! ! !Srihari Methods for evaluating the Hessian Matrix!• Diagonal Approximation!• Outer Product Approximation!• Inverse Hessian!• Finite Differences!• Exact Evaluation using Backpropagation!• Fast multiplication by the Hessian!6Machine Learning ! ! ! ! !Srihari Diagonal Approximation!• In many case inverse of Hessian is needed!• If Hessian approximation is diagonal, its inverse is trivially computed!• Complexity is O(W) rather than O(W2) for full Hessian!7Machine Learning ! ! ! ! !Srihari Outer product approximation!• Neural networks commonly use sum-of-squared errors function!• Can write Hessian matrix in the form!• Where!• Elements can be found in O(W2) steps !8 € E =12(yn− tn)2n=1N∑€ H ≈ bnbnTn=1n∑€ bn= ∇yn= ∇anMachine Learning ! ! ! ! !Srihari Inverse Hessian!• Use outer product approximation to obtain computationally efficient procedure for approximating inverse of Hessian!9Machine Learning ! ! ! ! !Srihari Finite Differences!• Using backprop, complexity is reduced from O(W3) to O(W2) 10Machine Learning ! ! ! ! !Srihari Exact Evaluation of the Hessian!• Using an extension of backprop!• Complexity is O(W2) 11Machine Learning ! ! ! ! !Srihari Fast Multiplication by the Hessian!• Application of the Hessian involve multiplication by the Hessian!• The vector vTH has only W elements!• Instead of computing H as an intermediate step, find efficient method to compute

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

UB CSE 574 - The Hessian Matrix

Sign up for free to view:

Please select your school