Neural Networks 10701 15781 Recitation February 12 2008 Parts of the slides are from previous years 10701 recitation and lecture notes and from Prof Andrew Moore s data mining tutorials Recall Linear Regression Prediction of continuous variables Learn the mapping f X Y f x wi xi or i w x i i i Model is linear in the parameters w some noise Assume Gaussian noise T 1 T X X X Y Learn MLE w Neural Network Neural nets are also models with w parameters in them They are now called weights As before we want to compute the weights to minimize sum of squared residuals Which turns out under Gaussian i i d noise assumption to be max likelihood Instead of explicitly solving for max likelihood weights we use Gradient Descent Perceptrons Input x x1 xn and target value t n Output o x f w0 wi xi i 1 1 if net 0 e g o x sign net 1 otherwise 1 o x net n or 1 exp w0 wi xi i 1 n 1 where net w0 wi xi net net 1 e i 1 sigmoid Given training data x l t l find w which minimizes 1 L l E t o x l 2 2 l 1 Gradient descent General framework for finding a minimum of a continuous differentiable function f w Start with some initial value w 1 and compute the gradient vector f w 1 The next value w 2 is obtained by moving some distance from w 1 in the direction of steepest descent i e along the negative of the gradient w k 1 w k k f w k Gradient Descent on a Perceptron The sigmoid perceptron update rule L w j w j l l 1 l x jl l 1 n where l w j x jl l tl l j 0 Boolean Functions e g using step activation function with threshold 0 can we learn the function X1 AND X X OR X X AND NOT X X XOR X Multilayer Networks The class of functions representable by perceptron is limited Think of nonlinear functions o x h W j f w ji xi j i A 1 Hidden layer Net Ninput 2 Nhidden 3 Noutput 1 Backpropagation HW2 Problem 2 Output in k th output unit from input x ok x f Wkj f w ji xi j i With bias add a constant term for every noninput unit 1 K Learn w to minimize E tk ok x 2 2 k 1 Backpropagation Initialize all weights Do until convergence 1 Input a training example to the network and compute the output ok 2 Update each hidden to output weight wkj by wkj wkj k y j where k t k ok f net k y j output from hidden unit j 3 Update each input to hidden weight wji by w ji w ji j yi where j wkj k f net j k
View Full Document