Code Printed by Mathematica for Students 2 neural networks slide show nb Neural Networks Joseph E Gonzalez We are going to go through Neural Networks and review the process of back propagation Experimental Mathematica based presentation Printed by Mathematica for Students neural networks slide show nb Single Perceptron The Perceptron perceptronPlot g wi Xi D 3 i 0 w0 X0 1 w1 X1 w1 X2 There are several parts 1 Link Function g uD 2 Weights wi 3 A bias term X0 Printed by Mathematica for Students 3 4 neural networks slide show nb Link Function g xD g FunctionBx 1 1 Exp xD Plot g xD 8x 8 8 D 1 1 x F 1 0 0 8 0 6 0 4 0 2 5 5 Printed by Mathematica for Students 1 neural networks slide show nb Demo ManipulateB g FunctionBx 1 1 Exp xD F Plot3D g w0 w1 x1 w2 x2D 8x1 3 3 8x2 3 3 D 88w0 0 3 3 88w1 2 3 3 88w2 2 3 3 F w0 w1 w2 Printed by Mathematica for Students 5 6 neural networks slide show nb Neural Network with Multiple Hidden Layers Lets Consider what this network looks like plt outHxL g ui Zi D 3 i 0 u0 u1 Z1 g wi1 Xi D Z2 g wi2 Xi D Z3 g wi Xi D i 0 i 0 i 0 2 Z0 1 w10 X0 1 u3 u2 w11 w20 2 w12 w30 w21 w22 2 w31 3 w32 X2 X1 Matlab Style Forward Propagation Lets define a matrix Was w10 w11 w12 W w20 w21 w22 w30 w31 w32 2 We can multiply this matrix by X where we have added a 1 w10 w11 X1 w12 X2 w10 w11 w12 W X1 w20 X2 w30 w21 w22 X2 w20 w21 X1 w22 X2 X3 w32 w30 w31 X1 w32 X2 1 w31 1 3 Lets define function application as element wise Then we obtain gAw10 w11 X1 w12 X2 E X2 E Z2 gBW X2 F X3 Z3 gAw30 w31 X1 w32 X2 E 1 gAw20 w21 X1 w22 We can then prepend a 1 to the result to obtain Printed by Mathematica for Students Z1 4 neural networks slide show nb outHXL gBH u0 u1 u2 u3 L 1 Z1 Z2 F gBu0 ui Zi F Z3 Printed by Mathematica for Students 3 i 1 5 7 8 neural networks slide show nb Forward Propagation Example 1 What is the value of Z1 plotTree 8 outHxL g D Z0 1 Z1 g D Z2 g D Z3 g D X0 1 X2 2 X1 3 D outHxL g D 2 Z0 1 Z1 g D 1 X0 1 3 1 2 3 2 Z3 g D Z2 g D 3 X2 2 2 1 1 2 1 X1 3 Printed by Mathematica for Students neural networks slide show nb What is the value of Z2 plotTree 8 outHxL g D Z0 1 Z1 g 1 6 3D 0 88 Z2 g D Z3 g D X0 1 X2 2 X1 3 D outHxL g D 2 Z0 1 Z1 g 1 6 3D 0 88 1 X0 1 3 1 2 3 3 X2 2 2 Z3 g D Z2 g D 2 1 1 2 1 X1 3 Printed by Mathematica for Students 9 10 neural networks slide show nb What is the value of Z3 plotTree 8 outHxL g D Z0 1 Z1 0 88 Z2 g 2 4 6D 0 5 Z3 g D X0 1 X2 2 X1 3 D outHxL g D 2 Z0 1 Z1 0 88 1 X0 1 3 1 2 3 2 Z2 g 2 4 6D 0 5 3 X2 2 2 1 1 2 Z3 g D 1 X1 3 Printed by Mathematica for Students neural networks slide show nb What is the value of outHX L plotTree 8 outHxL g D Z0 1 Z1 0 88 Z2 0 5 Z3 g 3 2 3D 1 X0 1 X2 2 X1 3 D outHxL g D 2 Z0 1 Z1 0 88 1 X0 1 3 1 2 3 2 Z3 g 3 2 3D 1 Z2 0 5 3 X2 2 2 1 1 2 1 X1 3 Printed by Mathematica for Students 11 12 neural networks slide show nb Done plotTree 8 outHxL g 2 0 88 1 5 2D 0 35 Z0 1 Z1 0 88 Z2 0 5 Z3 1 X0 1 X2 2 X1 3 D outHxL g 2 0 88 1 5 2D 0 35 2 Z0 1 Z1 0 88 1 X0 1 3 1 2 3 2 Z3 1 Z2 0 5 3 X2 2 2 1 1 2 1 X1 3 Printed by Mathematica for Students neural networks slide show nb Demo dynamicDemo J Printed by Mathematica for Students 13 14 neural networks slide show nb Generalized Back Propagation plt outHxL g ui Zi D 3 i 0 u0 u1 Z1 g wi1 Xi D Z2 g wi2 Xi D Z3 g wi Xi D i 0 i 0 i 0 2 Z0 1 w10 u3 u2 w11 w20 X0 1 2 w12 w21 w30 w22 2 w31 3 w32 X2 X1 Suppose we want to find the best model outHx U WL with respect to the parameters W and U How can we quantify best Lets considered mean squared error E HoutHXi L Yi L2 n 6 i 1 There are many ways to do this One of the most common and least effective methods is to use gradient descent This corresponds to the update rule uHt 1L uHtL i i h wHt 1L wHtL ij ij h E ui E wij uHtL i 7 wHtL ij 8 Recall we have the following graph Printed by Mathematica for Students neural networks slide show nb plt outHxL g ui Zi D 3 i 0 u0 u1 Z1 g wi1 Xi D Z2 g wi2 Xi D Z3 g wi Xi D i 0 i 0 i 0 2 Z0 1 u3 u2 w10 w11 w20 X0 1 2 w12 w30 w21 2 w31 w22 3 w32 X2 X1 Lets first derive the update rule for U E HoutHXL YL2 9 Taking the derivative we get stuck E uk uk HoutHXL YL2 10 Applying the infamous chain rule x f HgHxLL E uk u f HuL u g HxL 2 HoutHXL Yi L x x x 2x 2 uk x gHxL outHXL 11 12 f HgHxLL f HgHxLL g HxL Now we need to take the derivative of the neural network Lets first replace out with the function from the top perceptron E uk 2 HoutHXL YL uk gB ui Zi F 3 i 0 Chain rule again Printed by Mathematica for Students 13 15 16 neural networks slide show nb E uk 2 HoutHXL YL g B ui Zi F 3 3 i 0 i 0 uk ui Zi 14 We know that only one term in the Zi sum will remain and that is Zi k E uk 2 HoutHXL YL g B ui Zi F Zk 3 15 i 0 Done …
View Full Document