„CodePrinted by Mathematica for StudentsNeural NetworksJoseph E. GonzalezWe are going to go through Neural Networks and review the process of back propagation. Experimental Mathematica based presentation.2 neural_networks_slide_show.nbPrinted by Mathematica for StudentsSingle Perceptron„The PerceptronperceptronPlotw0w1w1X0=1g@‚i=03wiXiDX1X2„There are several parts1.Link Function [email protected] wi3.A bias term X0neural_networks_slide_show.nb 3Printed by Mathematica for StudentsLink Function(1)g@xD =11 + ‰-xg = FunctionBx,11 + Exp@-xDF;Plot@g@xD, 8x, -8, 8<D-550.20.40.60.81.04 neural_networks_slide_show.nbPrinted by Mathematica for StudentsDemoManipulateBg = FunctionBx,11 + Exp@-xDF;Plot3D@ g@w0 + w1 x1 + w2 x2D, 8x1, -3, 3<, 8x2, -3, 3<D,88w0, 0<, -3, 3<, 88w1, 2<, -3, 3<, 88w2, -2<, -3, 3<Fw0w1w2neural_networks_slide_show.nb 5Printed by Mathematica for StudentsNeural Network with Multiple Hidden Layers„Lets Consider what this network looks likepltu0u1u2u3w01w11w21w02w12w22w03w13w23Z0=1outHxL=g@‚i=03uiZiDZ1=g@‚i=02wi1XiDZ2=g@‚i=02wi2XiDZ3=g@‚i=02wi3XiDX0=1X2X1„Matlab Style Forward PropagationLets define a matrix Was:(2)W =w01w11w21w02w12w22w03w13w23We can multiply this matrix by X where we have added a 1(3)W ÿ1X1X2=w01w11w21w02w12w22w03w13w23ÿ1X2X3=w01+ w11 X1+ w21 X2w02+ w12 X1+ w22 X2w03+ w13 X1+ w23 X2Lets define function application as element wise. Then we obtain:(4)gBW ÿ1X2X3F =gAw01+ w11 X1+ w21 X2EgAw02+ w12 X1+ w22 X2EgAw03+ w13 X1+ w23 X2E=Z1Z2Z3We can then prepend a 1 to the result to obtain:6 neural_networks_slide_show.nbPrinted by Mathematica for Students(5)outHXL = gBHu0u1u2u3Lÿ1Z1Z2Z3F = gBu0+‚i=13ui ZiFneural_networks_slide_show.nb 7Printed by Mathematica for StudentsForward Propagation (Example) #1„What is the value of Z1plotTree@8"outHxL=g@?D", "Z0=1", "Z1=g@?D", "Z2=g@?D", "Z3=g@?D","X0=1", "X2=2", "X1=3"<D21-3-2123-3211-21Z0=1outHxL=g@?DZ1=g@?DZ2=g@?DZ3=g@?DX0=1X2=2X1=38 neural_networks_slide_show.nbPrinted by Mathematica for StudentsWhat is the value of Z2?plotTree@8"outHxL=g@?D", "Z0=1", "Z1=g@1-6+3D=0.88", "Z2=g@?D","Z3=g@?D", "X0=1", "X2=2", "X1=3"<D21-3-2123-3211-21Z0=1outHxL=g@?DZ1=g@1-6+3D=0.88Z2=g@?DZ3=g@?DX0=1X2=2X1=3neural_networks_slide_show.nb 9Printed by Mathematica for StudentsWhat is the value of Z3?plotTree@8"outHxL=g@?D", "Z0=1", "Z1=0.88", "Z2=g@2+4-6D=0.5", "Z3=g@?D","X0=1", "X2=2", "X1=3"<D21-3-2123-3211-21Z0=1outHxL=g@?DZ1=0.88Z2=g@2+4-6D=0.5Z3=g@?DX0=1X2=2X1=310 neural_networks_slide_show.nbPrinted by Mathematica for StudentsWhat is the value of outHXL?plotTree@8"outHxL=g@?D", "Z0=1", "Z1=0.88", "Z2=0.5", "Z3=g@3+2+3D=1","X0=1", "X2=2", "X1=3"<D21-3-2123-3211-21Z0=1outHxL=g@?DZ1=0.88Z2=0.5Z3=g@3+2+3D=1X0=1X2=2X1=3neural_networks_slide_show.nb 11Printed by Mathematica for StudentsDone!plotTree@8"outHxL=g@2+0.88-1.5-2D=0.35", "Z0=1", "Z1=0.88", "Z2=0.5","Z3=1", "X0=1", "X2=2", "X1=3"<D21-3-2123-3211-21Z0=1outHxL=g@2+0.88-1.5-2D=0.35Z1=0.88Z2=0.5Z3=1X0=1X2=2X1=312 neural_networks_slide_show.nbPrinted by Mathematica for StudentsDemodynamicDemoJneural_networks_slide_show.nb 13Printed by Mathematica for StudentsGeneralized Back Propagationpltu0u1u2u3w01w11w21w02w12w22w03w13w23Z0=1outHxL=g@‚i=03uiZiDZ1=g@‚i=02wi1XiDZ2=g@‚i=02wi2XiDZ3=g@‚i=02wi3XiDX0=1X2X1Suppose we want to find the best model outHx; U, W L with respect to the parameters W and U. How can we quantify best?Lets considered mean squared error. (6)E =‚i=1nHoutHXiL- YiL2There are many ways to do this. One of the most common (and least effective) methods is to use gradient descent. Thiscorresponds to the update rule:(7)uiHt+1LôuiHtL- h∂ E∂uiuiHtL(8)wijHt+1LôwijHtL- h∂ E∂wijwijHtLRecall we have the following graph:14 neural_networks_slide_show.nbPrinted by Mathematica for Studentspltu0u1u2u3w01w11w21w02w12w22w03w13w23Z0=1outHxL=g@‚i=03uiZiDZ1=g@‚i=02wi1XiDZ2=g@‚i=02wi2XiDZ3=g@‚i=02wi3XiDX0=1X2X1Lets first derive the update rule for U:(9)E = HoutHXL - Y L2Taking the derivative we get (stuck?):(10)∂ E∂uk=∂∂uk HoutHXL - Y L2Applying the infamous chain rule:(11)∂∂ x f HgHxLL =∂∂u f HuLu = g HxL ∂∂ x gHxL(12)∂ E∂uk= 2 HoutHXL- YiL∂∂x x2= 2 x ∂∂uk outHXL∂∂ x f HgHxLL = f£HgHxLLg£HxLNow we need to take the derivative of the neural network. Lets first replace out with the function from the top perceptron(13)∂ E∂uk= 2 HoutHXL- YL ∂∂uk gB‚i=03ui ZiFChain rule againneural_networks_slide_show.nb 15Printed by Mathematica for Students(14)∂ E∂uk= 2 HoutHXL- YL g£B‚i=03ui ZiF ‚i=03∂∂uk ui ZiWe know that only one term in the Zi sum will remain and that is Zi=k(15)∂ E∂uk= 2 HoutHXL- YL g£B‚i=03ui ZiFZkDone thats it!!! Sort of. Lets look at the derivative of g@xD =11-Exp@-xD(16)g£@xD =∂∂ x H1 + Exp@-xDL-1(17)g£@xD = -H1 + Exp@-xDL-2 ∂∂ x H1 + Exp@-xDL(18)g£@xD = -H1 + Exp@-xDL-2 ∂∂ x 1 +∂∂ x Exp@-xD(19)g£@xD = -H1 + Exp@-xDL-2 0 + Exp@-xD ∂∂ x H-xL(20)g£@xD = -H1 + Exp@-xDL-2 H0 - Exp@-xDL(21)g£@xD = H1 + Exp@-xDL-2 Exp@-xDWith some manipulation we get:(22)g£@xD =Exp@-xDH1 + Exp@-xDL 1H1 + Exp@-xDL(23)g£@xD =Exp@-xD1 + Exp@-xD g@xD(24)g£@xD =1 + Exp@-xD- 11 + Exp@-xD g@xD(25)g£@xD =1 + Exp@-xD1 + Exp@-xD+-11 + Exp@-xD g@xD(26)g£@xD = 1 -11 + Exp@-xD g@xD(27)g¢@xD = H1 - g@xDL g @xDRecall that we earlier had:(28)∂ E∂uk= 2 HoutHXL- YL g£B‚i=03ui ZiF Zkwe can make a simple substitution to get:16 neural_networks_slide_show.nbPrinted by Mathematica for Studentswe can make a simple substitution to get:(29)∂ E∂uk= 2 HoutHXL- YL 1 - gB‚i=03ui ZiF gB‚i=03ui ZiF Zk(30)∂ E∂uk= 2 HoutHXL- YL H1 - outHXLL outHXL Zkpltu0u1u2u3w01w11w21w02w12w22w03w13w23Z0=1outHxL=g@‚i=03uiZiDZ1=g@‚i=02wi1XiDZ2=g@‚i=02wi2XiDZ3=g@‚i=02wi3XiDX0=1X2X1neural_networks_slide_show.nb 17Printed by Mathematica for StudentsGradient of WThat wasn't too bad. How about the next layer. We again start with:(31)E = HoutHXL - Y L2Taking the derivative with respect to wkr (and applying the chain rule)(32)∂ E∂wkr=∂ E∂outHXL ∂outHXL∂wkr= 2 HoutHXL- YL ∂∂wkr outHXLExpanding out we get:(33)∂ E∂wkr= 2 HoutHXL- YL ∂∂wkr gB‚i=03ui ZiFChain rule:(34)∂ E∂wkr= 2 HoutHXL- YL g£B‚i=03ui ZiF ‚i=03∂∂wkrui ZiRecall that g£@xD = H1 - g@xDL g@xD(35)∂ E∂wkr= 2 HoutHXL- YL H1 - outHXLL outHXL ‚i=03∂∂wkrui ZiRemember that each of the Zi is connected to all the perceptrons from the lower level so we must take the
View Full Document