Error Back PropagationThe algorithm is described with respect to a single node in the network. The network is initializedby assigning small random values to each weight. It is trained by repeated epoches.Vm+11. . . Vm+1qVmiwm+11ibbwm+1qi<<Temporary variables: δmi, hmiVm−11wmi1<<. . . Vm−1pwmipbbForward propagation:Given the example x1, . . . , xnwith the desired outputs y1, . . . , yk, the values of all nodes are com-puted recursively as follows.Initialization: V0i= xii = 1, . . . , nRecursive step: hmi=pXj=1wmijVm−1j, Vmi= g(hmi) for all i, mThis produces values for the top nodes VMi, i = 1, . . . k. Produce the output: Oi= VMi, i = 1, . . . k.Error back propagation:The values of the temporary variables δmiare computed recursively as follows:Initialization: δMj= g0(hMj)(yj− VMj) j = 1, . . . , kRecursive step: δmi= g0(hmi)qXj=1wm+1jiδm+1j, for all i, mWeights update:wmij= wmij+ δmiVm−1jRelated formulas:If g is a sigmoid function:g(h) =11 + e−2βh, g0= 2βg(1 − g)If g is a hyperbolic tangent:g(h) =eβh− e−βheβh+ e−βh, g0= β(1 − g2)Correctness proof for BPThe Error-Back-Propagation algorithm updates the weights according to:wmij= wmij+ δmiVm−1jwhere δmi= g0(hmi)qXj=1wm+1jiδm+1jWe need to show that it is a proper steepest-descent step: wmij= wmij+ (−∂E∂wmij)Here E is the network error:E =Xj(yj− Oj)2=Xj(yj− g(hMj))2(1)The tough part is the following theorem:Theorem: In BP: δmi= −12∂E∂hmifor m = M, M − 1, . . . , 1Proof: for m = M we have: δMj= g0(hMj)(yj− Oj)∂E∂hMi= −2(yi− g(hMi))g0(hMi) = −2δMiNow assume that the theorem holds for m + 1:∂E∂hm+1i= −2δm+1iProve that the theorem holds for m:∂E∂hmi= −2δmiObserve that the values of hm+1jfor all j completely determines E. Therefore, applying the chainrule:∂E∂hmi=Xj∂E∂hm+1j∂hm+1j∂hmiBut hm+1j=Piwm+1jig(hmi) and therefore:∂hm+1j∂hmi= wm+1jig0(hmi). Combining this with the induc-tive assumption we have:∂E∂hmi=Xj−2δm+1jwm+1jig0(hmi) = −2g0(hmi)Xjwm+1jiδm+1j= −2δmiend of proofThe other part that we need for the proof that BP is proper steepest descent is:∂E∂wmij=∂E∂hmiVm−1j(2)This follows from the fact that hmi=PjwmijVm−1jso that∂hmi∂wmij= Vm−1j. Now:∂E∂wmij=∂E∂hmi∂hmi∂wmij=∂E∂hmiVm−1jBP is exactly steepest descent if: δmiVm−1j= −∂Ewmijup to a positive constant. Indeed, combiningthe theorem and the result in (2) we have:∂E∂wmij=∂E∂hmiVm−1j=
View Full Document