DOC PREVIEW
UMD CMSC 421 - Learning with Neural Networks

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning with Neural NetworksReiner [email protected] many others whose copyright I shamelessly infringed by including their illustrations in this presentation.1OutlineBiology: Neurons and the BrainComputational Model of the NeuronLogical NeuronPerceptronADAptive LINear Element (ADALINE)Multi-Layer PerceptronError Back-Propagation2Biological NeuronAxonAP→SynapseLEARNINGLTP or LTDexcitatory or−→inhibitoryNeurotransmitterDendriteEPSPor→IPSPSomaAP→AxonSlow (≈10ms), but MASSIVELY REDUPLICATED (1011neurons, 1014synapses)3Computational Neuron Model (Node)ax== F(x w wTx )Σ Fθw1=0xx0iisigmoidGaussian11011piece−wiselinearlinearstep1F4Logical NeuronMcCulloch & Pitts, 1943• wi>0∈ {−Φ, 1} (fixed; NO learning), w0= θ ∈ {0} ∪ Z−, F = step function• syncronous updates: a(t + 1) =(1 if ~wT~x(t) ≥ 00 otherwise−1−1 −1−1 FLIP−FLOP−11 SET RESET−21111a (t) = a (t−2) AND a (t−1)3aaa123111 −20NAND−11Model of the Brain?• N nodes, p(0) ∈ [0, 1] fraction of initially active nodes• NO external inputs, i.e., ~x = ~a• n = ne(excitatory) + ni(inhibitory) incoming connections per node from random(other) nodes• ne ni, Φ  0, θ ≈ −1 ⇔ limit cycles . . . ~a(t), ~a(t+1), . . . , ~a(t+k) = ~a(t) . . .5PerceptronRosenblatt, 1958 vs. Minsky & Papert, 1969• more complex, but subsumed by the following `modern' perceptronxdx1axx=xi iw1Σθ=1 if0> 0_binary inputsotherwisereal weightswTxw0• two sets of input patternsX+= {~x+1, ~x+2, . . . },X−= {~x−1, ~x−2, . . . }, X+∩ X−= ∅• GOAL: ~w (initially random) so thata =(1 if ~x ∈ X+0 if ~x ∈ X−• Solution: wi= wi+ µxi(t − a) wheret (a) is the target (actual) output for~x and µ ∈]0, 1] is the learning rate• guaranteed to find a `correct' ~w, provided X+and X−are linearly separable• very rare for |X| = |X+∪ X−|  d + 1: P (|X|, d) =12|X|−1dPi=0|X| − 1i• P (|X| = 4, d = 2) = 14/16 (XOR and EQU not linearly separable)6ADAptive LINear Element (ADALINE)Widrow & Hoff, 1960xdx1a wTx=a x=R RmnmxnT(W )xx=xi iw1Σθreal inputsreal weightsw0• paired input and target output pat-terns (~x ∈ Rm,~t ∈ Rn)• GOAL: weight matrix W so that~a = WT~x =~t for each pair (~x,~t)• Solution: Gradient Descent• Error: E =12Pl(tl− al)2• Gradient:δEδwi,j= (ti− ai)(−δaiδwi,j)•δaiδwi,j=δ ~wTi~xδwi,j=δPlwi,lxlδwi,j= xj• Least Mean Square (LMS) or ∆-rule:wi,j= wi,j− µδEδwi,j= . . .· · · = wi,j+µ∆ixjwhere ∆i= ti−ai• finds (global) minimum of E7Multi-Layer PerceptronRosenblatt, 1958 (original Perceptron had multiple layers)x2x21xx21xx2<01x>0?1x>0<0<0input hidden outputw x + w x + = 0θw x + w x + = 0θw x + w x + = 0θ1,111,2 21222,212,11 122−2θ1θ211• input → hidden → output• computes the function ~aO=FO(WTFH(VT~x)) where V (W )contains the incoming weights ofthe hidden (output) nodes• used for classification and func-tion approximation• typically, F(x) = σ(x) =11+e−sx(sigmoid function)• if trained via Gradient Descend,F needs to have a derivative F'• σ0(x) = sσ(x)(1 − σ(x))8Error Back-PropagationRumelhart, 1986• generalization of the LMS or δ-rule for Multi-Layer Perceptrons• Error: E =12Pl(tl− al)2• Gradient of E with respect to weight wi,jfrom jthhidden to ithoutput node:.δEδwi,j= (ti− ai)(−δaiδwi,j) = −δF( ~wTi~aH)δwi,j(ti− ai) = −F'( ~wTi~aH)aH,j(ti− ai)so that wi,j= wi,j+ µ∆iaH,jwith ∆i= F'( ~wTi~aH)(ti− ai)• Gradient of E with respect to weight vi,jfrom jthinput to ithhidden node:.δEδvi,j=Pl(tl− al)(−δalδvi,j) = −PlδF( ~wTl~aH)δvi,j(tl− al) =−PlF'( ~wTl~aH)δ ~wTl~aHδvi,j(tl− al) = −Pl∆lδPkwl,kaH,kδvi,j= −Pl∆lwl,iδaH,iδvi,j=−Pl∆lwl,iδF (~vTi~x)δvi,j= −Pl∆lwl,iF0(~vTi~x)δ~vTi~xδvi,j= −Pl∆lwl,iF0(~vTi~x)xj=−∆ixjwhere ∆i= F0(~vTi~x)Plwl,i∆lso that vi,j= vi,j+ µ∆ixj• NOT guaranteed to find the global minimum of E (is NOT monotonously decreasingdue to non-linearities)9Error Back-PropagationPower• with enough hidden units, can approximate any arbitrary function as precisely asdesiredShortcomings• premature convergence on local minima• very very slow• representational `mysticism' (what was learned?)• many parameters to chose (how many hidden units? µ? FH? FO?)• overfitting (`memorized' training examples; performs poorly on unseen inputs)`Better' Heuristics• EBP with Momentum: ∆(n)wi,j= µ∆(n)ia(n)H,j+ α∆(n−1)wi,j• Resilient EBP (RPROP): ∆wi,j= −∆i,jsign(δEδwi,j) where ∆i,j(wi,j's learning rate)increases (decreases) if sign(δEδwi,j) remains the same (changes) from one timestep to the next10References• McCulloch, W. & Pitts, W. A Logical Calculus of the Ideas Immanent in NervousActivity. Bulletin of Mathematical Biophysics: 5, pp. 115-137, 1943.• Rosenblatt, F. The Perceptron: a probabilistic model for information storage &organization in the brain. Psychological Review: 65, pp. 386-408, 1958.• Minsky, M. & Papert, S. Perceptrons. MIT Pre ss, Cambridge, 1969.• Widrow, B. & Hoff, M. E. Adaptive switching circuits. IRE WESCON ConventionRecord: 4, pp. 96-104, 1960.• Rumelhart D. et al. Learning Representations by Back-Propagating Errors. Nature:329, 533-536, 1986.• Riedmiller, M. & Braun, H. A direct adaptive method for faster backpropagationlearning: the RPROP algorithm. Proceedings of the ICNN, San Francisco, pp.123-134, 1993.• Sejnowski, T. J. & Rosenberg C. R. NETtalk: A parallel network that learns toread aloud. Technical Report JHU/EECS-86/1, EE and CS Dept., John HopkinsUniversity, 1986.11• Pomerleau, D. A. ALVINN: an autonomous land vehicle in a neural network.Technical Report CMU-CS-89-107, CS Dept., Carnegie Mellon University,


View Full Document

UMD CMSC 421 - Learning with Neural Networks

Download Learning with Neural Networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning with Neural Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning with Neural Networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?