UMD CMSC 421 - Learning with Neural Networks - D184712

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 421> Learning with Neural Networks

DOC PREVIEW

UMD CMSC 421 - Learning with Neural Networks

School name University of Maryland, College Park

Course Cmsc 421- Introduction to Artificial Intelligence

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Learning with Neural NetworksReiner [email protected] many others whose copyright I shamelessly infringed by including their illustrations in this presentation.1OutlineBiology: Neurons and the BrainComputational Model of the NeuronLogical NeuronPerceptronADAptive LINear Element (ADALINE)Multi-Layer PerceptronError Back-Propagation2Biological NeuronAxonAP→SynapseLEARNINGLTP or LTDexcitatory or−→inhibitoryNeurotransmitterDendriteEPSPor→IPSPSomaAP→AxonSlow (≈10ms), but MASSIVELY REDUPLICATED (1011neurons, 1014synapses)3Computational Neuron Model (Node)ax== F(x w wTx )Σ Fθw1=0xx0iisigmoidGaussian11011piece−wiselinearlinearstep1F4Logical NeuronMcCulloch & Pitts, 1943• wi>0∈ {−Φ, 1} (fixed; NO learning), w0= θ ∈ {0} ∪ Z−, F = step function• syncronous updates: a(t + 1) =(1 if ~wT~x(t) ≥ 00 otherwise−1−1 −1−1 FLIP−FLOP−11 SET RESET−21111a (t) = a (t−2) AND a (t−1)3aaa123111 −20NAND−11Model of the Brain?• N nodes, p(0) ∈ [0, 1] fraction of initially active nodes• NO external inputs, i.e., ~x = ~a• n = ne(excitatory) + ni(inhibitory) incoming connections per node from random(other) nodes• ne ni, Φ  0, θ ≈ −1 ⇔ limit cycles . . . ~a(t), ~a(t+1), . . . , ~a(t+k) = ~a(t) . . .5PerceptronRosenblatt, 1958 vs. Minsky & Papert, 1969• more complex, but subsumed by the following `modern' perceptronxdx1axx=xi iw1Σθ=1 if0> 0_binary inputsotherwisereal weightswTxw0• two sets of input patternsX+= {~x+1, ~x+2, . . . },X−= {~x−1, ~x−2, . . . }, X+∩ X−= ∅• GOAL: ~w (initially random) so thata =(1 if ~x ∈ X+0 if ~x ∈ X−• Solution: wi= wi+ µxi(t − a) wheret (a) is the target (actual) output for~x and µ ∈]0, 1] is the learning rate• guaranteed to find a `correct' ~w, provided X+and X−are linearly separable• very rare for |X| = |X+∪ X−|  d + 1: P (|X|, d) =12|X|−1dPi=0|X| − 1i• P (|X| = 4, d = 2) = 14/16 (XOR and EQU not linearly separable)6ADAptive LINear Element (ADALINE)Widrow & Hoff, 1960xdx1a wTx=a x=R RmnmxnT(W )xx=xi iw1Σθreal inputsreal weightsw0• paired input and target output pat-terns (~x ∈ Rm,~t ∈ Rn)• GOAL: weight matrix W so that~a = WT~x =~t for each pair (~x,~t)• Solution: Gradient Descent• Error: E =12Pl(tl− al)2• Gradient:δEδwi,j= (ti− ai)(−δaiδwi,j)•δaiδwi,j=δ ~wTi~xδwi,j=δPlwi,lxlδwi,j= xj• Least Mean Square (LMS) or ∆-rule:wi,j= wi,j− µδEδwi,j= . . .· · · = wi,j+µ∆ixjwhere ∆i= ti−ai• finds (global) minimum of E7Multi-Layer PerceptronRosenblatt, 1958 (original Perceptron had multiple layers)x2x21xx21xx2<01x>0?1x>0<0<0input hidden outputw x + w x + = 0θw x + w x + = 0θw x + w x + = 0θ1,111,2 21222,212,11 122−2θ1θ211• input → hidden → output• computes the function ~aO=FO(WTFH(VT~x)) where V (W )contains the incoming weights ofthe hidden (output) nodes• used for classification and func-tion approximation• typically, F(x) = σ(x) =11+e−sx(sigmoid function)• if trained via Gradient Descend,F needs to have a derivative F'• σ0(x) = sσ(x)(1 − σ(x))8Error Back-PropagationRumelhart, 1986• generalization of the LMS or δ-rule for Multi-Layer Perceptrons• Error: E =12Pl(tl− al)2• Gradient of E with respect to weight wi,jfrom jthhidden to ithoutput node:.δEδwi,j= (ti− ai)(−δaiδwi,j) = −δF( ~wTi~aH)δwi,j(ti− ai) = −F'( ~wTi~aH)aH,j(ti− ai)so that wi,j= wi,j+ µ∆iaH,jwith ∆i= F'( ~wTi~aH)(ti− ai)• Gradient of E with respect to weight vi,jfrom jthinput to ithhidden node:.δEδvi,j=Pl(tl− al)(−δalδvi,j) = −PlδF( ~wTl~aH)δvi,j(tl− al) =−PlF'( ~wTl~aH)δ ~wTl~aHδvi,j(tl− al) = −Pl∆lδPkwl,kaH,kδvi,j= −Pl∆lwl,iδaH,iδvi,j=−Pl∆lwl,iδF (~vTi~x)δvi,j= −Pl∆lwl,iF0(~vTi~x)δ~vTi~xδvi,j= −Pl∆lwl,iF0(~vTi~x)xj=−∆ixjwhere ∆i= F0(~vTi~x)Plwl,i∆lso that vi,j= vi,j+ µ∆ixj• NOT guaranteed to find the global minimum of E (is NOT monotonously decreasingdue to non-linearities)9Error Back-PropagationPower• with enough hidden units, can approximate any arbitrary function as precisely asdesiredShortcomings• premature convergence on local minima• very very slow• representational `mysticism' (what was learned?)• many parameters to chose (how many hidden units? µ? FH? FO?)• overfitting (`memorized' training examples; performs poorly on unseen inputs)`Better' Heuristics• EBP with Momentum: ∆(n)wi,j= µ∆(n)ia(n)H,j+ α∆(n−1)wi,j• Resilient EBP (RPROP): ∆wi,j= −∆i,jsign(δEδwi,j) where ∆i,j(wi,j's learning rate)increases (decreases) if sign(δEδwi,j) remains the same (changes) from one timestep to the next10References• McCulloch, W. & Pitts, W. A Logical Calculus of the Ideas Immanent in NervousActivity. Bulletin of Mathematical Biophysics: 5, pp. 115-137, 1943.• Rosenblatt, F. The Perceptron: a probabilistic model for information storage &organization in the brain. Psychological Review: 65, pp. 386-408, 1958.• Minsky, M. & Papert, S. Perceptrons. MIT Pre ss, Cambridge, 1969.• Widrow, B. & Hoff, M. E. Adaptive switching circuits. IRE WESCON ConventionRecord: 4, pp. 96-104, 1960.• Rumelhart D. et al. Learning Representations by Back-Propagating Errors. Nature:329, 533-536, 1986.• Riedmiller, M. & Braun, H. A direct adaptive method for faster backpropagationlearning: the RPROP algorithm. Proceedings of the ICNN, San Francisco, pp.123-134, 1993.• Sejnowski, T. J. & Rosenberg C. R. NETtalk: A parallel network that learns toread aloud. Technical Report JHU/EECS-86/1, EE and CS Dept., John HopkinsUniversity, 1986.11• Pomerleau, D. A. ALVINN: an autonomous land vehicle in a neural network.Technical Report CMU-CS-89-107, CS Dept., Carnegie Mellon University,

View Full Document