Connectionist Models: Lecture 3Let’s just do an exampleAn informal account of BackPropBackpropagation AlgorithmSlide 9Momentum termConvergencePattern Separation and NN architectureLocal MinimumOverfitting and generalizationStopping criteriaOverfitting in ANNsSummaryALVINN drives 70mph on highwaysUse MLP Neural Networks when …Applications of FFNNExtensions of Backprop NetsElman Nets & Jordan NetsRecurrent BackpropModels of LearningRecruiting connectionsThe Idea of Recruitment LearningSlide 29Slide 30Slide 32Slide 33Triangle nodes and feature structuresRepresenting concepts using triangle nodesSlide 37Slide 38Recruiting triangle nodesStrengthen these connectionsDistributed vs Localist Rep’nSlide 42Connectionist Models in Cognitive ScienceSpreading activation and feature structuresCan we formalize/model these intuitions5 levels of Neural Theory of LanguageThe Color Story: A Bridge between Levels of NTLA Tour of the Visual SystemRods and Cones in the RetinaSlide 51The Microscopic ViewWhat Rods and Cones DetectCenter Surround cellsColor Opponent CellsConnectionist Models: Lecture 3Srini NarayananCS182/CogSci110/Ling109Spring 2006Let’s just do an exampleE = Error = ½ ∑i (ti – yi)2x0fi1w01y0i2b=1w02w0b E = ½ (t0 – y0)2i1i2y00 0 00 1 11 0 11 1 10.80.60.5000.62240.51/(1+e^-0.5) E = ½ (0 – 0.6224)2 = 0.1937ijijyW iiiiiyyyt 101 i00 000001 yyyt 6224.016224.06224.0001463.001463.00101 yW0202 yW00bbyW02 i0 blearning ratesuppose = 0.50731.01463.05.00bW0.4268An informal account of BackPropFor each pattern in the training set: Compute the error at the output nodesCompute w for each wt in 2nd layerCompute delta (generalized error expression) for hidden unitsCompute w for each wt in 1st layerAfter amassing w for all weights and, change each wt a little bit, as determined by the learning ratejpipijowBackpropagation AlgorithmInitialize all weights to small random numbersFor each training example doFor each hidden unit h:For each output unit k:For each output unit k:For each hidden unit h:Update each network weight wij:ijjijxwiihihxwy )(khkhkxwy )()()1(kkkkkytyy kkhkhhhwyy)1(withijijijwww Backpropagation Algorithm“activations”“errors”Momentum termThe speed of learning is governed by the learning rate.If the rate is low, convergence is slowIf the rate is too high, error oscillates without reaching minimum.Momentum tends to smooth small weight error fluctuations. n)(n)y()1n(ijwn)(ijwji10 the momentum accelerates the descent in steady downhill directions.the momentum has a stabilizing effect in directions that oscillate in time.ConvergenceMay get stuck in local minimaWeights may diverge…but works well in practiceRepresentation power:2 layer networks : any continuous function3 layer networks : any functionPattern Separation and NN architectureLocal MinimumUSE A RANDOM COMPONENT SIMULATED ANNEALINGOverfitting and generalizationTOO MANY HIDDEN NODES TENDS TO OVERFITStopping criteriaSensible stopping criteria:total mean squared error change: Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.01, 0.1]).generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. If this stopping criterion is used then the part of the training set used for testing the network generalization will not be used for updating the weights.Overfitting in ANNsSummaryMultiple layer feed-forward networksReplace Step with Sigmoid (differentiable) function Learn weights by gradient descent on error functionBackpropagation algorithm for learningAvoid overfitting by early stoppingALVINN drives 70mph on highwaysUse MLP Neural Networks when …(vectored) Real inputs, (vectored) real outputsYou’re not interested in understanding how it worksLong training times acceptableShort execution (prediction) times requiredRobust to noise in the datasetApplications of FFNNClassification, pattern recognition:FFNN can be applied to tackle non-linearly separable learning problems.Recognizing printed or handwritten characters,Face recognitionClassification of loan applications into credit-worthy and non-credit-worthy groupsAnalysis of sonar radar to determine the nature of the source of a signalRegression and forecasting:FFNN can be applied to learn non-linear functions (regression) and in particular functions whose inputs is a sequence of measurements over time (time series).Extensions of Backprop NetsRecurrent ArchitecturesBackprop through timeElman Nets & Jordan NetsUpdating the context as we receive input•In Jordan nets we model “forgetting” as well•The recurrent connections have fixed weights•You can train these networks using good ol’ backpropOutputHiddenContext Input1αOutputHiddenContext Input1Recurrent Backprop•we’ll pretend to step through the network one iteration at a time•backprop as usual, but average equivalent weights (e.g. all 3 highlighted edges on the right are equivalent)a b cunrolling3 iterationsa b ca b ca b cw2w1 w3w4w1 w2 w3 w4a b cModels of Learning•Hebbian ~ coincidence•Supervised ~ correction (backprop)•Recruitment ~ one trialRecruiting connections •Given that LTP involves synaptic strength changes and Hebb’s rule involves coincident-activation based strengthening of connections–How can connections between two nodes be recruited using Hebbs’s rule?The Idea of Recruitment Learning•Suppose we want to link up node X to node Y•The idea is to pick the two nodes in the middle to link them up•Can we be sure that we can find a path to get from X to Y?KBFP )1(link nothe point is, with a fan-out of 1000, if we allow 2 intermediate layers, we can almost always find a pathXXYYBBNNKKF = B/NF = B/NXYXYFinding a Connection in Random NetworksFor Networks with N nodes and sqrt(N ) branching factor, there is a high probability of finding good links.Recruiting a Connection in
View Full Document