Berkeley COMPSCI 182 - Connectionist Models - D2717436

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 182> Connectionist Models

DOC PREVIEW

Berkeley COMPSCI 182 - Connectionist Models

School name University of California, Berkeley

Course Compsci 182- Neural Basis of Thought and Language

Pages 46

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Connectionist Models: Lecture 3Let’s just do an exampleAn informal account of BackPropBackpropagation AlgorithmSlide 9Momentum termConvergencePattern Separation and NN architectureLocal MinimumOverfitting and generalizationStopping criteriaOverfitting in ANNsSummaryALVINN drives 70mph on highwaysUse MLP Neural Networks when …Applications of FFNNExtensions of Backprop NetsElman Nets & Jordan NetsRecurrent BackpropModels of LearningRecruiting connectionsThe Idea of Recruitment LearningSlide 29Slide 30Slide 32Slide 33Triangle nodes and feature structuresRepresenting concepts using triangle nodesSlide 37Slide 38Recruiting triangle nodesStrengthen these connectionsDistributed vs Localist Rep’nSlide 42Connectionist Models in Cognitive ScienceSpreading activation and feature structuresCan we formalize/model these intuitions5 levels of Neural Theory of LanguageThe Color Story: A Bridge between Levels of NTLA Tour of the Visual SystemRods and Cones in the RetinaSlide 51The Microscopic ViewWhat Rods and Cones DetectCenter Surround cellsColor Opponent CellsConnectionist Models: Lecture 3Srini NarayananCS182/CogSci110/Ling109Spring 2006Let’s just do an exampleE = Error = ½ ∑i (ti – yi)2x0fi1w01y0i2b=1w02w0b E = ½ (t0 – y0)2i1i2y00 0 00 1 11 0 11 1 10.80.60.5000.62240.51/(1+e^-0.5) E = ½ (0 – 0.6224)2 = 0.1937ijijyW   iiiiiyyyt  101 i00   000001 yyyt    6224.016224.06224.0001463.001463.00101 yW0202 yW00bbyW02 i0 blearning ratesuppose  = 0.50731.01463.05.00bW0.4268An informal account of BackPropFor each pattern in the training set: Compute the error at the output nodesCompute w for each wt in 2nd layerCompute delta (generalized error expression) for hidden unitsCompute w for each wt in 1st layerAfter amassing w for all weights and, change each wt a little bit, as determined by the learning ratejpipijowBackpropagation AlgorithmInitialize all weights to small random numbersFor each training example doFor each hidden unit h:For each output unit k:For each output unit k:For each hidden unit h:Update each network weight wij:ijjijxwiihihxwy )(khkhkxwy )()()1(kkkkkytyy kkhkhhhwyy)1(withijijijwww Backpropagation Algorithm“activations”“errors”Momentum termThe speed of learning is governed by the learning rate.If the rate is low, convergence is slowIf the rate is too high, error oscillates without reaching minimum.Momentum tends to smooth small weight error fluctuations. n)(n)y()1n(ijwn)(ijwji10 the momentum accelerates the descent in steady downhill directions.the momentum has a stabilizing effect in directions that oscillate in time.ConvergenceMay get stuck in local minimaWeights may diverge…but works well in practiceRepresentation power:2 layer networks : any continuous function3 layer networks : any functionPattern Separation and NN architectureLocal MinimumUSE A RANDOM COMPONENT SIMULATED ANNEALINGOverfitting and generalizationTOO MANY HIDDEN NODES TENDS TO OVERFITStopping criteriaSensible stopping criteria:total mean squared error change: Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.01, 0.1]).generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. If this stopping criterion is used then the part of the training set used for testing the network generalization will not be used for updating the weights.Overfitting in ANNsSummaryMultiple layer feed-forward networksReplace Step with Sigmoid (differentiable) function Learn weights by gradient descent on error functionBackpropagation algorithm for learningAvoid overfitting by early stoppingALVINN drives 70mph on highwaysUse MLP Neural Networks when …(vectored) Real inputs, (vectored) real outputsYou’re not interested in understanding how it worksLong training times acceptableShort execution (prediction) times requiredRobust to noise in the datasetApplications of FFNNClassification, pattern recognition:FFNN can be applied to tackle non-linearly separable learning problems.Recognizing printed or handwritten characters,Face recognitionClassification of loan applications into credit-worthy and non-credit-worthy groupsAnalysis of sonar radar to determine the nature of the source of a signalRegression and forecasting:FFNN can be applied to learn non-linear functions (regression) and in particular functions whose inputs is a sequence of measurements over time (time series).Extensions of Backprop NetsRecurrent ArchitecturesBackprop through timeElman Nets & Jordan NetsUpdating the context as we receive input•In Jordan nets we model “forgetting” as well•The recurrent connections have fixed weights•You can train these networks using good ol’ backpropOutputHiddenContext Input1αOutputHiddenContext Input1Recurrent Backprop•we’ll pretend to step through the network one iteration at a time•backprop as usual, but average equivalent weights (e.g. all 3 highlighted edges on the right are equivalent)a b cunrolling3 iterationsa b ca b ca b cw2w1 w3w4w1 w2 w3 w4a b cModels of Learning•Hebbian ~ coincidence•Supervised ~ correction (backprop)•Recruitment ~ one trialRecruiting connections •Given that LTP involves synaptic strength changes and Hebb’s rule involves coincident-activation based strengthening of connections–How can connections between two nodes be recruited using Hebbs’s rule?The Idea of Recruitment Learning•Suppose we want to link up node X to node Y•The idea is to pick the two nodes in the middle to link them up•Can we be sure that we can find a path to get from X to Y?KBFP )1(link nothe point is, with a fan-out of 1000, if we allow 2 intermediate layers, we can almost always find a pathXXYYBBNNKKF = B/NF = B/NXYXYFinding a Connection in Random NetworksFor Networks with N nodes and sqrt(N ) branching factor, there is a high probability of finding good links.Recruiting a Connection in

View Full Document