Berkeley COMPSCI 182 - An informal account of BackProp - D2815910

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 182> An informal account of BackProp

DOC PREVIEW

Berkeley COMPSCI 182 - An informal account of BackProp

School name University of California, Berkeley

Course Compsci 182- Neural Basis of Thought and Language

Pages 53

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 53 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

An informal account of BackPropBackprop DetailsThe output layerThe hidden layerMomentum termConvergenceLocal MinimumOverfitting and generalizationOverfitting in ANNsEarly Stopping (Important!!!)Stopping criteriaArchitectural ConsiderationsSlide 17Problems and NetworksSummaryALVINN drives 70mph on highwaysUse MLP Neural Networks when …Applications of FFNNSlide 25Extensions of Backprop NetsElman Nets & Jordan NetsRecurrent BackpropConnectionist Models in Cognitive Science5 levels of Neural Theory of LanguageThe Color Story: A Bridge between Levels of NTLA Tour of the Visual SystemSlide 33Slide 34Slide 35Slide 36Slide 40The Microscopic ViewRods and Cones in the RetinaWhat Rods and Cones DetectSlide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50The WCS Color ChipsResults of Kay’s Color StudySlide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62An informal account of BackPropFor each pattern in the training set: Compute the error at the output nodesCompute w for each wt in 2nd layerCompute delta (generalized error expression) for hidden unitsCompute w for each wt in 1st layerAfter amassing w for all weights and, change each wt a little bit, as determined by the learning ratejpipijowBackprop DetailsHere we go…Also refer to web notes for derivationk j iwjkwijE = Error = ½ ∑i (ti – yi)2yiti: targetijijijWEWWijijWEW jiiiijiiiiijyxfytWxxyyEWE)('The derivative of the sigmoid is just  iiyy 1   jiiiiijyyyytW  1ijijyW   iiiiiyyyt  1The output layerlearning ratek j iwjkwijE = Error = ½ ∑i (ti – yi)2yiti: targetThe hidden layerjkjkWEWjkjjjjjkWxxyyEWEiijiiiijiiiijWxfytyxxyyEyE)(')(kjiijiiijkyxfWxfytWE)(')(')(  kjjiijiiiijkyyyWyyytW 11)(jkjkyW  jjiijiiiijyyWyyyt 11)( jjiiijjyyW 1Momentum termThe speed of learning is governed by the learning rate.If the rate is low, convergence is slowIf the rate is too high, error oscillates without reaching minimum.Momentum tends to smooth small weight error fluctuations. n)(n)y()1n(ijwn)(ijwji10 the momentum accelerates the descent in steady downhill directions.the momentum has a stabilizing effect in directions that oscillate in time.ConvergenceMay get stuck in local minimaWeights may diverge…but works well in practiceRepresentation power:2 layer networks : any continuous function3 layer networks : any functionLocal MinimumUSE A RANDOM COMPONENT SIMULATED ANNEALINGOverfitting and generalizationTOO MANY HIDDEN NODES TENDS TO OVERFITOverfitting in ANNsEarly Stopping (Important!!!)Stop training when error goes up on validation setStopping criteriaSensible stopping criteria:total mean squared error change: Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.01, 0.1]).generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. If this stopping criterion is used then the part of the training set used for testing the network generalization will not be used for updating the weights.Architectural ConsiderationsWhat is the right size network for a given job?How many hidden units?Too many: no generalizationToo few: no solutionPossible answer: Constructive algorithm, e.g.Cascade Correlation (Fahlman, & Lebiere 1990)etcThe number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error.Two types of adaptive algorithms can be used:start from a large network and successively remove some nodes and links until network performance degrades.begin with a small network and introduce new neurons until performance is satisfactory.Network TopologyProblems and Networks•Some problems have natural "good" solutions•Solving a problem may be possible by providing the right armory of general-purpose tools, and recruiting them as needed•Networks are general purpose tools.•Choice of network type, training, architecture, etc greatly influences the chances of successfully solving a problem•Tension: Tailoring tools for a specific job Vs Exploiting general purpose learning mechanismSummaryMultiple layer feed-forward networksReplace Step with Sigmoid (differentiable) function Learn weights by gradient descent on error functionBackpropagation algorithm for learningAvoid overfitting by early stoppingALVINN drives 70mph on highwaysUse MLP Neural Networks when …(vectored) Real inputs, (vectored) real outputsYou’re not interested in understanding how it worksLong training times acceptableShort execution (prediction) times requiredRobust to noise in the datasetApplications of FFNNClassification, pattern recognition:FFNN can be applied to tackle non-linearly separable learning problems.Recognizing printed or handwritten characters,Face recognitionClassification of loan applications into credit-worthy and non-credit-worthy groupsAnalysis of sonar radar to determine the nature of the source of a signalRegression and forecasting:FFNN can be applied to learn non-linear functions (regression) and in particular functions whose inputs is a sequence of measurements over time (time series).Extensions of Backprop NetsRecurrent ArchitecturesBackprop through timeElman Nets & Jordan NetsUpdating the context as we receive input•In Jordan nets we model “forgetting” as well•The recurrent connections have fixed weights•You can train these networks using good ol’ backpropOutputHiddenContext Input1αOutputHiddenContext Input1Recurrent Backprop•we’ll pretend to step through the network one iteration at a time•backprop as usual, but average equivalent weights (e.g. all 3 highlighted edges on the right are equivalent)a b cunrolling3 iterationsa b ca b ca b cw2w1 w3w4w1 w2 w3 w4a b cConnectionist Models in Cognitive

View Full Document