DOC PREVIEW
CORNELL CS 472 - Lecture Slides

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

2-Layer Feedforward NetworksBoolean functions:• Every boolean function can be represented by network with single hidden layer• But might require exponential (in number of inputs) hidden unitsContinuous functions:• Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer [Cybenko 1989; Hornik et al. 1989]Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988]. x1x2xNo1o2oOMulti-Layer Nets• Fully connected, two layer, feedforwardBackpropagation Training (Overview)Training data: –(x1,y1),…, (xn,yn), with target labels yz∈{0,1}Optimization Problem (single output neuron):– Variables: network weights wiÆj– Objective fct: minw∑z=1..n(yz–oz)2,– Constraints: noneAlgorithm: local search via gradient descent.• Randomly initialize weights. • Until performance is satisfactory*, – Present all training instances. For each one,• Calculate actual output. (forward pass)• Compute the weight changes that move the output o closer to the desired label y. (backward pass)– Add up weight changes and change the weights. Smooth and Differentiable Threshold Function• Replace sign function by a differentiable activation function Æ sigmoid function:Slope of Sigmoid Function Backpropagation Training (Detail)• Input: training data (x1,y1),…, (xn,yn), learning rate parameter α.• Initialize weights.• Until performance is satisfactory– For each training instance,• Compute the resulting output• Compute βz= (yz–oz) for nodes in the output layer• Compute βj= ∑kwjÆkok(1 – ok) βkfor all other nodes.• Compute weight changes for all weights using∆wiÆj(l) = oioj(1 – oj) βj– Add up weight changes for all training instances, and update the weights accordingly. wiÆ,j← wiÆ,j+ α ∑l∆wiÆ,j(l)Hidden Units• Hidden units are nodes that are situated between the input nodesand the output nodes. • Hidden units allow a network to learn non-linear functions.• Hidden units allow the network to represent combinations of the input features. • Given too many hidden units, a neural net will simply memorize the input patterns (overfitting).• Given too few hidden units, the network may not be able to represent all of the necessary generalizations (underfitting).How long should you train the net?How long should you train the net? • The goal is to achieve a balance between correct responses for the training patterns and correct responses for new patterns. (That is, a balance between memorization and generalization). • If you train the net for too long, then you run the risk of overfitting.• In general, the network is trained until it reaches an acceptable error rate (e.g. 95%). Design Decisions• Choice of learning rate r• Stopping criterion – when should training stop?• Network architecture– How many hidden layers? How many hidden units per layer?– How should the units be connected? (Fully? Partial? Use domain knowledge?)• How many restarts (local optima) of search to find good optimum of objective


View Full Document
Download Lecture Slides
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Slides and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Slides 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?