CORNELL CS 472 - Foundations of Artificial Intelligence - D704042

Home> Schools> Cornell University> Computer Science (CS) > CS 472> Foundations of Artificial Intelligence

DOC PREVIEW

CORNELL CS 472 - Foundations of Artificial Intelligence

School name Cornell University

Course Cs 472- Foundations of Artif Intllgnce

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Foundations of Artificial IntelligenceNeural NetworksCS472 – Fall 2007Thorsten JoachimsRestaurant Data SetLimited Expressiveness of Perceptrons• Minsky and Papert (1969) showed certain simple functions cannot be represented (e.g. Boolean XOR). Killed the field! •Mid 80th: Non-linear Neural Networks (Rumelhart et al. 1986)Neural Networks• Rich history, starting in the early forties (McCulloch and Pitts 1943).•Two views:– Modeling the brain– “Just” representation of complex functions(Continuous; contrast decision trees)• Much progress on both fronts.• Drawn interest from: Neuroscience, Cognitive science, AI, Physics, Statistics, and CS/EE.Neuron Why Neural Nets?Motivation:Solving problems under the constraints similar to those of the brain may lead to solutions to AI problems that would otherwise be overlooked.• Individual neurons operate very slowlymassively parallel algorithms• Neurons are failure-prone devicesdistributed representations• Neurons promote approximate matchingless brittle2Connectionist Models of LearningCharacterized by: • A large number of very simple neuron-like processing elements.• A large number of weighted connections between the elements.• Highly parallel, distributed control.• An emphasis on learning internal representations automatically.Artificial NeuronsActivation Functions:Example: Perceptron Perceptron Network2-Layer Feedforward NetworksBoolean functions:• Every boolean function can be represented by network with single hidden layer• But might require exponential (in number of inputs) hidden unitsContinuous functions:• Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer [Cybenko 1989; Hornik et al. 1989]Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988]. x1x2xNo1o2oOMulti-Layer Nets• Fully connected, two layer, feedforward3Backpropagation Training (Overview)Training data: –(x1,y1),…, (xn,yn), with target labels yz∈{0,1}Optimization Problem (single output neuron):– Variables: network weights wiÆj–Obj.:E=minw∑z=1..n(yz–o(xz))2,– Constraints: noneAlgorithm: local search via gradient descent.• Randomly initialize weights. • Until performance is satisfactory*, – Compute partial derivatives (∂ E / ∂wiÆj) of objective function E for each weight wiÆj– Update each weight by wiÆj← wiÆj+ α (∂ E / ∂ wiÆj) Smooth and Differentiable Threshold Function• Replace sign function by a differentiable activation function Æ sigmoid function:Slope of Sigmoid Function Backpropagation Training (Detail)• Input: training data (x1,y1),…, (xn,yn), learning rate parameter α.• Initialize weights.• Until performance is satisfactory– For each training instance,• Compute the resulting output• Compute βz= (yz–oz) for nodes in the output layer• Compute βj= ∑kwjÆkok(1 – ok) βkfor all other nodes.• Compute weight changes for all weights using∆wiÆj(l) = oioj(1 – oj) βj– Add up weight changes for all training instances, and update the weights accordingly. wiÆ,j← wiÆ,j+ α ∑l∆wiÆ,j(l)Hidden Units• Hidden units are nodes that are situated between the input nodesand the output nodes. • Hidden units allow a network to learn non-linear functions.• Hidden units allow the network to represent combinations of the input features. • Given too many hidden units, a neural net will simply memorize the input patterns (overfitting).• Given too few hidden units, the network may not be able to represent all of the necessary generalizations (underfitting).How long should you train the net?4How long should you train the net? • The goal is to achieve a balance between correct responses for the training patterns and correct responses for new patterns. (That is, a balance between memorization and generalization). • If you train the net for too long, then you run the risk of overfitting.• Select number of training iterations via cross-validation on a holdout set. Design Decisions• Choice of learning rate α• Stopping criterion – when should training stop?• Network architecture– How many hidden layers? How many hidden units per layer?– How should the units be connected? (Fully? Partial? Use domain knowledge?)• How many restarts (local optima) of search to find good optimum of objective

View Full Document