UI CS 4420 - Artificial Intelligence - D1121823

Home> Schools> University of Iowa> Computer Science (CS) > CS 4420> Artificial Intelligence

DOC PREVIEW

UI CS 4420 - Artificial Intelligence

School name University of Iowa

Course Cs 4420- Artificial Intelligence

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Artificial Intelligence{Example: A Feed-forward Network}{Computing with NNs}{Learning $=$ Training in Neural Networks}{Process for Developing Neural Networks}{The Perceptron Learning Method}{Normalizing Unit Thresholds.}{The Perceptron Learning Method}{The Perceptron Learning Method}{Theoretic Background}{Weight update rule}{Learning a 5-place Minority Function}{Learning a 5-place Minority Function}{Multilayer perceptrons}{Expressiveness of MLPs}{Back-propagation learning}{Back-propagation derivation}{Back-propagation derivation}{Decision Trees for Classification}{Decision trees}{Expressiveness}{Hypothesis spaces}{Hypothesis spaces}{Hypothesis spaces}{Hypothesis spaces}{Hypothesis spaces}{Hypothesis spaces}{Decision tree learning}{Information}{Information}{Information}{Example contd.}{Summary}Artificial IntelligenceLearning and NeuralNetworksReadings: Chapter 19 & 20.5 ofRussell & NorvigExample: A Feed-forward Network2I1IO5w13w14w23w24w35w45H3H4a5= g5(W3,5a3+ W4,5a4)= g5(W3,5g3(W1,3a1+ W2,3a2) + W4,5g4(W1,4a1+ W2,4a2))where aiis the output and giis the activation function of node i.Computing with NNsDifferent functions are implemented by different networktopologies and unit weights.The lure of NNs is that a network need not be explicitlyprogrammed to compute a certain functionf.Given enough nodes and links, a NN can learn thefunction by itself.It does so by looking at a training set of input/outputpairs forf and modifying its topology and weights sothat its own input/output behavior agrees with thetraining pairs.In other words, NNs learn by induction, too.Learning = Training in Neural NetworksNeural networks are trained using data referred to as atraining set.The process is one of computing outputs, compareoutputs with desired answers, adjust weights andrepeat.The information of a Neural Network is in its structure,activation functions, weights, andLearning to use different structures and activationfunctions is very difficult.These weights are used to express the relative strengthof an input value or from a connecting unit (i.e., inanother layer). It is by adjusting these weights that aneural network learns.Process for Developing Neural Networks1. Collect data Ensure that application is amenable to a NNapproach and pick data randomly.2.Separate Data into Training Set and Test Set3. Define a Network Structure Are perceptrons sufficient?4.Select a Learning Algorithm Decided by available tools5.Set Parameter Values They will affect the length of thetraining period.6.Training Determine and revise weights7.Test If not acceptable, go back to steps 1, 2, ..., or 5.8.Delivery of the productThe Perceptron Learning MethodWeight updating in perceptrons is very simple becauseeach output node is independent of the other output nodes.Perceptron Network Single PerceptronInputUnits UnitsOutputInputUnits UnitOutputOIjWj,iOiIjWjWith no loss of generality then, we can consider aperceptron with a single output node.Normalizing Unit Thresholds.Notice that, if t is the threshold value of the output unit,thenstept(nXj=1WjIj) = step0(nXj=0WjIj)where W0= t and I0= −1.Therefore, we can always assume that the unit’sthreshold is0 if we include the actual threshold as theweight of an extra link with a fixed input value.This allows thresholds to be learned like any otherweight.Then, we can even allow output values in [0, 1] byreplacingstep0by the sigmoid function.The Perceptron Learning MethodIf O is the value returned by the output unit for a givenexample andT is the expected output, then the unit’serror isErr = T − OIf the error Err is positive we need to increase O;otherwise, we need to decreaseO.The Perceptron Learning MethodSince O = g(Pnj=0WjIj), we can change O by changingeachWj.Assuming g is monotonic, to increase O we shouldincreaseWjif Ijis positive, decrease Wjif Ijisnegative.Similarly, to decrease O we should decrease Wjif Ijispositive, increaseWjif Ijis negative.This is done by updating each Wjas followsWj← Wj+ α × Ij× (T − O)where α is a positive constant, the learning rate.Theoretic BackgroundLearn by adjusting weights to reduce error on training setThe squared error for an example with inputx and trueoutputy isE =12Err2≡12(y − hW(x))2,Perform optimization search by gradient descent:∂E∂Wj= Err ×∂Err∂Wj= Err ×∂∂Wjy − g(nXj = 0Wjxj)= −Err × g0(in) × xjWeight update ruleWj← Wj− α ×∂E∂Wj= Wj+ α × Err × g0(in) × xjE.g., positive error =⇒ increasing network output =⇒increasing weights on positive inputs and decreasing onnegative inputsSimple weight update rule (assumingg0(in) constant):Wj← Wj+ α × Err × xjLearning a 5-place Minority FunctionAt first, collect the data (see below), then choose a structure (a perceptronwith five inputs and one output) and the activation function (i.e., step−3).Finally, set up parameters (i.e., Wi= 0) and start to learn:Assumping α = 1, we have Sum =P5i=1WiIi, Out = step−3(Sum),Err = T − Out, and Wj← Wj+ Ij∗ Err.I1I2I3I4I5T W1W2W3W4W5Sum Out Erre11 0 0 0 1 1 0 0 0 0 0e21 1 0 1 0 0e30 0 0 1 1 1e41 1 1 1 0 0e50 1 0 1 0 1e60 1 1 1 1 0e70 1 0 1 0 1e81 0 1 0 0 1Learning a 5-place Minority FunctionThe same as the last example, except that α = 0.5 insteadofα = 1. Sum =P5i=1WiIi, Out = step−3(Sum),Err = T − Out, and Wj← Wj+ Ij∗ Err.I1I2I3I4I5T W1W2W3W4W5Sum Out Erre11 0 0 0 1 1 0 0 0 0 0e21 1 0 1 0 0e30 0 0 1 1 1e41 1 1 1 0 0e50 1 0 1 0 1e60 1 1 1 1 0e70 1 0 1 0 1e81 0 1 0 0 1Multilayer perceptronsLayers are usually fully connected;numbers of hidden units typically chosen by handInput unitsHidden unitsOutput unitsaiWj,iajWk,jakExpressiveness of MLPsAll continuous functions w/ 2 layers, all functions w/ 3 layers-4-2024x1-4-2024x200.10.20.30.40.50.60.70.80.9hW(x1, x2)-4-2024x1-4-2024x200.10.20.30.40.50.60.70.80.91hW(x1, x2)Back-propagation learningOutput layer: same as for single-layer perceptron,Wj,i← Wj,i+ α × aj× ∆iwhere ∆i= Erri× g0(ini)Hidden layer: back-propagate the error from the output layer:∆j= g0(inj)XiWj,i∆i.Update rule for weights in hidden layer:Wk,j← Wk,j+ α × ak× ∆j.(Most neuroscientists deny that back-propagation occurs inthe brain)Back-propagation derivationThe squared error on a single example is defined asE =12Xi(yi− ai)2,where the sum is over the nodes in the output layer.∂E∂Wj,i= −(yi− ai)∂ai∂Wj,i= −(yi− ai)∂g(ini)∂Wj,i= −(yi− ai)g0(ini)∂ini∂Wj,i= −(yi− ai)g0(ini)∂∂Wj,iXjWj,iaj= −(yi−

View Full Document