CMU CS 15381 - Regression and Neural Networks - D2581569

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15381> Regression and Neural Networks

DOC PREVIEW

CMU CS 15381 - Regression and Neural Networks

School name Carnegie Mellon University

Course Cs 15381- Artificial Intelligence: Representation and Problem Solving

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

15-381: Artificial IntelligenceRegression and neural networks (NN)Mimicking the brain• In the early days of AI there was a lot of interest indeveloping models that can mimic human thinking.• While no one knew exactly how the brain works (and,even though there was a lot of progress since, there isstill little known), some of the basic computational unitswere known• A key component of these units is the neuron.The Neuron• A cell in the brain• Highly connected to otherneurons• Thought to performcomputations by integratingsignals from other neurons• Outputs of thesecomputation may betransmitted to one or moreneuronsWhat can we do with NN?• Classification - We already mentioned many useful applications• Regression Input: Real valued variables Output: One or more real values• Examples: - Predict the price of Google’s stock from Microsoft’sstock price - Predict distance to obstacle from various sensorsLinear regression• Given an input x we wouldlike to compute an output y• In linear regression weassume that y and x arerelated with the followingequation: y = wx+ε where w is a parameterand ε representsmeasurement or othernoiseXYMultivariate regression: Leastsquares• We can re-write multivariate regression as y = Xw• The solution is: w = (XTX)-1XTy• The is an instance of a larger set of computational solutions whichare usually referred to as ‘generalized least squares’• XTX is a k by k matrix• XTy is a vector with k entriesMultivariate regression: Leastsquares• We can re-write our model as y = Xw• The solution turns out to be: w = (XTX)-1XTyWe need to invert a k by k matrix• This takes O(k3)• Depending on k this can be rather slowWhere we are• Linear regression – solved!• But - Solution may be slow - Does not address general regression problems of theform y = f(Xw)Back to NN: Preceptron• The basic processing unit of a neural nety=f(∑wixi)w0w1w2wkx1x2xk1Linear regression• Lets start by setting f(∑wixi)=∑wixi• We are back to linear regression• Unlike our original linear regressionsolution, for perceptrons we will use adifferent strategy• Why? - We will discuss this later, for now letsfocus on the solution …y=wixiw0w1w2wkx1x2xk1Gradient descentz=(f(w)-y)2wSlope = ∂z/ ∂wΔzΔw• Going in the opposite direction to the slope will lead toa smaller z• But not too much, otherwise we would go beyond theoptimal wGradient descent• Going in the opposite direction to the slope will lead toa smaller z• But not too much, otherwise we would go beyond theoptimal w• We thus update the weights by setting: where λ is small constant which is intended to preventus from passing the optimal wwzww!!"#$Example when choosing the ‘right’λ• We get a monotonically decreasing error as we performmore updatesGradient descent for linearregression• We compute the gradient w.r.t. to each wi• And if we have n measurements then where xj,i is the i’th value of the j’th input vector)(22!!""=#$%&'("))kkkikkkixwyxxwyw!!=="=##njjTjijnjjTjiyxyw1,12)(2)( xw-xw-Gradient descent for linearregression• If we have n measurements then• Set• Then our update rule can be written as!!=="=##njjTjijnjjTjiyxyw1,12)(2)( xw-xw-)(jTjjy xw-=!!=+"njjijiixww1,2#$Gradient descent algorithm forlinear regression1.Chose λ2.Start with a guess for w3.Compute δj for all j4.For all i set5.If no improvement forstop. Otherwise go to step 3!=+"njjijiixww1,2#$!=njjTjy12)( xw-Example• W = 2Gradient descent vs. matrixinversion• Advantages of matrix inversion - No iterations - No need to specify parameters - Closed form solution in a predictable time• Advantages of gradient descent - Applicable regardless of the number of parameters - General, applies to other forms of regressionPerceptrons for classification• So far we discussed regression• However, perceptrons can also be used for classification• For example, output 1 is wTx > 0 and -1 otherwise• Problem?Perceptrons for classification• So far we discussed regression• However, perceptrons can also be used for classification• Outputs either 0 or 1• We predict 1if wTx > 1/2 and 0 otherwise• Problem?yxBest least squares fitBest classifierThe sigmoid function• To classify using a perceptron wereplace the linear function with thesigmoid function:• Using the sigmoid we would minimize• Where yj is either 0 or 1 depending onthe classhehg!+=11)(!="njjTjgy12))(( xwGradient descent with sigmoid• Once we defined our target function, we can minimize it usinggradient descent• This involves some math, and relies on the following derivation*:• So,))(1)(()(' hghghg !=*I have included a derivation of this at the end ofthe lecture notesijjTjTnjjTjjTijTnjjTjjTnjjijTjnjjTjixgggywggygywgygyw,11112))(1)(())((2)('))((2))(())((2))((xwxwxwxwxwxwxwxwxw!!!=""!!=!""!=!""####====Gradient descent with sigmoid)(jTjjgy xw!=")(jTjgg xw=ijjTjTnjjTjnjjTjixgggygyw,112))(1)(())((2))(( xwxwxwxw !!!=!""##==SetijjnjjjnjjTjixgggyw,112)1(2))(( !!=!""##==$xwijjnjjjiixggww,1)1(2 !+"#=$%So our update rule is:Revised algorithm for sigmoidregression1.Chose λ2.Start with a guess for w3.Compute δj for all j4.For all i set5.If no improvement forstop. Otherwise go to step 3!=njjTjgy12))( x(w-ijjnjjjiixggww,1)1(2 !+"#=$%Multilayer neural networks• So far we discussed networks with one layer.• But these networks can be extended to combine severallayers, increasing the set of functions that can berepresented using a NNv1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2Often called the ‘hidden layer’Learning the parameters formultilayer networks• Gradient descent works by connecting the output to theinputs.• But how do we use it for a multilayer network?• We need to account for both, the output weights and thehidden layer weightsv1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2Learning the parameters formultilayer networks• Its easy to compute the update rule for the output weightsw1 and w2: whereijjnjjjiivggww,1)1(2 !+"#=$%v1=g(wTx)w0,1x1x21v2=g(wTx)y=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2)(jTjjgy vw!="Learning the parameters formultilayer networks• Its easy to compute the update rule for the output weightsw1 and w2: whereijjnjjjiivggww,1)1(2 !+"#=$%v1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2)(jTjjgy vw!="But what is the error associated with each of thehidden layer states?Backpropagation• A method for

View Full Document