DOC PREVIEW
CMU CS 15381 - Regression and Neural Networks

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-381: Artificial IntelligenceRegression and neural networks (NN)Mimicking the brain• In the early days of AI there was a lot of interest indeveloping models that can mimic human thinking.• While no one knew exactly how the brain works (and,even though there was a lot of progress since, there isstill little known), some of the basic computational unitswere known• A key component of these units is the neuron.The Neuron• A cell in the brain• Highly connected to otherneurons• Thought to performcomputations by integratingsignals from other neurons• Outputs of thesecomputation may betransmitted to one or moreneuronsWhat can we do with NN?• Classification - We already mentioned many useful applications• Regression Input: Real valued variables Output: One or more real values• Examples: - Predict the price of Google’s stock from Microsoft’sstock price - Predict distance to obstacle from various sensorsLinear regression• Given an input x we wouldlike to compute an output y• In linear regression weassume that y and x arerelated with the followingequation: y = wx+ε where w is a parameterand ε representsmeasurement or othernoiseXYMultivariate regression: Leastsquares• We can re-write multivariate regression as y = Xw• The solution is: w = (XTX)-1XTy• The is an instance of a larger set of computational solutions whichare usually referred to as ‘generalized least squares’• XTX is a k by k matrix• XTy is a vector with k entriesMultivariate regression: Leastsquares• We can re-write our model as y = Xw• The solution turns out to be: w = (XTX)-1XTyWe need to invert a k by k matrix• This takes O(k3)• Depending on k this can be rather slowWhere we are• Linear regression – solved!• But - Solution may be slow - Does not address general regression problems of theform y = f(Xw)Back to NN: Preceptron• The basic processing unit of a neural nety=f(∑wixi)w0w1w2wkx1x2xk1Linear regression• Lets start by setting f(∑wixi)=∑wixi• We are back to linear regression• Unlike our original linear regressionsolution, for perceptrons we will use adifferent strategy• Why? - We will discuss this later, for now letsfocus on the solution …y=wixiw0w1w2wkx1x2xk1Gradient descentz=(f(w)-y)2wSlope = ∂z/ ∂wΔzΔw• Going in the opposite direction to the slope will lead toa smaller z• But not too much, otherwise we would go beyond theoptimal wGradient descent• Going in the opposite direction to the slope will lead toa smaller z• But not too much, otherwise we would go beyond theoptimal w• We thus update the weights by setting: where λ is small constant which is intended to preventus from passing the optimal wwzww!!"#$Example when choosing the ‘right’λ• We get a monotonically decreasing error as we performmore updatesGradient descent for linearregression• We compute the gradient w.r.t. to each wi• And if we have n measurements then where xj,i is the i’th value of the j’th input vector)(22!!""=#$%&'("))kkkikkkixwyxxwyw!!=="=##njjTjijnjjTjiyxyw1,12)(2)( xw-xw-Gradient descent for linearregression• If we have n measurements then• Set• Then our update rule can be written as!!=="=##njjTjijnjjTjiyxyw1,12)(2)( xw-xw-)(jTjjy xw-=!!=+"njjijiixww1,2#$Gradient descent algorithm forlinear regression1.Chose λ2.Start with a guess for w3.Compute δj for all j4.For all i set5.If no improvement forstop. Otherwise go to step 3!=+"njjijiixww1,2#$!=njjTjy12)( xw-Example• W = 2Gradient descent vs. matrixinversion• Advantages of matrix inversion - No iterations - No need to specify parameters - Closed form solution in a predictable time• Advantages of gradient descent - Applicable regardless of the number of parameters - General, applies to other forms of regressionPerceptrons for classification• So far we discussed regression• However, perceptrons can also be used for classification• For example, output 1 is wTx > 0 and -1 otherwise• Problem?Perceptrons for classification• So far we discussed regression• However, perceptrons can also be used for classification• Outputs either 0 or 1• We predict 1if wTx > 1/2 and 0 otherwise• Problem?yxBest least squares fitBest classifierThe sigmoid function• To classify using a perceptron wereplace the linear function with thesigmoid function:• Using the sigmoid we would minimize• Where yj is either 0 or 1 depending onthe classhehg!+=11)(!="njjTjgy12))(( xwGradient descent with sigmoid• Once we defined our target function, we can minimize it usinggradient descent• This involves some math, and relies on the following derivation*:• So,))(1)(()(' hghghg !=*I have included a derivation of this at the end ofthe lecture notesijjTjTnjjTjjTijTnjjTjjTnjjijTjnjjTjixgggywggygywgygyw,11112))(1)(())((2)('))((2))(())((2))((xwxwxwxwxwxwxwxwxw!!!=""!!=!""!=!""####====Gradient descent with sigmoid)(jTjjgy xw!=")(jTjgg xw=ijjTjTnjjTjnjjTjixgggygyw,112))(1)(())((2))(( xwxwxwxw !!!=!""##==SetijjnjjjnjjTjixgggyw,112)1(2))(( !!=!""##==$xwijjnjjjiixggww,1)1(2 !+"#=$%So our update rule is:Revised algorithm for sigmoidregression1.Chose λ2.Start with a guess for w3.Compute δj for all j4.For all i set5.If no improvement forstop. Otherwise go to step 3!=njjTjgy12))( x(w-ijjnjjjiixggww,1)1(2 !+"#=$%Multilayer neural networks• So far we discussed networks with one layer.• But these networks can be extended to combine severallayers, increasing the set of functions that can berepresented using a NNv1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2Often called the ‘hidden layer’Learning the parameters formultilayer networks• Gradient descent works by connecting the output to theinputs.• But how do we use it for a multilayer network?• We need to account for both, the output weights and thehidden layer weightsv1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2Learning the parameters formultilayer networks• Its easy to compute the update rule for the output weightsw1 and w2: whereijjnjjjiivggww,1)1(2 !+"#=$%v1=g(wTx)w0,1x1x21v2=g(wTx)y=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2)(jTjjgy vw!="Learning the parameters formultilayer networks• Its easy to compute the update rule for the output weightsw1 and w2: whereijjnjjjiivggww,1)1(2 !+"#=$%v1=g(wTx)w0,1x1x21v2=g(wTx)v1=g(wTv)w1,1w2,1w0,2w1,2w2,2w1w2)(jTjjgy vw!="But what is the error associated with each of thehidden layer states?Backpropagation• A method for


View Full Document

CMU CS 15381 - Regression and Neural Networks

Documents in this Course
Planning

Planning

19 pages

Planning

Planning

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

27 pages

Lecture

Lecture

19 pages

FOL

FOL

41 pages

lecture

lecture

34 pages

Exam

Exam

7 pages

Lecture

Lecture

22 pages

Handout

Handout

11 pages

Midterm

Midterm

14 pages

lecture

lecture

83 pages

Handouts

Handouts

38 pages

mdp

mdp

37 pages

HW2

HW2

7 pages

nn

nn

25 pages

lecture

lecture

13 pages

Handout

Handout

5 pages

Lecture

Lecture

27 pages

Lecture

Lecture

62 pages

Lecture

Lecture

5 pages

Load more
Download Regression and Neural Networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression and Neural Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression and Neural Networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?