Pitt CS 2750 - Linear regression - D95275

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 2750> Linear regression

DOC PREVIEW

Pitt CS 2750 - Linear regression

School name University of Pittsburgh

Course Cs 2750- Machine Learning

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 7Milos [email protected] Sennott SquareLinear regressionCS 2750 Machine LearningOutline Regression • Linear model• Error function based on the least squares fit.• Parameter estimation. • Gradient methods.• On-line regression techniques. • Linear additive models• Statistical model of linear regression2CS 2750 Machine LearningSupervised learningData: a set of n examples is an input vector of size dis the desired output (given by a teacher)Objective: learn the mapping s.t.• Regression: Y is continuousExample: earnings, product orders company stock price• Classification: Y is discreteExample: handwritten digit in binary form digit label},..,,{21 nDDDD =>=<iiiyD ,xYXf →:nifyii,..,1allfor)(=≈ x),,(,2,1, diiiixxx L=xiyCS 2750 Machine LearningLinear regression• Function is a linear combination of input components∑=+=+++=djjjddxwwxwxwxwwf1022110)( Kxkwww K,,10- parameters (weights)∑11x),( wxf0w1w2wdwdx2xxInput vectorBias termYXf →:3CS 2750 Machine LearningLinear regression• Shorter (vector) definition of the model– Include bias constant in the input vectorxwxTddxwxwxwxwf =+++= K221100)(kwww K,,10- parameters (weights)∑11x),( wxf0w1w2wdwdx2xxInput vector),,,1(21 dxxx L=xCS 2750 Machine LearningLinear regression. Error.• Data:• Function:• We would like to have• Error function– measures how much our predictions deviate from the desired answers• Learning: We want to find the weights minimizing the error !2,..1))((1iininfynJ x−=∑=>=<iiiyD ,x)(iif xx →nifyii,..,1allfor)(=≈xMean-squared error4CS 2750 Machine LearningLinear regression. Example• 1 dimensional input-1.5 -1 -0.5 0 0.5 1 1. 5 2-15-10-5051015202530)(1x=xCS 2750 Machine LearningLinear regression. Example.• 2 dimensional input-3-2-10123-4-2024-20-15-10-505101520),(21xx=x5CS 2750 Machine LearningLinear regression. Optimization.•We want the weights minimizing the error• For the optimal set of parameters, derivatives of the error withrespect to each parameter must be 0• Vector of derivatives:2,..12,..1)(1))((1iTiniiininynfynJ xwx −=−=∑∑==0xxwwwww=−−=∇=∑=iiTininnynJJ )(2))(())((grad10)(2)(,,1,10,01=−−−−−=∂∂∑=jididiiininjxxwxwxwynJwKwCS 2750 Machine LearningLinear regression. Optimization.• defines a set of equations in w……0ww=))((gradnJ0)(2)(,,1,10,01=−−−−−=∂∂∑=jididiiininjxxwxwxwynJwKw0)(2)(1,,1,10,011=−−−−−=∂∂∑=ididiiininxxwxwxwynJwKw0)(2)(,1,10,010=−−−−−=∂∂∑=didiiininxwxwxwynJwKw0)(2)(,,1,10,01=−−−−−=∂∂∑=dididiiinindxxwxwxwynJwKw6CS 2750 Machine LearningSolving linear regressionBy rearranging the terms we get a system of linear equationswith d+1 unknowns0)(2)(,,1,10,01=−−−−−=∂∂∑=jididiiininjxxwxwxwynJwKwjiniijinididjinijijjiniijiniixyxxwxxwxxwxxw,1,1,,1,,11,1,10,0∑∑∑∑∑======+++++ KKbAw=1111111,1,11,110,0∑∑∑∑∑======+++++niinididnijijniiniiyxwxwxwxw KK1,11,1,1,1,1,11,11,10,0 iniiinididinijijiniiiniixyxxwxxwxxwxxw∑∑∑∑∑======+++++ KKCS 2750 Machine LearningSolving linear regression• The optimal set of weights satisfies:Leads to a system of linear equations (SLE) with d+1unknowns of the formSolution to SLE: ?0xxwww=−−=∇∑=iiTininynJ )(2))((1jiniijinididjinijijjiniijiniixyxxwxxwxxwxxw,1,1,,1,,11,1,10,0∑∑∑∑∑======+++++ KKbAw=7CS 2750 Machine LearningSolving linear regression• The optimal set of weights satisfies:Leads to a system of linear equations (SLE) with d+1unknowns of the formSolution to SLE:• matrix inversion0xxwww=−−=∇∑=iiTininynJ )(2))((1jiniijinididjinijijjiniijiniixyxxwxxwxxwxxw,1,1,,1,,11,1,10,0∑∑∑∑∑======+++++ KKbAw=bAw1−=CS 2750 Machine LearningGradient descent solutionGoal: the weight optimization in the linear regression modelAn alternative to SLE solution: • Gradient descentIdea:– Adjust weights in the direction that improves the Error– The gradient tells us what is the right direction-a learning rate (scales the gradient changes))(wwww iError∇−←α2,..1)),((1)( wxwiininfynErrorJ −==∑=0>α8CS 2750 Machine LearningGradient descent method• Descend using the gradient information• Change the value of w according to the gradientw*|)(wwwError∇*w)( wError)(wwww iError∇−←αDirection of the descentCS 2750 Machine LearningGradient descent method• New value of the parameter- a learning rate (scales the gradient changes)w*|)(wwErrorw∂∂*w)(wError*|)(*wjjjwErrorwww∂∂−←α0>αFor all j9CS 2750 Machine LearningGradient descent method• Iteratively approaches the optimum of the Error functionw)0(w)(wError)2(w)1(w)3(wCS 2750 Machine LearningOnline gradient algorithm• The error function is defined for the whole dataset D• error for a sample• Online gradient method: changes weights after every sample• vector form: 2,..1)),((1)( wxwiininfynErrorJ −==∑=2online)),((21)( wxwiiifyErrorJ −==)(wijjjErrorwww∂∂−←α0>α- Learning rate that depends on the number of updates)(wwww iError∇−←α>=<iiiyD ,x10CS 2750 Machine LearningOnline gradient method2)),((21)( wxwiiionlinefyErrorJ −==(i)-th update step with :xwxTf =)(Linear modelOn-line errorii1)( ≈αAnnealed learning rate:- Gradually rescales changesOn-line algorithm: generates a sequence of online updates)1(|)()()1()(−∂∂−←−ijiijijwErroriwwwwαj-th weight:>=<iiiyD ,xjiiiiijijxfyiww,)1()1()()),()((−−−+← wxαFixed learning rate:Ci=)(α- Use a small constantCS 2750 Machine LearningOnline regression algorithmOnline-linear-regression (D, number of iterations)Initialize weightsfor i=1:1: number of iterationsdo select a data point from Dset learning rateupdate weight vectorend forreturn weights),,(210 dwwww K=w)(iαiiifyi xwxww )),()((−+←α),(iiiyD x=w• Advantages: very easy to implement, continuous data streams11CS 2750 Machine LearningOn-line learning. Example-3 -2 -1 0 1 2 311.522.533.544.5-3 -2 -1 0 1 2 311.522.533.544.5-3 -2 -1 0 1 2 30.511.522.533.544.555.5-3 -2 -1 0 1 2 30.511.522.533.544.555.51234CS 2750 Machine LearningPractical concerns: Input normalization• Input normalization– makes the data vary roughly on the same scale. – Can make a huge difference in on-line learningAssume on-line update (delta) rule for two weights j,k,:For inputs with a large magnitude the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Pitt CS 2750 - Linear regression

Sign up for free to view:

Please select your school