DOC PREVIEW
Pitt CS 2710 - Supervised learning

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 2710 Foundations of AICS 2710 Foundations of AILecture 22Milos [email protected] Sennott SquareSupervised learning.Linear and logistic regression.CS 2710 Foundations of AISupervised learningData: a set of n examples is an input vector of size dis the desired output (given by a teacher)Objective: learn the mapping s.t.• Regression: Y is continuousExample: earnings, product orders company stock price• Classification: Y is discreteExample: handwritten digit digit label},..,,{21 nDDDD =>=<iiiyD ,xYXf →:nifyii,..,1allfor)(=≈ x),,(,2,1, diiiixxx L=xiy2CS 2710 Foundations of AILinear regression.CS 2710 Foundations of AILinear regression• Function is a linear combination of input components∑=+=+++=djjjddxwwxwxwxwwf1022110)( Kxkwww K,,10- parameters (weights)∑11x),( wxf0w1w2wdwdx2xxInput vectorBias termYXf →:3CS 2710 Foundations of AILinear regression• Shorter (vector) definition of the model– Include bias constant in the input vectorxwxTddxwxwxwxwf =+++= K221100)(kwww K,,10- parameters (weights)∑11x),( wxf0w1w2wdwdx2xxInput vector),,,1(21 dxxx L=xCS 2710 Foundations of AILinear regression. Error.• Data:• Function:• We would like to have• Error function– measures how much our predictions deviate from the desired answers• Learning: We want to find the weights minimizing the error !2,..1))((1iininfynJ x−=∑=>=<iiiyD ,x)(iif xx →nifyii,..,1allfor)(=≈xMean-squared error4CS 2710 Foundations of AILinear regression. Example• 1 dimensional input-1.5 -1 -0.5 0 0.5 1 1. 5 2-15-10-5051015202530)(1x=xCS 2710 Foundations of AILinear regression. Example.• 2 dimensional input-3-2-10123-4-2024-20-15-10-505101520),(21xx=x5CS 2710 Foundations of AILinear regression. Optimization.•We want the weights minimizing the error• For the optimal set of parameters, derivatives of the error withrespect to each parameter must be 0• Vector of derivatives:2,..12,..1)(1))((1iTiniiininynfynJ xwx −=−=∑∑==0xxwwwww=−−=∇=∑=iiTininnynJJ )(2))(())((grad10)(2)(,,1,10,01=−−−−−=∂∂∑=jididiiininjxxwxwxwynJwKwCS 2710 Foundations of AISolving linear regressionBy rearranging the terms we get a system of linear equationswith d+1 unknowns0)(2)(,,1,10,01=−−−−−=∂∂∑=jididiiininjxxwxwxwynJwKwjiniijinididjinijijjiniijiniixyxxwxxwxxwxxw,1,1,,1,,11,1,10,0∑∑∑∑∑======+++++ KKbAw=1111111,1,11,110,0∑∑∑∑∑======+++++niinididnijijniiniiyxwxwxwxw KK1,11,1,1,1,1,11,11,10,0 iniiinididinijijiniiiniixyxxwxxwxxwxxw∑∑∑∑∑======+++++ KK6CS 2710 Foundations of AISolving linear regression• The optimal set of weights satisfies:Leads to a system of linear equations (SLE) with d+1unknowns of the formSolution to SLE:• matrix inversion0xxwww=−−=∇∑=iiTininynJ )(2))((1jiniijinididjinijijjiniijiniixyxxwxxwxxwxxw,1,1,,1,,11,1,10,0∑∑∑∑∑======+++++ KKbAw=bAw1−=CS 2710 Foundations of AIGradient descent solutionGoal: the weight optimization in the linear regression modelAn alternative to SLE solution: • Gradient descentIdea:– Adjust weights in the direction that improves the Error– The gradient tells us what is the right direction-a learning rate (scales the gradient changes))(wwww iError∇−←α2,..1)),((1)( wxwiininfynErrorJ −==∑=0>α7CS 2710 Foundations of AIGradient descent method• Descend using the gradient information• Change the value of w according to the gradientw*|)(wwwError∇*w)( wError)(wwww iError∇−←αDirection of the descentCS 2710 Foundations of AIGradient descent method• New value of the parameter- a learning rate (scales the gradient changes)w*|)(wwErrorw∂∂*w)(wError*|)(*wjjjwErrorwww∂∂−←α0>αFor all j8CS 2710 Foundations of AIGradient descent method• Iteratively converge to the optimum of the Error functionw)0(w)(wError)2(w)1(w)3(wCS 2710 Foundations of AIOnline gradient algorithm• The error function is defined for the whole dataset D• error for a sample• Online gradient method: changes weights after every sample• vector form: 2,..1)),((1)( wxwiininfynErrorJ −==∑=2online)),((21)( wxwiiifyErrorJ −==)(wijjjErrorwww∂∂−←α0>α- Learning rate that depends on the number of updates)(wwww iError∇−←α>=<iiiyD ,x9CS 2710 Foundations of AIOnline gradient method2)),((21)( wxwiiionlinefyErrorJ −==(i)-th update step with :xwxTf =)(Linear modelOn-line errorii1)( ≈αAnnealed learning rate:- Gradually rescales changes in weightsOn-line algorithm: generates a sequence of online updates)1(|)()()1()(−∂∂−←−ijiijijwErroriwwwwαj-th weight:>=<iiiyD ,xjiiiiijijxfyiww,)1()1()()),()((−−−+← wxαCS 2710 Foundations of AIOnline regression algorithmOnline-linear-regression (D, number of iterations)Initialize weightsfor i=1:1: number of iterationsdo select a data point from Dset update weight vectorend forreturn weights),,(210 dwwww K=wi/1=αiiify xwxww )),((−+←α),(iiiyD x=w• Advantages: very easy to implement, continuous data streams,adapts to changes in the model over time10CS 2710 Foundations of AIOn-line learning. Example-3 -2 -1 0 1 2 311.522.533.544.5-3 -2 -1 0 1 2 311.522.533.544.5-3 -2 -1 0 1 2 30.511.522.533.544.555.5-3 -2 -1 0 1 2 30.511.522.533.544.555.51234CS 2710 Foundations of AILogistic regression11CS 2710 Foundations of AIBinary classification• Two classes• Our goal is to learn how to correctly classify two types of examples – Class 0 – labeled as 0, – Class 1 – labeled as 1• We would like to learn• First step: we need to devise a model of the function f• Inspiration:neuron (nerve cells)}1,0{=Y}1,0{: →XfCS 2710 Foundations of AINeuron• neuron (nerve cell) and its activities12CS 2710 Foundations of AINeuron-based binary classification model∑0w1w2wkwzx1yThreshold functionCS 2710 Foundations of AINeuron-based binary classification• Function we want to learnxInput vectorBias term∑1)1(x0w1w2wkw)2(xz)(kxThreshold functiony}1,0{: →Xf13CS 2710 Foundations of AILogistic regression model• A function model with smooth switching:)...()()()1(10kkxwxwwgf ++=x)1/(1)(zezg−+=where w are parameters of the modelsand g(z) is a logistic functionxInput vectorBias term∑1)1(x0w1w2wkw)2(xz)(kxLogistic function)(xf]1,0[)(∈xfCS 2710 Foundations of AILogistic functionfunction• also referred to as sigmoid function• replaces the ideal threshold function with smooth


View Full Document

Pitt CS 2710 - Supervised learning

Documents in this Course
Learning

Learning

24 pages

Planning

Planning

25 pages

Lecture

Lecture

12 pages

Load more
Download Supervised learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Supervised learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Supervised learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?