Unformatted text preview:

1CS 2750 Machine LearningCS 2750 Machine LearningLecture 8Milos [email protected] Sennott SquareLinear regression (cont.)Linear methods for classificationCS 2750 Machine LearningCoefficient shrinkage• The least squares estimates often have low bias but high variance • The prediction accuracy can be often improved by setting some coefficients to zero– Increases the bias, reduces the variance of estimates• Solutions:– Subset selection– Ridge regression– Principal component regression• Next: ridge regression2CS 2750 Machine LearningRidge regression• Error function for the standard least squares estimates: • We seek: • Ridge regression:•Where• What does the new error function do? 2,..1)(1)(iTininynJ xww −=∑=2,..1*)(1minargiTiniynxwww−=∑=22,..1)(1)( wxwwλ+−=∑=iTininynJ∑==diiw022w0≥λandCS 2750 Machine LearningRidge regression• Standard regression:• Ridge regression:• penalizes non-zero weights with the costproportional to (a shrinkage coefficient) • If an input attribute has a small effect on improving the error function it is “shut down” by the penalty term• Inclusion of a shrinkage penalty is often referred to as regularization2,..1)(1)(iTininynJ xww −=∑=22,..1)(1)( wxwwλ+−=∑=iTininynJ∑==diiw022wλjx3CS 2750 Machine LearningSupervised learningData: a set of n examples is input vector, and y is desired output (given by a teacher)Objective: learn the mapping s.t.Two types of problems:• Regression: Y is continuousExample: earnings, product orders company stock price•Classification: Y is discreteExample: temperature, heart rate diseaseToday: binary classification problems:},..,,{21 ndddD =>=<iiiyd ,xixYXf →:nixfyii,..,1allfor)(=≈CS 2750 Machine LearningBinary classification• Two classes• Our goal is to learn to classify correctly two types of examples– Class 0 – labeled as 0, – Class 1 – labeled as 1• We would like to learn• Zero-one error (loss) function• Error we would like to minimize:•First step: we need to devise a model of the function }1,0{=Y}1,0{: →Xf=≠=iiiiiiyfyfyError),(0),(1),(1wxwxx)),((1),(yErrorEyxx4CS 2750 Machine LearningDiscriminant functions• One convenient way to represent classifiers is through – Discriminant functions• Works for binary and multi-way classification• Idea: – For every class i = 0,1, …k define a functionmapping– When the decision on input x should be made choose the class with the highest value of• So what happens with the input space? Assume a binary case. )(xigℜ→X)(xigCS 2750 Machine LearningDiscriminant functions)()(01xx gg ≥-2 -1.5 -1 -0.5 0 0.5 1 1.5-2-1.5-1-0.500.511.525CS 2750 Machine LearningDiscriminant functions)()(01xx gg ≥)()(01xx gg≤-2 -1.5 -1 -0.5 0 0.5 1 1.5-2-1.5-1-0.500.511.52)()(01xx gg≤CS 2750 Machine LearningDiscriminant functions)()(01xx gg ≥)()(01xx gg≤-2 -1.5 -1 -0.5 0 0.5 1 1.5-2-1.5-1-0.500.511.52)()(01xx gg≤)()(01xx gg ≥6CS 2750 Machine LearningDiscriminant functions• Define decision boundary. )()(01xx gg ≥)()(01xx gg≤-2 -1.5 -1 -0.5 0 0.5 1 1.5-2-1.5-1-0.500.511.52)()(01xx gg ≥)()(01xx gg≤)()(01xx gg=CS 2750 Machine LearningQuadratic decision boundary-2 -1.5 -1 -0.5 0 0.5 1 1.5-2-1.5-1-0.500.511.522.53Decision boundary)()(01xx gg ≥)()(01xx gg ≤)()(01xx gg=7CS 2750 Machine LearningLogistic regression model• Defines a linear decision boundary• Discriminant functions:• where)()()(1xwxwwx,TTggf ==)1/(1)(zezg−+=xInput vector∑11x)( wx,f0w1w2wdw2xzdxLogistic function)()(1xwxTgg = )(1)(0xwxTgg −=- is a logistic functionCS 2750 Machine LearningLogistic functionfunction• also referred to as a sigmoid function• Replaces the threshold function with smooth switching • takes a real number and outputs the number in the interval [0,1])1(1)(zezg−+=-20 -15 -10 -5 0 5 10 15 2000.10.20.30.40.50.60.70.80.918CS 2750 Machine LearningLogistic regression model• Discriminant functions:• Where• Values of discriminant functions vary in [0,1]– Probabilistic interpretation)1/(1)(zezg−+=),|1( wx=ypxInput vector∑11x0w1w2wdw2xzdx)()(1xwxTgg = )(1)(0xwxTgg −=- is a logistic function)()(),|1()(1xwxxwwx,Tggypf ====CS 2750 Machine LearningLogistic regression• Instead of learning the mapping to discrete values 0,1 •we learn a probabilistic function–where f describes the probability of class 1 given xNote that:• Transformation to discrete class values:),|1()( wxwx,==ypf]1,0[: →Xf}1,0{: →Xf2/1)|1( ≥=xypIf then choose 1Else choose 0)|1(1),|0( wx,wx=−==ypyp9CS 2750 Machine LearningLinear decision boundary• Logistic regression model defines a linear decision boundary• Why?• Answer: Compare two discriminant functions.• Decision boundary:• For the boundary it must hold: 0)()(1log)()(log1=−=xwxwxxTTggggo)()(01xx gg=0)(explog)(exp11)(exp1)(explog)()(log1==−=−+−+−= xwxwxwxwxwxxTTTTTggoCS 2750 Machine LearningLogistic regression model. Decision boundary• LR defines a linear decision boundaryExample: 2 classes (blue and red points)-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2-1.5-1-0.500.511.52Decision boundary10CS 2750 Machine LearningLikelihood of outputs• Let• Then• Find weights w that maximize the likelihood of outputs– Apply the log-likelihood trick The optimal weights are the same for both the likelihood and the log-likelihoodLogistic regression: parameter learning.=−=−=∑∏=−=−niyiyiniyiyiiiiiDl1111)1(log)1(log),(µµµµw∏∏=−=−===niyiyiiiniiiyyPDL111)1(),|(),(µµwxw)()(),|1( xwwxTiiiigzgyp ====µ)1log()1(log1iiiniiyyµµ−−+=∑=>=<iiiyD ,xCS 2750 Machine LearningLogistic regression: parameter learning• Log likelihood• Derivatives of the loglikelihood• Gradient descent:)1log()1(log),(1iiiniiyyDlµµ−−+=∑=w)),(())((),(11iiiniiTiinifygyDl xwxxwxww−−=−−=−∇∑∑==)1(|)],([)()1()(−−∇−←−kDlkkkwwwwwαNonlinear in weights !!∑=−−−+←niiikikkfyk1)1()1()()],([)( xxwwwα))((),(1, iinijijzgyxDlw−−=∂∂−∑=w11CS 2750 Machine LearningLogistic regression. Online gradient descent• On-line component of the loglikelihood• On-line learning update for weight w• ith update for the logistic regression and)1(|)],([)()1()(−∇−←−kkonlinekkDJkwwwwwα),( wkonlineDJ>=<kkkyD ,xkkkikifyk xxwww )],()[()1()1()(


View Full Document

Pitt CS 2750 - Linear regression

Download Linear regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Linear regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Linear regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?