DOC PREVIEW
UT Dallas CS 6375 - regression

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS6375 Machine LearningSimple Linear Regression, Logistic RegressionInstructor: Yang LiuSpring 2015Slides modified Tom Mitchell, Paul Resnick 2Regression Models Answer ‘what is the relationship between the variables?’ 1 numerical dependent (response) variable What is to be predicted 1 or more numerical or categorical independent (explanatory) variables Find a simple, convenient mathematical function to fit data samples3Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleNon-LinearMultipleLinear1 ExplanatoryVariableRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleNon-LinearMultipleLinear1 ExplanatoryVariableBased on number of explanatory variables & nature of relationshipY Xi i i= + +β β ε0 1Linear Regression Model Relationship between variables is a linear function  e.g., relationship between income and educationDependent (Response) Variable(e.g., income)Independent (Explanatory) Variable (e.g., education)Random Error502040600 20 40 60XYScattergram Plot of all (Xi, Yi) pairs Suggests how well model will fit602040600 20 40 60XYThinking ChallengeHow would you draw a line through the points? How do you determine which line ‘fits best’?7Least Squares ‘Best Fit’ means difference between actual Y values & predicted Y is minimum LS minimizes the sum of the squared differences()∑∑===−niiniiiYY1212ˆˆε()∑∑===−niiniiiYY1212ˆˆε8Least Squares Graphicallyε2YXε1ε3ε4^^^^ε2YXε1ε3ε4^^^^Y X2 0 1 2 2==== ++++ ++++$ $$ββββ ββββ εεεεY X2 0 1 2 2==== ++++ ++++$ $$ββββ ββββ εεεε$$ $Y Xi i==== ++++ββββ ββββ0 1$$ $Y Xi i==== ++++ββββ ββββ0 1LS minimizes $ $ $ $ $εεεε εεεε εεεε εεεε εεεεiin2112223242==== ++++ ++++ ++++====∑∑∑∑LS minimizes $ $ $ $ $εεεε εεεε εεεε εεεε εεεεiin2112223242==== ++++ ++++ ++++====∑∑∑∑9Derivation of Parameter Equations Goal: Minimize squared error()( )xnnynxyxyiiiii1010021002ˆˆ2))ˆˆ(2(ˆˆˆˆˆ0ββββββββε−−−=−−−=∂−−∂=∂∂=∑∑∑xy10ˆˆββ−=10Derivation of Parameter Equations()( )( )∑−+−−=∑−−−=∂∑−−∂=∂∑∂=iiiiiiiiixxyyxxyxxy1110121012ˆˆ2ˆˆ2ˆˆˆˆˆ0ββββββββε()()( )( ) ( )()xxxyiiiiiiiiSSSSyyxxxxxxyyxxxx=∑ ∑−−=−−∑ ∑−=−111ˆˆˆβββ11Coefficient EquationsSample SlopeSample Y-interceptPrediction Equationxy10ˆˆββ−=()()( )∑−∑−−==21ˆxxyyxxSSSSiiixxxyβiixy10ˆˆˆββ+=12Interpretation of Coefficients Slope (β1) Estimated Y changes by β1for each 1 unit increase in X If β1= 2, then sales (Y) is expected to increase by 2 for each 1 unit increase in advertising (X) Y-Intercept (β0) Average value of Y when X = 0 If β0= 4, then average sales (Y) is expected to be 4 when advertising (X) is 0^^^^^13Example: R&D and New Products How does investment in R&D affect the number of new products developed? We can postulate the following relation: # of new products = α + β*Investment in R&D + uRD8006004002000NEWPROD50403020100Scatter plot14Example: R&D and New Products  The estimate for β = 0.049 This tells us that in order to increase the number of new products in one unit, we need to invest a little bit more than 20 monetary units in R&D. If a company invests 1000 in R&D, we would predict this company to develop around 49 new productsRD8006004002000NEWPROD5040302010015Logistic Regression It’s actually a binary classifier16Another Example: Failing or Passing an Exam Let us define a variable ‘Outcome’ Outcome = 0 if the individual fails the exam= 1 if the individual passes the exam We can reasonably assume that Failing or Passing an exam depends on the quantity of hours we use to study Note that in this case, the dependent variable takes only two possible values. We will call it ‘dichotomic’ variable17Regression Analysis with Dichotomic Dependent Variables We will be interested then in inference about the probability of passing the exam.  Were we to use linear regression, we would postulate:Prob (Outcome=1) = α + β*Quantity of hours of study + uWe will call this model a ‘Linear Probability Model’ (LPM)18Linear Probability Models (LPM) Our dataset contains information about 14 students.  Our statistical software will happily perform a linear regression of Outcome, on the quantity of study hours.Student id Outcome Quantity of Study Hours1 0 32 1 343 0 174 0 65 0 126 1 157 1 268 1 299 0 1410 1 5811 0 212 1 3113 1 2614 0 1119Linear Probability Models (LPM) –What is Wrong about them? Let us do a scatter plot and insert the regression line: A straight line will predict values between negative and positive infinity, outside the [0,1] interval!HSTUDY6050403020100OUTCOME1.21.0.8.6.4.20.0-.220Non-Linear Probability Models Goal: model the probability of the event occurring with an explanatory variable ‘X’ The predicted probability need to be [0,1]. There is a threshold above which the probability hardly increases as a reaction to changes in the explanatory variable. Many functions meet these requirements (non-linearity and being bounded within [0,1]). We will focus on the Logistic.21Logistic Regression (starting from Naïve bayes) Consider learning f: X->Y, where X is a vector of real-valued features <x1,…xn>, Y is boolean We could use a Gaussian naïve bayes classifier Assume all xiare conditionally independent given Y Model P(xi|Y=yk) as Gaussian N(µik,δi) Model P(Y) as Bernouli (π) What does that imply about the form of P(Y|X)?22232425Training Logistic Regression: MCLE Choose parameters W=<w0,…wn> to maximize conditional likelihood of training data Training data D={<X1,Y1>…<XL,YL>}  Data likelihood = Data conditional likelihood = 2627Gradient Descent No closed-form solution to maximize l(w) Use gradient descent 2829Logistic Regression vs. Naïve Bayes Functional form follows from naïve bayes assumption Training procedure picks parameters without the conditional independence assumption Pick W to maximize P(Y|X,W)30Generative vs. Discriminative classifier Generative (e.g., naïve bayes) Assume some functional form P(X|Y), P(Y) This is the ‘generative’ model Estimate parameters of P(X|Y),P(Y) directly from training data Use bayes rule to calculate P(Y|x=xi) Discriminative Assume


View Full Document

UT Dallas CS 6375 - regression

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?