UT Dallas CS 6375 - regression - D3105441

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 6375> regression

DOC PREVIEW

UT Dallas CS 6375 - regression

School name University of Texas at Dallas

Course Cs 6375- Machine Learning

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS6375 Machine LearningSimple Linear Regression, Logistic RegressionInstructor: Yang LiuSpring 2015Slides modified Tom Mitchell, Paul Resnick 2Regression Models Answer ‘what is the relationship between the variables?’ 1 numerical dependent (response) variable What is to be predicted 1 or more numerical or categorical independent (explanatory) variables Find a simple, convenient mathematical function to fit data samples3Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleNon-LinearMultipleLinear1 ExplanatoryVariableRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleNon-LinearMultipleLinear1 ExplanatoryVariableBased on number of explanatory variables & nature of relationshipY Xi i i= + +β β ε0 1Linear Regression Model Relationship between variables is a linear function  e.g., relationship between income and educationDependent (Response) Variable(e.g., income)Independent (Explanatory) Variable (e.g., education)Random Error502040600 20 40 60XYScattergram Plot of all (Xi, Yi) pairs Suggests how well model will fit602040600 20 40 60XYThinking ChallengeHow would you draw a line through the points? How do you determine which line ‘fits best’?7Least Squares ‘Best Fit’ means difference between actual Y values & predicted Y is minimum LS minimizes the sum of the squared differences()∑∑===−niiniiiYY1212ˆˆε()∑∑===−niiniiiYY1212ˆˆε8Least Squares Graphicallyε2YXε1ε3ε4^^^^ε2YXε1ε3ε4^^^^Y X2 0 1 2 2==== ++++ ++++$ $$ββββ ββββ εεεεY X2 0 1 2 2==== ++++ ++++$ $$ββββ ββββ εεεε$$ $Y Xi i==== ++++ββββ ββββ0 1$$ $Y Xi i==== ++++ββββ ββββ0 1LS minimizes $ $ $ $ $εεεε εεεε εεεε εεεε εεεεiin2112223242==== ++++ ++++ ++++====∑∑∑∑LS minimizes $ $ $ $ $εεεε εεεε εεεε εεεε εεεεiin2112223242==== ++++ ++++ ++++====∑∑∑∑9Derivation of Parameter Equations Goal: Minimize squared error()( )xnnynxyxyiiiii1010021002ˆˆ2))ˆˆ(2(ˆˆˆˆˆ0ββββββββε−−−=−−−=∂−−∂=∂∂=∑∑∑xy10ˆˆββ−=10Derivation of Parameter Equations()( )( )∑−+−−=∑−−−=∂∑−−∂=∂∑∂=iiiiiiiiixxyyxxyxxy1110121012ˆˆ2ˆˆ2ˆˆˆˆˆ0ββββββββε()()( )( ) ( )()xxxyiiiiiiiiSSSSyyxxxxxxyyxxxx=∑ ∑−−=−−∑ ∑−=−111ˆˆˆβββ11Coefficient EquationsSample SlopeSample Y-interceptPrediction Equationxy10ˆˆββ−=()()( )∑−∑−−==21ˆxxyyxxSSSSiiixxxyβiixy10ˆˆˆββ+=12Interpretation of Coefficients Slope (β1) Estimated Y changes by β1for each 1 unit increase in X If β1= 2, then sales (Y) is expected to increase by 2 for each 1 unit increase in advertising (X) Y-Intercept (β0) Average value of Y when X = 0 If β0= 4, then average sales (Y) is expected to be 4 when advertising (X) is 0^^^^^13Example: R&D and New Products How does investment in R&D affect the number of new products developed? We can postulate the following relation: # of new products = α + β*Investment in R&D + uRD8006004002000NEWPROD50403020100Scatter plot14Example: R&D and New Products  The estimate for β = 0.049 This tells us that in order to increase the number of new products in one unit, we need to invest a little bit more than 20 monetary units in R&D. If a company invests 1000 in R&D, we would predict this company to develop around 49 new productsRD8006004002000NEWPROD5040302010015Logistic Regression It’s actually a binary classifier16Another Example: Failing or Passing an Exam Let us define a variable ‘Outcome’ Outcome = 0 if the individual fails the exam= 1 if the individual passes the exam We can reasonably assume that Failing or Passing an exam depends on the quantity of hours we use to study Note that in this case, the dependent variable takes only two possible values. We will call it ‘dichotomic’ variable17Regression Analysis with Dichotomic Dependent Variables We will be interested then in inference about the probability of passing the exam.  Were we to use linear regression, we would postulate:Prob (Outcome=1) = α + β*Quantity of hours of study + uWe will call this model a ‘Linear Probability Model’ (LPM)18Linear Probability Models (LPM) Our dataset contains information about 14 students.  Our statistical software will happily perform a linear regression of Outcome, on the quantity of study hours.Student id Outcome Quantity of Study Hours1 0 32 1 343 0 174 0 65 0 126 1 157 1 268 1 299 0 1410 1 5811 0 212 1 3113 1 2614 0 1119Linear Probability Models (LPM) –What is Wrong about them? Let us do a scatter plot and insert the regression line: A straight line will predict values between negative and positive infinity, outside the [0,1] interval!HSTUDY6050403020100OUTCOME1.21.0.8.6.4.20.0-.220Non-Linear Probability Models Goal: model the probability of the event occurring with an explanatory variable ‘X’ The predicted probability need to be [0,1]. There is a threshold above which the probability hardly increases as a reaction to changes in the explanatory variable. Many functions meet these requirements (non-linearity and being bounded within [0,1]). We will focus on the Logistic.21Logistic Regression (starting from Naïve bayes) Consider learning f: X->Y, where X is a vector of real-valued features <x1,…xn>, Y is boolean We could use a Gaussian naïve bayes classifier Assume all xiare conditionally independent given Y Model P(xi|Y=yk) as Gaussian N(µik,δi) Model P(Y) as Bernouli (π) What does that imply about the form of P(Y|X)?22232425Training Logistic Regression: MCLE Choose parameters W=<w0,…wn> to maximize conditional likelihood of training data Training data D={<X1,Y1>…<XL,YL>}  Data likelihood = Data conditional likelihood = 2627Gradient Descent No closed-form solution to maximize l(w) Use gradient descent 2829Logistic Regression vs. Naïve Bayes Functional form follows from naïve bayes assumption Training procedure picks parameters without the conditional independence assumption Pick W to maximize P(Y|X,W)30Generative vs. Discriminative classifier Generative (e.g., naïve bayes) Assume some functional form P(X|Y), P(Y) This is the ‘generative’ model Estimate parameters of P(X|Y),P(Y) directly from training data Use bayes rule to calculate P(Y|x=xi) Discriminative Assume

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

UT Dallas CS 6375 - regression

Sign up for free to view:

Please select your school