UF STA 6166 - LINEAR REGRESSION - D965574

Home> Schools> University of Florida-Gainesville> (STA) > STA 6166> LINEAR REGRESSION

DOC PREVIEW

UF STA 6166 - LINEAR REGRESSION

School name University of Florida-Gainesville

Course Sta 6166- Statistical Methods in Research I

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Simple Linear RegressionTopic 16 – Linear Regression 16-1 Topic 16 - LINEAR REGRESSION In regression analysis, we are interested in moving beyond correlation to using one of the variables as a predictor of the other variable. The variable used for prediction is called the EXPLANATORY VARIABLE and the variable which we wish to predict (or estimate the mean of) is called the RESPONSE VARIABLE. EXAMPLE Wish to estimate a line (Y = β0 + β1X) that describes the relationship between Disease Index and Interval between Legume Crops. Our purpose is to predict the disease index for intervals that a framer might use between pea crops.Topic 16 – Linear Regression 16-2 Statistical software will always fit a line to any set of bivariate data. In order for the analysis to be statistically appropriate, you’ll also need to decide such things as a) is it appropriate (both variables are quantitative)? b) is the relationship statistically significant (is the slope = 0)? c) is the relationship linear or curvilinear? d) Do we need to account for other variables? e) Is there a time component to the data collection that hasn’t been included? We answer these questions using Defn: Regression Analysis is a statistical method for analyzing the relationship between a response variable (Y) and one or more explanatory variables (X1, X2, …, Xp). It differs from other modeling techniques in that Y must be continuous and at least one of the X variables must be quantitative (continuous or discrete).Topic 16 – Linear Regression 16-3 Simple Linear Regression 1) The Model In statistical jargon we write Yi = β0 + β1Xi + εi Deterministic part Random part where β0 = intercept β1 = slope (ΔY / ΔX) εi = error term for the ith observation and it has variance σ2 The deterministic part β0 + β1X equals the average value of Y for a given value of X (i.e. conditional mean of Y given X=xi). We write this as xxXY 10|ββμ+== where β0 = mean of Y when X = 0 (β0 + β1⋅0 = β0) (Note: it has no meaning if X cannot equal 0)Topic 16 – Linear Regression 16-4 β1 = change in the mean Index (Y) for a 1 year increase in Interval (X) since [β0 + β1(X+1)] – [β0 + β1X ] = β1 β1 < 0 = negative slope = as X increases, Y decreases Y X β1 > 0 = positive slope = as X increases, Y increases Y X β1 = 0 = no slope = no relationship between X and Y Y X X Aside: other types of likely relationships that can be considered to be linear: a) Y = m [log10(X)] + bTopic 16 – Linear Regression 16-5 b) log10(Y) = m [log10(X)] + b c) Y = mX2 + b The model is called linear since it is LINEAR in the parameters β0 and β1. Example: can be made linear in the parameters since I can take log0110ββ×= XY10 on both sides and get the equation: log10(Y) = β1 [log10(X)] + β0. On the other hand, cannot be made into a linear equation since the transformation does not yield a linear model: 0110ββ+= XYlog10(Y) = log10[Xβ1 + 10β0]. 2) Estimating the Model Parameters Suppose we have a data set for which the relationship appears to be linear BUT we do not know the parameters of the model (β0, β1, σ2) that describe the linear relationship. Then, we need to estimate these using the available, sample data.Topic 16 – Linear Regression 16-6 Method of Least Squares We wish to capture the relationship between X and Y using a straight line, i.e. we wish to “fit” a line to the observed data. One approach is to put the line at the location where it is simultaneously as close as possible to every point on the graph. NOTATION: 0ˆβ = estimated value for the intercept parameter β0 1ˆβ= estimated value for the slope parameter β1Topic 16 – Linear Regression 16-7 XY10ˆˆˆββ+= = “predicted value” of Y = estimated value for the mean of Y given X YYeˆ−= = “residual” (difference between the observed value Y and the estimated mean value Yˆ) The least squares criterion is to simultaneously minimize the residual e for every point. It does this by minimizing the “Residual Sum of Squares” SSE = ∑=−niiiYY12)ˆ( where n is the number of data pairs (xi, yi) in the dataset. Using this method we get the estimators for the 2 parameters: ∑∑==−−−=niiniiixxyyxx1211)())((ˆβ xy10ˆˆββ−=Topic 16 – Linear Regression 16-8 The equation is called the fitted regression line obtained by “regressing Y on X”. xYxXY 10|ˆˆˆˆββμ+=== Once and are calculated we can calculate estimated means for Y at each level of X and we can calculate the residuals . 0ˆβ1ˆβiiiYYeˆ−= Notation for sums of squares: ∑=−=niixxxxS12)( ∑=−=niiyyyyS12)( ∑=−−=niiixyyyxxS1))(( Using this notation we can write the estimator of β1 as xxxyniiniiiSSxxyyxx=−−−=∑∑==1211)())((ˆβTopic 16 – Linear Regression 16-9 EXAMPLE Pea self-intolerance continued (n = 10) Using SAS we get the following output: Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr>|t| Intercept 1 4.56656 0.29765 15.34 <.0001 Interval 1 -0.22205 0.03759 -5.91 0.0004 So we can write the estimated regression function as XXY 22205.056656.4ˆˆˆ10−=+=ββTopic 16 – Linear Regression 16-10 and use it to estimate (predict) the mean value of Y at any given value of X. Example of its use: suppose we intend to rotate crops on an interval of 7 years. What is our expected disease index for this interval? 01.3722205.056656.4ˆ=×−=Y. For 13 years? 68.11322205.056656.4ˆ=×−=Y Some additional results are: Obs Interval Index yhat resid 1 0 4.5 4.56656 -0.06656 2 4 3.7 3.67836 0.02164 3 6 4.0 3.23426 0.76574 4 6 3.0 3.23426 -0.23426 5 6 3.1 3.23426 -0.13426 6 8 2.8 2.79015 0.00985 7 9 1.9 2.56810 -0.66810 8 9 3.0 2.56810 0.43190 9 9 2.3 2.56810 -0.26810 10 14 1.6 1.45785 0.14215 SAS code to get this output:Topic 16 – Linear Regression 16-11 proc reg data=peas; model index = interval; plot index*interval; plot r.*p.; output out=resids p=yhat r=resid; quit; proc print data=resids; var interval index yhat resid; quit; Important Point #1: Predictions can be done for any value of X within the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

UF STA 6166 - LINEAR REGRESSION

Sign up for free to view:

Please select your school