UF STA 6166 - Regression Analyses - D1746799

Home> Schools> University of Florida-Gainesville> (STA) > STA 6166> Regression Analyses

DOC PREVIEW

UF STA 6166 - Regression Analyses

School name University of Florida-Gainesville

Course Sta 6166- Statistical Methods in Research I

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Example Peas are known to be a self-intolerant crop because Note the choice of variables on each axisSimple Linear RegressionRegression 1 LINEAR REGRESSION Defn: a Scatterplot shows the relationship between two quantitative variables measured on the same population units. The values of the response variable are plotted on the Y-axis and the values of the explanatory variable are plotted on the X-axis. Each pair (xi, yi) for one individual unit is represented by a single point on the plot. Example Peas are known to be a self-intolerant crop because repeated planting in the same field makes them susceptible to root rot diseases. In the Netherlands, one pea crop in a six-year (1:6) rotation is considered desirable agronomically. In a study to examine how the crop interval (# years without a pea or other legume crop) is related to the severity of root rot disease in pea crops, ten soil samples from different parts of the Netherlands were obtained. They collected data on interval between pea crops (“Interval”) and disease (“Disease Index” – higher numbers means more disease). The data are:Regression 2 A scatterplot of the observed data is: Note the choice of variables on each axis Y = response variable = disease index X = explanatory variable = crop interval (yrs between plantings) Now, at a minimum we’d like to determine the following: 1) Is there a relationship between Y and X? 2) If so, what type is it (linear, log, polynomial, etc)? 3) If we use a statistical method, we need to ask whether the assumptions been met? We answer these questions usingRegression 3 Defn: Regression Analysis is a statistical method for analyzing the relationship between a response variable (Y) and one or more explanatory variables (X1, X2, …, Xp). It differs from other modeling techniques in that Y must be continuous and at least one of the X variables must be quantitative (continuous or discrete). Simple Linear Regression 1) The Model We start by assuming that, if Y and X are related, the response and explanatory variables are related linearly, i.e. the relationship can be written in a form like Y = mX + b where b is the Y–intercept (value of Y when X = 0) and m is the slope of the relationship = ∆Y/∆X In statistical jargon we write Yi = β0 + β1Xi + εi Deterministic part Random partRegression 4 Yi = β0 + β1Xi + εi where β0 = intercept β1 = slope (∆Y / ∆X) εi = error term for the ith observation and it has variance σ2 The deterministic part β0 + β1X equals the average value of Y for a given value of X (i.e. conditional mean of Y given X=xi) The heavy black line = the relationship β0 + β1X The light vertical line = ε = Y - β0 + β1XRegression 5 β0 = mean of Y when X = 0 since β0 + β1⋅0 = β0 (note: it has no inherent meaning if X cannot equal 0) β1 = change in the mean Disease Index (Y) for a 1 year increase in Interval (X) since [β0 + β1(X+1)] – [β0 + β1X ] = β1 β1 < 0 = negative slope = as X increases, Y decreases Y X β1 > 0 = positive slope = as X increases, Y increases Y X β1 = 0 = no slope = no relationship between X and Y Y X XRegression 6 Aside: other types of likely relationships that can be considered to be linear: a) Y = m [log10(X)] + b b) log10(Y) = m [log10(X)] + b c) Y = mX2 + b The model is called linear since it is LINEAR in the parameters β0 and β1. Example: can be made linear in the parameters since I can take log0110ββ×= XY10 on both sides and get the equation: log10(Y) = β1 [log10(X)] + β0. On the other hand, cannot be made into a linear equation since the transformation does not yield a linear model: log0110ββ+= XY10(Y) = log10[Xβ1 + 10β0]. 2) Estimating the Model Parameters β0 and β1 We have a data set for which the relationship appears to be linear BUT we do not know the parameters (β0, β1, σ2) of the model that describe the linear relationship. We need to estimate these using the available data. The most common method used today is theRegression 7 Method of Least Squares We wish to capture the relationship between X and Y using a straight line, i.e. we wish to “fit” a line to the observed data. One approach is to put the line at the location where it is simultaneously as close as possible to every point on the graph. NOTATION: 0ˆβ = estimated value for the intercept parameter β01ˆβ= estimated value for the slope parameter β1 XY10ˆˆˆββ+= = “predicted value” of Y = estimated value for the mean of Y given X YYeˆ−= = “residual” (difference between the observed value Y and the estimated mean value Yˆ) The least squares criterion is to simultaneously minimize the residual e for every point. It does this by minimizing the “Residual Sum of Squares” SSE=SS(resid) = ∑=−niiiYY12)ˆ( where n is the number of data pairs (xi, yi) in the dataset.Regression 8 Using this method we get that the estimators for the 2 parameters are: ∑∑==−−−=niiniiixxyyxx1211)())((ˆβ xy10ˆˆββ−= The equation is called the fitted regression line obtained by “regressing Y on X”. XY10ˆˆˆββ+=Regression 9 Once and are calculated we can calculate estimated means for Y at each level of X and we can calculate the residuals . 0ˆβ1ˆβiiiYYeˆ−= EXAMPLE Pea self-intolerance continued (n = 10) Some intermediate calculations are: 99.2=y, 90976.0=Ys, ∑=− 2767.8)(2yyi 10.7=x, 69534.3=Xs, ∑=− 5556.136)(2xxi ∑−=−− 3222.30))(( yyxxii So, we get 22205.05556.1363222.30)())((ˆ1211−=−=−−−=∑∑==niiniiixxyyxxβ which says that on average the disease index goes down 0.22205 units for each 1 year increase in crop interval. 5665.410.7)22205.0(99.2ˆˆ10=×−−=−= xyββ which says that the mean disease index is estimated to be 4.5665 when the crop interval equals 0.Regression 10 With the model parameters estimated we can write the estimated regression function as XY22205.056656.4ˆ−= and use it to estimate (predict) the mean value of Y at any given value of X. Example of SAS code for estimating the line: proc reg data=peas; model index=interval; quit; The REG Procedure Model: MODEL1 Dependent Variable: Index Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr >

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-24-25 out of 25 pages.

UF STA 6166 - Regression Analyses

Sign up for free to view:

Please select your school