DOC PREVIEW
UF STA 6166 - Regression Analyses

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Example Peas are known to be a self-intolerant crop because Note the choice of variables on each axisSimple Linear RegressionRegression 1 LINEAR REGRESSION Defn: a Scatterplot shows the relationship between two quantitative variables measured on the same population units. The values of the response variable are plotted on the Y-axis and the values of the explanatory variable are plotted on the X-axis. Each pair (xi, yi) for one individual unit is represented by a single point on the plot. Example Peas are known to be a self-intolerant crop because repeated planting in the same field makes them susceptible to root rot diseases. In the Netherlands, one pea crop in a six-year (1:6) rotation is considered desirable agronomically. In a study to examine how the crop interval (# years without a pea or other legume crop) is related to the severity of root rot disease in pea crops, ten soil samples from different parts of the Netherlands were obtained. They collected data on interval between pea crops (“Interval”) and disease (“Disease Index” – higher numbers means more disease). The data are:Regression 2 A scatterplot of the observed data is: Note the choice of variables on each axis Y = response variable = disease index X = explanatory variable = crop interval (yrs between plantings) Now, at a minimum we’d like to determine the following: 1) Is there a relationship between Y and X? 2) If so, what type is it (linear, log, polynomial, etc)? 3) If we use a statistical method, we need to ask whether the assumptions been met? We answer these questions usingRegression 3 Defn: Regression Analysis is a statistical method for analyzing the relationship between a response variable (Y) and one or more explanatory variables (X1, X2, …, Xp). It differs from other modeling techniques in that Y must be continuous and at least one of the X variables must be quantitative (continuous or discrete). Simple Linear Regression 1) The Model We start by assuming that, if Y and X are related, the response and explanatory variables are related linearly, i.e. the relationship can be written in a form like Y = mX + b where b is the Y–intercept (value of Y when X = 0) and m is the slope of the relationship = ∆Y/∆X In statistical jargon we write Yi = β0 + β1Xi + εi Deterministic part Random partRegression 4 Yi = β0 + β1Xi + εi where β0 = intercept β1 = slope (∆Y / ∆X) εi = error term for the ith observation and it has variance σ2 The deterministic part β0 + β1X equals the average value of Y for a given value of X (i.e. conditional mean of Y given X=xi) The heavy black line = the relationship β0 + β1X The light vertical line = ε = Y - β0 + β1XRegression 5 β0 = mean of Y when X = 0 since β0 + β1⋅0 = β0 (note: it has no inherent meaning if X cannot equal 0) β1 = change in the mean Disease Index (Y) for a 1 year increase in Interval (X) since [β0 + β1(X+1)] – [β0 + β1X ] = β1 β1 < 0 = negative slope = as X increases, Y decreases Y X β1 > 0 = positive slope = as X increases, Y increases Y X β1 = 0 = no slope = no relationship between X and Y Y X XRegression 6 Aside: other types of likely relationships that can be considered to be linear: a) Y = m [log10(X)] + b b) log10(Y) = m [log10(X)] + b c) Y = mX2 + b The model is called linear since it is LINEAR in the parameters β0 and β1. Example: can be made linear in the parameters since I can take log0110ββ×= XY10 on both sides and get the equation: log10(Y) = β1 [log10(X)] + β0. On the other hand, cannot be made into a linear equation since the transformation does not yield a linear model: log0110ββ+= XY10(Y) = log10[Xβ1 + 10β0]. 2) Estimating the Model Parameters β0 and β1 We have a data set for which the relationship appears to be linear BUT we do not know the parameters (β0, β1, σ2) of the model that describe the linear relationship. We need to estimate these using the available data. The most common method used today is theRegression 7 Method of Least Squares We wish to capture the relationship between X and Y using a straight line, i.e. we wish to “fit” a line to the observed data. One approach is to put the line at the location where it is simultaneously as close as possible to every point on the graph. NOTATION: 0ˆβ = estimated value for the intercept parameter β01ˆβ= estimated value for the slope parameter β1 XY10ˆˆˆββ+= = “predicted value” of Y = estimated value for the mean of Y given X YYeˆ−= = “residual” (difference between the observed value Y and the estimated mean value Yˆ) The least squares criterion is to simultaneously minimize the residual e for every point. It does this by minimizing the “Residual Sum of Squares” SSE=SS(resid) = ∑=−niiiYY12)ˆ( where n is the number of data pairs (xi, yi) in the dataset.Regression 8 Using this method we get that the estimators for the 2 parameters are: ∑∑==−−−=niiniiixxyyxx1211)())((ˆβ xy10ˆˆββ−= The equation is called the fitted regression line obtained by “regressing Y on X”. XY10ˆˆˆββ+=Regression 9 Once and are calculated we can calculate estimated means for Y at each level of X and we can calculate the residuals . 0ˆβ1ˆβiiiYYeˆ−= EXAMPLE Pea self-intolerance continued (n = 10) Some intermediate calculations are: 99.2=y, 90976.0=Ys, ∑=− 2767.8)(2yyi 10.7=x, 69534.3=Xs, ∑=− 5556.136)(2xxi ∑−=−− 3222.30))(( yyxxii So, we get 22205.05556.1363222.30)())((ˆ1211−=−=−−−=∑∑==niiniiixxyyxxβ which says that on average the disease index goes down 0.22205 units for each 1 year increase in crop interval. 5665.410.7)22205.0(99.2ˆˆ10=×−−=−= xyββ which says that the mean disease index is estimated to be 4.5665 when the crop interval equals 0.Regression 10 With the model parameters estimated we can write the estimated regression function as XY22205.056656.4ˆ−= and use it to estimate (predict) the mean value of Y at any given value of X. Example of SAS code for estimating the line: proc reg data=peas; model index=interval; quit; The REG Procedure Model: MODEL1 Dependent Variable: Index Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr >


View Full Document

UF STA 6166 - Regression Analyses

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download Regression Analyses
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regression Analyses and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression Analyses 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?