DOC PREVIEW
UT Dallas CS 6313 - Using R for linear regression

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional entries (all current as of version R-2.4.1). Sample texts from an R session are highlighted with gray shading. Suppose we prepare a calibration curve using four external standards and a reference, obtaining the data shown here: > conc [1] 0 10 20 30 40 50 > signal [1] 4 22 44 60 82 The expected model for the data is signal = βo + β1×conc where βo is the theoretical y-intercept and β1 is the theoretical slope. The goal of a linear regression is to find the best estimates for βo and β1 by minimizing the residual error between the experimental and predicted signal. The final model is signal = bo + b1×conc + e where bo and b1 are the estimates for βo and β1 and e is the residual error. Defining Models in R To complete a linear regression using R it is first necessary to understand the syntax for defining models. Let’s assume that the dependent variable being modeled is Y and that A, B and C are independent variables that might affect Y. The general format for a linear1 model is response ~ op1 term1 op2 term 2 op3 term3… 1 When discussing models, the term ‘linear’ does not mean a straight-line. Instead, a linear model contains additive terms, each containing a single multiplicative parameter; thus, the equations y = β0 + β1x y = β0 + β1x1 + β2x2 y = β0 + β11x2 y = β0 + β1x1 + β2log(x2) are linear models. The equation y = αxβ, however, is not a linear model.where term is an object or a sequence of objects and op is an operator, such as a + or a −, that indicates how the term that follows is to be included in the model. The table below provides some useful examples. Note that the mathematical symbols used to define models do not have their normal meanings! Syntax Model Comments Y ~ A Y = βo + β1A Straight-line with an implicit y-intercept Y ~ -1 + A Y = β1A Straight-line with no y-intercept; that is, a fit forced through (0,0) Y ~ A + I(A^2) Y = βo+ β1A + β2A2Polynomial model; note that the identity function I( ) allows terms in the model to include normal mathematical symbols. Y ~ A + B Y = βo+ β1A + β2B A first-order model in A and B without interaction terms. Y ~ A:B Y = βo + β1AB A model containing only first-order interactions between A and B. Y ~ A*B Y = βo+ β1A + β2B + β3AB A full first-order model with a term; an equivalent code is Y ~ A + B + A:B. Y ~ (A + B + C)^2 Y = βo+ β1A + β2B + β3C + β4AB + β5AC + β6AC A model including all first-order effects and interactions up to the nth order, where n is given by ( )^n. An equivalent code in this case is Y ~ A*B*C – A:B:C. Completing a Regression Analysis The basic syntax for a regression analysis in R is lm(Y ~ model) where Y is the object containing the dependent variable to be predicted and model is the formula for the chosen mathematical model. The command lm( ) provides the model’s coefficients but no further statistical information; thus > lm(signal ~ conc) Call: lm(formula = signal ~ conc) Coefficients: (Intercept) conc 3.60 1.94To obtain more useful information, and to obtain access to many more useful functions for manipulating the data, it is best to create an object that contains the command for the model > lm.r = lm(signal ~ conc) This object can then be used as an argument for other commands. To obtain a more complete statistical summary of the model, for example, we use the summary( ) command. > summary(lm.r) Call: lm(formula = signal ~ conc) Residuals: 1 2 3 4 5 0.4 -1.0 1.6 -1.8 0.8 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.60000 1.23288 2.92 0.0615 . conc 1.94000 0.05033 38.54 3.84e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.592 on 3 degrees of freedom Multiple R-Squared: 0.998, Adjusted R-squared: 0.9973 F-statistic: 1486 on 1 and 3 DF, p-value: 3.842e-05 The section of output labeled ‘Residuals’ gives the difference between the experimental and predicted signals. Estimates for the model’s coefficients are provided along with the their standard deviations (‘Std Error’), and a t-value and probability for a null hypothesis that the coefficients have values of zero. In this case, for example, we see that there is no evidence that the intercept (βo) is different from zero and strong evidence that the slope (β1) is significantly different than zero. At the bottom of the table we find the standard deviation about the regression (sr or residual standard error), the correlation coefficient and an F-test result on the null hypothesis that the MSreg/MSres is 1. Other useful commands are shown below: > coef(lm.r) # gives the model’s coefficients (Intercept) conc 3.69 1.94> resid(lm.r) # gives the residual errors in Y 1 2 3 4 5 0.4 -1.0 1.6 -1.8 0.8 > fitted(lm.r) # gives the predicted values for Y 1 2 3 4 5 3.6 23.0 42.4 61.8 81.2 Evaluating the Results of a Linear Regression Before accepting the result of a linear regression it is important to evaluate it suitability at explaining the data. One of the many ways to do this is to visually examine the residuals. If the model is appropriate, then the residual errors should be random and normally distributed. In addition, removing one case should not significantly impact the model’s suitability. R provides four graphical approaches for evaluating a model using the plot( ) command. > layout(matrix(1:4,2,2)) > plot(lm.r)The plot in the upper left shows the residual errors plotted versus their fitted values. The residuals should be randomly distributed around the horizontal line representing a residual error of zero; that is, there should not be a distinct trend in the distribution of points. The plot in the lower left is a standard Q-Q plot, which should suggest that the residual errors are


View Full Document

UT Dallas CS 6313 - Using R for linear regression

Documents in this Course
ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-9

PS-9

14 pages

PS-7

PS-7

11 pages

PS-6

PS-6

12 pages

PS-5

PS-5

8 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch09-02

ch09-02

22 pages

ch09-01

ch09-01

24 pages

ch08-2

ch08-2

19 pages

ch08-1

ch08-1

17 pages

ch07-3

ch07-3

19 pages

ch07-2

ch07-2

11 pages

ch04

ch04

51 pages

ch02

ch02

50 pages

ch01

ch01

28 pages

PS-10

PS-10

18 pages

PS-4

PS-4

8 pages

probs 2-3

probs 2-3

17 pages

ch11-3

ch11-3

26 pages

ch11-2

ch11-2

17 pages

ch11-1

ch11-1

13 pages

ch10-02

ch10-02

29 pages

ch10-01

ch10-01

28 pages

ch09-04

ch09-04

22 pages

ch09-03

ch09-03

17 pages

SCAN0004

SCAN0004

12 pages

SCAN0001

SCAN0001

12 pages

Prob9

Prob9

12 pages

prob10

prob10

3 pages

Load more
Download Using R for linear regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Using R for linear regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Using R for linear regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?