TU BIOS 6030 - chap11a.tab - D32571

Home> Schools> Tulane University> Biostatistics (BIOS) > BIOS 6030> chap11a.tab

DOC PREVIEW

TU BIOS 6030 - chap11a.tab

School name Tulane University

Course Bios 6030- Introductory Biostat

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Relationships among Variablesy = dependent variableSimple Linear RegressionRelationships among VariablesRegression Analysis1. Dependent variable (y) is continuous2. Independent variables (xi)—continuous or categorical3. Direction and strength of relationshipExample:1. Relate systolic blood pressure levels to a measure of obesity. 2. Relate cost of a house to the square footageof the house.Applications1. Characterize relationship2. Obtain quantitative formula for y as a function of x1, x2, …, xk3. Control for other variables4. Select subset of variables5. Is relationship lineary = dependent variablex = independent variablen= number of subjects(x1, y1)…(xn, yn)Example:Age Height9 4810 5111 5312 5513 55Strategies:1. Try straight line first2. Does it explain a significant amount of variabilityLet x = age and y = height. We can propose a relationship of the following form:βxxyE )|(For a given age, the average height E(y|x) is α + βxThe line y = α + βx is called the regression lineα is called the interceptβ is called the slopeGenerally, the relationship is not exact for every child. We introduce an error term, e, which represents the variance of height for children of the same age. y=  + βx + e where e is normally distributed with mean 0 and variance σ2. Therefore, for a child of age x, the corresponding height will be normally distributed with mean α + βx and variance σ2. The latter is a measure of spread for a particular age. Note: If σ2 were 0, every pointwould fall exactly on the regression line. 1. If β > 0, then as x increases, the expected value of y increases.2. If β < 0, then as x increases, the expected value of y decreases. 3. If β = 0, there is no linear relationship between x and y.Fitting the Regression LineEyeball the data—not accurateLeast Squares—minimize sum of squares of vertical lines drawn from each point to best fitting line. The distance is taken in the “y” direction. If yi is the observed point for a given value of x, then iyˆ is the corresponding point on the regression line, a + bxi where a and b are estimates of α and β. Define iiiyˆyd . This is the observed residual.Let S = sum of the squared distances of the points from the line 212iiniibxaydSSome notation:Raw sum of squares for x = niix12Corrected sum of squares for x = xxniiniiniiLnxxxx211212Raw sum of squares for y = niiy12Corrected sum of squares for y = yyniiniiniiLnyyyy211212Raw sum of cross products = niiiyx1Corrected sum of cross products =   xyniniiniiiiiniiLnyxyxyyxx1111   xbya,xxyyxxβb niiniii121ˆorxbyaLLnxxnyxyxbxxxyniniiinininiiiii  ,121211 1Example:Age(xi)Height(yi) xxi yyi9 48 -2.0 -4.410 51 -1.0 -1.411 53 0 .612 55 1.0 2.613 55 2.0 2.6Sum = 55 262 0 0  yyxxii 2xxiiyˆ 2yyiˆ8.8 4.0 48.8 0.641.4 1.0 50.6 0.160 0 52.4 0.362.6 1.0 54.2 0.645.2 4.0 56 1.00Sum = 18 or Lxy10.0 or Lxx262 2.80 or Lyyiixxxyx..y...)(..a.LLb.yx81632ˆ632819452118145281101845211Note: The slope indicates that for every year increase in age, height increases by 1.8 inches.The predicted or average value of y for a given value of xi is estimated from the above regression equation.What is the estimated average value of height for a child 9 years old?8.482.166.32)9(8.16.3281632ˆiix..yNote: This least squares approach is appropriate when E(e|X=x) = 0; i.e., the average residual for each given value of x is 0. A normality assumption is also needed to test hypotheses.Hypothesis Testing We want to be able to distinguish a regression line that fits the data well from one that does not. Note:)(ˆxxbybxxbybxayThis implies that when xx  then yy . This is true for every regression line. Therefore, the point yx, falls on the regression line. Def: Residual =  iiyyˆ. This is the part of y that is not explained by x. Def: Regression component =  yyi. This the partof y that is explained by x.Note: If the point (xi, yi) falls on the regression line, the residual component is 0. A good fitting regression line will have large regression components and small residual components. H0: β = 0H1: β ≠ 0Total sum of squares (Total SS) = Total variability for y,  2 yySSiTOTALCan be divided into two componentsa. Regression sum of squares (Reg SS) = Variation among the regression estimates 2ˆ yySSiREGb.Residual sum of squares  2ˆiiRESyySSSSTOTAL = SSREG + SSRESThe test we will use is the ratio of SSREG to SSRES. A large ratio indicates that the model stated above is a good fit.Reg MS or MSREG is SSREG divided by the number of predictor variables in the model (k). MSREG = SSREG/kFor simple linear regression, k = 1. k is called the degrees of freedom for regression.Res MS or MSRES is SSRES divided by (n – k - 1).MSRES = SSRES/ (n – k - 1). n – k - 1 is the degrees of freedom for residual. Sometimes, MSRES is denoted by s2y.x. 1,2,1~nRESREGFMSMSWe reject H0 for large values of F. For level of significance , H0 is rejected if F > F1, n – 2, 1 - The F-test is the ratio of two variances, the numerator called the mean squares for regression and the denominator called the mean squares for residuals. A large ratio would indicate a good fit of the regression lineto the data while a small ratio would be a poorfit. For the example,Source of Variationdf SS MS FRegression 1 32.4 32.4 34.71Residual 3 2.8 .9333Total 4 35.2      4.3210182.355262764,13222222222xxxyREGyyTOTALLLnxxnyxxySSLnyyyySSHo:  = 0HA:   0Reject Ho if F > F1,3,0.05 = 10.13The exact p-value is given by Pr(F1, n – 2 > F).These results indicate that the null hypothesis is rejected and the alternative hypothesis is accepted; thus, the slope of the regression line is significantly different from 0. There is a significant linear relationship between age and height. An alternate test is a t-test. This

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

TU BIOS 6030 - chap11a.tab

Sign up for free to view:

Please select your school