DOC PREVIEW
UNC-Chapel Hill STOR 151 - Homework 3

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HW3, due 2/05/09: 2.96, 2.104, 2.112, 3.40.Note: question 2.112 asks you to download some data on base-ball home runs from the course CD, and use the techniques ofthis course to determine which one was the best. Obviously thereis no unique correct answer to this question and you should feelfree to express your personal opinion regarding which character-istics you feel most important. However, what is important isthat you back up your opinion by using appropriate statisticaltechniques. The question will be graded not by which playeryou select, but how well you back up your answer with suitablestatistics.1Interpretation of Standard Deviation(See “Empirical Rule”, Chapter 2, pp. 60–63)If the shape of the dotplot or histogram is approximately bell-shaped, we would expect• 68% of the data to be within 1 SD of the mean• 95% of the data to be within 2 SD of the mean• 99.7% of the data to be within 3 SD of the meanExample: Student heights data (rounded to nearest 1 inch)Mean is 66.32, SD is 3.7966.32± 3.79 is 62.53 to 70.11; 53 out of 71 students (75%) haveheights within this rangeWithin 2SD of mean: 68/71 or 96%Within 3SD of mean: 71/71 or 100%23Another Example of Correlation and RegressionData consist of summer temperature averages (1948–1996) forMount Airy, North Carolina and Charleston, South Carolina.A plot of the data shows that there is a positive associationbetween the two cities. But how can we quantify that in termsof (a) correlation, (b) regression?45Some bare facts (checked in Excel)Mount Airy: mean is 74.3204, SD is 1.2040Charleston: mean is 80.5837, SD is 1.0517Correlation r = 0.6530Linear regression (use LINEST in Excel): if y is Charleston andx is Mount Airy,ˆy = 38.1933 + 0.5704xThe numbers 38.1933 and 0.5704 are called the intercept andslope of the regression.6Example: Suppose we want to calculate a temperature for Charlestonthat corresponds to 76oF in Mount Airy.Use the equationˆy = 38.1933 + 0.5704xwith x = 76.The answer isˆy = 38.1933 + 0.5704 × 76 = 81.5437.In other words, when the temperature is 76oF in Mount Airy weexpect is to be about 81.5oF in Charleston.7Relationship between Regression and Correlation(Text, page 117)If:• ¯y is the mean of the y values,• syis the SD of the y values,• ¯x is the mean of the x values,• sxis the SD of the x values,• r is the correlation,• a and b are the intercept and slope of the linear regression,thenb = rsysx,a = ¯y − b¯x.8Example:For the Mount Airy–Charleston data,b = 0.6530 ×1.05171.2040= 0.5704,a = 80.5837 − 74.3204 × 0.5704 = 38.1913.(Actually, Excel gives a = 38.1933, which is correct, but thedifference is due to rounding.)9Cautions1. Extrapolation is Dangerous2. Influential Observations3. Correlation Does Not Imply Causation• Danger of lurking variables4. Confounding10Extrapolation is Dangerous• Baseball Example (don’t extrapolate to a hitting rate of 0,or 1)• Global Temperatures11Temperatures 1900–200812Temperatures 1900–2008, extrapolated to 300013Influential ObservationsSometimes an outlier has the effect of distorting the whole fit ofa linear regression.Example based on Florida election 2000: if we omit Palm Beach,the fitted regression line is quite different.14Florida Election 200015Lurking VariablesThis is based on a study conducted at the University of California,Berkeley.During the study period, 8442 men and 4321 women appliedto graduate school in Berkeley. About 44% of the men wereadmitted, and about 35% of the women.16Breakdown by major:Men WomenMajor Applicants % Admitted Applicants % AdmittedA 825 62 108 82B 560 63 25 68C 325 37 593 34D 417 33 375 35E 191 28 393 24F 373 6 341 7Proportions admitted (for this particular group of majors):45% of men, 30% of women17The message:Illustrates the danger of lurking variables (in a categorical datacontext, but the same issue arises with quantitative data)In this example, “major” was a lurking variable. If we just lookat the relationship between “gender” and “percent admitted”without taking the major into account, it looks as if there is asubstantial bias against women.But the breakdown by major shows that this is spurious. Thereal explanation is that women tend(ed) to apply to majors wherethe overall proportion of students admitted is (was) smaller.This phenomenon is known as Simpson’s Paradox.18ConfoundingExample of British smoking study.In 1972–1974, researchers surveyed a group of British womenand determined how many of them smoked.Over the next 20 years, 31% of the non-smokers died, but only24% of the smokers.Does this prove that smoking was beneficial to health?What was the lurking variable here?19The variable “Age” is correlated with smoking, because in the1970s, few women over 40 smoked.But Age is (for obvious reasons) also a good predictor of death.Age is called a confounding variable. If the researchers had strat-ified by age groups, they would almost certainly have seen moredeaths among the smokers than among the


View Full Document

UNC-Chapel Hill STOR 151 - Homework 3

Download Homework 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?