MATH 220 Assignment 2 Question 1 The World Bank collects data on many variables related to world development for countries throughout the world Two of these are Internet use in number of users per 100 people and life expectancy in years The data file is provided separately a Make a scatterplot of life expectancy the response variable versus internet use Describe the relationship Is there an overall pattern Do you see any deviation from that pattern Yes there is an overall positive pattern in the scatter plot As internet use increases life expectancy also increases This suggests that there may be a link between internet use and factors that contribute to a longer lifespan such as access to healthcare information education and social support However there are also some deviations from this pattern There are a few data points that show high life expectancy with low internet use and vice versa b Compute the correlation coefficient R between life expectancy versus internet use Correlation coefficient R 0 670133 Work Correlation coefficient R was found by using the CORREL function in excel and selecting the x and y ranges of data c A friend looks at the scatterplot and concludes that using the internet will increase the length of your life Would you agree with her Explain your answer I would agree with her on her conclusion There is a clear positive correlation between internet use and life expectancy As the amount of internet use increases the life expectancy also increases 1 d Make a scatterplot of life expectancy versus internet use but this time use different symbols for European and non European countries Do you wish to modify your answer to question c Explain Based on the data if you live in a European country life expectancy does not have much association with internet use Whether you use more internet or less does not seem to have a direct effect on life expectancy In other regions higher internet use is positively associated with higher life expectancy Question 2 Old Faithful Geyser in Yellowstone National Park is renowned among other things for the regularity of its eruptions The eruption durations X in minutes and the subsequent intervals before the next eruption Y in minutes are provided in a separate file a Make a scatterplot of the interval variable versus the duration variable Describe the relationship Is there an overall pattern Do you see any deviation from that pattern 2 Yes there is an overall pattern with higher intervals having a positive asociation with higher duration I do not see much deviation from this pattern b Find the correlation coefficient R between interval and duration What would happen to the value of R if the scales were transformed in hours for the interval and duration variables R 0 858427 Correlation would be unchanged if scales were transformed Work Correlation R was found by using the CORREL function in excel and selecting the two categories of data c Find the equation of the regression line for predicting interval from duration In simple language what is the slope of the line telling us y 10 741x 33 828 The slop tells that for each minute duration increases the interval increases by 10 741 minutes d Add the regression line to the scatterplot e Find the percent of variation in the interval variable that is explained by the model Does the regression model provide a good fit R 0 7369 73 69 of the variation is explained by the model f Make a residual plot from the linear regression model you constructed above Discuss the appropriateness of the model 3 Based on this residual plot it appears that the linear regression model is an appropriate estimator g Use the equation of the regression line to predict the subsequent interval before the next eruption for an eruption that lasted 5 minutes How confident are you that the prediction is quite accurate y hat 10 741 85 33 828 y hat 946 813 I m not extremely confident on being able to predict the exact interval of time due to the fact that my estimate would be an extrapolation There is no existing data with durations lasting that long 4 Question 3 One of the most dangerous contaminants deposited over European countries following the Chernobyl accident of April 1987 was radioactive cesium To study cesium transfer from contaminated soil to plants researchers collected soil samples and samples of mushroom mycelia from 17 wooded locations in Umbria Central Italy from August 1986 to November 1989 Measured concentrations Bq kg Bq or becquerel is a unit of radioactivity of cesium in the soil are given in a separate data file a Construct a scatterplot using Y concentration in mushrooms and X concentration in soil Describe the relationship between the two variables The two variables appear to have a positive relationship and there is a clear outlier on the upper right region of the scatter plot b Fit a linear model and and report the correlation coefficient 5 R 0 6375 c Exclude sample number 17 and repeat parts a and b R 0 2164 d What is the effect of case 17 on the linear model and the correlation coefficient The range went from 0 1300 to 0 500 the relationship between the variables went from positive to negative the coefficient for x is now a negative number and the r 2 value dropped to 4 68 Overall without the outlier the data has no correlation and X does not explain Y Question 4 Paper and pencil and or Excel 6 a Find the conditional distributions of the field of study variable for each region Adding up the fields of study to see what percentage they are distributed within a region b Construct the bar graphs of the three conditional distributions on the same page Excel does this very nicely c Provide a brief description of the relationship between field of study and region 7 If you live outside of the US you are most likely to study Engineering and more likely to study natural science than social science Inside of the US you are most likely to study a social science and least likely to study engineering 8

