New version page

# MSU EPI 280 - SAS HOMEWORK 5

## This preview shows page 1-2 out of 6 pages.

View Full Document
Do you want full access? Go Premium and unlock all 6 pages.
Do you want full access? Go Premium and unlock all 6 pages.

Unformatted text preview:

SAS HOMEWORK_5 Due 4/23/2020 From SASHELP obtain the data set called Baseball. Import it into your permanent library. For each part (a-g), show the program and the output that you generated using the SAS program and provide interpretation based on your output. Please include titles in your programs so you have a record of what this particular program is supposed to do. (10 points) Run PROC CONTENTS in a way that your output has the variables listed in the order that they were created. libname W15_CI "C:\Users\isabelle\Desktop\W15_CI"; run; Data Baseball; SET SASHELP.Baseball; Data W15_CI.Baseball; set sashelp.Baseball; run; proc contents data=sashelp.Baseball; run; (10 points) Obtain all the default statistics for a variable that has a label “Home Runs in 1986” Proc means data=sashelp.baseball maxdec=2 N mean STDDEV Median Mode Min Max; Var nHome; title "Home runs in 1986"; run; (10 points) Obtain a 95% CI for the mean number of home runs for the total data set. Provide interpretation. PROC MEANS DATA=sashelp.baseball MAXDEC=2 N MEAN STDDEV alpha=0.05 clm; VAR nHome; TITLE "95% CI for the mean homeruns in 1986"; RUN; Sampling from the distribution, 95% of the time will result in attaining the true mean of home runs performed in 1986 between 10.15 and 12.06. (10 points) Test the hypotheses that the mean number of home runs differs for 10. State your conclusion.PROC TTEST DATA=sashelp.baseball; VAR nHome; TITLE "CI for the difference in the mean of 10 for home runs in 1986"; RUN; Using an alpha value of 0.05, and N degrees of freedom = 321, one can observe the obtained p-value of <0.0001 to be statistically significant in the notion that the average number of home runs in 1986 differs from the null hypothesized value of 10. Using this information, we can reject the null hypothesis. (10 points) Obtain 95% CI for the mean number of home runs for each “League at the end of 1986” (National and American). Provide interpretation. PROC MEANS DATA=sashelp.baseball MAXDEC=2 N MEAN STDDEV alpha=0.05 clm; CLASS League; VAR nHome; TITLE "95% CI for the League in 1986"; RUN; Sampling from the distribution, 95% of the time will result in attaining the true mean of home runs performed in 1986 between 11.09 and 13.88 for American Leagues at the end of 1986, whereas National leagues lies between an interval of 8.22 and 10.69. (10 points) Obtain 95% CI for the difference in mean number of home runs between the Leagues at the end of 1986 (National and American). Provide interpretation. proc ttest data=sashelp.baseball; CLASS League; VAR nHome; run;95% of the samples taken from the population will obtain the true mean difference between the boundaries of 1.14 and 4.92 for national versus american league baseball games at the end of 1986. (10 points) Test that the two means are the same. Even though you might obtain that the variances are not equal by the F-test (the p-value ,0.05), use Sullivan’s rule of thumb that if the ratio (the F-statistic) is between 0.5 and 2.0 we can assume equality of variances. Making that assumption, state your conclusion. T value = 3.16; the p-value obtained from the test statistics results to 0.0017 which is less than the alpha value of 0.05, hence we can reject the null hypothesis stating that the means are the same and infer that there is statistical significance that the means between national and american leagues are different. (10 points) Create a variable you might call “levelofskill” (or whatever made sense to you) and categorize “number of home runs” as: 0-9 (lowskill), and >=10 as “goodskill”. What is the 95% CI for the proportion of players that fall into the “goodskill” category? Provide interpretation. Data W15_CI.skill; set W15_CI.baseball; lowskill = nHome<9; goodskill = nHome>=10; Title1 "Creating new variable levelofskill"; Title2 "Putting the new data set levelofskill into permanent folder W15_CI"; RUN; DATA W15_CI.skill; SET W15_CI.baseball; IF (nHome<9 ) THEN SKILL ="lowskill "; ELSE IF (nHome >=10 ) THEN SKILL ="goodskill "; RUN; proc freq data = W15_CI.skill; tables skill/binomial alpha=0.05; title "95% ci for prop of players in highskill"; Run; 95% of samples taken from the population will contain the true mean proportion of number of home runs between the boundaries of 0.4150 and 0.5282 for the individuals in the high skill group .(10 points) What is the 95% CI for the difference in proportions (RD) of “goodskill” players in the two Leagues at the end of the 1986 season (American vs. National)? Provide interpretation. proc freq data = W15_CI.skill; tables League*skill/ nocol nopercent riskdiff; title "95% ci diff in prop of goodskill players"; Run; 95% of samples taken from the population with respect to the difference in mean proportions of national from american will have the true mean lie between the boundaries of -0.2446 and -0.0221. (10 points) What are the 95% CI for RR and OR for the “goodskill” relative to “lowskill” for each league.? Provide interpretation. 95% of samples taken from the population with respect to the difference in mean proportions of national from american will have the true mean of being more likely to hit a home run in 1986 lie between the boundaries of 1.333 and 1.0404. (relrisk) 95% of samples taken from the population with respect to the difference in mean proportions of national from american will have the true mean of having the odds of hitting a home run in 1986 lie between the boundaries of 1.7143 and 1.0866. proc freq data = W15_CI.skill; tables League*skill/ nocol nopercent relrisk; title "95% ci diff in prop of goodskill players"; Run;libname W15_CI "C:\Users\isabelle\Desktop\W15_CI"; run; Data Baseball; SET SASHELP.Baseball; Data W15_CI.Baseball; set sashelp.Baseball; run; proc contents data=sashelp.baseball; run; data W15_CI.baseball; set baseball; Title "home runs in 1986"; run; data W15_CI.Baseball_homerun; set baseball; title1 "Home runs in 1986"; title2 "putting the new data set Home runs in 1986 into permanent folder W15_CI"; run; Proc means data=sashelp.baseball maxdec=2 N mean STDDEV Median Mode Min Max; Var nHome; title "Home runs in 1986"; run; PROC MEANS DATA=sashelp.baseball MAXDEC=2 N MEAN STDDEV alpha=0.05 clm; VAR nHome; TITLE "95% CI for the mean homeruns in 1986"; RUN; PROC TTEST DATA=sashelp.baseball; VAR nHome; TITLE "CI for the difference in the mean of

View Full Document