PSY 394U Do It Yourself Statistics Chapter 4 One Sample Tests In this chapter we will follow up with some more concrete examples based upon the concepts introduced in the last chapter We will learn how to determine whether a descriptive statistic such as the median mean standard deviation etc is or is not consistent with a predicted value We will also learn how to determine the range of values over which we can expect a statistic to vary that is how to determine the sampling distribution In fact these are really two side of the same coin In order to use Monte Carlo and bootstrapping methods we are going to need to know how to probe our sampling distributions for some information The three most common types of information we extract from a sampling distribution are 1 the standard error the standard deviation of the sampling distribution 2 95 confidence intervals and 3 a probability of some particular value arising from the conditions that generated the sampling distribution These three are tightly coupled and can be thought of as different ways of expressing the same basic information 30 20 0 10 F requency 40 His to g ra m o f m o v y d a t 0 5 10 15 m ovies last week Figure 4 1 The number of people reporting having seen x movies last week vs x the number of movies Consider the data shown in Figure 4 1 which shows a histogram of how many people out of a sample of 100 have seen x movies in the past week These data are available on the class website on the main Homework page you are encouraged to download them and work through the examples in this chapter The data are highly skewed because the vast majority of people 71 have either seen 0 or 1 movie in the past week and 4 people in the sample must be movie critics having averaged over one movie a day Let s say we wanted to find out if people in general watched more than one movie per week We could tackle this problem a couple different ways First let s take a bad but PSY 394U Do It Yourself Statistics very easy approach we ll simply do a one sample t test of the hypothesis that our measured mean is greater than 1 We type t test movydat mu 1 alternative greater and R responds One Sample t test data movydat t 2 0177 df 99 p value 0 02317 alternative hypothesis true mean is greater than 1 95 percent confidence interval 1 099157 Inf sample estimates mean of x 1 56 If you remember what a t test is about this should be pretty clear even if you are new to R If you are rusty on the t test however what the above command is saying is test to see if the mean of movydat is greater than a mean of 1 0 Don t worry about the details of the t test in a later chapter once we have become comfortable with the concept of sampling distributions we will revisit a few of the popular traditional statistical tests What the output is telling us is that if the true mean were 1 movie per week and the data were distributed normally and we were willing to accept the mean as a good measure of central tendency for these data then there is about a 2 3 chance the pvalue of 0 023 that we would have seen a mean as large or larger than the one we actually obtained A more do it yourself approach but one still reliant on the above assumptions follows First we compute the standard deviation of the data and then use it to compute the expected standard error of the sampling distribution of the mean my n length movydat compute number of samples my sd sd movydat the std dev my se my sd sqrt my n the std err by CLT Now we can picture what the sampling distribution of the mean should look like we just need to draw a Gaussian distribution whose mean is our measured mean 1 56 and whose standard deviation is the standard error we just computed 0 28 Since our standard error came out to be just under 0 3 we know that around 99 of the distribution will fall between about 0 7 and 2 5 x seq 0 7 2 5 length 100 my ing dist dnorm x 1 56 my se distribution plot x my ing dist abline v 1 week make an x axis compute the normal take a look at it draw a line at 1 movie The result is shown in Figure 4 2 and should look very much like what you get when you enter the above commands Notice that this analysis gives us qualitatively the same PSY 394U Do It Yourself Statistics result as the traditional t test it looks fairly unlikely but not extremely unlikely that our measured mean 1 56 and a mean of 1 belong to the same distribution To be more quantitative about this we could compute the area of our sampling distribution less than a mean of 1 pnorm 1 1 56 278 And this gives us about a 2 2 chance of seeing a mean as small or smaller than 1 given that the true mean is equal to 1 56 our measured mean Notice that we ve asked the mirror image question from the traditional t test could a value of 1 come from a distribution centered on 1 56 vs could a value of 1 56 come from a distribution centered on 1 but it amounts to the same thing and we get the same answer Figure 4 2 The sampling distribution of the mean for the number of movies per week by Central Limit Theorem The dashed line shows that an average of 1 movie per week is unlikely but not extremely unlikely Alternatively if we wanted to report our mean value plus or minus the 95 confidence interval we could use the qnorm function on a set of actual data the equivalent function is quantile and we ll use this function a little later qnorm c 025 975 1 56 278 The small discrepancy comes from the fact that the t test uses Gosset s i e Student s t distribution rather than the standard normal distribution which is technically correct when we estimating the population variance from our sample variance Why this is so is beyond our current scope PSY 394U Do It Yourself Statistics In English this function call says Give me the 2 5 and 97 5 percentiles of a normal distribution whose mean is 1 56 and whose standard deviation is 0 278 Note that the lower bound the 2 5 point is just above 1 0 the 2 2 point as it should be Thus ends our examination of these data using traditional methods it should be clear from examining the original data that population distribution is almost certainly not normally distributed This has two ramifications First …
View Full Document
Unlocking...