Unformatted text preview:

Chapter 4One Sample TestsIn this chapter, we will follow up with some more concrete examples based upon the concepts introduced in the last chapter. We will learn how to determine whether a descriptive statistic (such as the median, mean, standard deviation, etc.) is or is not consistent with a predicted value. We will also learn how to determine the range of values over which we can expect a statistic to vary, that is, how to determine the sampling distribution. In fact, these are really two side of the same coin.In order to use Monte Carlo and bootstrapping methods, we are going to need to know how to probe our sampling distributions for some information. The three most common types of information we extract from a sampling distribution are 1) the standard deviation (which is the “standard error of the mean” under traditional methods), 2) 95% confidence intervals, and 3) a probability of some particular value arising from the conditions that generated the sampling distribution (a “significance”). These three are tightly coupled, and can be thought of as different ways of expressing the same basic information. Going to the moviesConsider the data shown in Figure 4.1, which shows a histogram of how many people (out of a sample of 100) have seen x movies in the past week. (These data – moviewatch.txt - are available on the webpage; download them and work through the examples in this chapter.) The data are highly skewed, because the vast majority of people (70%) have either seen 0 or 1 movie in the past week (and 7 people in the sample must be movie critics, having seen 7 or more movies per week).Let’s say we wanted to find out if people typically watched more than one movie per week. We could tackle this problem a couple different ways. First, let’s take a bad, but very easy approach: let us test whether mean number of movies watched was larger than 1. The mean of our moviewatch data is 1.59, and traditional statistics using Central Limit Theorem (CLT) can easily tell us how how likely it is that this mean comes from a sampling distribution that is truly centered around 1. PSY 394U – Do-It-Yourself StatisticsFigure 4.1 – The number of people reporting having seen x movies last week vs. x, the number of movies.With the MATLAB Statistics Toolbox - or any off-the-shelf statistical software - we can simply do a one-sample t-test of the hypothesis that our measured mean is “significantly” greater than 1 without the need to understand the underlying principles. We type:>> [h,p,ci,stats] = ttest(moviewatch, 1, .05, 'right')and then look at the output:h = 1p = 0.0092ci = 1.1812 Infstats = tstat: 2.3962 df: 99PSY 394U – Do-It-Yourself Statisticssd: 2.4622If you remember what a t-test is about, this should be fairly clear even if you are new to MATLAB. If you are rusty on the t-test, however, what the above command is saying is “test to see if the mean of ‘moviewatch’ is greater than a mean of 1.0”. What the output is telling us is that, if the true mean were 1 movie per week, and the data were distributed normally, and we were willing to accept the mean as a good measure of central tendency for these data, then there is about 1% chance (the p-value of 0.0092) that we would have seen a mean as large or larger than the one we actually obtained. Note that this is below two of the common cut-off values for statistical significance, 0.01 and 0.05.A more do-it-yourself approach, but one still reliant on the above assumptions, is following. First, we compute the standard deviation of the data, and then use it to compute the expected standard error of the sampling distribution of the mean using Central Limit Theorem:>> mymean = mean(moviewatch) % the mean>> myn = length(moviewatch) % compute number of samples>> mysd = std(moviewatch) % the standard deviation >> myse = mysd./sqrt(myn) % the standard error by CLTNow we can picture what the sampling distribution of the mean should look like – we just need to draw a Gaussian distribution whose mean is our measured mean (1.590), and whose standard deviation is the standard error we just computed (0.246). We also know that around 95% of the distribution should fall between the mean and +/– 2 standard errors, which is about 1.098 and 2.082. This gives us a way to check our drawing.>> xvals = linspace(0,3); % make and x-axis>> distofmeans = normpdf(x, mymean, myse); % normal dist.>> figure; plot(xvals, distofmeans) % plot it% and draw a dashed line at x = 1 for reference>> line([1 1], [0 max(distofmeans)], 'LineStyle', ':') The result is shown in Figure 4.2, and should look very much like what you get when you enter the above commands. Notice that this analysis gives us qualitatively the same result as the traditional t-test: it looks fairly unlikely that our measured mean, 1.59, and a mean of 1 belong to the same distribution. To be more quantitative about this, we could compute the area of our sampling distribution less than a mean of 1:>> normcdf(1, mymean, myse)ans = 0.0083PSY 394U – Do-It-Yourself StatisticsAnd this gives us about a 1% chance of seeing a mean as small or smaller than 1 given that the true mean is equal to 1.59, our measured mean. Notice that we’ve asked the mirror image question from the traditional t-test – “could a value of 1 come from a distribution centered on 1.59?” vs. “could a value of 1.59 come from a distribution centered on 1?” – but it amounts to the same thing and we get the same answer when we assume the same standard error about these numbers.1 Figure 4.2 – The sampling distribution of the mean for the number of movies per week by Central Limit Theorem. The dashed line shows that an average of 1 movie per week is highly unlikely.Alternatively, we can report our mean value with its 95% confidence interval. To compute the confidence interval under Central Limit Theorem we can use inverse normal probability density function norminv() or use the “ci” value from ttest() function.PSY 394U – Do-It-Yourself Statistics1 The small discrepancy comes from the fact that the t-test uses Gosset’s (i.e. Student’s) t distribution, rather than the standard normal distribution, which is technically correct when estimating the population variance from a sample variance. The difference is negligible for large (n > 30) sample sizes.>> ci = norminv([.025, .975], mymean, myse)ci = 1.1074 2.0726In


View Full Document

UT PSY 394U - Chapter 4 One Sample Tests

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Download Chapter 4 One Sample Tests
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 4 One Sample Tests and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 4 One Sample Tests 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?