Unformatted text preview:

PSY 394U – Do-It-Yourself Statistics Chapter 3 Sampling Distributions The “ing” in “sampling distribution” is very important. Without the “ing,” we have a “sample distribution,” which is a different thing. If you go out and measure the IQs of 50 randomly selected students in the Lake Wobegon Independent School District, you have a sample distribution (no “ing”). Chances are that not all of the students will be above average, but you might be curious to see if the students are – well – above average on average. So you would compute the mean of your sample distribution to see if it were above 100. Now, if you were repeat this process (don’t worry about why just yet) – collecting a sample of 50, computing the mean, and jotting it down – 29 more times, then you would have a sampling distribution of 30 means. A sample distribution, then, is a collection of measurements. A sampling distribution is collection of summary statistics (often the mean), each of which was computed from a different sample. If it were up to me, they would have completely different names, like “data distribution” and “summary distribution” so students would get less confused. But we’re stuck with what we have, so just play close attention to the presence or absence of that “ing.” Why would we want a sampling distribution? Let’s start over at Lake Wobegon. You measure the IQs of 50 randomly selected students and compute the average IQ, which comes out to be 101.5. At first you might think “Aha, students are really above average here.” But then you might start to wonder: 101.5 isn’t that different from 100, what if students are really average, and my number is just a fluke? What if students are really well above average (115, say), and my number is just a fluke in the other direction? Perhaps the most obvious (if impractical) thing to do is have someone check your work. So, you have a friend go to Lake Wobegon, do 50 measurements and compute a mean. Intuitively, if she also gets 101.5, then this is probably a good estimate of the true mean. But what if she gets 105, or 90.5? You could get into an argument over whose mean is better, or you might decide to use the average of the two numbers but, if you think about it, the best way to figure out the value of the “right” mean is just to collect a bunch of them. So you might enlist 28 (or 48, or 98) more people and have each of them do the experiment and compute a mean. Then, you can plot the sampling distribution of the mean and see what is really going on. Figure 3.1 – Three possible sampling distributions given a mean from a single sample of Lake Wobegon children (shown by small black arrow on the x-axis). 1 / 18PSY 394U – Do-It-Yourself Statistics Figure 3.1 shows 3 possible sampling distributions of 100 means. On the left, our original estimate of the mean was a little on the high side, and it looks like Lake Wobegon children are average after all. In the middle, our original estimate was right on and, moreover, repeated sampling produced 100 nearly identical means, all above average. We would thus conclude that Lake Wobegon children are almost certainly above average, but not by an amount that is likely to be very important. On the right, we see that our original estimate was a bit low – about 93% of the means were above our original one. Only about 94% of the estimates are above 100, however. So, after all that work and what looks like pretty clear evidence that the average Lake Wobegon IQ is well above 100, we would not be able to conclude, statistically, that this was so.*As a slightly more complicated example, consider a project in which a student wishes to see if cells divide more rapidly on one medium that another. Forty cell cultures are randomly assigned to the two groups. After a fixed amount of time, the areas of the cultures are measured, resulting in two sets of 20 numbers. The question is “Did medium A result in more growth than medium B?” The data are shown in Figure 3.1. Clearly, the mean of group B is higher, and it is higher by an amount that the student would consider important. The problem, however, is if we had simply divided the 40 cultures into two arbitrary groups, both using exactly the same medium, we still would have gotten some difference in the means, so how can we be sure that the difference we are seeing is a real difference. Group A Group B Figure 3.1 – A boxplot of the distributions of the two groups. The center lines are the medians, the box ends show the upper and lower quartiles (25th and 75th percentiles), and the whiskers show the maximum and minimum. The inner ticks on the left and right show the locations of the individual data points. * The rightmost sampling distribution is much too broad to be realistic for these circumstances; the width of the one in the center is more realistic. 2 / 18PSY 394U – Do-It-Yourself Statistics In principle, this is an easy problem to solve – the student could simply run the experiment again. She might get a smaller difference, making her a little nervous, or bigger difference, making her curious just how big the difference is. If the student had an inordinate amount of patience, she might decide to just buckle down in repeat the experiment many, many times. After having done so, the student will have a distribution of means for the two groups. In other words, the student will not only have estimates of the locations of the true means, but she will also have an estimate of how stable the means are given a sample size of 20. With this knowledge, she can make a much more informed decision concerning whether her original difference was real or not: if the two distributions of means overlap a lot, then the difference between any two means was likely due to chance. If the distributions do not overlap, however, then the difference is likely real. This can be simply thought of as a simple signal-to-noise problem, and the question is whether the signal – the difference between the means – is large relative to the noise – the precision with which the individual means are known. Obviously, however, this has come at a cost because our poor student has had to run her experiment 30 times (say) rather than once. 80859095100105110115120125ABGroup Figure 3.2 – Plot of the means of the two groups. The error bars


View Full Document

UT PSY 394U - Sampling Distributions

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Download Sampling Distributions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sampling Distributions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sampling Distributions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?