PSY 394U Do It Yourself Statistics Chapter 3 Sampling Distributions The ing in sampling distribution is very important Without the ing we have a sample distribution which is a different thing If you go out and measure the IQs of 50 randomly selected students in the Lake Wobegon Independent School District you have a sample distribution no ing Chances are that not all of the students will be above average but you might be curious to see if the students are well above average on average So you would compute the mean of your sample distribution to see if it were above 100 Now if you were repeat this process don t worry about why just yet collecting a sample of 50 computing the mean and jotting it down 29 more times then you would have a sampling distribution of 30 means A sample distribution then is a collection of measurements A sampling distribution is collection of summary statistics often the mean each of which was computed from a different sample If it were up to me they would have completely different names like data distribution and summary distribution so students would get less confused But we re stuck with what we have so just play close attention to the presence or absence of that ing Why would we want a sampling distribution Let s start over at Lake Wobegon You measure the IQs of 50 randomly selected students and compute the average IQ which comes out to be 101 5 At first you might think Aha students are really above average here But then you might start to wonder 101 5 isn t that different from 100 what if students are really average and my number is just a fluke What if students are really well above average 115 say and my number is just a fluke in the other direction Perhaps the most obvious if impractical thing to do is have someone check your work So you have a friend go to Lake Wobegon do 50 measurements and compute a mean Intuitively if she also gets 101 5 then this is probably a good estimate of the true mean But what if she gets 105 or 90 5 You could get into an argument over whose mean is better or you might decide to use the average of the two numbers but if you think about it the best way to figure out the value of the right mean is just to collect a bunch of them So you might enlist 28 or 48 or 98 more people and have each of them do the experiment and compute a mean Then you can plot the sampling distribution of the mean and see what is really going on Figure 3 1 Three possible sampling distributions given a mean from a single sample of Lake Wobegon children shown by small black arrow on the x axis 1 19 PSY 394U Do It Yourself Statistics Figure 3 1 shows 3 possible sampling distributions of 100 means On the left our original estimate of the mean was a little on the high side and it looks like Lake Wobegon children are average after all In the middle our original estimate was right on and moreover repeated sampling produced 100 nearly identical means all above average We would thus conclude that Lake Wobegon children are almost certainly above average but not by an amount that is likely to be very important On the right we see that our original estimate was a bit low about 93 of the means were above our original one Only about 94 of the estimates are above 100 however So after all that work and what looks like pretty clear evidence that the average Lake Wobegon IQ is well above 100 we would not be able to conclude statistically that this was so As a slightly more complicated example consider a project in which a student wishes to see if cells divide more rapidly on one medium that another Forty cell cultures are randomly assigned to the two groups After a fixed amount of time the areas of the cultures are measured resulting in two sets of 20 numbers The question is Did medium A result in more growth than medium B The data are shown in Figure 3 1 Clearly the mean of group B is higher and it is higher by an amount that the student would consider important The problem however is if we had simply divided the 40 cultures into two arbitrary groups both using exactly the same medium we still would have gotten some difference in the means so how can we be sure that the difference we are seeing is a real difference Group A Group B Figure 3 1 A boxplot of the distributions of the two groups The center lines are the medians the box ends show the upper and lower quartiles 25th and 75th percentiles and the whiskers show the maximum and minimum The inner ticks on the left and right show the locations of the individual data points The rightmost sampling distribution is much too broad to be realistic for these circumstances the width of the one in the center is more realistic 2 19 PSY 394U Do It Yourself Statistics In principle this is an easy problem to solve the student could simply run the experiment again She might get a smaller difference making her a little nervous or bigger difference making her curious just how big the difference is If the student had an inordinate amount of patience she might decide to just buckle down in repeat the experiment many many times After having done so the student will have a distribution of means for the two groups In other words the student will not only have estimates of the locations of the true means but she will also have an estimate of how stable the means are given a sample size of 20 With this knowledge she can make a much more informed decision concerning whether her original difference was real or not if the two distributions of means overlap a lot then the difference between any two means was likely due to chance If the distributions do not overlap however then the difference is likely real This can be simply thought of as a simple signal to noise problem and the question is whether the signal the difference between the means is large relative to the noise the precision with which the individual means are known Obviously however this has come at a cost because our poor student has had to run her experiment 30 times say rather than once 125 120 115 110 Area 105 100 95 90 85 80 A B Group Figure 3 2 Plot of the means of the two groups The error bars are 95 confidence intervals they enclose 95 of the sampling distribution of means A comparison with Fig 3 1 reveals that these distributions are quite a bit more narrow than the distributions of the data Distributions of means or any other summary statistic computed from data variance median etc are known as sampling distributions and they …
View Full Document
Unlocking...