PSY 394U Do It Yourself Statistics Chapter 2 Some Preliminaries Populations samples and parameters In science we are interested in collecting data that will tell us about important features of the world such as whether two groups of measurements are different or not e g the speed of light measured in one direction vs another or whether one variable influences another or not e g the influence of axon diameter on conduction velocity In order to truly know the exact state of some aspect of the world we would have to know every possible value of interest and this collection of values is known as a population Rarely it is possible to measure an entire population such as the average height of seniors at Austin High School Much more often however it is either theoretically possible but impractical e g the average height of adults alive today living in Austin Texas or impossible e g the average height of adult humans In our experiments then we collect a sample of data and if the experiment is done well the properties of the sample will embody all of the important properties of the population that we wish to examine The relationship between a population and random samples drawn from it is fairly easy to appreciate and it depends most critically on the sample size Consider the following digital image This full resolution image can be thought of as a population of pixels By looking at the entire image the whole population of pixels it is easy to determine things about the image that are over and above the pixel values per se For example it is easy to see that Mickey is smiling or that there are two buttons on the front of his overalls Now consider the following four images which consist of random samples of 40 10 5 and 2 of the pixels the remaining pixels have been set to gray Notice that as the sample size decreases the picture becomes increasingly ragged and importantly it becomes progressively more difficult to make judgments about the picture For example my judgment about the number of buttons would probably change to one and zero for the 5 and 2 cases respectively Obviously these answers would be wrong and PSY 394U Do It Yourself Statistics they would be wrong because sample sizes were too small to allow me to correctly make the relevant decision 40 10 5 2 The relationship between samples of experimental data and populations of interest is directly analogous to the relationship between the original Mickey image and the sampled versions Notice that there is information even in the rightmost sampled image the pixels are black where the ears are yellow where the shoes are etc In fact some people might be able to guess that this was a picture of Mickey Mouse if they were told beforehand that it was a famous cartoon character In other words the rightmost sampled image does represent the original image to some extent it just doesn t allow us to make decisions about the original image with the same accuracy as does an image containing a greater number of samples such as the leftmost image In exactly the same way small samples of experimental data do not allow us to make judgments about aspects of the population with the same accuracy as do larger samples In science we are generally interested in aspects of the population distribution such as the mean the variance the median etc These are called parameters of the distribution because like the number of buttons Mickey has they are not directly obtainable by measurement they must be computed or inferred from a group of measurements The word parameter in fact means beyond para measurement meter In a scientific experiment then we collect a sample of data from a population and then estimate some parameter of that population the mean for example by computing the value of that parameter from our sample There is an additional step however Once we have estimated some parameter from our data it would be extremely informative to know how confident we are in that estimate Consider Mickey again If you determine the number of buttons by looking at the 40 image your answer will be two If I do the same on the 5 image my answer will be one Clearly your answer is better than mine is some respect and it would be extremely valuable to quantify this in some way that is to compute not only the value of our parameter the number of buttons but also to compute how confident we are in our estimate given our sample size and other factors This computation the computation of how certain we our about our parameter estimates is the key benefit that a statistical analysis yields Simply put we wish not only to compute the estimate of our parameter of interest we also wish to compute the distribution of that parameter if we were to PSY 394U Do It Yourself Statistics measure it over and over again because this distribution is what tells us how confident we can be in original estimate To gain an intuition about how this computation might work consider Mickey one final time If you were to repeatedly sample 40 of the pixels and judge the number of buttons your answer would almost always be two In other words the distribution of your estimated parameter the number of buttons would be extremely narrow If I were to repeatedly sample only 5 of the pixels however I would likely produce a range of answers Sometimes by chance I would collect a lot of samples from the left button as in the above image and answer one Sometimes I might get a lot of samples from the right button and give the same answer Sometimes however I might get a fair number of samples from both buttons and answer two and sometimes I might get very few samples from either button and answer zero Thus the distribution of my answers would be quite a bit wider than yours and from that would could conclude that your original answer was better than mine in the sense that it was more stable over repeated experimentation we could and should assign a much higher confidence to your answer than mine The distribution of a parameter estimate over repeated experimentation is called a sampling distribution and it is what allows us to compute a quantitative estimate of the confidence associated with our judgments about experimental data From the above example it is obvious that this confidence depends critically on sample size The core goal of statistical analysis is to figure out what the sampling distribution looks like after having done only one experiment After all we don t want to spend our time repeating an experiment 30 50 or
View Full Document
Unlocking...