UT PSY 394U - Populations, samples, and parameters - D2816700

Home> Schools> University of Texas at Austin> Psychology (PSY) > PSY 394U> Populations, samples, and parameters

DOC PREVIEW

UT PSY 394U - Populations, samples, and parameters

School name University of Texas at Austin

Course Psy 394u- Introduction to Cognitive Science

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Populations, samples, and parametersA little bit about data analysis and statisticsPSY 394U – Do-It-Yourself Statistics Chapter 2 Some Preliminaries Populations, samples, and parameters In science, we are interested in collecting data that will tell us about important features of the world, such as whether two groups of measurements are different or not (e.g. the speed of light measured in one direction vs. another), or whether one variable influences another or not (e.g. the influence of axon diameter on conduction velocity). In order to truly know the exact state of some aspect of the world, we would have to know every possible value of interest, and this collection of values is known as a population. Rarely, it is possible to measure an entire population, such as “the average height of seniors at Austin High School.” Much more often, however, it is either theoretically possible but impractical (e.g. the average height of adults alive today living in Austin, Texas) or impossible (e.g. the average height of adult humans). In our experiments, then, we collect a sample of data and, if the experiment is done well, the properties of the sample will embody all of the important properties of the population that we wish to examine. The relationship between a population and random samples drawn from it is fairly easy to appreciate, and it depends most critically on the sample size. Consider the following digital image. This full resolution image can be thought of as a “population” of pixels. By looking at the entire image – the whole population of pixels – it is easy to determine things about the image that are over and above the pixel values per se. For example, it is easy to see that Mickey is smiling or that there are two buttons on the front of his overalls. Now consider the following four images, which consist of random samples of 40%, 10%, 5%, and 2% of the pixels (the remaining pixels have been set to gray). Notice that, as the sample size decreases, the picture becomes increasingly ragged and, importantly, it becomes progressively more difficult to make judgments about the picture. For example, my judgment about the number of buttons would probably change to “one” and “zero” for the 5% and 2% cases, respectively. Obviously, these answers would be wrong, andPSY 394U – Do-It-Yourself Statistics they would be wrong because sample sizes were too small to allow me to correctly make the relevant decision. 40% 10% 5% 2% The relationship between samples of experimental data and populations of interest is directly analogous to the relationship between the original Mickey image and the sampled versions. Notice that there is information even in the rightmost sampled image; the pixels are black where the ears are, yellow where the shoes are, etc. In fact, some people might be able to guess that this was a picture of Mickey Mouse if they were told beforehand that it was a famous cartoon character. In other words, the rightmost sampled image does represent the original image to some extent; it just doesn’t allow us to make decisions about the original image with the same accuracy as does an image containing a greater number of samples (such as the leftmost image). In exactly the same way, small samples of experimental data do not allow us to make judgments about aspects of the population with the same accuracy as do larger samples. In science, we are generally interested in aspects of the population distribution such as the mean, the variance, the median, etc. These are called parameters of the distribution because, like the number of buttons Mickey has, they are not directly obtainable by measurement, they must be computed or inferred from a group of measurements. The word parameter, in fact, means beyond (para) measurement (meter). In a scientific experiment, then, we collect a sample of data from a population, and then estimate some parameter of that population (the mean, for example) by computing the value of that parameter from our sample. There is an additional step, however. Once we have estimated some parameter from our data, it would be extremely informative to know how confident we are in that estimate. Consider Mickey again. If you determine the number of buttons by looking at the 40% image, your answer will be “two.” If I do the same on the 5% image, my answer will be “one.” Clearly, your answer is better than mine is some respect, and it would be extremely valuable to quantify this in some way, that is, to compute not only the value of our parameter (the number of buttons), but also to compute how confident we are in our estimate given our sample size and other factors. This computation – the computation of how certain we our about our parameter estimates – is the key benefit that a statistical analysis yields. Simply put, we wish not only to compute the estimate of our parameter of interest, we also wish to compute the distribution of that parameter if we were toPSY 394U – Do-It-Yourself Statistics measure it over and over again, because this distribution is what tells us how confident we can be in original estimate. To gain an intuition about how this computation might work, consider Mickey one final time. If you were to repeatedly sample 40% of the pixels and judge the number of buttons, your answer would almost always be “two.” In other words, the distribution of your estimated parameter (the number of buttons) would be extremely narrow. If I were to repeatedly sample only 5% of the pixels, however, I would likely produce a range of answers. Sometimes, by chance, I would collect a lot of samples from the left button (as in the above image) and answer “one.” Sometimes I might get a lot of samples from the right button and give the same answer. Sometimes, however, I might get a fair number of samples from both buttons and answer “two,” and sometimes I might get very few samples from either button and answer “zero.” Thus, the distribution of my answers would be quite a bit wider than yours, and from that would could conclude that your original answer was better than mine in the sense that it was more stable over repeated experimentation; we could (and should) assign a much higher confidence to your answer than mine. The distribution of a parameter estimate over repeated experimentation is called a

View Full Document