DOC PREVIEW
UT PSY 394U - Study Notes

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Summary of R learned in this chapterPSY 394U – Do-It-Yourself Statistics Chapter 4 One Sample Tests In this chapter, we will follow up with some more concrete examples based upon the concepts introduced in the last chapter. We will learn how to determine whether a descriptive statistic (such as the median, mean, standard deviation, etc.) is or is not consistent with a predicted value. We will also learn how to determine the range of values over which we can expect a statistic to vary, that is, how to determine the sampling distribution. In fact, these are really two side of the same coin. In order to use Monte Carlo and bootstrapping methods, we are going to need to know how to probe our sampling distributions for some information. The three most common types of information we extract from a sampling distribution are 1) the standard error (the standard deviation of the sampling distribution), 2) 95% confidence intervals, and 3) a probability of some particular value arising from the conditions that generated the sampling distribution. These three are tightly coupled, and can be thought of as different ways of expressing the same basic information. Figure 4.1 – The number of people reporting having seen x movies last week vs. x, the number of movies. Histogram of movydatmovies last weekFrequency0 5 10 150 10203040 Consider the data shown in Figure 4.1, which shows a histogram of how many people (out of a sample of 100) have seen x movies in the past week. (These data are available on the class website on the main Homework page; you are encouraged to download them and work through the examples in this chapter.) The data are highly skewed, because the vast majority of people (71%) have either seen 0 or 1 movie in the past week (and 4 people in the sample must be movie critics, having averaged over one movie a day). Let’s say we wanted to find out if people in general watched more than one movie per week. We could tackle this problem a couple different ways. First, let’s take a bad butPSY 394U – Do-It-Yourself Statistics very easy approach: we’ll simply do a one-sample t-test of the hypothesis that our measured mean is greater than 1. We type: > t.test(movydat, mu = 1, alternative = "greater") and R responds: One Sample t-test data: movydat t = 2.0177, df = 99, p-value = 0.02317 alternative hypothesis: true mean is greater than 1 95 percent confidence interval: 1.099157 Inf sample estimates: mean of x 1.56 If you remember what a t-test is about, this should be pretty clear even if you are new to R. If you are rusty on the t-test, however, what the above command is saying is “test to see if the mean of ‘movydat’ is greater than a mean of 1.0”. Don’t worry about the details of the t-test; in a later chapter, once we have become comfortable with the concept of sampling distributions, we will revisit a few of the popular traditional statistical tests. What the output is telling us is that, if the true mean were 1 movie per week, and the data were distributed normally, and we were willing to accept the mean as a good measure of central tendency for these data, then there is about a 2.3% chance (the p-value of 0.023) that we would have seen a mean as large or larger than the one we actually obtained. A more do-it-yourself approach, but one still reliant on the above assumptions follows. First, we compute the standard deviation of the data, and then use it to compute the expected standard error of the sampling distribution of the mean: > my.n = length(movydat) # compute number of samples > my.sd = sd(movydat) # the std. dev. > my.se = my.sd/sqrt(my.n) # the std. err. by CLT Now we can picture what the sampling distribution of the mean should look like – we just need to draw a Gaussian distribution whose mean is our measured mean (1.56), and whose standard deviation is the standard error we just computed (0.28). Since our standard error came out to be just under 0.3, we know that around 99% of the distribution will fall between about 0.7 and 2.5. > x = seq(0.7, 2.5, length = 100) # make an x-axis > my.ing.dist = dnorm(x, 1.56, my.se) # compute the normal distribution > plot(x, my.ing.dist) # take a look at it > abline(v = 1) # draw a line at 1 movie / week The result is shown in Figure 4.2, and should look very much like what you get when you enter the above commands. Notice that this analysis gives us qualitatively the samePSY 394U – Do-It-Yourself Statistics result as the traditional t-test: it looks fairly unlikely but not extremely unlikely that our measured mean, 1.56, and a mean of 1 belong to the same distribution. To be more quantitative about this, we could compute the area of our sampling distribution less than a mean of 1: > pnorm(1, 1.56, .278) And this gives us about a 2.2% chance of seeing a mean as small or smaller than 1 given that the true mean is equal to 1.56, our measured mean. Notice that we’ve asked the mirror image question from the traditional t-test – “could a value of 1 come from a distribution centered on 1.56?” vs. “could a value of 1.56 come from a distribution centered on 1?” – but it amounts to the same thing and we get the same answer.* Figure 4.2 – The sampling distribution of the mean for the number of movies per week by Central Limit Theorem. The dashed line shows that an average of 1 movie per week is unlikely but not extremely unlikely. Alternatively, if we wanted to report our mean value plus or minus the 95% confidence interval, we could use the “qnorm()” function (on a set of actual data, the equivalent function is “quantile()”, and we’ll use this function a little later). >qnorm(c(.025, .975), 1.56, .278) * The small discrepancy comes from the fact that the t-test uses Gosset’s (i.e. Student’s) t distribution rather than the standard normal distribution, which is technically correct when we estimating the population variance from our sample variance. Why this is so is beyond our current scope.PSY 394U – Do-It-Yourself Statistics In English, this function call says “Give me the 2.5 and 97.5 percentiles of a normal distribution whose mean is 1.56 and whose standard deviation is 0.278”. Note that the lower bound (the 2.5%) point, is just above 1.0 (the 2.2% point), as it should be. Thus ends our examination of these data using traditional methods; it should be clear from examining the original data that


View Full Document

UT PSY 394U - Study Notes

Documents in this Course
Roadmap

Roadmap

6 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?