This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistics 11 Lecture 22 Hypothesis Testing (9.3-9.6) 1. Introduction In the previous lecture you learned about confidence intervals. In Chapter 9 you learn about tests of significance. Recall that in STATISTICAL INFERENCE, the parameters are usually not known, and we draw conclusions from outcomes (i.e. statistics, the outcomes of samples) to make statements about the underlying parameters. This is the basic idea in Chapter 9: we make assumptions about the value of unknown parameters, and then test to see if those assumptions could have led to the outcomes (samples and their statistics) we observed. We then use a probability calculation to express the strength of our conclusions, stated as a chance (probability). Statistical significance is about deciding whether differences observed between a sample outcome and the population parameter are "real" or whether they might well just be due to chance. The sample outcomes can be groups of people who were assigned “treatment” or “control” in a randomized experiment. They can also be companies with different characteristics that you observed, rather than people who were treated differently. 2. An Example using the IRS A senator introduces a bill to simplify the tax code. His claim is the bill is revenue-neutral. Basically it won't change the amount of taxes the government collects, it just simplifies the tax law. His bill can be evaluated. The IRS could SAMPLE from the POPULATION of all tax returns, then figure out the effect the proposed bill would have on these revenues, and then check to see if the bill is really revenue-neutral. Suppose the IRS randomly samples 100 forms. The sample average (x-bar) comes out to -$219 which we interpret as the government would have collected 219 fewer dollars from taxpayers. Let us make the unrealistic assumption that the population standard deviation (sigma) is known and it is $725. (In practice, we might use s, the sample standard deviation as an estimate of sigma) The senator argues that the standard deviation is so large, $725, that an average of -$219 is inconsequential. He thinks he knows something about statistics and reasons that -$219 plus or minus $725 could be interpreted as the bill may have a margin of error running anywhere from -$1,669 to $1,231. The IRS counters by saying that it is incorrect to use a Standard Deviation in this manner. What the Senator really needs is the standard error (also called standard deviation of the sample distribution), that is nσ, and to put that around the sample average of -219. The IRS's argument is what you want to learn. To understand the -219 and the 725, you need to convert the population Standard Deviation to a standard error (also called standard deviation of the sample distribution) of sample means using the formula: nσ. Remember what the standard error (also called standard deviation of the sample distribution) is: it is variation association with the sample statistic (here, a distribution of averages or means). It is the variation of all possible sample outcomes. We are arguing here over one sample out of countless possible samples. The IRS goes on to say, the senator may think and argue that the population parameter is $0, but the IRS says it's not and in fact, they think it's negative based on the sample of 100 returns. How do they figure that? First, the IRS calculates a standard error (also called standard deviation of the sample distribution) of sample meansStatistics 11 Lecture 22 Hypothesis Testing (9.3-9.6) 5.72100725==nσ so if you were to construct a confidence interval around the -219, say, a 95% confidence interval, is zero in that interval? Second, they set up a "test" and use Z as the "test statistic" 02.35.72219100/7250219/−=−=−−=−=nxZσµ it’s about Z = -3.0 This test statistic says, in a way, that if the true parameter was zero dollars and the samples have a variation of $72.5 then the chance that you could have picked a sample of size 100 with a mean of -219 is about .00135 or about 1/10 of a percent. This is the area to the left of -3.0 under the standard normal curve in Table IV. 3. Definitions A. The NULL HYPOTHESIS is that the observed results are due to chance alone. That is, any differences between the parameter (the expected value) and the observed (or actual) outcome was due to chance alone. In this case, the null hypothesis is a statement about a parameter: the population average is 0. It is generally written H0 and in this example, it would be written: B. The ALTERNATIVE HYPOTHESIS suggests that the observed results (sample outcomes) are due to more than just chance. It implies that the statement about the NULL is not correct (i.e. the proposed parameter is wrong) and any observed differences are real, not just luck. Usually, the ALTERNATIVE is what we're setting out to prove. The NULL is like a "straw man" that we set up to knock down. In this example, the alternative is written 0<µ:aH Notice that it is simply a restatement of the null in terms of the hypothetical parameter value. (see page 301) C. The TEST STATISTIC measures how different the observed results are from what we would expect to get if the null hypothesis were true. When using the normal curve, the test statistic is z, where z = (observed value - expected value)/(appropriate measure of spread) 02.35.72219100/7250219/−=−=−−=−=nxZσµ about 3.0 All a Z does is it tells you how many standard errors away the observed value is from the expected value when the expected value is calculated by using the NULL HYPOTHESIS. In this example: D. The SIGNIFICANCE LEVEL (or P-VALUE). This is the chance of getting results as or more extreme than what we got, IF the null hypothesis were true. P-VALUE could also be called "probability value" and it is simply the “tail” area associated with the calculated Z. p-values are always "if-then" statements: "If the null hypothesis were true, then there would be a p% chance to get these kind of results." The less probable an outcome is, the stronger the evidence that we would reject the null in favor of the alternative. 0:0=µHStatistics 11 Lecture 22 Hypothesis Testing (9.3-9.6) STATISTICAL SIGNIFICANCE and SIGNIFICANCE LEVELS. It is common practice to decide upon a fixed value before testing and to make that fixed value "decisive". The decisive value for a p-value is called a SIGNIFICANCE LEVEL and its symbols is ALPHA (α). So if you choose an alpha of .05


View Full Document

UCLA STAT 11 - Lecture 22

Download Lecture 22
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 22 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 22 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?