Unformatted text preview:

OverviewComparing Two SamplesThe t-test and Other BeastsCheating with Tests (Ch. 29)Motivation for CheatingMoving the Goal PostsMultiple Testing: ``A Fishing Expedition''Final Words for the CourseChapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseStat 220, Part VIIIHypothesis Tests and Statistical DecisionsLecture 23More Tests (Ch. 26-28)A Critical Look at Tests (Ch. 29)Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseOverviewLast lecture we learned the concept of hypothesis tests. In away, these tests provide society with a statistical court of law.•The stage is set with two decision options•Evidence is gathered and presented•It is weighed against other considerations (i.e., the built-inadvantage of the Null)•If the evidence satisfies the requirement of “a reasonableperson”, then a new decision is reached (the Null isrejected).•If not, the status-quo continues.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseOverviewNote: The hypothesis-test template is generic. There are manyforms and types of tests, not just the one we learned.In this lecture, last of the course, we will see a couple more testtypes that are extremely commonly used.Then we will briefly review some common fallacies andcontroversies regarding tests.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseComparing Two Samples (Ch. 27):Example 1400 draws are made at random with replacement from box A,and independently 100 draws are made at random withreplacement from box B. We got the sample averages and SD’sshown below.Box A Box BAverage = 110 Average = 90SD = 60 SD = 40Are the actual averages of the boxes different?We set up the Null: zero difference, vs. Alternative: nonzerodifference.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseComparing two samples: Example 1We need to estimate the EV and SE for the difference betweenthe two box averages.Box A Average = 110, SE = 3Box B Average = 90, SE = 4Our point estimate for the difference is just 110 − 90 = 20. TheSE estimate is more complicated:Standard Error for the difference of two independent quantitiesisqSE2a+ SE2b,where a denotes the first quantity and b the second quantity.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseExample 1, ConcludedSo standard error for the difference isSE of difference =p32+ 42= 5,because the two samples are independent. The z-score isDifferenceSE of Difference=205= 4,which leads to a two-sided two-sample P-value of less than0.01% (we can still use the normal approximation here!).The difference is highly significant. The boxes seem to havedifferent averages.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseThe two-sample Test andRandomized ExperimentsThe Two-sample Test is also how treatment vs. controlrandomized trials are evaluated.Note: Generally, the method is designed for two independentsamples. But we can use them to compare the treatment andthe control group in a randomized controlled experiment – eventhough the groups are dependent.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseT-test (Ch. 26.6)Calculation of the z statistic is based on the Null. However, inthe tests we learned so far, the Null only provides the average ofthe box. The test requires an SE, and for this the bootstrap isused.From a strict perspective, using a z-test is wrong. An(approximately) normal r.v. whose SE is bootstrap-estimated, isnot normal anymore. Rather, it obeys the t-distribution. So wedo the same calculation, but look up a different table: the z-testbecomes a t-test (we’re not doing t-tests this course; no time).Note: the t-distribution applies whenever we calculate the SD separately from theaverage; so percentages do not follow the t-distribution, because the 0-1 box SDestimate is a function of the average. Bottom line: for p ercentages we still usethe z-test.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseThe t CurvesThe t distribution is not a single curve, but a family of curveswith different degrees of freedom (d.f.). For one-sampleproblems, the d.f. are just# of Observations - 1.We’ve already met the t curve with 1 d.f.: it is the terribleCauchy curve that doesn’t obey the CLT! Lesson: with 2observations, there is very little you can say about your data.−4 −2 0 2 40.0 0.1 0.2 0.3 0.4Normal1 d.f. (Cauch10 d.f.25 d.f.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch. 29)Motivationfor CheatingMoving theGoal PostsMultipleTesting: “AFishingExpedition”Final Wordsfor theCourseWhy do People Cheat withStatistical Tests?The two main objectives of research should be to understandthe world better, and to make it better.However, research not recognized by other scientists, is simplynonexistent. Unfortunately, the first actual objective ofresearch nowadays, is to produce results that will be publishedin a peer-reviewed journal.Most journals only publish experiments with statisticallysignificant results. Now,1 Testing is an open problem in many respects;2 Data come “polluted” with noise and outliers, and so acertain amount of legitimate manipulation is needed;3 The rest, as they say, is history.Chapter 27OverviewComparingTwo SamplesThe t-testand OtherBeastsCheatingwith Tests(Ch.


View Full Document

UW STAT 220 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?