UW STAT 220 - Correlation and Regression - D1037147

Home> Schools> University of Washington> Statistics (STAT) > STAT 220> Correlation and Regression

DOC PREVIEW

UW STAT 220 - Correlation and Regression

School name University of Washington

Course Stat 220- Basic Statistics

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

IntroductionProbability: Continuous Random NumbersThe Statistical View of RegressionThe Regression FallacyPrediction Errors and ResidualsThe r.m.s. ErrorResidual PlotsNormal Curve Inside a Vertical StripAdvanced Issues and SummaryChapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryStat 220, Part IIICorrelation and RegressionLecture 14A bit more Probability and the Regression FallacyAlso, Chapter 11: R.M.S. Error for RegressionChapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryContextThe regression method from last lecture is used to predict thevalue of y from a given value of x. The calculation is(relatively) simple math.But how sure are we of these estimates?This is where probability enters the picture again. We will needa little bit more probability background.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryDrawing Numbers at RandomWe saw in Lecture 3, that probability theory allows us to assignprobabilities (between 0 and 1) to events like coin tosses. It alsoallows us to randomly choose people or assign them to groups.In a similar manner, one can also draw random numbers out ofa continuous range, following a given probability distribution.Such a distribution can be described using a curve. The curvemust:•Be always above (or touching) zero;•Enclose a total area of 100% below it.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryDrawing Numbers at RandomWe have already met such a curve before:What a happy coincidence... the normal curve is exactly whatwe need now.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryDrawing Numbers from the NormalDistributionProbabilities for drawing out of the normal distribution can onlybe calculated for intervals. The normal table is exactly the placewhere these probabilities can be found. For example:•The probability of drawing a positive number is 0.5•The probability of drawing a number between −1 and +1is about 0.68And so on.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryDeterministic and RandomWhen we use the statistical toolset called regression, we areactually decomposing y into two parts:y = deterministic part + random part(FPP call the random part “chance error”)In our case, the deterministic part is the effect of y’s associationwith x. We call it “deterministic” because the underlyingassumption is that x and y are connected via some causal chain(of course, not necessarily x → y).The random part is a convenient approximation. If we kneweverything that affects y, we wouldn’t need it. But we don’t.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryDeterministic and RandomIn regression, we assume that the random part is related to thenormal distribution:y = x effect + error size × random normal numberWe cannot predict the random normal number (this is exactlywhat we mean by “random”: we cannot predict it!), but weknow some of its properties, e.g. that on the average it is zero.More importantly, the smaller the error size, the better or moreprecise our regression estimate is – because it means that therandom part is smaller on average.Today we will learn how to calculate the regression error size (inFPP terms: “the r.m.s. error”). But first, let us shed light uponthe regression fallacy.Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryExample 1: 9th vs. 10 Grade WASLSuppose that the statewide 9th grade reading WASL test scoresare approximately normal with average 80 and SD 5, and that ayear later the average and SD scores of this cohort’s 10th gradeWASL were the same. Suppose also that r = 0.4 between thetwo tests, and that the 9th vs. 10th grade scatterplot isfootball-shaped.Using the regression method, we can calculate the average10th-grade score of the group who got 90 in 9th grade:zx=90 − 805= 2; zy= zx× 0.4 = 0.8So they scored 84 on average on the second test. Similarly, thegroup who got 70 on the first test, will now average 76.Does school make everyone more mediocre?Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues andSummaryThe regression fallacyWhat is going on? Essentially, nothing but randomness orchance error. You see, the statistical view of test scores holdsnot just for y, but also for x:9th grade score = deterministic part + random part“Deterministic part” is the student’s true reading ability (moreexactly: true r eading-WASL success ability). “Random part” isfactors like how much the student liked the questions, whethershe/he had a good night’s sleep, was in some sort of teenagecrisis, or rather was “on a roll”. Stuff that’s hard to keep trackof.Now think from basic common sense: for the group who scored90 in 9th grade, do you expect the sum total of all theseintractable factors to be positive or negative on the average?Chapter 11IntroductionProbability:ContinuousRandomNumbersTheStatisticalView ofRegressionTheRegressionFallacyPredictionErrors andResidualsThe r.m.s.ErrorResidualPlotsNormal CurveInside aVertical StripAdvancedIssues

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

UW STAT 220 - Correlation and Regression

Sign up for free to view:

Please select your school