Unformatted text preview:

IES 612/STA 4-573/STA 4-576 Spring 2006 Week 1 – IES612-lecture-week01.doc (updated: 08 January 2006) * roster check . . . * review syllabus . . . * information card information . . . Info Card IES 612 or STA 4-573 Spring 2006 1. Name 2. Department/degree 3. Major/concentration/advisor 4. Previous Stat classes? 5. Previous Math classes? 6. Previous Computing classes/experience? 7. What do you hope to learn from this class? Page 2 8. Something that will help me get to know you better. Page 2 SYLLABUS Regression (5 weeks) Experimental Design (5 weeks) Sampling (2+ weeks) Math modeling (2+ weeks) REVIEW (prerequisite material) * We are moving from DESCRIPTIVE STATISTICS and simple HYPOTHESIS TESTS towards MODELS for describing ASSOCIATION and PREDICTION * CONCEPTS: POPULATION = collection of all units of interest SAMPLE = subset of population selected to represent the population PARAMETERS = characteristic of the population (μ, σ2, ρ, β1) STATISTICS = characteristic of the sample (X , s2, r, b1)Sampling – selecting elements from a population into a sample Inference – making statements about a population based on information in a sample * refer to IES612-lecture-week00.doc for more detailed review suggestions Hypothesis Tests H0 – null/no-effect hypothesis Ha (or H1 or HA) – research or alternative hypothesis Test statistic (TS) Rejection Region / P-value Conclusion Errors? Type I (False Positive); Type II (False Negative) α = Pr(Type I error) = Pr(Reject H0 GIVEN H0 true) β = Pr(Type II error) = Pr(Accept H0 GIVEN Ha true) Power (“sensitivity”)? Pr(reject H0 GIVEN Ha true) = Pr(detect a true difference) Confidence Intervals (point estimate) +/- (multiple) (std. Error) * other ways to forms confidence intervals but this general form applies in many general cases Association Categorical data – multiway tables (see OL Ch. 10) Numeric data – regression data (x1, y1), (x2, y2) … (xn, yn) or in shorthand, (xi, yi) i = 1, …, n Example: Manatee deaths due to motorboats in Florida YEAR Number Boats (1000s) Manatees Killed 77 447 13 78 460 21 79 481 24 80 498 16 81 513 2482 512 20 83 526 15 84 559 34 85 585 33 86 614 33 87 645 39 88 675 43 89 711 50 90 719 47 Graphical display? Scatterplot or scatterdiagram Manatees killed by boats (1977-1990)01020304050600 200 400 600 800Number of Boats (1000s)Series1 Example: Progesterone level as a function of gestation day in sheep pregnant with singletons Singleton Gestation Days Progesterone Singleton 53 3.8 60 5 66 4.5 72 4.2 73 5.5 76 5.8 77 4.678 5.3 78 7.2 79 5.7 80 6 80 6.3 81 4.8 82 5.6 83 4.9 84 4.3 87 4.9 89 4.2 98 3.4 105 4.8 72 5.2 72 5.9 77 5.7 77 2.8 82 6.6 98 6.1 98 9.3 104 7.7 104 5.3 109 7.8Progesterone as a functiongestation days02468100 50 100 150Gestation daSeries1 Basic Model Yi = β0 + β1Xi + εi [“simple linear regression”] Y = response variable (dependent variable) X = predictor variable (independent variable, covariate) Formal assumptions: 1. relation linear – on average error = 0 [ E(εi) = 0 ] –> E(Yi) = β0 + β1Xi2. Constant variance - V(εi) = σ2–> V(Yi) =σ23. εi independent 4. εi ~ Normal Issue of causality Observational versus experimental studies. Why not y = mx + b? Form above can be more easily generalized to more than one predictor variable. β0 = y-intercept, value of “Y” at “X=0” β1 = slope, how “Y” changes with unit change in “X”Which parameter is generally of more interest? Why? β1 = contains information about the relationship between the two variables. Estimating regression coefficients Least squares – minimize Yi−ˆ Y i()2i=1n∑= Yi−ˆ β 0−ˆ β 1Xi()2i=1n∑= Yi− b0− b1Xi()2i=1n∑ Solution: ˆ β 1= b1=Yi−Y ()Xi− X ()i=1n∑Xi− X ()2i=1n∑=SXYSXX ˆ β 0= b0= Y − b1X Interpretation: Units? Interpretation: graphical (quadrants defined by the means) Example (Manatee): b0 = -41.43 and b1 = 0.125 Interpretation: Intercept: When no boats were registered, predict –41.4 manatee deaths ?!?!? Notice that x=0 is well outside the SCOPE of the model. Slope: For each additional x=1 (1000) boats, predict an increase of 0.1 manatee deaths. Maybe a better interpretation, for each additional x=10 (10,000) boats, predict an additional manatee death. How do you deal with the intercept? Reparameterize the model by rescaling the X variable. yi=β0*+β1(xi− x ) +εi [ intercept is the average response at the mean X level] yi=β0**+β1(xi− 447) +εi [intercept is the average response at X=447] Issues Leverage = points with high/low values of the predictor variable X (“outliers” in the X direction) Influential = omitting point causes estimates of the regression coefficients to change dramaticallyOutlier = point with a large residual (more to come!) Estimate of σ2 Recall from your first stat class, s2=yi− y ()2i=1n∑n−1 with “n-1” degrees of freedom Pay penalty b/c mean unknown and estimated by y. How about in regression? Mean at any value of “x” is estimated by ˆ y =b0+b1x So in regression, we estimate the variance by s2=yi−ˆ y ()2i=1n∑n−2 “mean squared residual” “mean squared error” “s” = sample std. dev. around the regression line/ std. error of estimate/residual std. dev. How do we use the estimate of σ2? 1. If ε ~ N, then expect approx. 95% of residuals to be within +/- 2 s of 0 (more to come) 2. Used in inference for the regression coefficients Using SAS to fit the simple regression model /* example sas program that does simple linear regression */ options ls=75; data example1; input year nboats manatees; cards; 77 447 13 78 460 21 79 481 24 80 498 16 81 513 24 82 512 20 83 526 15 84 559 34 85 585 3386 614 33 87 645 39 88 675 43 89 711 50 90 719 47 ; ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\linreg-output.rtf’; proc reg; title ‘Number of Manatees killed regressed on the number of boats registered in Florida’; model manatees = nboats / p r cli clm; plot manatees*nboats=”o” p.*nboats=”+” / overlay; plot r.*nboats r.*p.; run; ODS RTF CLOSE; Analysis of Variance Source DFSum ofSquaresMeanSquareF Value Pr > F Model 1 1711.97866 1711.97866 93.61 <.0001 Error 12 219.44991 18.28749 Corrected Total 13 1931.42857 Root MSE 4.27639R-Square 0.8864Dependent Mean 29.42857Adj R-Sq 0.8769Coeff Var 14.53141 Parameter


View Full Document

MIAMI IES 612 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?