Chapter 11 p is for categorical data and ybar is for quantitative data Normal Model for the sampling distribution of the means o Used when you have the population mean and you want to get information about the sample mean ybar o o SD ybar n Means have a smaller standard deviation than individuals is the standard deviation of the population If the population is N then the sample distribution for the means is N n More degrees of freedom the closer it is to a normal distribution Confidence intervals with means o ybar t dfSE ybar o SE ybar s n o Use T table If S is larger then you can replace Degrees of freedom n 1 When you are centering the distribution on the sample mean use standard Find T in the same way as Z except use standard error instead with s error o SE ybar s n T models are unimodal symmetric and bell shaped o Less degrees of freedom means narrower peak and less normal looking T model assumptions o Independence o Randomization o 10 of population condition T model conditions o Nearly normal condition When n 15 data must be normal When 15 n 40 data must be unimodal and reasonably symmetric When n 40 it is usually fine unless extremely skewed HE SAYS n 25 IS LARGE ENOUGH o Random o Unlike Z tables T tables go from right to left One sample t test o Tdf ybar 0 SE ybar o CENTER IT AT THE HYPOTHESIS NOT THE SAMPLE MEAN When trying to guess a sample size use the z value instead of the t value and if the sample is not very large plug it back in the t model o Assume Z ME Z n and calculate n 1 If n 60 Then use the n for the T value and find Calculator Stat t statistics ME tn1 1 x s n2 Chapter 12 When looking at the t table round down for degrees of freedom Two means o 0 n o If the sample means come from independent samples the varience of their sum or differences is the sum of their variences o SD 0 n 2 n1 2 n2 o Two sample t test the ratio of the difference in the means from our samples to its standard error and compare that ratio to a critical value from a student s t model o DIFFERENT SAMPLES For standard error always use T distribution H0 1 2 0 Degrees of freedom the smaller of n1 1 and n2 1 Assumptions t ybar1 ybar2 1 2 SE SE ybar1 ybar2 s1 2 n1 s2 2 n2 Independence o Randomization condition o 10 condition Nearly normal condition o When n 15 data must be normal o When 15 n 40 data must be unimodal and reasonable symmetric o When n 40 it is usually fine unless extremely skewed o In general n1 n2 40 Independent groups assumption o Two sample t interval ybar1 ybar2 t df SE 2 n2 SE s1 2 n1 s2 For t tests you can pool the data and combine variances using o SEpooled n1 1 s1 o df N1 N2 2 o Equal variance assumption the variances of the two populations 2 n1 n2 2 2 n2 1 s2 from which the samples have been draw from are equal Cannot use two sample t test for paired data because it is not independent Paired t test one sample t test for the mean of the pairwise differences dependent samples SAME SAMPLE o H0 0 o t dbar 0 SE o SE sd n o df n 1 o Confidence interval xbar t n 1 x SE o Assumptions Paired data assumption Independence Randomization condition 10 condition Normal population assumption Nearly normal condition Chapter 6 Scatterplot plots one quantitative variable against another o Asks whether there is an association between the variables o Direction positive or negative o Form linear straightens out etc o Strength how close the points are to each other o Outliers points standing away from the overall pattern of the scatterplot goes on the y axis o Explanatory variable goes on the x axis and the response variable X variable is know as the independent and y is the dependent variable Association means relationship and correlation refers to strength Correlation measures the strength of the linear association between two quantitative variables ASSOCIATION DOES NOT MEAN CAUSATION o Correlation coefficient r zxzy n 1 Shows direction as r can be positive or negative Basically look for how close it is to 1 Correlation treats x and y symmetrically in that the correlation of x with y is the same as the correlation of y with x No units and is not affected by scale or units and is not robust o Conditions Only quantitative variables Must look linear Outliers should most of the time be ignored because they distort the answer o Correlation matrix shows the correlation coefficients between many variables Lurking variable a third variable that affects both the variables you have observed Confounded variable when the effects on a response variable cannot be distinguished from each other Ecological correlations based on rates or averages and tend to overstate the strength of associations Linear regression model equation of a straight line through the data y b0 b1x o Can also be written as Zy r Zx Which is the same as y ybar sx r x xbar sx o b1 r sy sx o Residuals e y y Standard deviation of residuals spread around regression line Se e2 n 2 o Line of best fit least squares line line for which the sum of the squared residuals is the smallest o If b0 is meaningless then we say it is the portion of y that cannot be explained by x units o As x increases by 1 unite the response on average increases by b1 o Interpolation is good but extrapolation should not be used o Regression to the mean because the correlation is always less than 1 0 in magnitude each predicted y tends to be few standard deviations from its mean then its corresponding x is from its mean Assumptions Linearity condition Quantitative data condition Outlier condition Independence assumptions o Can be tested by plotting x vs residual y and this should have no shape or direction Equal spread condition same spread around the line The squared correlation coefficient of determination R2 gives the fraction percentage of the data s variation accounted for by the model and explained by the linear relationship between the variables and 1 r2 is the fraction of the original variation left in the residuals Chapter 14 Estimate a range of y by y B0 B1 Lower case corresponds to an individual value on the regression line Upper case corresponds to a sample mean value on the regression line Population and the Sample o y 0 1X Regression line assumes the means of the y values for each value of x fall exactly on the line Error y 0 1X is normally distributed N 0 Regression Assumptions o Linearity assumption Linearity condition plot looks straight Quantitative …
View Full Document