Midterm 3 Study Guide Chapters 7 8 Statistical Inference Observe puzzle or question of interest Identify unit variation of interest Create a causal theory why how x moves y Draw out testable hypotheses from theory Establish research design to test hypotheses Define unit of observation Define temporal spatial domain of study Operationalize variables A quantity of our sample or population that we Now we examine our data and test our hypothesis What are we doing Seek to infer from sample known data to population unknown data Population all occurrences of your phenomenon of interest Sample a subset of the population of interest Take what is known about sample and infer that traits also apply to population Statistical inference is always employed except in rare cases of when population as a whole is surveyed Parameters Parameters Traits that can be quantified like averages differences between groups and relationships among variables Which of the following describes a parameter want to know Roman Alphabet Sample parameter Sample Mean is X Greek Alphabet Population parameter The real value Population Mean is Why do we take samples if we care about the population Feasibility cannot always survey or document every occurrence of a phenomenon Costs survey all cases is almost always prohibitively expensive Practically no need to survey population If done well sample parameters traits will accurately reflect population parameters traits Simple Random Sample Simple Random Sample Ideal standard for a sample Which of the following indicates that every member of the population had an equal chance of being in the sample simple random sample 95 of the area under the standard normal curve falls within two standard deviations of the mean of a simple random sample What sampling technique for a large sample of samples produces the results demonstrated by the Central Limit Theorem Simple random Every case in your population of interest has an equal chance of being selected as part of the sample On average parameters from a simple random sample will reflect the population parameters Simple random samples are useful because of the central limit theorem Central Limit Theorem all sampling distributions follow a normal distribution in the limit i e as they get very large in size False For a variable that is normally distributed in a population all cases fall within two standard deviations of the mean The 68 95 99 Rule Central Limit Theorem the Central Limit Theorem does not say that a variable in a sample is normally distributed Any sample we draw is only one of a number of samples we could draw So we could draw many samples and take the means of all of those samples If we were to then graph those sample means that would show us the sampling distribution The Central Limit Theorem says that the sampling distribution will be normally distributed As our sample size increases The expected value of the sample mean equals the population mean The standard error of the mean decreases Our confidence that our inference is valid grows Why We think our sample mean is the true mean and if it is not it will not be all that far off Standard Deviation of Sampling Distribution Central Limit Theorem The sampling distribution of sample means will approach normality with mean and standard deviation The standard error of our sample tells us the uncertainty in our estimate of the mean without pulling hundreds of repeated samples Standard error of our sample mean equals the standard deviation of our sample over the square root of our sample size What does the standard error tell us An estimate of the population mean An estimate of how confident we are in that mean Also we can see that as our sample size gets larger our confidence will increase Because our standard error will decrease Big Sample Small standard of error The formula for the standard error has the sample size in the denominator As the denominator increases the fraction becomes smaller 8 2 8 4 8 6 8 8 8 10 8 12 8 14 etc So the standard error decreases as the sample size increases Remember smaller standard error means greater confidence in our estimate What is Hypothesis Testing We cannot just look at the difference between the estimates and the hypothetical parameter We have to deal with the uncertainty around our estimate We need to eliminate the possibility that any difference was just the result of random chance Five Step Model for Hypothesis Testing 1 Make assumptions and determine if those assumptions were met Typically we assume random sampling Central Limit Theorem 2 State the null hypothesis Typically there is no difference Exception if we have a directional research hypothesis 3 Determine the sampling distribution and critical region 4 Calculate your test statistic The sampling distribution and the test statistic will depend on the particular hypothesis you re testing 5 Interpret the results Do we reject the null hypothesis Is there a statistically significant relationship Types of Tests Relationship between x y said to be statistically significant at the 95 confidence level Type I Error Rejecting the null when the null is true Saying there is a relationship when there is not Type II Error Accepting the null when the null is false Saying there is not a relationship when there is P Value tells us the probability we would see the observed relationship in the sample data if there were truly no relationship in the population Ranges between 0 1 Lower p values indicate more confidence in the relationship With a larger sample size we should get lower p values Limitations of P values aka statistical significance Subject to manipulation Higher sample size reduces p value by reducing standard error Not comparative with other p values Smaller p value Ex p 05 p 04 does not mean that one relationship is stronger than another Silent as to the validity of our measures Key Point p Value is not conclusive evidence for causality it is one piece of the larger puzzle Statistical significance only detects whether movement in x y is due to chance Note the difference between a test of statistical significance a measure of association Test for statistical significance is the observed relationship between x y due to chance Measure of association gauges the strength or magnitude of the observed relationship between x y A very large sample size will often result in a statistically significant effect even if the actual relationship has little to no substantive significance Review steps
View Full Document