DOC PREVIEW
UW-Madison STAT 333 - Final Review

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistics 333 Semester Review Spring 2003Chapter 1 — Statistical Inference• causal inference — To infer causality, you need a randomized experiment (or a huge observationalstudy and lots of outside information).• inference to populations — Generalizations to populations are justified by statistics alone under ran-dom sampling. Otherwise, generalizations may be speculative and can only be justified by subjectivejudgment.• null and alternative hypotheses — A null hypothesis is usually a simple statement that there is noeffect or no difference between groups, and is formally an equation about a parameter. We might finddata to be consistent with a null hypothesis, but we do not prove a null hypothesis to be true. Wemay fail to reject a null hypothesis, but we do not accept it based on data. An alternative hypothesisis what we accept if we are able to reject a null hypothesis. Formally, it is usually an inequalityabout one or more parameters.• test statistic — a test statistic is something that can be calculated from sample data. Statisticalinferences are based on the null distribution of the test statistic, the random distribution the statisticwould have if we were able to take several samples of the same size and if the null hypothesis were true.In an experiment where individuals are randomly allocated, we can also consider the randomizationdistribution of the test statistic.• p-value — A p-value is the probability that the test statistic would take on a value at least asextreme (in reference to the alternative hypothesis) as that from the actual data, assuming that thenull hypothesis is true. Small p-values indicate evidence against the null hypothesis. P-values areprobabilities about the values of test statistics, but are not probabilities of hypotheses directly.• permutation and randomization tests — In a randomization test, we ask how unusual the resultsof an experiment are compared to all possible randomizations. A permutation test is identical inpractice, but differs in interpretation because the groups are observed instead of randomly assigned.• confounding — Two variables are confounded when we cannot distinguish between their effects.Chapter 2 — t-Distribution Methods• standard error — The standard error of a statistic is the estimated standard deviation of its samplingdistribution.• Z-ratio and t-ratio — The Z-ratio (z-score) of a statistic that is an estimator for a parameter is the(estimate - parameter)/(standard deviation). In many situations the Z-ratio will have an (approx-imate) standard normal distribution. The t-ratio is (estimate - parameter)/(standard error). Thedifference is that the t-ratio has variability in both the numerator (estimate) and the denominator(standard error), so the t-ratio is a more variable statistic than the Z-ratio. In many situations thet-ratio will have a t distribution with a number of degrees of freedom that is a function of the samplesize.• paired samples versus two independent samples — In a paired setting, pairs of observations aresampled together. It is appropriate to take differences within the pair and to base inferences on thedistribution of these differences. When there are two independent samples, inferences are based onthe distribution of the difference of sample means, which has a different standard error than theBret Larget May 7, 2003Statistics 333 Semester Review Spring 2003mean of paired differences. In a paired design, the effects of confounding factors are often reducedmore efficiently than by randomization.• pooled standard deviation — If the model assumes that there is a common standard deviation, thenit makes sense to pool information from all samples to estimate it. This is an assumption of ANOVAand regression as well. However, if in a two-sample setting there is a large discrepancy in variabilityin the two samples, it is best not to pool.• confidence intervals — A 95% confidence interval is made from a procedure that will contain theparameter for 95% of all samples. The values in a 95% confidence interval are precisely those forwhich the corresponding two-sided hypothesis test would have a p-value larger than 0.05.Chapter 3 — Assumptions• robustness — A statistical procedure is robust if it is still valid when assumptions are not met.• resistant — A statistical procedure is resistant when it does not change much when part of the datachanges, perhaps drastically.• outlier — An outlier is an observation that is far away from the rest of the group.• transformations — Transformed variables (logs are a common choice) often come closer to fittingmodel assumptions.Chapter 4 — Alternative Methods• ranks — Transforming observations to ranks is an alternative to t tools. This transformation may beespecially useful with censored data, where an observation may not be known exactly, but its valuemay be known to be larger than some value. (For example, if the variable being measured is survivaltime and the individual has not died by the end of the study, the survival time is censored.)• permutation test — In a permutation test, the p-value is a proportion of regroupings.• sign test — For paired data, we can consider randomizing the sign of each paired difference (orpermuting the groups within each pair) to find a p-value.• Normal approximation — P-values from permutation tests can be found by a normal approximation.Chapter 5 — One-way ANOVA• pooled estimates of SD — If there are two or more samples and we assume constant variance in thepopulations, the best estimate of the common variance is the pooled variance, s2p, which is a weightedaverage of the sample variances, weighted by their degrees of freedom.• degrees of freedom — For a random sample, the degrees of freedom are one less than the sample size,n − 1.• One-way ANOVA — In a one-way ANalysis Of VAriance, there is a single categorical explanatoryvariable and a quantitative response variable. The null hypothesis is that all population means areequal. To test this hypothesis, we use an F -test. The ANOVA table is an accounting method forcomputing the F test statistic.Bret Larget May 7, 2003Statistics 333 Semester Review Spring 2003• F -distribution — An F distribution has both numerator and denominator degrees of freedom. Themean is near one. P-values in F -tests are areas to the right.• Extra-sum-of-squares F -test — An extra sum of squares F -test is for


View Full Document

UW-Madison STAT 333 - Final Review

Download Final Review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Final Review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Final Review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?