Final Study Guide
68 Cards in this Set
Front | Back |
---|---|
Statistical Inference involves using
|
statistics computed from data collected in a sample to make statements (inferences) about unknown population parameters.
|
When trying to find the confidence intervals we must first find μ by....
|
selecting a simple random sample from the population and compute the sample mean for the data in the sample.
|
The sample mean of of X is known as
|
the point estimate of μ
|
Sample
|
a subset of the population that we examine in order to gather information
-Example: If VCU was an all-male school, the male students in this class would be a sample of the population.
|
What is added and subtracted from the point estimate to create the intervals?
|
a margin of error
|
The interval is referred to as
|
a confidence interval
|
The term confidence refers to
|
the amount of confidence that we have that our interval will contain μ.
|
With respect to the confidence interval: Since μ is unknown
|
we will never know for sure whether the interval contains it or not
|
What are the main levels of confidence used?
|
90%, 95%, 98%, and 99%
|
The only way to have 100% confidence is to
|
know μ
|
What are the assumptions used with the confidence interval?
|
We must have a simple random sample from the population, the population must be normal, or the sample size must be large enough for the central limit theorem to apply.
|
When σ is unknown what equation is unknown
|
t-distribution
|
What is the confidence interval for μ when the standard deviation σ is unknown?
|
100*C%
|
Equation for t-distribution
|
X ± t*df (S/ n )
|
Degrees of Freedom
|
one less than the sample size, so df = n-1
|
We use statistics computed from data collected in a sample to
|
make statements (inferences) about some parameter of a population
|
The population of interest includes all VCU students who have bowled at least one game.
|
Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled.
What is the population of interest?
|
The parameter of interest is μ = the mean bowling score of all VCU students in the last game that they bowled.
|
Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled.
What is the parameter of interest?
|
General Significance Testing Procedure Step 1
|
State the null and alternative hypotheses, and the significance level α that is going to be used.
|
General Significance Testing Procedure: Step 2
|
Carry out the experiment, collect the data, verify the assumptions, and if appropriate compute the value of the test statistic.
|
General Significance Testing Procedure: Step 3
|
Calculate the p-value (or rejection region).
|
General Significance Testing Procedure: Step 4
|
Make a decision on the significance of the test (reject or fail to reject H0
|
General Significance Testing Procedure: Step 5
|
Make a conclusion statement in the words of the original problem. This is the statistical inference.
|
The population consists of all sales of women's swimwear in 2009, the parameter of interest is μ = the mean cost of all women's swimwear purchased in 2009.
|
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the Population of interest?
|
H0: m = $60
|
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the null hypothesis?
|
Ha: m > $60
|
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the Alternative hypothesis?
|
Population
|
the entire group of individuals (subjects) about which the researcher wants information.
- Examples: All U.S. citizens, all male students at this university, all sections of all courses taught this semester at this university.
|
Parameter
|
some characteristic of the population that the researcher wants to measure
- Examples: Proportion of U.S. citizens who voted in the last Presidential election, average (mean) height of all male students at this university, proportion of all sections of all courses taught by adjunct (par…
|
Statistic
|
A descriptive measure, usually computed from a sample, which can be expressed or evaluated numerically.
- Example: If VCU was an all male school, the average height of the male students in this class would be a statistic."
|
Inference
|
A statement about a population based on the data collected in the sample. One type of inference is using a sample statistic to estimate a population parameter.
|
Distribution
|
A listing of all the possible values that a characteristic can take and the number (or percentage) of times that each value occurs. A major component of statistics involves describing the distribution of a set of data.
|
Descriptive Statistics
|
Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population.
|
Descriptive Statistics
|
Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population and for comparing characteristics among a population.
|
Inferential Statistics
|
Branch of statistics in which we use data and statistics computed from a sample to make inferences about a population.
|
Replication (repetition)
|
Repeat the measurement several times
|
Constant
|
Measurements of some characteristic do not change in repeated trials.
|
Variable
|
Measurements of some characteristic vary from trial to trial
|
Qualitative (or categorical) variable
|
Measurements vary in kind/type/name but not in degree, meaning that they cannot be arranged in order of magnitude (Gender, Eye Color, Social Security Number)
|
Quantitative Variable
|
Measurements vary in magnitude from trial to trial, meaning some order or ranking can be applied. (Number of students, Weight, Grades)
|
Quantitative Variables
|
Variables in which measurements vary in magnitude from trial to trial, meaning some order and ranking can be applied. Possible measurements are divided into class intervals. Each measurement should fall in one and exactly one interval.
|
Discrete Quantitative Variable
|
Variable whose measurements can only assume a countable number of possible values (Number of students in a specific class, Number of cars in a parking deck)
|
Continuous Quantitative Variable
|
Variable who measurements can assume any one of a countless number of values in a line interval. It is usually either a measureable quantity or something that is calculated, such as rates, averages, proportions, and percentages.
|
Representative
|
Descriptive name if the characteristics of the population that are important are nearly the same as the sample
|
Bias
|
Exists when some subjects or outcomes are systematically favored over others.
|
Selection Bias
|
When one or more types of subjects are systematically excluded from the sample.
|
Nonresponse Bias
|
When individuals chosen for the sample can't be contacted or fail (or refuse) to respond.
|
Response Bias
|
when the respondents give inaccurate information (especially on questions that involve leal or social behavior issues) or if the interviewer influences the subject to respond in a certain way due to the wording of the question.
|
Haphazard Sample
|
involves selecting a sample by some convenient mechanism that does not involve randomization
|
Volunteer Response Sample
|
Exists when people volunteer to be part of a study
|
Probability sampling designs
|
Each member of the population has a positive and equal probability (chance) of being selected for the sample.
|
Simple Random Sampling
|
make a list of all possible individuals in the population & randomly choose n of the subjects in such a way that every set of n subjects has an equal chance to be in the sample (n is the sample size). Interviewer has no discretion.
|
Table of Random Digits
|
a randomly generated set of digits used to randomly select subjects for the sample.
|
Stratified random sampling
|
sampling in which the population is naturally divided into 2 or more groups of similar subjects, called strata, and a representative number of subjects are selected from each strata.
|
Strata
|
groups of similar subjects
|
Multistage Random Sampling
|
Sampling in which the population is divided into clusters (groups) of individuals and simple random sampling is used to randomly select several of these clusters
|
Experimental Units
|
The subjects (individuals, units) on which the measurements are made
|
Control Group
|
Group of experimental units who do not receive the treament
|
Blinding
|
Occurs when the experimental units do not know to which group they have been assigned
|
Double-blinding
|
Occurs when the experimental units, as well as, the people conducting the experiment and have contact with the experimental units also do not know to which group the experimental units have been assigned
|
Confounding
|
existence of some factor other than the treatment that makes the treatment and control groups different
|
Observational Study
|
a procedure in which we cannot ( or do not) control which experimental units are assigned to the two groups and hence only observe anecdotal evidence.
|
4 things when describing a distribution
|
Center, Spread, Shape, Unusual Features.
|
Qualitative Variables
|
Measurement vary in name or kind only, and cannot be ranked in any order of magnitude. Pie Charts/ Bar Graphs.
|
Stem and Leaf Plot
|
Determine center of distribution, determine range or spread of data, determine shape of distribution.
|
Advantages of stem and leaf plot
|
Display distribution of data, can be used to determine center, spread, shape, and unusual features of the distribution. Retain actual data, Easy to construct, Making sorting of the data easier.
|
Disadvantages of stem and leaf plot
|
Not very effective for large data sets. Choice of stems depends on data type and data range.
|
Histograms
|
Unlike stem and leaf plot, does not retain original data.
|
How to: Histogram
|
1. Determine number of class intervals to use.
2. Determine the range of the data by subtracting the smallest observation from the largest observation.
3. Divide range by number of class intervals and round to a convenient number. This will be the equal class width.
|