Front Back
Statistical Inference involves using
statistics computed from data collected in a sample to make statements (inferences) about unknown population parameters.
When trying to find the confidence intervals we must first find μ by....
selecting a simple random sample from the population and compute the sample mean for the data in the sample.
The sample mean of of X is known as
the point estimate of μ
Sample
a subset of the population that we examine in order to gather information -Example: If VCU was an all-male school, the male students in this class would be a sample of the population.
What is added and subtracted from the point estimate to create the intervals?
a margin of error
The interval is referred to as
a confidence interval
The term confidence refers to
the amount of confidence that we have that our interval will contain μ.
With respect to the confidence interval: Since μ is unknown
we will never know for sure whether the interval contains it or not
What are the main levels of confidence used?
90%, 95%, 98%, and 99%
The only way to have 100% confidence is to
know μ
What are the assumptions used with the confidence interval?
We must have a simple random sample from the population, the population must be normal, or the sample size must be large enough for the central limit theorem to apply.
When σ is unknown what equation is unknown
t-distribution
What is the confidence interval for μ when the standard deviation σ is unknown?
100*C%
Equation for t-distribution
X ± t*df (S/ n )
Degrees of Freedom
one less than the sample size, so df = n-1
We use statistics computed from data collected in a sample to
make statements (inferences) about some parameter of a population
The population of interest includes all VCU students who have bowled at least one game.
Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled. What is the population of interest?
The parameter of interest is μ = the mean bowling score of all VCU students in the last game that they bowled.
Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled. What is the parameter of interest?
General Significance Testing Procedure Step 1
State the null and alternative hypotheses, and the significance level α that is going to be used.
General Significance Testing Procedure: Step 2
Carry out the experiment, collect the data, verify the assumptions, and if appropriate compute the value of the test statistic.
General Significance Testing Procedure: Step 3
Calculate the p-value (or rejection region).
General Significance Testing Procedure: Step 4
Make a decision on the significance of the test (reject or fail to reject H0
General Significance Testing Procedure: Step 5
Make a conclusion statement in the words of the original problem. This is the statistical inference.
The population consists of all sales of women's swimwear in 2009, the parameter of interest is μ = the mean cost of all women's swimwear purchased in 2009.
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was \$60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than \$60. What is the Population of interest?
H0: m = \$60
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was \$60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than \$60. What is the null hypothesis?
Ha: m > \$60
It is conjectured that the mean cost of all women's swimwear purchased in 2009 was \$60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than \$60. What is the Alternative hypothesis?
Population
the entire group of individuals (subjects) about which the researcher wants information. - Examples: All U.S. citizens, all male students at this university, all sections of all courses taught this semester at this university.
Parameter
some characteristic of the population that the researcher wants to measure - Examples: Proportion of U.S. citizens who voted in the last Presidential election, average (mean) height of all male students at this university, proportion of all sections of all courses taught by adjunct (par…
Statistic
A descriptive measure, usually computed from a sample, which can be expressed or evaluated numerically. - Example: If VCU was an all male school, the average height of the male students in this class would be a statistic."
Inference
A statement about a population based on the data collected in the sample. One type of inference is using a sample statistic to estimate a population parameter.
Distribution
A listing of all the possible values that a characteristic can take and the number (or percentage) of times that each value occurs. A major component of statistics involves describing the distribution of a set of data.
Descriptive Statistics
Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population.
Descriptive Statistics
Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population and for comparing characteristics among a population.
Inferential Statistics
Branch of statistics in which we use data and statistics computed from a sample to make inferences about a population.
Replication (repetition)
Repeat the measurement several times
Constant
Measurements of some characteristic do not change in repeated trials.
Variable
Measurements of some characteristic vary from trial to trial
Qualitative (or categorical) variable
Measurements vary in kind/type/name but not in degree, meaning that they cannot be arranged in order of magnitude (Gender, Eye Color, Social Security Number)
Quantitative Variable
Measurements vary in magnitude from trial to trial, meaning some order or ranking can be applied. (Number of students, Weight, Grades)
Quantitative Variables
Variables in which measurements vary in magnitude from trial to trial, meaning some order and ranking can be applied. Possible measurements are divided into class intervals. Each measurement should fall in one and exactly one interval.
Discrete Quantitative Variable
Variable whose measurements can only assume a countable number of possible values (Number of students in a specific class, Number of cars in a parking deck)
Continuous Quantitative Variable
Variable who measurements can assume any one of a countless number of values in a line interval. It is usually either a measureable quantity or something that is calculated, such as rates, averages, proportions, and percentages.
Representative
Descriptive name if the characteristics of the population that are important are nearly the same as the sample
Bias
Exists when some subjects or outcomes are systematically favored over others.
Selection Bias
When one or more types of subjects are systematically excluded from the sample.
Nonresponse Bias
When individuals chosen for the sample can't be contacted or fail (or refuse) to respond.
Response Bias
when the respondents give inaccurate information (especially on questions that involve leal or social behavior issues) or if the interviewer influences the subject to respond in a certain way due to the wording of the question.
Haphazard Sample
involves selecting a sample by some convenient mechanism that does not involve randomization
Volunteer Response Sample
Exists when people volunteer to be part of a study
Probability sampling designs
Each member of the population has a positive and equal probability (chance) of being selected for the sample.
Simple Random Sampling
make a list of all possible individuals in the population & randomly choose n of the subjects in such a way that every set of n subjects has an equal chance to be in the sample (n is the sample size). Interviewer has no discretion.
Table of Random Digits
a randomly generated set of digits used to randomly select subjects for the sample.
Stratified random sampling
sampling in which the population is naturally divided into 2 or more groups of similar subjects, called strata, and a representative number of subjects are selected from each strata.
Strata
groups of similar subjects
Multistage Random Sampling
Sampling in which the population is divided into clusters (groups) of individuals and simple random sampling is used to randomly select several of these clusters
Experimental Units
The subjects (individuals, units) on which the measurements are made
Control Group
Group of experimental units who do not receive the treament
Blinding
Occurs when the experimental units do not know to which group they have been assigned
Double-blinding
Occurs when the experimental units, as well as, the people conducting the experiment and have contact with the experimental units also do not know to which group the experimental units have been assigned
Confounding
existence of some factor other than the treatment that makes the treatment and control groups different
Observational Study
a procedure in which we cannot ( or do not) control which experimental units are assigned to the two groups and hence only observe anecdotal evidence.
4 things when describing a distribution
Qualitative Variables
Measurement vary in name or kind only, and cannot be ranked in any order of magnitude. Pie Charts/ Bar Graphs.
Stem and Leaf Plot
Determine center of distribution, determine range or spread of data, determine shape of distribution.
Advantages of stem and leaf plot
Display distribution of data, can be used to determine center, spread, shape, and unusual features of the distribution. Retain actual data, Easy to construct, Making sorting of the data easier.
Disadvantages of stem and leaf plot
Not very effective for large data sets. Choice of stems depends on data type and data range.
Histograms
Unlike stem and leaf plot, does not retain original data.
How to: Histogram
1. Determine number of class intervals to use. 2. Determine the range of the data by subtracting the smallest observation from the largest observation. 3. Divide range by number of class intervals and round to a convenient number. This will be the equal class width.