Final Study Guide
68 Cards in this Set
Front  Back 

Statistical Inference involves using

statistics computed from data collected in a sample to make statements (inferences) about unknown population parameters.

When trying to find the confidence intervals we must first find μ by....

selecting a simple random sample from the population and compute the sample mean for the data in the sample.

The sample mean of of X is known as

the point estimate of μ

Sample

a subset of the population that we examine in order to gather information
Example: If VCU was an allmale school, the male students in this class would be a sample of the population.

What is added and subtracted from the point estimate to create the intervals?

a margin of error

The interval is referred to as

a confidence interval

The term confidence refers to

the amount of confidence that we have that our interval will contain μ.

With respect to the confidence interval: Since μ is unknown

we will never know for sure whether the interval contains it or not

What are the main levels of confidence used?

90%, 95%, 98%, and 99%

The only way to have 100% confidence is to

know μ

What are the assumptions used with the confidence interval?

We must have a simple random sample from the population, the population must be normal, or the sample size must be large enough for the central limit theorem to apply.

When σ is unknown what equation is unknown

tdistribution

What is the confidence interval for μ when the standard deviation σ is unknown?

100*C%

Equation for tdistribution

X ± t*df (S/ n )

Degrees of Freedom

one less than the sample size, so df = n1

We use statistics computed from data collected in a sample to

make statements (inferences) about some parameter of a population

The population of interest includes all VCU students who have bowled at least one game.

Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled.
What is the population of interest?

The parameter of interest is μ = the mean bowling score of all VCU students in the last game that they bowled.

Of interest is to make inference about the mean bowling score of all VCU students in the last last game that they bowled.
What is the parameter of interest?

General Significance Testing Procedure Step 1

State the null and alternative hypotheses, and the significance level α that is going to be used.

General Significance Testing Procedure: Step 2

Carry out the experiment, collect the data, verify the assumptions, and if appropriate compute the value of the test statistic.

General Significance Testing Procedure: Step 3

Calculate the pvalue (or rejection region).

General Significance Testing Procedure: Step 4

Make a decision on the significance of the test (reject or fail to reject H0

General Significance Testing Procedure: Step 5

Make a conclusion statement in the words of the original problem. This is the statistical inference.

The population consists of all sales of women's swimwear in 2009, the parameter of interest is μ = the mean cost of all women's swimwear purchased in 2009.

It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the Population of interest?

H0: m = $60

It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the null hypothesis?

Ha: m > $60

It is conjectured that the mean cost of all women's swimwear purchased in 2009 was $60, and of interest is to test this conjecture versus the alternative that the mean cost of all women's swimwear purchased in 2009 was actually greater than $60.
What is the Alternative hypothesis?

Population

the entire group of individuals (subjects) about which the researcher wants information.
 Examples: All U.S. citizens, all male students at this university, all sections of all courses taught this semester at this university.

Parameter

some characteristic of the population that the researcher wants to measure
 Examples: Proportion of U.S. citizens who voted in the last Presidential election, average (mean) height of all male students at this university, proportion of all sections of all courses taught by adjunct (par…

Statistic

A descriptive measure, usually computed from a sample, which can be expressed or evaluated numerically.
 Example: If VCU was an all male school, the average height of the male students in this class would be a statistic."

Inference

A statement about a population based on the data collected in the sample. One type of inference is using a sample statistic to estimate a population parameter.

Distribution

A listing of all the possible values that a characteristic can take and the number (or percentage) of times that each value occurs. A major component of statistics involves describing the distribution of a set of data.

Descriptive Statistics

Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population.

Descriptive Statistics

Branch of statistics concerned with numerical and graphical techniques for describing one or more characteristics of a population and for comparing characteristics among a population.

Inferential Statistics

Branch of statistics in which we use data and statistics computed from a sample to make inferences about a population.

Replication (repetition)

Repeat the measurement several times

Constant

Measurements of some characteristic do not change in repeated trials.

Variable

Measurements of some characteristic vary from trial to trial

Qualitative (or categorical) variable

Measurements vary in kind/type/name but not in degree, meaning that they cannot be arranged in order of magnitude (Gender, Eye Color, Social Security Number)

Quantitative Variable

Measurements vary in magnitude from trial to trial, meaning some order or ranking can be applied. (Number of students, Weight, Grades)

Quantitative Variables

Variables in which measurements vary in magnitude from trial to trial, meaning some order and ranking can be applied. Possible measurements are divided into class intervals. Each measurement should fall in one and exactly one interval.

Discrete Quantitative Variable

Variable whose measurements can only assume a countable number of possible values (Number of students in a specific class, Number of cars in a parking deck)

Continuous Quantitative Variable

Variable who measurements can assume any one of a countless number of values in a line interval. It is usually either a measureable quantity or something that is calculated, such as rates, averages, proportions, and percentages.

Representative

Descriptive name if the characteristics of the population that are important are nearly the same as the sample

Bias

Exists when some subjects or outcomes are systematically favored over others.

Selection Bias

When one or more types of subjects are systematically excluded from the sample.

Nonresponse Bias

When individuals chosen for the sample can't be contacted or fail (or refuse) to respond.

Response Bias

when the respondents give inaccurate information (especially on questions that involve leal or social behavior issues) or if the interviewer influences the subject to respond in a certain way due to the wording of the question.

Haphazard Sample

involves selecting a sample by some convenient mechanism that does not involve randomization

Volunteer Response Sample

Exists when people volunteer to be part of a study

Probability sampling designs

Each member of the population has a positive and equal probability (chance) of being selected for the sample.

Simple Random Sampling

make a list of all possible individuals in the population & randomly choose n of the subjects in such a way that every set of n subjects has an equal chance to be in the sample (n is the sample size). Interviewer has no discretion.

Table of Random Digits

a randomly generated set of digits used to randomly select subjects for the sample.

Stratified random sampling

sampling in which the population is naturally divided into 2 or more groups of similar subjects, called strata, and a representative number of subjects are selected from each strata.

Strata

groups of similar subjects

Multistage Random Sampling

Sampling in which the population is divided into clusters (groups) of individuals and simple random sampling is used to randomly select several of these clusters

Experimental Units

The subjects (individuals, units) on which the measurements are made

Control Group

Group of experimental units who do not receive the treament

Blinding

Occurs when the experimental units do not know to which group they have been assigned

Doubleblinding

Occurs when the experimental units, as well as, the people conducting the experiment and have contact with the experimental units also do not know to which group the experimental units have been assigned

Confounding

existence of some factor other than the treatment that makes the treatment and control groups different

Observational Study

a procedure in which we cannot ( or do not) control which experimental units are assigned to the two groups and hence only observe anecdotal evidence.

4 things when describing a distribution

Center, Spread, Shape, Unusual Features.

Qualitative Variables

Measurement vary in name or kind only, and cannot be ranked in any order of magnitude. Pie Charts/ Bar Graphs.

Stem and Leaf Plot

Determine center of distribution, determine range or spread of data, determine shape of distribution.

Advantages of stem and leaf plot

Display distribution of data, can be used to determine center, spread, shape, and unusual features of the distribution. Retain actual data, Easy to construct, Making sorting of the data easier.

Disadvantages of stem and leaf plot

Not very effective for large data sets. Choice of stems depends on data type and data range.

Histograms

Unlike stem and leaf plot, does not retain original data.

How to: Histogram

1. Determine number of class intervals to use.
2. Determine the range of the data by subtracting the smallest observation from the largest observation.
3. Divide range by number of class intervals and round to a convenient number. This will be the equal class width.
