**Unformatted text preview:**

Simple Random Sampling (SRS) is when every sample of size n from the population has an equalchance of being selected. • Stratified Sampling involves dividing the population into distinct homogeneous subgroups called strata. Then proportionate amounts of SRS from each stratum are selected. • Cluster Sampling splits the population into several representative groups called clusters and then making a random selection of clusters. Include every individual from each of randomly selected clusters in the sample. • Multistage sampling combines several sampling methods. Most surveys use some combination of stratified and cluster sampling as well as SRS. • Systematic sampling selects according to a random starting point and a fixed periodic interval.This interval, called the sampling interval, is calculated by dividing the population size by the desired sample size Probabilities range from 0 (nochance) to 1 (event has to happen):For any event A, 0 ≤ P(A) ≤ 1 The probability of the complete sample space S must equal 1: P( S) = 1 The probability that an event A does not occur (not A) equals 1 minus the probability that is does occur: P(not A) = P(Ac )= 1 – P(A)The intersection of A and B consists of outcomes that are in both A and B. (Denoted by � ∩ �)The union of A and B consists of outcomes that are in A or in B or in both A and B. (Denoted by � ∪ �)Conditional probabilities reflect how the probability of an event can be different if we know that some other event has occurred or is true.When sampling with replacement, you put back what you just drew. Hence the outcomes of draws are independent.When drawing without replacement you do not put back what you just drew.A density curve is a curve that is always on or above the horizontal axis (i.e. does not take negative values), and has area exactly 1 underneath it.A continuous probability model assigns probabilities as areas under a density curve. The area under the curve and above any range of values on the horizontal axis is the probability of an outcome in that range.• A parameter is a number that describes the population. In practice, the value of a parameter is not known when we cannot examine the entire population. • A statistic is a number that can be computed from the sample data without making use of any unknown parameters.Response variable (a.k.a. Dependent Variable) The outcome variable on which comparisons are made. Explanatory variable (a.k.a. Independent variable) • When the explanatory variable is categorical, it defines the groups to be compared with respect to values on the response variable. • When the explanatory variable is quantitative, it defines the change in different numerical values to be compared with respect to the values for the response variable.The location of the median is at the position (n + 1)/2in the sorted list Perfectly symmetric, the mean equals the median. Skewed to the right, the mean is larger than the median. Skewed to the left, the mean is smaller than the median.The first quartile (a.k.a.lower quartile),Q1, is the median of the values below the median in the sorted data set. The third quartile (a.k.a. upper quartile), Q3, is the median of the values above the median in the sorted data set.A frequency table is a listing of possible values for a variable, together with one or more of the following summary for each possible value. • frequency count • relative frequencies = ����� ����� • percentages =100% × ����� ����� • cumulative percentages• Interquartile Range (IQR): the difference between Q3 and Q1 IQR = Q3 – Q1 is the range of the middle 50% of the data values and hence the IQR is resistant to outliers. • Standard Deviation A “typical” distance of the observations from the mean. Mean is not resistant to outliers, hence standard deviation is not resistant to outliers.• Retrospective studies Collect data on something that has already occurred • Prospective studies Identify subjects in advance and collect data as events unfold. • Cross-sectional studies Takes a snapshot of a population at a certain time.Stem-and-leaf plots and Dotplots These are graphs for the raw data. They are useful to describe the pattern of variability in the data, especially for small data sets. • Histograms This is a summary graph for a single variable. Histograms are useful to understand the pattern of variability in the data, especially for large data sets. • Line graphs: Time plots Use them when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over

View Full Document