A parameter is a number that describes the population In practice the value of a parameter is not known when we cannot examine the entire population A statistic is a number that can be computed from the sample data without making use of any unknown parameters Response variable a k a Dependent Variable The outcome variable on which comparisons are made Explanatory variable a k a Independent variable When the explanatory variable is categorical it defines the groups to be compared with respect to values on the response variable When the explanatory variable is quantitative it defines the change in different numerical values to be compared with respect to the values for the response variable Simple Random Sampling SRS is when every sample of size n from the population has an equal chance of being selected Stratified Sampling involves dividing the population into distinct homogeneous subgroups called strata Then proportionate amounts of SRS from each stratum are selected Cluster Sampling splits the population into several representative groups called clusters and then making a random selection of clusters Include every individual from each of randomly selected clusters in the sample Multistage sampling combines several sampling methods Most surveys use some combination of stratified and cluster sampling as well as SRS Systematic sampling selects according to a random starting point and a fixed periodic interval This interval called the sampling interval is calculated by dividing the population size by the desired sample size Stem and leaf plots and Dotplots These are graphs for the raw data They are useful to Retrospective studies Collect data on describe the pattern of variability in the data especially for small data sets something that has already occurred Prospective studies Identify subjects in Histograms This is a summary graph for a single variable Histograms are useful to advance and collect data as events unfold understand the pattern of variability in the data especially for large data sets Cross sectional studies Takes a snapshot of Line graphs Time plots Use them when there is a meaningful sequence like time a population at a certain time The line connecting the points helps emphasize any change over time Interquartile Range IQR the difference between Q3 and Q1 IQR Q3 Q1 is the range of the middle 50 of the data values and hence the IQR is resistant to outliers Standard Deviation A typical distance of the observations from the mean Mean is not resistant to outliers hence standard deviation is not resistant to outliers The location of the median is at the position n 1 2 in the sorted list Perfectly symmetric the mean equals the median Skewed to the right the mean is larger than the median Skewed to the left the mean is smaller than the median The first quartile a k a lower quartile Q1 is the median of the values below the median in the sorted data set A frequency table is a listing of possible values for a variable a k a together with one or more the following The third quartile upper quartile Q3 isof the eachthe possible value median ofsummary the valuesfor above median in the sorted data set frequency count relative frequencies percentages 100 cumulative percentages Probabilities range from 0 no chance to 1 event has to happen For any event A 0 P A 1 The probability of the complete sample space S must equal 1 P S 1 The probability that an event A does not occur not A equals 1 minus the probability that is does occur P not A P Ac 1 P A The intersection of A and B consists of outcomes that are in both A and B Denoted by The union of A and B consists of outcomes that are in A or in B or in both A and B Denoted by Conditional probabilities reflect how the probability of an event can be different if we know that some other event has occurred or is true When sampling with replacement you put back what you just drew Hence the outcomes of draws are independent When drawing without replacement you do not put back what you just drew A density curve is a curve that is always on or above the horizontal axis i e does not take negative values and has area exactly 1 underneath it A continuous probability model assigns probabilities as areas under a density curve The area under the curve and above any range of values on the horizontal axis is the probability of an outcome in that range
View Full Document
Unlocking...