Unformatted text preview:

STT 231 Exam Study GuideThis guide will take you through the exam question by question. There are 12 questions in all, each covering a different topic. The topics are covered in chronological order from the beginning of the class to the end of class. I have included all relevant examples from class with the answers at the end of the document.Good luck!First, let’s go through some basics that will be needed to understand the rest of the material:Data: any observations that have been collected.Statistics: the collecting, organizing, analyzing, summarizing, and interpretation of data.Sample: a subset of a population- A random sample has no bias, everyone has an equal chance to be picked to answer- A nonrandom sample is a convenient sample and the conclusions from it maybe meaninglessSample size: n, the number of observations or data values in the samplePopulation size: N, the number of data values in the population- n  N- n = N in a censusQUESTION 1 – A multiple choice question concerning the differences between categorical and numerical data.There are two types of data:- Qualitative/categorical data : non-numeric datao Examples: hair color, eye color, skin color, fabric, etc.o You can’t use mathematical equations on this data and have a meaningful answer For example, zip codes are made of numbers but performing mathematical functions on them does not yield meaningful results- Quantitative data : numerical datao Examples: weight, height, temperature, batting average, age, etc.o Mathematical operations can be used on this data and will yield meaningful resultso Two kinds: Discrete data : countable or finite data, only whole numbers- Examples: apples, people, students, etc. Continuous data : infinite number of possible outcomes- Examples: height, weight, distance, time, etc.The difference between discrete and continuous data will likely not be needed for this problem, but will be needed later on.QUESTION 2 – Find the mean, median, Q1, Q3, and interquartile range (IQR)The mean is an arithmetic average and is found by adding all of the data values and dividing by the total number of values:The sample mean can be found using this equation:The population mean can be found using this equation:The median is the middle value of a data set. To find the mean the data must be in order. There are two ways to find the median:- If there is an odd number of values, the median is the middle number.- If there is an even number of values, the median is the mean of the two values in the middle.Remember: the median is not affected by outliers, but the mean is.The mode is the most commonly occurring data value. A data set can be bi-modal, multi-modal, or have no mode.In each of the types of distribution, the relative placement of the mean, median, and mode differs, as follows:- In symmetric or normal distribution, the mean, median, and mode are all in the center:- In a left-skewed distributions, the mean is the smallest, the mode is the largest, and the median is in between the two.- In a right-skewed distribution, the mode is the smallest value and the mean isthe highest value, with the median in between:Quartiles:The first quartile (Q1) is the number the cuts off the bottom 25% of a sorted data set. The second quartile (Q2) is the number that cuts off the bottom 50% of a sorted data set and is equivalent to the median. The third quartile (Q3) is the number that cuts off the bottom 75% of a sorted data set.Here is an example of finding Q1, the median, and Q3 of a data set with an odd number of values:1, 2, 5, 11, 15, 23, 20Here is an example of finding Q1, the median, and Q3 of a data set with an even number of values:1, 2, 5, 11, 15, 23, 30, 37So the median would be the average of 11 and 15, which is 13, Q1 would be the average of 2 and 5, which is 3.5, and Q3 would be the average of 23 and 30, which is 26.5.Another example of finding the Q1, the median, and Q3 of a data set with an even number of values is a little different:0, 1, 2, 5, 11, 15, 23, 30, 37, 40So the median is again 13, but this time, Q1 is simply 2 because there is an odd number of values in the lower half of the data set. Same goes for Q3, which is 30.The interquartile range, or IQR, is simply the difference between Q3 and Q1.QUESTION 3 – The 68-95-99.7 ruleThe empirical rule, or 68-95-99.7 rule, refers to a normally distributed set of data. This rule states:- 68% of the data lies within 1 standard deviation of the mean- 95% of the data lies within 2 standard deviations of the mean- 99.7% of the data lies within 3 standard deviations of the mean.If a data value lies within 2 standard deviations of the mean, it is considered a “usual” value. Outside of this range, the value is “unusual.” Outside of 3 standard deviations is considered “very rare.”To work these kinds of problems, you need to know how to find a z-score. The z-score is the number of standard deviations a value is away from the mean and is found using the following equations for sample z-score and population z-score, respectively:The standard deviation cannot be compared across data sets so the z-score gives you a way to do this.Ex 1. Weights are normally distributed, with a mean of 34 lbs and a standard deviation of 8 lbs. Find out what percentage of the sample will fall between 10 lbs and 58 lbs.Press Ctrl+F “Example 1” to see the solutionQUESTION 4 – Finding the correlation coefficient and a least-squares regression lineThe correlation coefficient (r) measures the strength of the linear relationship between two quantitative variables. It falls within the range of -1 to 1. An r value of 1 means there is a perfect, positive linear relationship between the two variables. An r value of -1 means there is a perfect, negative linear relationship. An r value of 0 means there is no relationship between the variables. It has no unit and can be found using the following formula, which will be given to you on the exam:where each set of parentheses represents the z-score of that variable. Sx and sy are the standard deviations with respect to each variable.An easy to calculate this is to find the z-score for x and y for each observation, take the product of zx and zy for all observations and then take the average of these values.A correlation coefficient only measures linear relationships, and so will not work fordata that correlates in a different way, such as grouped or curved data.You can use the correlation


View Full Document

MSU STT 231 - Exam Study Guide

Documents in this Course
Load more
Download Exam Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exam Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exam Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?