BStat Exam 1 Study Guide Mitchell Reichenberg Chapter 2 Statistics consist of two parts o Descriptive Statistics lots of numbers Draw a picture Calculate a few numbers which summarize the data o Inferential Statistics Take a smaller sample of the population to collect data o Data values of observations are information collected regarding some What are Data subject characteristics o Data are useless without their context o Date are often organized into a data table The rows of a data table correspond to individual cases about different o Respondents individuals who answer a survey o Subjects Participants people in an experiment o Experimental Units animals plants websites or other inanimate Variables the aspect characteristic that differs from subject to subject objects individual to individual o Types Categorical a variable that names categories and answers questions about how cases fall into those categories Quantitative a variable that has measure numerical values with units and the variable tells us about the quantity of what is measured Ordinal categories that have a natural ordering Nominal categories that have no natural ordering Discrete there is a natural gap between the values Continuous the values can be arbitrarily close together Identifier unique identifier assigned to each individual or item in a group Data value of the variables Interval no meaningful zero point o Types Ratio meaningful zero point can multiply and divide Time Series ordered data values over time Cross Sectional data values observed at a single point in time Chapter 3 Surveys and Sampling 1 Examine a part of the whole a Sample smaller groups of individuals b Samples that over or underemphasize some characteristics of the population are said to be biased c Sources of Bias i Selection Bias problem in sampling scheme systematic tendency to exclude one kind of individual from the survey Difference between population of interest and effective ii Non Response Bias subjects don t answer skip questions iii Response Bias subjects lie interviewer effect d Self Selected Sample i More passionate more likely to respond ii Minority opinion more passion a Randomization can protect you against factors that you know are 2 Randomize in the data 3 The sample size is what matters a It is the size of the sample not the size of the population that makes that difference i Exception if the population is small enough and the sample size is more than 10 of the population b The fraction of the population that you have sampled does not matter Population vs Sample o Population entire group of individuals in which we are interested but cant usually assess directly o Sample the part of the population we actually examine o Parameter a number describing a characteristic of the population o Statistic a number describing a characteristic of a sample Sampling Techniques o Non Statistical Sampling Convenience collected in the most convenient manner for the researcher Bias opinions limited to individuals present Voluntary individuals choose to be involved Bias sample design systematically favors a particular outcome o Statistical Sampling individuals in the sample are chosen based on known or calculable probabilities Simple Random every possible sample of a given size has an equal chance of being selected Sampling frame list of population Stratified Random divide population into subgroups strata according to some common characteristics Select a simple random sample from each subgroup Combine samples from subgroups into one Cluster Sampling divide populating into several clusters each representative of the population Select a random sample of clusters all items in the selected clusters can be used or items can be chosen from a cluster using another sampling technique Systematic Random Decide on sample size n Divide ordered frame of N individuals into groups of k individuals k N n Randomly select on individual from the first group Select every k th individual thereafter Exit Polls Stratify on states Choose a simple random sample of polling places in each state o Number of polling places is proportional to the number of voters in each state Choose a simple random sample of voters as they leave the polling place Surveys o Sample Survey designed to ask questions of a small group of people in order to learn something about the entire populations Main Objective to collect accurate and reliable data so that we can make appropriate decisions o Survey Design Define the issue Define the population of interest Develop survey questions Pre test the survey Determine the sampling size and sampling method Select sample and administer the survey o Types of questions Closed End select from a short list of defined choices Open End respondents are free to respond with anything Demographic questions about the respondents personal characteristics Chapter 4 Displaying and Describing Categorical Data o Summarizing categorical data Two way tables Relationships between categorical variables Marginal distribution Conditional distribution Simpson s paradox Summary of Categories o Count Each category has a number of occurrences frequency tables Percentages are useful relative frequency tables o Cross classification table is a good summary contingency tables or two way tables Visualize Categorical Data o Give a clear picture of what the data contains o Emphasize differences or similarities o Bar graphs and pie charts are usually the best Many varieties Height of bar or size of pie slice shows the frequency or percentage for each categories For two or more variables use multiple columns Contingency Tables o To show how opinions on regional foods varied by countries we can display the data in a contingency table where we have added the countries as a new variable o Marginal Distribution of a variable is the total count that occurs when the value of that variable is held constant o Conditional Distribution Explanatory variable predictor cause available variable Response variable predicted effect interesting variable Simpson s Paradox when percentages are inappropriately combined Chapter 5 Displaying and Describing Quantitative Data o Summarizing Numerical Data Stem and Leaf Plots Shape and skewedness Histograms Center mean vs median Boxplots 5 number summary Measuring the spread o Frequency Distribution Continuous Data may take on any value in some interval Summarized in a grouped data frequency table histogram Building a Histogram 1 Determine number of categories
View Full Document