Topics 1 and 2 Exploring Data and Relationships between 2 Quantitative Variables Statistics is a set of methods for drawing inferences about parameters of populations based on statistics computed from samples Population The entire group of interest Sample A part of the population selected to draw conclusions about the entire population sample size statistic Individual subject a person or any specific object in a population Parameter a population value fixed number population parameter mean Statistic number produced from a sample sample statistic mean x Bias preferring one side over another when sample mean population mean Categorical cannot have an average nominal qualitative unordered car color UIN SSN zip code ordinal rankings star ratings Numerical numerical values for which arithmetic makes sense discrete fixed values number of siblings SAT score continuous can take on any numerical value intermediate values possible age Right skewed Duck beak points right mean is on the right mean is median Five number summary minimum Q1 Q2 median Q3 maximum STAT 1 enter data STAT CALC 1 1 Var Stats L1 Response answer measures outcome of a study dependent y ex alcohol in blood Explanatory question explains or influences response var independent x ex number of beers drank Type Definition When to use mean average of all values symmetrical mean median Measures of Center median middle value resistant to outliers skewed outliers mode most frequently occurring categorical variables std dev symmetrical Measures of Spread IQR Q3 Q1 IQR outliers outside skewed outliers accuracy 1 5 IQR below Q1 above Q3 middle 50 histogram 1 numerical variable scatterplot 2 numerical variables pie chart bar chart 1 categorical variable Graphical Tools stacked bar chart contingency table 2 categorical variables separate boxplots for each 1 categorical and 1 numerical explanatory cat response num Topic 4 Probability Distributions Probability distribution of a random variable X tells us what values X can take and how to assign probabilities 1 Discrete R V discrete probability distribution gives the probability of every single outcome 2 Continuous R V continuous prob dist gives the probability of the R V taking values in an interval probability area under a density curve Normal Distribution X N 2 N 0 16 0 2 variance 16 4 Standardizing and z score z x Normal Q Q quantile Plot no curvature allowed Appropriate binomial distributions Each trial is independent Number of trials is fixed Only two possible outcomes Sampling Distribution list of sample means from many samples of the same size n Cannot just be one sample x one sample mean X average of all other x s Central Limit Theorem 1 Unbiased population mean equals sample mean x n 3 If n is large enough n 30 the sampling distribution will follow the normal x 2 Standard Error CALCULATOR 5 Number Summary STAT EDIT STAT CALC 1 Var Stats L1 Normal Distribution n 30 PRGM NORMAL mean std dev shading left right between or area given values Find Middle of Samples STAT TESTS Z Interval Stats x n C level calculates range Binomial n p x PRGM BERNOULI choose n p x Approximate Distribution 2 p 1 p N p n Proportions n and p categorical sample size PRGM PROP p n Mean and StdDev given mean p std dev p 1 p n p 10 n 1 p 10 n 2nd Vars 2 P value normal 10 normal 10 2 SampleTTest when independent T Test when Matched Pairs PROGRAMS SSIZE sample size n PROP Normal gives probability use 1 p if problem says NOT CH8 p hat x n 1 Confidence interval 2 Hypothesis Test gives z and p value DIST InvT df T value BINOM n p can be alpha value significance value sum of many Probability Out In Lines All out p 0 01 All in p 0 10 Note When p values given in data table if a 2 sided test you divide the p value by 2 Experiments Treatments given Blocked grouped by traits Matched pairs same person is used in each of the groups twins Completely randomized small group of the sample Observations Prospective Retrospective looking at the past Cross sectional specific time and place Statistics is a set of methods for drawing inferences about parameters of population based on statistics computed from samples
View Full Document
Unlocking...