Topics 1 and 2: Exploring Data and Relationships between 2 Quantitative Variables“Statistics is a set of methods for drawing inferences about parameters of populations based on statistics computed from samples.”Population: The entire group of interestSample: A part of the population selected to draw conclusions about the entire population (sample size≠statistic)Individual (subject): a person or any specific object in a populationParameter: a population value, fixed number (population parameter mean= μ)Statistic: number produced from a sample (sample statistic mean= ´x)Bias: preferring one side over another (when sample mean ≠ population mean)Categorical: cannot have an averagenominal- qualitative, unordered (car color, UIN, SSN, zip code)ordinal- rankings (star ratings)Numerical: numerical values for which arithmetic makes sensediscrete- fixed values (number of siblings, SAT score)continuous- can take on any numerical value, intermediate values possible (age)Right-skewed: Duck beak points right, mean is on the right (mean is ¿ median)Five number summary: minimum, Q1, Q2 median, Q3, maximum (STAT, 1, enter data, STAT>CALC, 1, 1-Var Stats L1)Response: answer, measures outcome of a study (dependent, y) ex: % alcohol in blood Explanatory: question, explains or influences response var. (independent, x) ex: number of beers drankType Definition When to useMeasures of Centermean average of all values symmetrical (mean=median)median middle value, resistant to outliers skewed/outliersmode most frequently occurring categorical variablesMeasures of Spread(accuracy)std dev symmetricalIQRmiddle 50%Q3-Q1= IQR , outliers outside1.5 × IQR below Q1/above Q3 skewed/outliersGraphical Toolshistogram 1 numerical variablescatterplot 2 numerical variablespie chart/bar chart 1 categorical variablestacked bar chart/contingency table 2 categorical variablesseparate boxplots for each 1 categorical and 1 numerical explanatory (cat) response (num)Topic 4: Probability DistributionsProbability distribution of a random variable, X, tells us what values X can take and how to assign probabilities.1. Discrete R.V. = discrete probability distribution- gives the probability of every single outcome2. Continuous R.V. = continuous prob dist- gives the probability of the R.V. taking values in an interval(probability = area under a density curve)Normal Distribution: X N (μ , σ2) N(0,16)→ μ=0, σ2variance=16, σ=4Standardizing and z-score: z=x−μσAppropriate binomial distributions: - Each trial is independent- Number of trials is fixed- Only two possible outcomesSampling Distribution: list of sample means from many samples of the same size, n. Cannot just be one sample.x=one samplemean X =average of all other x ' sCentral Limit Theorem:1. Unbiased- population mean, μ, equals sample mean, xNormal Q-Q quantile Plot: no curvature allowed2. Standard Error, σx=σ√n3. If n is large enough, n≥30, the sampling distribution will follow the normalCALCULATOR5 Number Summary: STAT, EDIT, STAT, CALC, 1-Var Stats L1Normal Distribution, n≥30: PRGM, NORMAL, mean, std dev, shading left/right/between or area given, valuesFind Middle % of Samples: STAT, TESTS, Z-Interval, Stats, σ, x, n, C-level %, calculates range (%,%)Binomial (n,p,x): PRGM, BERNOULI, choose, n, p, xApproximate Distribution: N ( p ,√p(1− p)n2)Proportions (n and p, categorical sample size): PRGM, PROP, p, n, Mean and StdDev givenmean= p std .dev =√p(1−p)nn× p≥ 10∧n(1− p)≥102nd Vars 2P-value **+ normal(#,10)**- normal(-10,#)2-SampleTTest when independentT-Test when Matched PairsPROGRAMSSSIZE sample size, nPROP Normalgives probability (use (1-p) if problem says NOT)CH8 [p hat= x/n]1. Confidenceinterval2. Hypothesis Testgives z and p valueDIST>InvT>(%,df)T-valueBINOM>n>p (can be alpha value/significance value)>sum of manyProbabilityOut/In # LinesAll out p< 0.01All in p> 0.10Note: When p-values given in data table, if a 2 sided test you divide the p-value by 2.Experiments: Treatments given- Blocked: grouped by traits- Matched pairs: same person is used in each of the groups, twins- Completely randomized: small group of the sampleObservations- Prospective- Retrospective: looking at the past- Cross-sectional: specific time and placeStatistics is a set of methods for drawing inferences about parameters of population based on statistics computed from
View Full Document