Review: probability•RVs, events, sample space !•Measures, distributions•disjoint union property (law of total probability—book calls this “sum rule”)•Sample v. population•Law of large numbers•Marginals, conditionals1Monty Hall2Terminology•Experiment = •Prior = •Posterior = 3Example: model selection•You’re gambling to decide who has to clean the lab•You are accused of using weighted dice!•Two models:•fair dice: all 36 rolls equally likely•weighted: rolls summing to 7 more likelyprior:observation: posterior:4Philosophy•Frequentist v. Bayesian•Frequentist view: a probability is a property of the world (the coin has P(H) = 0.62)•Bayesian view: a probability is a representation of our internal beliefs about the world (we think P(H) = 0.62)5Difference•Bayesian is willing to assign P(E) to any E, even one which has happened already (although it will be 1 or 0 if E or ¬E has been observed)•Frequentist will assign probabilities only to outcomes of future experiments•Consider the question: what is the probability that coin #273 is fair?6Which is right?•Both!•Bayesians can ask more questions•But for a question that makes sense to both, answer will agree•Can often rephrase a Bayesian question in frequentist terms•answer may differ•either may see other’s answer as a reasonable approximation7•X and Y are independent if, for all possible values of y, P(X) = P(X | Y=y)•equivalently, for all possible values of x, P(Y) = P(Y | X=x)•equivalently, P(X, Y) = P(X) P(Y)•Knowing X or Y gives us no information about the otherIndependence8WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.140.30.70.3 0.5 0.2Independence: probability = product of marginals9Admin•Slides and annotated slideshttp://www.cs.cmu.edu/~ggordon/10601/schedule.html•Mailing list:10601-09f-announce@cs•Recitation10Readings•So far: p1–4, sec 1–1.2, sec 2–2.3•We’ll put them next to relevant lectures on schedule page•They provide extra detail beyond what’s in lecture—you are responsible for knowing it•No specific due date11Expectations•How much should we expect to earn from our AAPL stock?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1412Linearity of expectation•Expectation is a linear function of numbers in bottom table•E.g., change -1s to 0s or to -2sWeatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1413Conditional expectation•What if we know it’s sunny?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1414Independence and expectation•If X and Y are independent, then:•Proof:15Sample means•Sample mean = •Expectation of sample mean:16Estimators•Common task: given a sample, infer something about the population•An estimator is a function of a sample that we use to tell us something about the population•E.g., sample mean is a good estimator of population mean•E.g., linear regression17Law of large numbers (more general form)•If we take a sample of size N from a distribution P with mean ! and compute sample mean x•Then x " ! as N " "––18Bias•Given an estimator T of a population quantity #•The bias of T is•Sample mean is estimator of population mean•(1 + # xi) / (N+1) is19Variance•Two estimators of population mean: sample mean, mean of every 2nd sample•Both unbiased, but one is more variable•Measure of variability: variance20Variance•If zero-mean: variance = E(X2)•Ex: constant 0 v. coin-flip ±1•In general: E((X – E(X))2)21Exercise: simplify the expression for variance•E((X – E(X))2)22Exercise•What is the variance of 3X?23Sample variance•Sample variance =•Expectation: •Sample size correction:24Bias-variance decomposition•Estimator T of population quantity #•Mean squared error = E((T – #)2) =25Bias-variance tradeoff•It’s nice to have estimators w/ small MSE•Typically there is a smallest possible MSE for a given amount of data•limited data provides limited information•Estimator which achieves min is efficient (close for large N: asymptotically eff.)•Often can adjust estimator so MSE is due to bias or variance—the famed
View Full Document