Review: probability•RVs, events, sample space Ω•Measures, distributions•disjoint union property (law of total probability—book calls this “sum rule”)•Sample v. population•Law of large numbers•Marginals, conditionals1Monty Hall2Terminology•Experiment = •Prior = •Posterior = 3Example: model selection•You’re gambling to decide who has to clean the lab•You are accused of using weighted dice!•Two models:•fair dice: all 36 rolls equally likely•weighted: rolls summing to 7 more likelyprior:observation: posterior:4Philosophy•Frequentist v. Bayesian•Frequentist view: a probability is a property of the world (the coin has P(H) = 0.62)•Bayesian view: a probability is a representation of our internal beliefs about the world (we think P(H) = 0.62)5Difference•Bayesian is willing to assign P(E) to any E, even one which has happened already (although it will be 1 or 0 if E or ¬E has been observed)•Frequentist will assign probabilities only to outcomes of future experiments•Consider the question: what is the probability that coin #273 is fair?6Which is right?•Both!•Bayesians can ask more questions•But for a question that makes sense to both, answer will agree•Can often rephrase a Bayesian question in frequentist terms•answer may differ•either may see other’s answer as a reasonable approximation7•X and Y are independent if, for all possible values of y, P(X) = P(X | Y=y)•equivalently, for all possible values of x, P(Y) = P(Y | X=x)•equivalently, P(X, Y) = P(X) P(Y)•Knowing X or Y gives us no information about the otherIndependence8WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.140.30.70.3 0.5 0.2Independence: probability = product of marginals9Readings•So far: p1–4, sec 1–1.2, sec 2–2.3•We’ll put them next to relevant lectures on schedule page•They provide extra detail beyond what’s in lecture—you are responsible for knowing it•No specific due date10Expectations•How much should we expect to earn from our AAPL stock?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1411Linearity of expectation•Expectation is a linear function of numbers in bottom table•E.g., change -1s to 0s or to -2sWeatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1412Conditional expectation•What if we know it’s sunny?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.1413Independence and expectation•If X and Y are independent, then:•Proof:14Sample means•Sample mean = •Expectation of sample mean:15Estimators•Common task: given a sample, infer something about the population•An estimator is a function of a sample that we use to tell us something about the population•E.g., sample mean is a good estimator of population mean•E.g., linear regression16Law of large numbers (more general form)•If we take a sample of size N from a distribution P with mean μ and compute sample mean x•Then x → μ as N → ∞––17Bias•Given an estimator T of a population quantity θ•The bias of T is•Sample mean is estimator of population mean•(1 + ∑ xi) / (N+1) is18Variance•Two estimators of population mean: sample mean, mean of every 2nd sample•Both unbiased, but one is much more variable•Measure of variability: variance19Variance•If zero-mean: variance = E(X2)•Ex: constant 0 v. coin-flip ±1•In general: E((X – E(X))2)20Exercise: simplify the expression for variance•E((X – E(X))2)21Exercise•What is the variance of 3X?22Sample variance•Sample variance =•Expectation: •Sample size correction:23Bias-variance decomposition•Estimator T of population quantity θ•Mean squared error = E((T – θ)2) =24CLT•Central limit theorem: for a sample of size N, population mean μ, population variance σ2, the sample average has•mean•variance25CLT proof•Assume mu = 0 for simplicity26Covariance•Suppose we want an approximate numeric measure of (in)dependence•Consider the random variable XY•if X, Y are typically both +ve or both -ve•if X, Y are independent27Covariance•cov(X, Y) = •Is this a good measure of dependence?•Suppose we scale X by 10:28Correlation•Like covariance, but control for variance of individual r.v.s•cor(X, Y) =•cor(10X, Y) = 29Correlation v. independence•Equal probability on each point•Are X and Y independent?•Are X and Y uncorrelated?XY!!"!!#!!!$"$!#30Correlation v. independence•Equal probability on each point•Are X and Y independent?•Are X and Y uncorrelated?!!"!!#!!!$"$!#XY31•For any X, Y, C•P(X | Y, C) P(Y | C) = P(Y | X, C) P(X | C)•Simple version (without context)•P(X | Y) P(Y) = P(Y | X) P(X)•Can be taken as definition of conditioningBayes RuleRev. Thomas Bayes1702–176132Bayes rule: usual form•Take symmetric form •P(X | Y) P(Y) = P(Y | X) P(X)•Divide by P(Y)33Revisit: weighted dice•Fair dice: all 36 rolls equally likely•Weighted: rolls summing to 7 more likely•Data: 1-6 2-534Exercise•You are tested for a rare disease, emacsitis—prevalence 3 in 100,000•Your receive a test that is 99% sensitive and 99% specific•sensitivity = P(yes | emacsitis)•specificity = P(no | ~emacsitis)•The test comes out positive•Do you have
View Full Document