Review: probability•Monty Hall, weighted dice•Frequentist v. Bayesian•Independence•Expectations, conditional expectations•Exp. & independence; linearity of exp.•Estimator (RV computed from sample)•law of large #s, bias, variance, tradeoff1Covariance•Suppose we want an approximate numeric measure of (in)dependence•Let E(X) = E(Y) = 0 for simplicity•Consider the random variable XY•if X, Y are typically both +ve or both -ve•if X, Y are independent2Covariance•cov(X, Y) = •Is this a good measure of dependence?•Suppose we scale X by 10:3Correlation•Like covariance, but controls for variance of individual r.v.s•cor(X, Y) =•cor(10X, Y) = 4Correlation & independence•Equal probability on each point•Are X and Y independent?•Are X and Y uncorrelated?XY!!"!!#!!!$"$!#5Correlation & independence•Do you think that all independent pairs of RVs are uncorrelated?•Do you think that all uncorrelated pairs of RVs are independent?6Proofs and (counter)examples•For a question A B•e.g., X, Y uncorrelated X, Y independent•if true, usually need to provide a proof•if false, usually only need to provide a counterexample??7Counterexamples•Counterexample = example satisfying A but not B•E.g., RVs X and Y that are not independent, but are correlatedA BX, Y uncorrelated X, Y independent??8Correlation & independence•Equal probability on each point•Are X and Y independent?•Are X and Y uncorrelated?!!"!!#!!!$"$!#XY9Law of iterated expectations•For any two RVs, X and Y, we have:•Convention: note in subscript the RVs that are not yet conditioned on (in this E(.)) or marginalized away (inside this E(.))10Law of iterated expectations•EX(X | Y) =•EY(EX(X | Y)) =11•For any X, Y, C•P(X | Y, C) P(Y | C) = P(Y | X, C) P(X | C)•Simple version (without context)•P(X | Y) P(Y) = P(Y | X) P(X)•Can be taken as definition of conditioningBayes RuleRev. Thomas Bayes1702–176112Exercise•You are tested for a rare disease, emacsitis—prevalence 3 in 100,000•Your receive a test that is 99% sensitive and 99% specific•sensitivity = P(yes | emacsitis)•specificity = P(no | ~emacsitis)•The test comes out positive•Do you have emacsitis?13Revisit: weighted dice•Fair dice: all 36 rolls equally likely•Weighted: rolls summing to 7 more likely•Data: 1-6 2-514Learning from data•Given a model class•And some data, sampled from a model in this class•Decide which model best explains the sample15Bayesian model learning•P(model | data) =•Z = •So, for each model, compute:•Then:16Prior: uniformall Hall T0 0.20.40.6 0.8100.050.10.150.20.2517Posterior: after 5H, 8T0 0.20.40.6 0.8100.050.10.150.20.25all Hall T18Posterior:11H, 20T0 0.20.40.6 0.8100.050.10.150.20.25all Hall T19Graphical models20Why do we need graphical models?•So far, only way we’ve seen to write down a distribution is as a big table•Gets unwieldy fast!•E.g., 10 RVs, each w/ 10 settings•Table size = •Graphical model: way to write distribution compactly using diagrams & numbers21Example ML problem•US gov’t inspects food packing plants•27 tests of contamination of surfaces•12-point ISO 9000 compliance checklist•are there food-borne illness incidents in 30 days after inspection? (15 types)•Q:•A:22Big graphical models•Later in course, we’ll use graphical models to express various ML algorithms•e.g., the one from the last slide•These graphical models will be big!•Please bear with some smaller examples for now so we can fit them on the slides and do the math in our heads…23Bayes nets•Best-known type of graphical model•Two parts: DAG and CPTs24Rusty robot: the DAG25Rusty robot: the CPTs•For each RV (say X), there is one CPT specifying P(X | pa(X))26Interpreting it27Benefits•11 v. 31 numbers•Fewer parameters to learn•Efficient inference = computation of marginals, conditionals posteriors28Inference example•P(M, Ra, O, W, Ru) = P(M) P(Ra) P(O) P(W|Ra,O) P(Ru|M,W)•Find marginal of M,
View Full Document