CMU CS 15780 - 17-probability-annotated - D2590507

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15780> 17-probability-annotated

CMU CS 15780 - 17-probability-annotated

Course Cs 15780- Graduate Artificial Intelligence

Pages 52

Download Save

Unformatted text preview:

15-780: Grad AILecture 17: ProbabilityGeoff Gordon (this lecture)Tuomas Sandholm TAs Erik Zawadzki, Abe OthmanReview: probabilityRVs, events, sample space !Measures, distributions‣disjoint union property (law of total probability or “sum rule”)Sample v. populationLaw of large numbersMarginals, conditionalsSuggested readingBishop, Pattern Recognition and Machine Learning, p1–4, sec 1–1.2, sec 2–2.3TerminologyExperiment = Prior = Posterior =Example: model selectionYou’re gambling to decide who has to clean the labYou are accused of using weighted dice!Two models:‣fair dice: all 36 rolls equally likely‣weighted: rolls summing to 7 more likelyprior:observation: posterior:X and Y are independent if, for all possible values of y, P(X) = P(X | Y=y)‣equivalently, for all possible values of x, P(Y) = P(Y | X=x)‣equivalently, P(X, Y) = P(X) P(Y)Knowing X or Y gives us no information about the otherIndependenceWeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.140.30.70.3 0.5 0.2Independence: probability = product of marginalsExpectationsHow much should we expect to earn from our AAPL stock?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14Linearity of expectationExpectation is a linear function of numbers in bottom tableE.g., suppose we own k sharesWeatherupsamedownsunrain+k0-k+k0-kWeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14Conditional expectationWhat if we know it’s sunny?Weatherupsamedownsunrain+10-1+10-1WeatherAAPL priceupsamedownsunrain0.090.150.060.210.350.14Independence and expectationIf X and Y are independent, E(XY) = E(X)E(Y)Proof:Sample meansSample mean = Expectation of sample mean:¯X =1N�iXiEstimatorsCommon task: given a sample, infer something about the populationAn estimator is a function of a sample that we use to tell us something about the populationE.g., sample mean is a good estimator of population meanE.g., linear regressionLaw of large numbers (more general form)For r.v. X: if we take a sample of size N from a distribution P(x) with mean μ and compute sample mean XThen X → μ as N → "––BiasGiven estimator T of population quantity θThe bias of T is E(T) – θSample mean is unbiased estimator of population mean(1 + # xi) / (N+1) is biased, but asymptotically unbiasedVarianceTwo estimators of population mean: sample mean, mean of every 2nd sampleBoth unbiased, but one is more variableMeasure of variability: varianceVarianceIf zero-mean: variance = E(X2)‣Ex: constant 0 v. coin-flip ±1In general: E([X – E(X)]2)‣equivalently, E(X2) – E(X)2 (but note numerical problem)ExerciseWhat is the variance of 3X?Sample varianceSample variance =Expectation: Sample size correction:N − 1N�i(xi− ¯x)Bias-variance decompositionEstimator T of population quantity θMean squared error = E((T – θ)2) =Bias-variance tradeoffIt’s nice to have estimators w/ small MSEThere is a smallest possible MSE for a given amount of data‣limited data provides limited informationEstimator which achieves min is efficient (close for large N: asymptotically eff.)Often can adjust estimator so MSE is due to bias or variance—the famed tradeoffCovarianceSuppose we want an approximate numeric measure of (in)dependenceLet E(X) = E(Y) = 0 for simplicityConsider the random variable XY‣if X, Y are typically both +ve or both -ve‣if X, Y are independentCovariancecov(X, Y) = E([X–E(X)][Y–E(Y)])Is this a good measure of dependence?‣Suppose we scale X by 10‣cov(10X, Y) = E([10X–E(10X)][Y–E(Y)])‣cov(10X, Y) = 10 cov(X, Y)CorrelationLike covariance, but controls for variance of individual r.v.scor(X, Y) = cov(X,Y)/$var(X)var(Y)cor(10X, Y) =Correlation & independenceEqual probability on each pointAre X and Y independent?Are X and Y uncorrelated?XY!!"!!#!!!$"$!#Correlation & independenceDo you think that all independent pairs of RVs are uncorrelated?Do you think that all uncorrelated pairs of RVs are independent?Correlation & independenceEqual probability on each pointAre X and Y independent?Are X and Y uncorrelated?!!"!!#!!!$"$!#XYLaw of iterated expectationsFor any two RVs, X and Y, we have:‣EY(EX[X | Y]) = E(X)Convention: note in subscript the RVs that are not yet conditioned on (in this E(.)) or marginalized away (inside this E(.))Law of iterated expectationsEX[X | Y] =EY(EX[X | Y]) =For any X, Y, C‣P(X | Y, C) P(Y | C) = P(Y | X, C) P(X | C)Simple version (without context)‣P(X | Y) P(Y) = P(Y | X) P(X)‣more commonly, P(X | Y) = P(Y | X) P(X) / P(Y)Can be taken as definition of conditioningBayes RuleRev. Thomas Bayes1702–1761ExerciseYou are tested for a rare disease, emacsitis—prevalence 3 in 100,000Your receive a test that is 99% sensitive and 99% specific‣sensitivity = P(yes | emacsitis) = 0.99‣specificity = P(no | ¬emacsitis) = 0.99The test comes out positiveDo you have emacsitis?Revisit: weighted diceFair dice: all 36 rolls equally likelyWeighted: rolls summing to 7 more likelyData: 1-6 2-5Learning from dataGiven a model classAnd some data, sampled from a model in this classDecide which model best explains the sampleBayesian model learningP(model | data) = P(data | model) P(model) / ZZ = P(data)So, for each model,‣compute P(data | model) P(model)‣normalizeE.g., which parameters for face recognizer are best?E.g., what is P(H) for a biased coin?Prior: uniformall Hall T0 0.20.40.6 0.8100.050.10.150.20.25Posterior: after 5H, 8T0 0.20.40.6 0.8100.050.10.150.20.25all Hall TPosterior:11H, 20T0 0.20.40.6 0.8100.050.10.150.20.25all Hall TProbability & AIWhy probability?Point of working with probability is to make decisionsE.g., find an open-loop plan or closed-loop policy with highest success probability or lowest expected costLater: MDP, POMDP, …Now: simple motivating example‣demonstrates that underlying problems are still familiar (related to SAT, PBI, MILP, #SAT)Probabilistic STRIPS planningSame as ordinary STRIPS except each effect happens w/ (known, independent) probabilityEat‣pre: have(Cake)‣post: ¬have(Cake), 0.9 eaten(Cake)Bake‣pre: ¬have(Cake)‣post: 0.8 have(Cake)Actions have no effect if ¬precondsSeek an (open-loop) plan with highest success probabilityTranslating to SAT-like problemRecall deterministic STRIPS → SAT:‣actAt+1 ! preA1t " preA2t " …‣actAt+1 ! postA1t+2 " postA2t+2 " …‣postt+2 ! actAt+1 # actBt+1 # …‣goal1T ∧ goal2T ∧ …‣init11 ∧ init21 ∧ …‣lots o’ mutexesWe need to modify 1–3 above, and

View Full Document


School:
Email:
New Password:
Confirm Password:

CMU CS 15780 - 17-probability-annotated

Sign up for free to view:

Please select your school