1Representing Uncertainty + Probabilistic LearningR&N Chapter 13A bit of 20.2Uncertainty• Most real-world problems deal with uncertain information– Diagnosis: Likely disease given observed symptoms– Equipment repair: Likely component failure given sensor reading– Help desk: Likely operation based on past operations– Cannot be represented by deterministic rulesHeadache => Fever2Uncertainty• Correct framework for representing uncertainty: Probability• Outline:– Review of basic probability tools (much of it well-known, but still important to review)– Bayes rule and its use in uncertain reasoning and probabilistic learning Probability• P(A) = Probability of event A = percentage of all possible worlds in which A is true.1)P(0≤≤AAll the possible worldsWorlds in which A is trueP(A) =3Probability)andP()P()P()orP(0)P(1)P(1)P(0BABABAFalseTrueA−+===≤≤Worlds in which A is trueWorlds in which B is trueProbability)andP()P()P()orP(0)P(1)P(1)P(0BABABAFalseTrueA−+===≤≤• Other ideas:– Fuzzy logic– Non-monotonic logic– Multi-valued logic– Evidence theory (Dempster-Shafer)• Probability is the only system that is “consistent”4Probability• Immediately derived properties)(1)( APAP−=¬),(),()( BAPBAPAP¬+=Denotes not-A = All the worlds in which A does not occurShort hand for “A and B”Probability• A random variable is a variable X that can take values x1,..,xn with a probability P(X= xi)attached to each i = 1,..,nWorlds in which X=x1Worlds in which X=x2Worlds in which X=x4Worlds in which X=x3( )1P1==∑=niixX5Conditional Probability• P(A|B) = Fraction of those worlds in which B is true for which A is also true.BAConditional Probability Example• H = Headache P(H) = 1/2• F = Flu P(F) = 1/8P(H|F) = 1/2HF6Conditional Probability Example• H = Headache P(H) = 1/2• F = Flu P(F) = 1/8P(H|F) = (Area of “H and F” region)(Area of F region)P(H|F) = P(H,F)/P(F)P(H|F) = 1/2HFConditional Probability• Definition:• Chain rule:)P(),P()|P(BBABA =)P()|P(),P( BBABA=7Conditional Probability• Other useful relations:1)|()|(=¬+BAPBAP1)|( ==∑BxXPiiProbabilistic Inference• Suppose H is true• Suppose that you know P(H|F) = ½ = 0.5• What is the probability that F is true? 0.5?HF• P(H) = 1/2• P(F) = 1/8• P(H|F) = 0.58Probabilistic Inference• Correct reasoning:• We know P(H), P(F), P(H|F) and the two chain rules:• Substituting the values:)P()|P(),P( FFHFH=)P(),P()|P(HFHHF =8/12/116/1)|P( ==HF16/18/15.0),P(=×=FHProbabilistic Inference• Correct reasoning:• We know P(H), P(F), P(H|F) and the two chain rules:• Substituting the values:)P()|P(),P( FFHFH=)P(),P()|P(HFHHF =8/12/116/1)|P( ==HF16/18/15.0),P(=×=FHThe key difference is that we took into account the fact that catching the flu is unlikely (P(F) is small)9Bayes Rule)P()P()|P()|P(ABBAAB =Introduced circa 1763Probabilistic Inference)P()P()|P()|P(ABBAAB =We want: Posteriorprobability that B occurs given that A occursWe know: Likelihood that Aoccurs given that B occursWe know: Prior probability that Boccurs in the absence of any other information10Bayes Rule• What if we do not know P(A)???• Use the relation:• More general Bayes rule:)P()|P()P()|P()P( BBABBAA¬¬+=)P()|P()P()|P()P()|P()|P(BBABBABBAAB¬¬+=Bayes Rule• Same rule for a non-binary random variable, except we need to sum over all the possible events X = xi∑======kkkiiixXxXAxXxXAAxX)P()|P()P()|P()|P()P()P()|P()|P(AxXxXAAxXiii====This is actually just P(A)11Joint Distribution• Joint Distribution Table:• Given a set of variables A,B,C,….• Generate a table with all the possible combinations of assignments to the variables in the rows• For each row, list the corresponding joint probability• For M binary variables size 2M0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBAUsing the Joint Distribution: Computing Other Probabilities0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBACompute the probability of event E:∑=EPEP containing rows all)row()(=),( BAP0.25 + 0.10 = 0.3512Using the Joint Distribution: Doing Inference0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBAGiven that event E1occurs, what is the probability that E2occurs:)(),()|(11212EPEEPEEP =Using the Joint Distribution: Doing Inference0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBA)(),,()|,(CPCBAPCBAP =0.100.05+0.05+0.10+0.100.100.30==13Inference• General view: I have some evidence (Headache) how likely is a particular conclusion (Fever)• Important in many industries: Medical, pharmaceutical, Help Desk, Fault Diagniosis…. Learning the Joint Distribution• Three possible ways of generating the joint distribution:1. Human experts (very difficult!)2. Using known conditionally probabilities (e.g., if we know P(C|A,B), P(B|A), and P(A), we know P(A,B,C) = P(C|A,B)P(B|A)P(A) This is the basis for Bayes Nets, to be covered later….)3. Learning from data14Learning the Joint Distribution?111?011?101?001?110?010?100?000ProbCBASuppose that we have recorded a lot of training data:(0,1,1)(1,0,1)(1,1,0)(0,0,0)(1,1,0)………..# of data entries with A=1, B=1, C=0Total number of data entriesThe entry for P(A,B,¬C) in the table is:Learning the Joint Distribution?111?011?101?001?110?010?100?000ProbCBASuppose that we have recorded a lot of training data:More generally, the entry for P(E) in the table is:# of data entries with ETotal number of data entries(0,1,1)(1,0,1)(1,1,0)(0,0,0)(1,1,0)………..15Real-Life Joint Distribution• UCI Census DatabaseReal-Life Joint Distribution• UCI Census DatabaseP(Male|Poor) = 0.4654/0.7604 = 0.61216So Far….• Basic probability concepts• Bayes rule• What are joint distributions• Inference using joint distributions• Learning joint distributions from data• Problem: If we have M variables, we need 2Mentries in the joint distribution table An independence assumption leads to an efficient way to learn and to do inferenceIndependence• A and B are independent iff:• In words: Knowing B does not affect how likely we think that A is true)P()|P( ABA=17Key Properties• Symmetry:• Joint distribution:• Independence of complements:)P()|P()P()|P( BABABA=⇔=)P()P(),P( BABA=)P()|P()P()|P( ABAABA=¬¬=¬Naïve Bayes• Suppose that A, B, C are independent• Then any value of the joint distribution can be computed easily:• In fact, we need only M numbers instead
View Full Document