CMU CS 15381 - Representing Uncertainty + Probabilistic Learning

Unformatted text preview:

1Representing Uncertainty + Probabilistic LearningR&N Chapter 13A bit of 20.2Uncertainty• Most real-world problems deal with uncertain information– Diagnosis: Likely disease given observed symptoms– Equipment repair: Likely component failure given sensor reading– Help desk: Likely operation based on past operations– Cannot be represented by deterministic rulesHeadache => Fever2Uncertainty• Correct framework for representing uncertainty: Probability• Outline:– Review of basic probability tools (much of it well-known, but still important to review)– Bayes rule and its use in uncertain reasoning and probabilistic learning Probability• P(A) = Probability of event A = percentage of all possible worlds in which A is true.1)P(0≤≤AAll the possible worldsWorlds in which A is trueP(A) =3Probability)andP()P()P()orP(0)P(1)P(1)P(0BABABAFalseTrueA−+===≤≤Worlds in which A is trueWorlds in which B is trueProbability)andP()P()P()orP(0)P(1)P(1)P(0BABABAFalseTrueA−+===≤≤• Other ideas:– Fuzzy logic– Non-monotonic logic– Multi-valued logic– Evidence theory (Dempster-Shafer)• Probability is the only system that is “consistent”4Probability• Immediately derived properties)(1)( APAP−=¬),(),()( BAPBAPAP¬+=Denotes not-A = All the worlds in which A does not occurShort hand for “A and B”Probability• A random variable is a variable X that can take values x1,..,xn with a probability P(X= xi)attached to each i = 1,..,nWorlds in which X=x1Worlds in which X=x2Worlds in which X=x4Worlds in which X=x3( )1P1==∑=niixX5Conditional Probability• P(A|B) = Fraction of those worlds in which B is true for which A is also true.BAConditional Probability Example• H = Headache P(H) = 1/2• F = Flu P(F) = 1/8P(H|F) = 1/2HF6Conditional Probability Example• H = Headache P(H) = 1/2• F = Flu P(F) = 1/8P(H|F) = (Area of “H and F” region)(Area of F region)P(H|F) = P(H,F)/P(F)P(H|F) = 1/2HFConditional Probability• Definition:• Chain rule:)P(),P()|P(BBABA =)P()|P(),P( BBABA=7Conditional Probability• Other useful relations:1)|()|(=¬+BAPBAP1)|( ==∑BxXPiiProbabilistic Inference• Suppose H is true• Suppose that you know P(H|F) = ½ = 0.5• What is the probability that F is true? 0.5?HF• P(H) = 1/2• P(F) = 1/8• P(H|F) = 0.58Probabilistic Inference• Correct reasoning:• We know P(H), P(F), P(H|F) and the two chain rules:• Substituting the values:)P()|P(),P( FFHFH=)P(),P()|P(HFHHF =8/12/116/1)|P( ==HF16/18/15.0),P(=×=FHProbabilistic Inference• Correct reasoning:• We know P(H), P(F), P(H|F) and the two chain rules:• Substituting the values:)P()|P(),P( FFHFH=)P(),P()|P(HFHHF =8/12/116/1)|P( ==HF16/18/15.0),P(=×=FHThe key difference is that we took into account the fact that catching the flu is unlikely (P(F) is small)9Bayes Rule)P()P()|P()|P(ABBAAB =Introduced circa 1763Probabilistic Inference)P()P()|P()|P(ABBAAB =We want: Posteriorprobability that B occurs given that A occursWe know: Likelihood that Aoccurs given that B occursWe know: Prior probability that Boccurs in the absence of any other information10Bayes Rule• What if we do not know P(A)???• Use the relation:• More general Bayes rule:)P()|P()P()|P()P( BBABBAA¬¬+=)P()|P()P()|P()P()|P()|P(BBABBABBAAB¬¬+=Bayes Rule• Same rule for a non-binary random variable, except we need to sum over all the possible events X = xi∑======kkkiiixXxXAxXxXAAxX)P()|P()P()|P()|P()P()P()|P()|P(AxXxXAAxXiii====This is actually just P(A)11Joint Distribution• Joint Distribution Table:• Given a set of variables A,B,C,….• Generate a table with all the possible combinations of assignments to the variables in the rows• For each row, list the corresponding joint probability• For M binary variables  size 2M0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBAUsing the Joint Distribution: Computing Other Probabilities0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBACompute the probability of event E:∑=EPEP containing rows all)row()(=),( BAP0.25 + 0.10 = 0.3512Using the Joint Distribution: Doing Inference0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBAGiven that event E1occurs, what is the probability that E2occurs:)(),()|(11212EPEEPEEP =Using the Joint Distribution: Doing Inference0.101110.250110.101010.050010.051100.100100.051000.30000ProbCBA)(),,()|,(CPCBAPCBAP =0.100.05+0.05+0.10+0.100.100.30==13Inference• General view: I have some evidence (Headache) how likely is a particular conclusion (Fever)• Important in many industries: Medical, pharmaceutical, Help Desk, Fault Diagniosis…. Learning the Joint Distribution• Three possible ways of generating the joint distribution:1. Human experts (very difficult!)2. Using known conditionally probabilities (e.g., if we know P(C|A,B), P(B|A), and P(A), we know P(A,B,C) = P(C|A,B)P(B|A)P(A) This is the basis for Bayes Nets, to be covered later….)3. Learning from data14Learning the Joint Distribution?111?011?101?001?110?010?100?000ProbCBASuppose that we have recorded a lot of training data:(0,1,1)(1,0,1)(1,1,0)(0,0,0)(1,1,0)………..# of data entries with A=1, B=1, C=0Total number of data entriesThe entry for P(A,B,¬C) in the table is:Learning the Joint Distribution?111?011?101?001?110?010?100?000ProbCBASuppose that we have recorded a lot of training data:More generally, the entry for P(E) in the table is:# of data entries with ETotal number of data entries(0,1,1)(1,0,1)(1,1,0)(0,0,0)(1,1,0)………..15Real-Life Joint Distribution• UCI Census DatabaseReal-Life Joint Distribution• UCI Census DatabaseP(Male|Poor) = 0.4654/0.7604 = 0.61216So Far….• Basic probability concepts• Bayes rule• What are joint distributions• Inference using joint distributions• Learning joint distributions from data• Problem: If we have M variables, we need 2Mentries in the joint distribution table  An independence assumption leads to an efficient way to learn and to do inferenceIndependence• A and B are independent iff:• In words: Knowing B does not affect how likely we think that A is true)P()|P( ABA=17Key Properties• Symmetry:• Joint distribution:• Independence of complements:)P()|P()P()|P( BABABA=⇔=)P()P(),P( BABA=)P()|P()P()|P( ABAABA=¬¬=¬Naïve Bayes• Suppose that A, B, C are independent• Then any value of the joint distribution can be computed easily:• In fact, we need only M numbers instead


View Full Document

CMU CS 15381 - Representing Uncertainty + Probabilistic Learning

Documents in this Course
Planning

Planning

19 pages

Planning

Planning

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

27 pages

Lecture

Lecture

19 pages

FOL

FOL

41 pages

lecture

lecture

34 pages

Exam

Exam

7 pages

Lecture

Lecture

22 pages

Handout

Handout

11 pages

Midterm

Midterm

14 pages

lecture

lecture

83 pages

Handouts

Handouts

38 pages

mdp

mdp

37 pages

HW2

HW2

7 pages

nn

nn

25 pages

lecture

lecture

13 pages

Handout

Handout

5 pages

Lecture

Lecture

27 pages

Lecture

Lecture

62 pages

Lecture

Lecture

5 pages

Load more
Download Representing Uncertainty + Probabilistic Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Representing Uncertainty + Probabilistic Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Representing Uncertainty + Probabilistic Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?