Machine Learning CS6375 Spring 2015 Bayesian Learning I a 1 Uncertainty Most real world problems deal with uncertain information Diagnosis Likely disease given observed symptoms Equipment repair Likely component failure given sensor reading Cannot be represented by deterministic rules Headache Fever Correct framework for representing uncertainty Probability 2 1 Probability P A Probability of event A fraction of all possible worlds in which A is true 3 Probability 4 2 Probability Immediately derived properties More generally IF we know that exactly one of B1 B2 Bn are true i e P B1 or B2 or Bn 1 and for all i j unequal P Bi and Bj 0 THEN we know P A P A B1 P A B2 P A Bn 5 Probability A random variable is a variable X that can take values x1 xn with a probability P X xi attached to each i 1 n 6 3 Example My mood can take one of two values Happy Sad The weather can take one of three values Rainy Sunny Cloudy Given P Mood Happy Weather Rainy 0 2 P Mood Happy Weather Sunny 0 1 P Mood Happy Weather Cloudy 0 4 Can I compute P Mood Happy Can I compute P Mood Sad Can I compute P Weather Rainy 7 Conditional Probability P A B Fraction of those worlds in which B is true for which A is also true 8 4 Conditional Probability Example H Headache P H 1 2 F Flu P F 1 8 P H F Area of H and F region Area of F region P H F P H F P F P H F 1 2 9 Conditional Probability Definition Chain rule Can you prove that P A B P A for any events A and B What can you say about P A B comparing to P A 10 5 Conditional Probability Other useful relations 11 Probabilistic Inference What is the probability that F is true given H is true Given P H 1 2 P F 1 8 P H F 0 5 12 6 Probabilistic Inference Correct reasoning We know P H P F P H F and the two chain rules Substituting the values 13 Bayes Rule 14 7 Bayes Rule 15 Bayes Rule What if we do not know P A Use the relation More general Bayes rule 16 8 Bayes Rule Same rule for a non binary random variable except we need to sum over all the possible events 17 Generalizing Bayes Rule If we know that exactly one of A1 A2 An are true then P B P B A1 P A1 P B A2 P A2 P B An P An and in general P B X P B A1 X P A1 X P B An X P An X So P Ak B X P Ak X P B Ak X P Ai X P B Ai X i 18 9 Medical Diagnosis A doctor knows that meningitis causes a stiff neck 50 of the time The doctor knows that if a person is randomly selected from the US population there s a 1 50 000 chance the person will have meningitis The doctor knows that if a person is randomly selected from the US population there s a 5 chance the person will have a stiff neck You walk into the doctor complaining of the symptom of a stiff neck What s the probability that the underlying cause is meningitis 19 Joint Distribution Joint Distribution Table Given a set of variables A B C Generate a table with all the possible combinations of assignments to the variables in the rows For each row list the corresponding joint probability For M binary variables size 2M 20 10 Using the Joint Distribution Compute the probability of event E 21 Inference Using the Joint Distribution Given that event E1 occurs what is the probability that E2 occurs 22 11 Inference Using the Joint Distribution 23 Inference General view I have some evidence Headache how likely is a particular conclusion Fever 24 12 Generating the Joint Distribution Three possible ways of generating the joint distribution 1 Human experts 2 Using known conditional probabilities e g if we know P C A B P B A and P A we know P A B C P C A B P B A P A 3 Learning from data 25 Learning the Joint Distribution Suppose that we have recorded a lot of training data The entry for P A B C in the table is 26 13 Learning the Joint Distribution Suppose that we have recorded a lot of training data More generally the entry for P E in the table is 27 Real Life Joint Distribution UCI Census Database P Male Poor 0 4654 0 7604 0 612 28 14 So Far Basic probability concepts Bayes rule What are joint distributions Inference using joint distributions Learning joint distributions from data Problem If we have M variables we need 2M entries in the joint distribution table An independence assumption leads to an efficient way to learn and to do inference Problem estimate probabilities 29 Independence A and B are independent iff In words Knowing B does not affect how likely we think that A is true 30 15 Key Properties Symmetry Joint distribution Independence of complements 31 Independence Suppose that A B C are independent Then any value of the joint distribution can be computed easily In fact we need only M numbers instead of 2M for binary variables 32 16 Independence General Case If X1 XM are independent variables Under the independence assumption we can compute any value of the joint distribution We can answer any inference query How do we learn the distributions Similar to earlier slides on joint distributions 33 Learning with the Independence Assumption Learning the distributions from data is simple and efficient In practice the independence assumption may not be met but it is often a very useful approximation 34 17 So Far Basic probability concepts Bayes rule What are joint distributions Inference using joint distributions Learning joint distributions from data Independence assumption Problem We now have the joint distribution How can we use it to make decision Bayes Classifier 35 Note about Probability Estimation So far we have been using relative frequencies to approximate probability of an event fu C u N We will discuss more probability estimation later 36 18 Three Prisoner Problem Three prisoners A B and C are locked in their cells One of them will be executed the next day and others will be released Only the governor knows which one will be executed Prisoner A asked the governor to tell him which one of B and C will be released and got the answer of B Now what is the chance that A thinks he will be executed 37 Monty Hall Problem Suppose you re on a game show and you re given the choice of three doors Behind one door is a car behind the others goats You pick a door say No 1 and the host who knows what s behind the doors opens another door say No 3 which has …
View Full Document