1 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2011 Today: • Probability • Bayes Rule • Estimating parameters • maximum likelihood • max a posteriori Readings: Probability review • Bishop Ch. 1 thru 1.2.3 • Bishop, Ch. 2 thru 2.2 • Andrew Moore’s online tutorial many of these slides are derived from William Cohen, Andrew Moore, Aarti Singh, Eric Xing, Carlos Guestrin. - Thanks! Probability Overview • Events – discrete random variables, continuous random variables, compound events • Axioms of probability – What defines a reasonable theory of uncertainty • Independent events • Conditional probabilities • Bayes rule and beliefs • Joint probability distribution • Expectations • Independence, Conditional independence2 Random Variables • Informally, A is a random variable if – A denotes something about which we are uncertain – perhaps the outcome of a randomized experiment • Examples A = True if a randomly drawn person from our class is female A = The hometown of a randomly drawn person from our class A = True if two randomly drawn persons from our class have same birthday • Define P(A) as “the fraction of possible worlds in which A is true” or “the fraction of times A holds, in repeated runs of the random experiment” – the set of possible worlds is called the sample space, S – A random variable A is a function defined over S A: S à {0,1} A little formalism More formally, we have • a sample space S (e.g., set of students in our class) – aka the set of possible worlds • a random variable is a function defined over the sample space – Gender: S à { m, f } – Height: S à Reals • an event is a subset of S – e.g., the subset of S for which Gender=f – e.g., the subset of S for which (Gender=m) AND (eyeColor=blue) • we’re often interested in probabilities of specific events • and of specific events conditioned on other specific events3 Visualizing A Sample space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval The Axioms of Probability • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) [di Finetti 1931]: when gambling based on “uncertainty formalism A” you can be exploited by an opponent iff your uncertainty formalism A violates these axioms4 Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A canʼt get any smaller than 0 And a zero area would mean no world could ever have A true Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A canʼt get any bigger than 1 And an area of 1 would mean all worlds will have A true5 Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) Theorems from the Axioms • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) è P(not A) = P(~A) = 1-P(A)6 Theorems from the Axioms • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) è P(not A) = P(~A) = 1-P(A) P(A or ~A) = 1 P(A and ~A) = 0 P(A or ~A) = P(A) + P(~A) - P(A and ~A) 1 = P(A) + P(~A) + 0 Elementary Probability in Pictures • P(~A) + P(A) = 1 A ~A7 Another useful theorem • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0, P(A or B) = P(A) + P(B) - P(A and B) è P(A) = P(A ^ B) + P(A ^ ~B) A = [A and (B or ~B)] = [(A and B) or (A and ~B)] P(A) = P(A and B) + P(A and ~B) – P((A and B) and (A and ~B)) P(A) = P(A and B) + P(A and ~B) – P(A and B and A and ~B) Elementary Probability in Pictures • P(A) = P(A ^ B) + P(A ^ ~B) B A ^ ~B A ^ B8 Multivalued Discrete Random Variables • Suppose A can take on more than 2 values • A is a random variable with arity k if it can take on exactly one value out of {v1,v2, ... vk} • Thus… jivAvAPji≠==∧= if 0)(! P(A = v1" A = v2"..." A = vk) = 1Elementary Probability in Pictures 1)(1==∑=kjjvAPA=1 A=2 A=3 A=4 A=59 Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B) Conditional Probability in Pictures A=1 A=2 A=3 A=4 A=5 picture: P(B|A=2)10 Independent Events • Definition: two events A and B are independent if Pr(A and B)=Pr(A)*Pr(B) • Intuition: knowing A tells us nothing about the value of B (and vice versa) Visualizing Probabilities Sample space of all possible worlds Its area is 1 B A A ^ B11 Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) B A Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B) P(C ^ A ^ B) = P(C|A ^ B) P(A|B) P(B)12 Independent Events • Definition: two events A and B are independent if P(A ^ B)=P(A)*P(B) • Intuition: knowing A tells us nothing about the value of B (and vice versa) Bayes Rule • let’s write 2 expressions for P(A ^ B) B A A ^ B13 P(B|A) * P(A) P(B) P(A|B) = Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 …by no means merely a curious speculation in the doctrine of chances, but necessary to be solved in order to a sure foundation for all our reasonings concerning past facts, and what is likely to be hereafter…. necessary to be considered by any that would give a clear account of the strength of analogical or inductive reasoning… Bayes’ rule we call P(A) the “prior” and P(A|B) the “posterior” Other Forms of Bayes Rule )(~)|~()()|()()|()|(APABPAPABPAPABPBAP+=)()()|()|(XBPXAPXABPXBAP∧∧∧=∧14 Applying Bayes Rule € P(A |B) =P(B | A)P(A)P(B | A)P(A) + P(B |~ A)P(~ A)A = you have the flu, B = you just coughed Assume: P(A) = 0.05 P(B|A) = 0.80 P(B| ~A) = 0.2 what is P(flu | cough) = P(A|B)? what does all this have to do with function approximation?15 The Joint Distribution
View Full Document