Unformatted text preview:

Bayesian Learning Bayes Theorem MAP ML hypotheses MAP learners Bayes optimal classifier Naive Bayes learner Bayesian belief networks 1 Two Roles for Bayesian Methods Provides practical learning algorithms Naive Bayes learning Bayesian belief network learning Combine prior knowledge prior probabilities with observed data Requires prior probabilities Provides useful conceptual framework Provides gold standard for evaluating other learning algorithms Additional insight into Occam s razor 2 Bayes Theorem P h D P D h P h P D P h prior probability of hypothesis h P D prior probability of training data D P h D probability of h given D P D h probability of D given h 3 Choosing Hypotheses P h D P D h P h P D Generally want the most probable hypothesis given the training data Maximum a posteriori hypothesis hM AP hM AP arg max P h D h H P D h P h h H P D arg max P D h P h arg max h H If assume P hi P hj then can further simplify and choose the Maximum likelihood ML hypothesis hM L arg max P D hi hi H 4 Basic Formulas for Probabilities Product Rule probability P A B of a conjunction of two events A and B P A B P A B P B P B A P A Sum Rule probability of a disjunction of two events A and B P A B P A P B P A B Theorem of total probability if events A1 An are mutually exclusive with P B n X i 1 5 P B Ai P Ai Pn i 1 P Ai 1 then Brute Force MAP Hypothesis Learner 1 For each hypothesis h in H calculate the posterior probability P h D P D h P h P D 2 Output the hypothesis hM AP with the highest posterior probability hM AP argmax P h D h H 6 Evolution of Posterior Probabilities P h P h D1 hypotheses a P h D1 D2 hypotheses b 7 hypotheses c Characterizing Learning Equivalent MAP Learners Algorithms Inductive system Training examples D Candidate Elimination Algorithm Hypothesis space H Output hypotheses Equivalent Bayesian inference system Training examples D Output hypotheses Hypothesis space H Brute force MAP learner P h uniform P D h 0 if inconsistent 1 if consistent Prior assumptions made explicit 8 by Learning A Real Valued Function y f hML e x Consider any real valued target function f Training examples hxi di i where di is noisy training value di f xi ei ei is random variable noise drawn independently for each xi according to some Gaussian distribution with mean 0 Then the maximum likelihood hypothesis hM L is the one that minimizes the sum of squared errors hM L arg min h H m X i 1 9 di h xi 2 Learning A Real Valued Function hM L argmax p D h h H argmax h H argmax h H m Y p di h i 1 m Y i 1 1 1 2 2 e 2 di h xi 2 Maximize natural log of this instead hM L argmax m X ln 1 1 2 2 2 1 di h xi argmax 2 h H i 1 h H i 1 m X argmax h H argmin h H m X di h xi i 1 m X di h xi i 1 10 2 2 di h xi 2 2 Most Probable Classification of New Instances So far we ve sought the most probable hypothesis given the data D i e hM AP Given new instance x what is its most probable classification hM AP x is not the most probable classification Consider Three possible hypotheses P h1 D 4 P h2 D 3 P h3 D 3 Given new instance x h1 x h2 x h3 x What s most probable classification of x 11 Bayes Optimal Classifier Bayes optimal classification X arg max vj V P vj hi P hi D hi H Example P h1 D 4 P h1 0 P h1 1 P h2 D 3 P h2 1 P h2 0 P h3 D 3 P h3 1 P h3 0 therefore X P hi P hi D 4 P hi P hi D 6 hi H X hi H and arg max vj V X P vj hi P hi D hi H 12 Naive Bayes Classifier Along with decision trees neural networks nearest nbr one of the most practical learning methods When to use Moderate or large training set available Attributes that describe instances are conditionally independent given classification Successful applications Diagnosis Classifying text documents 13 Naive Bayes Classifier Assume target function f X V where each instance x described by attributes ha1 a2 an i Most probable value of f x is vM AP vM AP argmax P vj a1 a2 an vj V argmax vj V P a1 a2 an vj P vj P a1 a2 an argmax P a1 a2 an vj P vj vj V Naive Bayes assumption P a1 a2 an vj Y P ai vj i which gives Naive Bayes classifier vN B argmax P vj vj V 14 Y i P ai vj Naive Bayes Algorithm Naive Bayes Learn examples For each target value vj P vj estimate P vj For each attribute value ai of each attribute a P ai vj estimate P ai vj Classify New Instance x vN B argmax P vj vj V Y ai x 15 P ai vj Bayesian Belief Networks Interesting because Naive Bayes assumption of conditional independence too restrictive But it s intractable without some such assumptions Bayesian Belief networks describe conditional independence among subsets of variables allows combining prior knowledge about in dependencies among variables with observed training data also called Bayes Nets 16 Conditional Independence Definition X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z that is if xi yj zk P X xi Y yj Z zk P X xi Z zk more compactly we write P X Y Z P X Z Example T hunder is conditionally independent of Rain given Lightning P T hunder Rain Lightning P T hunder Lightning Naive Bayes uses cond indep to justify P X Y Z P X Y Z P Y Z P X Z P Y Z 17 Bayesian Belief Network Storm BusTourGroup S B Lightning Campfire S B S B S B C 0 4 0 1 0 8 0 2 C 0 6 0 9 0 2 0 8 Campfire Thunder ForestFire Network represents a set of conditional independence assertions Each node is asserted to be conditionally independent of its nondescendants given its immediate predecessors Directed acyclic graph 18 Bayesian Belief Network Storm BusTourGroup S B Lightning Campfire S B S B S B C 0 4 0 1 0 8 0 2 C 0 6 0 9 0 2 0 8 Campfire Thunder ForestFire Represents joint probability distribution over all variables e g P Storm BusT ourGroup F orestF ire in general P y1 yn n …


View Full Document

CU-Boulder CSCI 4202 - Bayesian Learning

Download Bayesian Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Learning and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?