1CS 559: Machine Learning Fundamentals and Applications2ndSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiE-mail: [email protected]: Lieb 2151Logistics: Project • Class webpage: http://www.cs.stevens.edu/~mordohai/classes/cs559_s09.html• Project types:– Application project: pick an application of interest, apply learning algorithm on it– Algorithmic project: develop an algorithm (or a variant) and solve some problem with itPattern Classification, Chapter 2 2Logistics: Project • Topics: pick one before March 17– Related to your research– Based on class material– Brain storm with me• Has to be approved by March 17• Present for 5 minutes on March 26• Do actual work• Present work on April 30• Submit brief report by May 4 (midnight)Pattern Classification, Chapter 2 32Chapter 2 (Part 1): Bayesian Decision Theory(Sections 2.1-2.2) Bayesian Decision Theory–Continuous FeaturesBayes’ RulePattern Classification, Chapter 2 5xxxppPPjjj| | posteriorlikelihoodpriorevidence 1|1|000|11|110xxxxxjjjjjjjjppPpPppPPA leading proponent I.J. Good argued persuasively that ``the subjectivist (i.e. Bayesian) states his judgments, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science''. The Origin of Bayes’ RulePattern Classification, Chapter 2 6• A simple consequence of using probability to represent degrees of belief• For any two random variables:)|()()&()|()()&(BApBpBApABpApBAp)|()()|()(ABpApBApBp )()|()()|(BpABpApBAp 3Introduction• The sea bass/salmon example– State of nature, prior• State of nature is a random variable• The catch of salmon and sea bass is equiprobable–P(1) = P(2)(uniform priors)–P(1) + P( 2) = 1(exclusivity and exhaustivity)Pattern Classification, Chapter 2 (Part 1) 7• Decision rule with only the prior information– Decide 1if P(1) > P(2)otherwise decide 2• Use of the class –conditional information•p(x | 1)and p(x | 2)describe the difference in lightness between populations of sea and salmonPattern Classification, Chapter 2 (Part 1) 8Pattern Classification, Chapter 2 (Part 1) 94• Posterior, likelihood, evidence– P(j| x) = P(x | j) . P (j ) / P(x)– Where in case of two categories – Posterior = (Likelihood. Prior) / EvidencePattern Classification, Chapter 2 (Part 1) 10Pattern Classification, Chapter 2 (Part 1) 11• Decision given the posterior probabilitiesX is an observation for which:if P(1| x) > P(2| x)True state of nature = 1if P(1| x) < P(2| x)True state of nature = 2Therefore:whenever we observe a particular x, the probability of error is :P(error | x) = P(1| x) if we decide 2P(error | x) = P(2| x) if we decide 1Pattern Classification, Chapter 2 (Part 1) 125• Minimizing the probability of error • Decide 1if P(1| x) > P(2| x);otherwise decide 2Therefore:P(error | x) = min [P(1| x), P(2| x)](Bayes decision)Pattern Classification, Chapter 2 (Part 1) 13Example: Disease Testing• Suppose you have been tested positive for a disease; what is the probability that you actually have the disease? • It depends on the accuracy and sensitivity of the test, and on the background (prior) probability of the disease.Pattern Classification, Chapter 2 14Example: Disease Testing (cont.)• Let P(Test=+ | Disease=true) = 0.95• Then the false negative rate, P(Test=- | Disease=true) = 5%. • Let P(Test=+ | Disease=false) = 0.05, (the false positive rate is also 5%)• Suppose the disease is rare: P(Disease=true) = 0.01Pattern Classification, Chapter 2 15 161.099.0*05.001.0*95.001.0*95.0 | | ||falseDiseasePfalseDisea seTestptrueDiseasePtrueDiseaseTestptrueDisea sePtrueDiseaseTestpTesttrueDiseaseP6Example: Disease Testing (cont.)• Probability of having the disease given that you tested positive is just 16%. – Seems too low, but ... • Of 100 people, we expect only 1 to have the disease, and that person will probably test positive. • But we also expect about 5% of the others (about 5 people in total) to test positive by accident. • So of the 6 people who test positive, we only expect 1 of them to actually have the disease; and indeed 1/6 is approximately 0.16. Pattern Classification, Chapter 2 16Bayesian Decision Theory –Continuous Features• Generalization of the preceding ideas– Use of more than one feature– Use more than two states of nature– Allowing actions and not only decide on the state of nature– Introduce a loss of function which is more general than the probability of errorPattern Classification, Chapter 2 (Part 1) 17• Allowing actions other than classification primarily allows the possibility of rejection• Refusing to make a decision in close or bad cases!• The loss function states how costly each action taken isPattern Classification, Chapter 2 (Part 1) 187Let {1, 2,…, c}be the set of c states of nature(or “categories”)Let {1, 2,…, a}be the set of possible actionsLet (i| j)be the loss incurred for taking action iwhen the state of nature is jPattern Classification, Chapter 2 (Part 1) 19Overall riskR = Sum of all R(i| x) for i = 1,…,aMinimizing R Minimizing R(i| x) for i = 1,…, afor i = 1,…,aPattern Classification, Chapter 2 (Part 1) 20Conditional riskcj1jjjii)x|(P)|()x|(RSelect the action ifor which R(i| x)is minimumR is minimum and R in this case is called the Bayes risk = best performance that can be achieved!Pattern Classification, Chapter 2 (Part 1) 218•Two-category classification1: deciding 12: deciding 2ij= (i|j)loss incurred for deciding iwhen the true state of nature is jConditional risk:R(1| x) = 11P(1| x) + 12P(2| x)R(2| x) = 21P(1 | x) + 22P(2| x) Pattern Classification, Chapter 2 (Part 1) 22Our rule is the following:if R(1| x) < R(2| x)action 1: decide 1This results in the equivalent rule :decide 1if:(21- 11) P(x | 1) P(1) >(12- 22) P(x | 2) P(2)and decide2otherwisePattern Classification,
View Full Document