DOC PREVIEW
STEVENS CS 559 - CS 559 2nd Set of Notes

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 559: Machine Learning Fundamentals and Applications2ndSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiE-mail: [email protected]: Lieb 2151Logistics: Project • Class webpage: http://www.cs.stevens.edu/~mordohai/classes/cs559_s09.html• Project types:– Application project: pick an application of interest, apply learning algorithm on it– Algorithmic project: develop an algorithm (or a variant) and solve some problem with itPattern Classification, Chapter 2 2Logistics: Project • Topics: pick one before March 17– Related to your research– Based on class material– Brain storm with me• Has to be approved by March 17• Present for 5 minutes on March 26• Do actual work• Present work on April 30• Submit brief report by May 4 (midnight)Pattern Classification, Chapter 2 32Chapter 2 (Part 1): Bayesian Decision Theory(Sections 2.1-2.2) Bayesian Decision Theory–Continuous FeaturesBayes’ RulePattern Classification, Chapter 2 5xxxppPPjjj| | posteriorlikelihoodpriorevidence 1|1|000|11|110xxxxxjjjjjjjjppPpPppPPA leading proponent I.J. Good argued persuasively that ``the subjectivist (i.e. Bayesian) states his judgments, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science''. The Origin of Bayes’ RulePattern Classification, Chapter 2 6• A simple consequence of using probability to represent degrees of belief• For any two random variables:)|()()&()|()()&(BApBpBApABpApBAp)|()()|()(ABpApBApBp )()|()()|(BpABpApBAp 3Introduction• The sea bass/salmon example– State of nature, prior• State of nature is a random variable• The catch of salmon and sea bass is equiprobable–P(1) = P(2)(uniform priors)–P(1) + P( 2) = 1(exclusivity and exhaustivity)Pattern Classification, Chapter 2 (Part 1) 7• Decision rule with only the prior information– Decide 1if P(1) > P(2)otherwise decide 2• Use of the class –conditional information•p(x | 1)and p(x | 2)describe the difference in lightness between populations of sea and salmonPattern Classification, Chapter 2 (Part 1) 8Pattern Classification, Chapter 2 (Part 1) 94• Posterior, likelihood, evidence– P(j| x) = P(x | j) . P (j ) / P(x)– Where in case of two categories – Posterior = (Likelihood. Prior) / EvidencePattern Classification, Chapter 2 (Part 1) 10Pattern Classification, Chapter 2 (Part 1) 11• Decision given the posterior probabilitiesX is an observation for which:if P(1| x) > P(2| x)True state of nature = 1if P(1| x) < P(2| x)True state of nature = 2Therefore:whenever we observe a particular x, the probability of error is :P(error | x) = P(1| x) if we decide 2P(error | x) = P(2| x) if we decide 1Pattern Classification, Chapter 2 (Part 1) 125• Minimizing the probability of error • Decide 1if P(1| x) > P(2| x);otherwise decide 2Therefore:P(error | x) = min [P(1| x), P(2| x)](Bayes decision)Pattern Classification, Chapter 2 (Part 1) 13Example: Disease Testing• Suppose you have been tested positive for a disease; what is the probability that you actually have the disease? • It depends on the accuracy and sensitivity of the test, and on the background (prior) probability of the disease.Pattern Classification, Chapter 2 14Example: Disease Testing (cont.)• Let P(Test=+ | Disease=true) = 0.95• Then the false negative rate, P(Test=- | Disease=true) = 5%. • Let P(Test=+ | Disease=false) = 0.05, (the false positive rate is also 5%)• Suppose the disease is rare: P(Disease=true) = 0.01Pattern Classification, Chapter 2 15 161.099.0*05.001.0*95.001.0*95.0 | | ||falseDiseasePfalseDisea seTestptrueDiseasePtrueDiseaseTestptrueDisea sePtrueDiseaseTestpTesttrueDiseaseP6Example: Disease Testing (cont.)• Probability of having the disease given that you tested positive is just 16%. – Seems too low, but ... • Of 100 people, we expect only 1 to have the disease, and that person will probably test positive. • But we also expect about 5% of the others (about 5 people in total) to test positive by accident. • So of the 6 people who test positive, we only expect 1 of them to actually have the disease; and indeed 1/6 is approximately 0.16. Pattern Classification, Chapter 2 16Bayesian Decision Theory –Continuous Features• Generalization of the preceding ideas– Use of more than one feature– Use more than two states of nature– Allowing actions and not only decide on the state of nature– Introduce a loss of function which is more general than the probability of errorPattern Classification, Chapter 2 (Part 1) 17• Allowing actions other than classification primarily allows the possibility of rejection• Refusing to make a decision in close or bad cases!• The loss function states how costly each action taken isPattern Classification, Chapter 2 (Part 1) 187Let {1, 2,…, c}be the set of c states of nature(or “categories”)Let {1, 2,…, a}be the set of possible actionsLet (i| j)be the loss incurred for taking action iwhen the state of nature is jPattern Classification, Chapter 2 (Part 1) 19Overall riskR = Sum of all R(i| x) for i = 1,…,aMinimizing R Minimizing R(i| x) for i = 1,…, afor i = 1,…,aPattern Classification, Chapter 2 (Part 1) 20Conditional riskcj1jjjii)x|(P)|()x|(RSelect the action ifor which R(i| x)is minimumR is minimum and R in this case is called the Bayes risk = best performance that can be achieved!Pattern Classification, Chapter 2 (Part 1) 218•Two-category classification1: deciding 12: deciding 2ij= (i|j)loss incurred for deciding iwhen the true state of nature is jConditional risk:R(1| x) = 11P(1| x) + 12P(2| x)R(2| x) = 21P(1 | x) + 22P(2| x) Pattern Classification, Chapter 2 (Part 1) 22Our rule is the following:if R(1| x) < R(2| x)action 1: decide 1This results in the equivalent rule :decide 1if:(21- 11) P(x | 1) P(1) >(12- 22) P(x | 2) P(2)and decide2otherwisePattern Classification,


View Full Document

STEVENS CS 559 - CS 559 2nd Set of Notes

Download CS 559 2nd Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 559 2nd Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 559 2nd Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?