1Lecture note for Stat 231: Pattern Recognition and Machine LearningLecture 2:Bayesian Decision Theory1. Diagram and formulation2. Bayes rule for inference3. Bayesian decision4. Discriminant functions and space partition5. Advanced issues Lecture note for Stat 231: Pattern Recognition and Machine LearningDiagram of pattern classificationProcedure of pattern recognition and decision makingsubjectsFeaturesxObservablesXActionαInner beliefwX--- all the observables using existing sensors and instrumentsx --- is a set of features selected from components of X, or linear/non-linear functions of X.w --- is our inner belief/perception about the subject class.α --- is the action that we take for x. We denote the three spaces by},...,,{,class ofindex theisvectorais),...,,(α,,k21Cd21αCdwwwwxxxxwx=Ω=Ω∈Ω∈Ω∈2Lecture note for Stat 231: Pattern Recognition and Machine LearningExamplesEx 1: Fish classificationX=I is the image of fish,x =(brightness, length, fin#, ….)w is our belief what the fish type is Ωc={“sea bass”, “salmon”, “trout”, …}α is a decision for the fish type,in this case Ωc= ΩαΩα ={“sea bass”, “salmon”, “trout”, …}Ex 2: Medical diagnosisX= all the available medical tests, imaging scans that a doctor can order for a patientx =(blood pressure, glucose level, cough, x-ray….)w is an illness typeΩc={“Flu”, “cold”, “TB”, “pneumonia”, “lung cancer”…}α is a decision for treatment,Ωα ={“Tylenol”, “Hospitalize”, …}Lecture note for Stat 231: Pattern Recognition and Machine LearningTaskssubjectsFeaturesxObservablesXDecisionαInner beliefwcontrolsensorsselectingInformative featuresstatisticalinferencerisk/costminimizationIn Bayesian decision theory, we are concerned with the last three steps in the big ellipseassuming that the observables are given and features are selected.3Lecture note for Stat 231: Pattern Recognition and Machine LearningBayesian Decision TheoryFeaturesxDecisionα(x)Inner beliefp(w|x)statisticalInferencerisk/costminimizationTwo probability tables: a). Prior p(w)b). Likelihood p(x|w)A risk/cost function(is a two-way table)λ(α | w)The belief on the class w is computed by the Bayes ruleThe risk is computed by)()()|()|(xpwpwxpxwp =∑==kxxR1jjjii)|)p(ww|()|(αλαLecture note for Stat 231: Pattern Recognition and Machine LearningDecision Rule A decision is made to minimize the average cost / risk,It is minimized when our decision is made to minimize the cost / risk for each instance x.∫= dx)()|)(( xpxxRRαααΩ→Ωd:)(x∑=ΩΩ==kjjjxwpwxRx1)|()|(minarg)|(minarg)(αλααααA decision rule is a mapping function from feature space to the set of actionswe will show that randomized decisions won’t be optimal.4Lecture note for Stat 231: Pattern Recognition and Machine LearningBayesian errorIn a special case, like fish classification, the action is classification, we assume a 0/1 error.jijijijiwifwwifw≠===ααλααλ1)|(0)|()|(1)|p(w)|(iwjiijxpxxRααα−==∑≠The risk for classifying x to class αi is,The optimal decision is to choose the class that has maximum posterior probability)|(maxarg))|(1(minarg)( xpxpxαααααΩΩ=−=The total risk for a decision rule, in this case, is called the Bayesian errordxxpxxpdxxpxerrorperrorpR )())|)((1()()|()(∫∫−===αLecture note for Stat 231: Pattern Recognition and Machine LearningDiscriminant functionsTo summarize, we take an action to maximize some discriminant functions)|()()(log)|(log)()()|()()|()(xRxgwpwxpxgwpwxpxgxwpxgiiiiiiiiiiα−=+===})(...,...),(),({maxarg)(21xgxgxgxk=α5Lecture note for Stat 231: Pattern Recognition and Machine LearningPartition of feature spaceααΩ→Ωd:)(xThe decision is a partition /coloring of the feature space into k subspacesji,jiik1i≠=Ω∩ΩΩ∪=Ω=φ14352Lecture note for Stat 231: Pattern Recognition and Machine LearningAn example of fish classification6Lecture note for Stat 231: Pattern Recognition and Machine LearningDecision/classification BoundariesLecture note for Stat 231: Pattern Recognition and Machine LearningClose-form solutionsIn two case classification, k=2, with p(x|w) being Normal densities, the decisionBoundaries can be computed in close-form.7Lecture note for Stat 231: Pattern Recognition and Machine LearningAdvanced issues1. Subjectivism of the prior in Bayesian decision2. Learning the probabilities p(w) and p(x|w) ---machine learning3. Choosing optimal features---what features are most discriminative?4. How many features are enough?5. Consider the context or sequential information in
View Full Document