UB CSE 666 - Approximating class densities, Bayesian classifier - D443203

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 666> Approximating class densities, Bayesian classifier

UB CSE 666 - Approximating class densities, Bayesian classifier

School name University at Buffalo, The State University of New York

Course Cse 666- Biometrics & Image Analysis

Pages 23

Download Save

Unformatted text preview:

http://www.cubs.buffalo.eduPattern RecognitionApproximating class densities, Bayesian classifierhttp://www.cubs.buffalo.eduBayesian classification• Suppose we have 2 classes and we know probability density functions of their feature vectors. How some new pattern should be classified?• Bayes classification rule: classify x to the class which has biggest posterior probability)|( xwPiiw2121:?)|()|( wwxwPxwP>212211:?)()|()()|( wwwPwxpwPwxp>Using Bayes formula, we can rewrite classification rule:posteriorlikelihood priorhttp://www.cubs.buffalo.eduEstimating probability density function.• Parametric pdf estimation: model unknown probability density function of class by some parametric function and determine parameters based on training samples.)|(iwxpiw);(θxpiExample: Gaussian function2)(212/)2(1);(µπµ−−=xlexp• Non-parametric pdf estimation:1. Histogram2. K nearest neighbor 3. Kernel methods (Parzen kernels or windows)4. Other methods (estimating cumulative distribution function first, SVM density estimation, etc.)∑=−=NiihxxhNxp111)(ˆϕNis the number of training sampleshttp://www.cubs.buffalo.eduEstimating kernel width• Non-parametric pdf estimation:• Fixed kernels:• Adaptive kernels:or∑=−=NiihxxhNxp111)(ˆϕ∑=−=NiiiihxxhNxp111)(ˆϕ∑=−=NiiiiihxxhNxp111)(ˆϕhttp://www.cubs.buffalo.eduEstimating kernel widthRecall, we used maximum likelihood method for parametric pdfestimation:∏===NkkNxpxxxpXp121);(ˆmax)|,...,,(ˆmax);(ˆmaxθθθθθθCan we use same method for estimating the kernel width ?)(ˆxphNo, the max is not achievable:( )( )0 if max);(ˆmax11111→∞→=∏∏=∑=−=hhxpNkNihkxixhNhNkkhϕhttp://www.cubs.buffalo.eduEstimating kernel widthSolution: separate model data (kernel centers) from testing data- cross-validation technique)(ˆxp∏∑=≠−NkhkihkxixhN111maxϕhttp://www.cubs.buffalo.eduEstimating kernel widthTried maximum likelihood cross-validation and still diverges?)(ˆxp∞→≠−∏∑=NkhkihkxixhN111maxϕThis might happen if data is somewhat discrete:Solution - truly separate model data from testing data:∏∑=≠−NkkikihxxhxxhN111maxϕhttp://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a univariate normal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a bimodal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a bivariatenormal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduError in pdf estimation)(xp)(ˆxpx2)}()(ˆ{)ˆ( xpxpEpMSEx−=Discrepancy between true density and its estimation :)(xp)(ˆxp∫−= dxxpxpEpMISE2)}()(ˆ{)ˆ(- Mean Square Error- Mean Integrated Square Error[ ][ ][ ]22222222222}ˆˆ{ˆ}ˆ{}ˆ{}ˆ{2}ˆ{}ˆ{2}ˆ{}ˆ2ˆ{}ˆ{)ˆ(ppEEppEpEpEpppEpEpppEpEppppEppEpMSEx−+−=−++−=+−=+−=−=http://www.cubs.buffalo.eduBias and variance of estimation error[][]22}ˆˆ{ˆ)ˆ( ppEEppEpMSEx−+−=)(xp))(ˆ( xpE)(ˆxpBias VariancepEˆ- Average approximationBias is the difference between true density and average approximationVariance is the difference between average approximation and individual approximationsSmaller kernel width reduces bias, but increases variance.∫−= dyyphyxKhpE )(1ˆhttp://www.cubs.buffalo.eduBias and variance of estimation error{}{}5/15/125/125/22)()(−−−∫∫′′= ndxxpdttkhoptϕSilverman (Parzen):If some assumptions on the true density are made (e.g. ) then it is possible to analytically find the kernel width which gives smallest Optimal kernel width gets smaller when the number of training samples increases. For optimal kernel width also decreases:∞<′′∫dxxp2))(()ˆ(pMISEn)ˆ(pMISE{}5/45/12)()(~−∫′′ndxxpCMISEϕNote, that is unknown. Above formulas are useful for theory, but not for practical applications.)(xpFor multivariate pdf approximation:The performance decreases exponentially when the number of dimensions increases)4/(4~dnMISE+−http://www.cubs.buffalo.eduBayesian classification• Bayes classification rule: classify x to the class which has biggest posterior probability)|( xwPiiw2121:?)|()|( wwxwPxwP>• Bayes classification rule minimizes the total probability of misclassification. Cost of errors.• Errors happen when samples of class 1 are incorrectly classified to belong to class 2, and samples of class 2 are classified to belong to class 1.• The cost of making these errors can be different : 1λ- the cost of misclassifying samples of class 12λ- the cost of misclassifying samples of class 2http://www.cubs.buffalo.eduTotal cost (or risk) of classificationClassification algorithm splits feature space into two decision regions:1R- samples in this region are classified as being in class 12R- samples in this region are classified as being in class 2∫2)|(1Rdxwxp- the proportion of samples of class 1 being classified as class 2∫1)|(2Rdxwxp- the proportion of samples of class 2 being classified as class 1∫2)|()(11RdxwxpwP- the proportion of all input samples being class 1 but classified as being in class 2∫1)|()(22RdxwxpwP- the proportion of all input samples being class 2 but classified as being in class 1∫∫+=12)|()()|()(222111RRdxwxpwPdxwxpwPCostλλ-total costhttp://www.cubs.buffalo.eduMinimizing total cost of classificationSince and cover whole feature space1R2R1)|()|(2111=+∫∫RRdxwxpdxwxpThusCost is minimized if includes only points where ∫∫∫−+=+−=111))|()()|()(()()|()(})|(1){(11122211222111RRRdxwxpwPwxpwPwPdxwxpwPdxwxpwPCostλλλλλ1R0)|()()|()(111222<−wxpwPwxpwPλλhttp://www.cubs.buffalo.eduBayesian classificationBayesian classifier is an optimal classifier minimizing total classification cost. Such classifier is possible only if we have full knowledge about class distributions.If then classify as class 1.

View Full Document


School:
Email:
New Password:
Confirm Password:

UB CSE 666 - Approximating class densities, Bayesian classifier

Sign up for free to view:

Please select your school