DOC PREVIEW
MSU CSE 802 - Pattern Classification

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pattern ClassificationAll materials in these slides were taken fromPattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisherChapter 3:Maximum-Likelihood & Bayesian Parameter Estimation ●Introduction●Maximum-Likelihood Estimation●Bayesian Estimation●Curse of Dimensionality●Component analysis & Discriminants●EM AlgorithmPattern Classification, Chapter 32● Introduction● Bayesian framework●We could design an optimal classifier if we knew:●P(ωi) : priors●P(x | ωi) : class-conditional densitiesUnfortunately, we rarely have this complete information!● Design a classifier based on a set of labeled training samples (supervised learning)●Assume priors are known●Need sufficient no. of training samples for estimating class-conditional densities, especially when the dimensionality of the feature space is large1Pattern Classification, Chapter 33● Assumption about the problem: parametric model of P(x | ωi) is available● Normality of P(x | ωi)P(x | ωi) ~ N( µi, Σi)●Characterized by 2 parameters● Estimation techniques●Maximum-Likelihood (ML) and Bayesian estimation●Results of the two procedures are nearly identical, but the approaches are different1Pattern Classification, Chapter 34●Parameters in ML estimation are fixed but unknown! Bayesian parameter estimation procedure, by its nature, utilizes whatever prior information is available about the unknown parameter●MLE: Best parameters are obtained by maximizing the probability of obtaining the samples observed●Bayesian methods view the parameters as random variables having some known prior distribution; How do we know the priors?●In either approach, we use P(ωi| x) for our classification rule!1Pattern Classification, Chapter 35● Maximum-Likelihood Estimation●Has good convergence properties as the sample size increases; estimated parameter value approaches the true value as n increases●Simpler than any other alternative technique● General principle●Assume we have c classes andP(x | ωj) ~ N( µj, Σj)P(x | ωj) ≡ P (x | ωj, θj), where)...)x,xcov(,,,...,,(),(njmj22j11j2j1jjjσσσσσσσσµµµµµµµµ====ΣΣΣΣµµµµ====θθθθ2Use class ωjsamples to estimate class ωj parametersPattern Classification, Chapter 36●Use the information in training samples to estimate θ = (θ1, θ2, …, θc); θi(i = 1, 2, …, c) is associated with the ith category●Suppose sample set D contains n iid samples, x1, x2,…, xn●ML estimate of θ is, by definition, the value that maximizes P(D | θ)“It is the value of θ that best agrees with the actually observed training samples”samples) ofset the w.r.t. of likelihood the called is )|D(P)(F)|x(P)|D(Pnk1kkθθθθθθθθθθθθ====θθθθ∏∏∏∏====θθθθ========θθθθˆ2Pattern Classification, Chapter 372Pattern Classification, Chapter 38●Optimal estimation●Let θ = (θ1, θ2, …, θp)tand ∇θbe the gradient operator●We define l(θ) as the log-likelihood functionl(θ) = ln P(D | θ)●New problem statement:determine θ that maximizes the log-likelihoodtp21,...,,θθθθ∂∂∂∂∂∂∂∂θθθθ∂∂∂∂∂∂∂∂θθθθ∂∂∂∂∂∂∂∂====∇∇∇∇θθθθ)(lmaxargˆθθθθ====θθθθθθθθ2Pattern Classification, Chapter 39Set of necessary conditions for an optimum is:∇θl = 0))|x(Plnl(knk1kθθθθ∑∑∑∑∇∇∇∇====∇∇∇∇========θθθθθθθθ2Pattern Classification, Chapter 310●Example of a specific case: unknown µ●P(x | µ) ~ N(µ, Σ)(Samples are drawn from a multivariate normal population)θ = µ, therefore the ML estimate for µ must satisfy:[[[[]]]])x()|x(Pln and)x()x(21)2(ln21)|x(Pln1kk1ktkdkµµµµ−−−−∑∑∑∑====µµµµ∇∇∇∇µµµµ−−−−∑∑∑∑µµµµ−−−−−−−−ΣΣΣΣππππ−−−−====µµµµ−−−−θµθµθµθµ−−−−0)ˆx(knk1k1====µµµµ−−−−∑∑∑∑ΣΣΣΣ========−−−−2Pattern Classification, Chapter 311•Multiplying by Σ and rearranging, we obtain:which is the arithmetic average or the mean of the samples of the training samples!Conclusion: Given P(xk| ωj), j = 1, 2, …, c to be Gaussian in a d-dimensional feature space, estimate the vector θ = (θ1, θ2, …, θc)tand perform a classification using the Bayes decision rule of chapter 2!∑∑∑∑====µµµµ========nk1kkxn1ˆ2Pattern Classification, Chapter 312●ML Estimation: ● Univariate Gaussian Case: unknown µ& σθ = (θ1, θ2) = (µ, σ2)====θθθθθθθθ−−−−++++θθθθ−−−−====θθθθ−−−−θθθθ====θθθθσθσθσθσθσσσσθθθθσθσθσθσθσσσσ====∇∇∇∇θθθθ−−−−θθθθ−−−−πθπθπθπθ−−−−====θθθθ====θθθθ02)x(210)x(10))|x(P(ln))|x(P(lnl)x(212ln21)|x(Plnl2221k21k2k2k121k22k2Pattern Classification, Chapter 313Summation:Combining (1) and (2), one obtains:n)x( ; nxnk1k2k2nk1kk∑∑∑∑µµµµ−−−−====σσσσ∑∑∑∑====µµµµ================∑∑∑∑ ∑∑∑∑====θθθθθθθθ−−−−++++θθθθ−−−−∑∑∑∑====θθθθ−−−−θθθθ========================nk1knk1k2221k2nk1k1k2(2) 0ˆ)ˆx(ˆ1(1) 0)x(ˆ12Pattern Classification, Chapter 314●Bias● ML estimate for σ2is biased● An unbiased estimator for Σ is:222i.n1n)xx(n1E σσσσ≠≠≠≠σσσσ−−−−====−−−−ΣΣΣΣ  matrix covariance Samplenk1ktkk)ˆx)(x(1-n1C∑∑∑∑µµµµ−−−−µµµµ−−−−============2Pattern Classification, Chapter 115●Bayesian Estimation (Bayesian learning approach for pattern classification problems)● In MLE θ was supposed to have a fixed value● In BE θ is a random variable● The computation of posterior probabilities P(ωi| x) lies at the heart of Bayesian classification● Goal: compute P(ωi| x, D) Given the training sample set


View Full Document

MSU CSE 802 - Pattern Classification

Download Pattern Classification
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pattern Classification and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pattern Classification 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?