UCSD CSE 190 - Parameter Estimation - D2958217

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 190> Parameter Estimation

DOC PREVIEW

UCSD CSE 190 - Parameter Estimation

School name University of California, San Diego

Course Cse 190- Topics/Computer Sci & Engineer

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CSE190a Fall 06Parameter EstimationBiometricsCSE 190-aLecture 6CSE190a Fall 06Announcements• Readings on E-reserves• Project proposal due todayPattern ClassificationAll materials in these slides were taken fromPattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisherChapter 3:Maximum-Likelihood & Bayesian Parameter Estimation (part 1)z Introductionz Maximum-Likelihood Estimationz Example of a Specific Casez The Gaussian Case: unknown μ and σz BiaszPattern Classification, Chapter 34z Introductionz Data availability in a Bayesian frameworkz We could design an optimal classifier if we knew:z P(ωi) (priors)z P(x | ωi) (class-conditional densities)Unfortunately, we rarely have this complete information!z Design a classifier from a training samplez No problem with prior estimationz Samples are often too small for class-conditional estimation (large dimension of feature space!)1 Pattern Classification, Chapter 35z A priori information about the problemz Normality of P(x | ωi)P(x | ωi) ~ N( μi, Σi)z Characterized by 2 parametersz Estimation techniquesz Maximum-Likelihood (ML) and the Bayesian estimationsz Results are nearly identical, but the approaches are different12Pattern Classification, Chapter 36z Parameters in ML estimation are fixed but unknown!z Best parameters are obtained by maximizing the probability of obtaining the samples observedz Bayesian methods view the parameters as random variables having some known distributionz In either approach, we use P(ωi| x)for our classification rule!1 Pattern Classification, Chapter 37z Maximum-Likelihood Estimationz Has good convergence properties as the sample size increasesz Simpler than any other alternative techniquesz General principlez Assume we have c classes andP(x | ωj) ~ N( μj, Σj)P(x | ωj) ≡ P (x | ωj, θj) where:)...)x,xcov(,,,...,,(),(njmj22j11j2j1jjjσσμμ=Σμ=θ2Pattern Classification, Chapter 38z Use the informationprovided by the training samples to estimate θ = (θ1, θ2, …, θc), each θi(i = 1, 2, …, c) is associated with each categoryz Suppose that D contains n samples, x1, x2,…, xnz ML estimate of θ is, by definition the value that maximizes P(D | θ)“It is the value of θ that best agrees with the actually observed training sample”samples) ofset the w.r.t. of likelihood the called is )|D(P)(F)|x(P)|D(Pnk1kkθθθ=θ∏=θ==θˆ2 Pattern Classification, Chapter 392Pattern Classification, Chapter 310z Optimal estimationz Let θ = (θ1, θ2, …, θp)tand let ∇θbe the gradient operatorz We define l(θ) as the log-likelihood functionl(θ) = ln P(D | θ)z New problem statement:determine θ that maximizes the log-likelihoodtp21,...,,⎥⎦⎤⎢⎣⎡θ∂∂θ∂∂θ∂∂=∇θ)(lmaxargˆθ=θθ2 Pattern Classification, Chapter 311Set of necessary conditions for an optimum is:∇θl = 0))|x(Plnl(knk1kθ∑∇=∇==θθ23Pattern Classification, Chapter 312z Example of a specific case: unknown μ, Σ knownz P(xi| μ) ~ N(μ, Σ)(Samples are drawn from a multivariate normal population)θ = μ therefore:• The ML estimate for μ must satisfy:[])()|(ln)()(21)2(ln21)|(ln11μμμμπμθμ−=∇−−−Σ−=∑∑−−kkktkdkxxPandxxxP 0)ˆ(11=−Σ∑==−μknkkx2 Pattern Classification, Chapter 313• Multiplying by Σ and rearranging, we obtain:Just the arithmetic average of the samples of the training samples!Conclusion:If P(xk| ωj) (j = 1, 2, …, c) is supposed to be Gaussian in a d-dimensional feature space; then we can estimate the vector θ = (θ1, θ2, …, θc)tand perform an optimal classification!∑=μ==nk1kkxn1ˆ2Pattern Classification, Chapter 314z ML Estimation: z Gaussian Case: unknown μand σθ = (θ1, θ2) = (μ, σ2)⎪⎪⎩⎪⎪⎨⎧=θθ−+θ−=θ−θ=⎟⎟⎟⎟⎠⎞⎜⎜⎜⎜⎝⎛θσθσθσθσ=∇θ−θ−πθ−=θ=θ02)x(210)x(10))|x(P(ln))|x(P(lnl)x(212ln21)|x(Plnl2221k21k2k2k121k22k2 Pattern Classification, Chapter 315Summation:Combining (1) and (2), one obtains:nxnxnkkknkkk∑∑====−====12211)(ˆˆˆˆμσθμθ2 ; ⎪⎪⎩⎪⎪⎨⎧∑∑=θθ−+θ−∑=θ−θ======nk1knk1k2221k2nk1k1k2(2) 0ˆ)ˆx(ˆ1(1) 0)x(ˆ12Pattern Classification, Chapter 316z Biasz ML estimate for σ2is biasedz An elementary unbiased estimator for Σ is:2221)(1σσ≠−=⎥⎦⎤⎢⎣⎡−ΣnnxxnEi4444434444421matrix covariance Samplenk1ktkk)ˆx)(x(1-n1C∑μ−μ−===2 Pattern Classification, Chapter 317z Appendix: ML Problem Statementz Let D = {x1, x2, …, xn}P(x1,…, xn| θ) = Π1,nP(xk| θ); |D| = nOur goal is to determine (value of θ that makes this sample the most representative!)θˆ2418z Bayesian Estimation (BE)z Bayesian Parameter Estimation: Gaussian Casez Bayesian Parameter Estimation: General Estimationz Problems of Dimensionalityz Computational Complexityz Component Analysis and Discriminantsz Hidden Markov ModelsBayesian Parameter Estimation (part 2)19Bayesian Estimationz In MLE θ was supposed fixz In BE θ is a random variablez The computation of posterior probabilities P(ωi| x) that is used for classification lies at the heart of Bayesian classificationz Given the sample D, Bayes formula can be written∑==cjjjiiiPxpPxpxP1)|(),|()|(),|(),|(DDDDDωωωωω20zWe assume that- Samples Di provide info about class i only, where D={D1, …, Dc)}- P(ωi) = P(ωi| Di) (i.e., samples Didetermine the prior on ωI)z Goal: compute p(ωi| x, Di) ∑==cjPxpPxpxP1)(),|()(),|(),|(jjjiiiiiDDDωωωωω21zSo now what do we do??? Well, the only term we don’t know on the right-side of is p(x|ωi,Di) the class conditional density, but this involves a parameter θ that is a random variable.∑==cjPxpPxpxP1)(),|()(),|(),|(jjjiiiiiDDDωωωωω22zIf we knew θ we would be done! But we don’t know it.z We do know that- θ has a known prior p(θ )- and we have observed samples Di.z So we can re-write the ccd as 3θθdDxpxp )|,()|(∫=DθθθdDpDxp )|(),|(∫=θθθdDpxp )|()|(∫=23Bayesian Parameter Estimation: Gaussian CaseStep I: Estimate θ using the a-posteriori density P(θ | D)The univariate case: P(μ | D)μ is the only unknown parameter(μ0and σ0are known!)),)),)202σμμσμμ0N( ~(N( ~ |(x pp524z So now we must calculatez Reproducing density is found aswhere N ),(~)|(2nnpσμμD220220202202220020ˆσσσσσμσσσμσσσμ+=++⎟⎟⎠⎞⎜⎜⎝⎛+=nandnnnnnn ∫=μμμμμμdpDppDpp)()|()()|()|(

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

UCSD CSE 190 - Parameter Estimation

Sign up for free to view:

Please select your school