DOC PREVIEW
UCSD ECE 271A - Homework Set Five

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Homework Set FiveECE 271ADepartment of Computer and Electrical EngineeringUniversity of California, San DiegoNuno Vasconcelos Fall 2007Due December 6, 20071. BDR and nearest neighbors Consider a c lassification problem with c classes and uniform classprobabilities, i.e. PY(i) = 1/c, ∀i. Assume that the goal is to cla ssify an iid sequence of observationsX = {x1, . . . , xn} as a whole (i.e. the samples a re not classified one at a time).a) Compute the BDR for this problem and show that it converges (in probability) to a nearest neighborrule based on the class-conditional distributions and the distribution of the observations. Show thatthe distance function is the Kullback-Leibler divergenceD[p (x)||q(x)] =Zp(x) logp(x)q(x)dx.This proves that the BDR for the classification of sequences is really just a nearest neighbor rule.b) Assuming that all densities are Gaussian with equal covariance Σ, the class co nditional densitieshave mean µiand the observation density has mean µ write down an expression for the decision rule asa function of the Gaussian parameters. Provide an interpretation for this new decision rule, by statingwhat are the items being compared and what is the distance function.2. Kernel methods Problem 4.3.3 in DHS13. Multinomial EM In this problem we consider an example where there is a closed-form solutionto ML estimation from incomplete data. The goal is to compare with the EM solution and g e t someinsight o n how the steps of the latter can be substantially eas ier to derive than the former.Consider our bridge example and let U be the type of vehicle that crosses the bridge. U that can take4 values, (compact, sedan, station wagon, and pick-up truck) that we denote by U ∈ {1, 2, 3 , 4}. On agiven day, an operator collects an iid sample of size n from U and the number of vehicles of each typeis counted and stored in a vecto r D = (x1, x2, x3, x4). The resulting random variable X (the histog ramof vehicle classes) has a multinomial distributionPX1,X2,X3,X4(x1, x2, x3, x4; Ψ) =n!x1!x2!x3!x4!12+14Ψx114−14Ψx214−14Ψx314Ψx4.However, it is later realized that the oper ator included motorcycles in the compact class. It is establishedthat bikes have probability14Ψ, which leads to a new modelPX11,X12,X2,X3,X4(x11, x12, x2, x3, x4; Ψ) ==n!x11!x12!x2!x3!x4!12x1114Ψx1214−14Ψx214−14Ψx314Ψx4.Determining the parameter Ψ from the available data is as a problem of ML estimatio n with missingdata, since we only have measurements forx1= x11+ x12but not for x11and x12independently.a) Determine the value of Ψ that maximizes the likelihood of D, i.e.Ψ∗i= arg maxΨPX1,X2,X3,X4(D; Ψ)by using standar d ML estimation procedures.b) Assume that we have the complete data, i.e. Dc= (x11, x12, x2, x3, x4). Determine the value of Ψthat maximizes its likelihood, i.e.Ψ∗c= arg maxΨPX11,X12,X2,X3,X4(Dc; Ψ),by using standard ML estimation procedures. Compar e the difficulty of obtaining this solution vs. thatof obtaining the solution in a). Does this look like a problem where EM might be helpful?c) Derive the E and M-steps of the EM algorithm for this problem.d) Using the equations for the EM steps, determine the fixed point of the algorithm (i.e. the solution)by makingΨk+1= Ψkwhere k is the iteration number. Compare to the solution obtained in a).24. Mixtures of Gaussians The goal of this problem is to give you some “hands- on” experience onthe very important case of EM as a tool for the estimation of the parameters of a mixture. Consider amixture of two GaussiansPX(x) =2Xc=1πcG(x, µc, Σc)where the covariance matrices are diagonal, i.e. Σc= diag(σ2c,1, σ2c,2), and a training sample of fivepointsD = {(−2.5, −1), (−2, 0.5), (−1, 0), (2.5, −1), (2, 1)}.a) Assume that the following holdµ1= −µ2Σ1= Σ1= σ2I,π1= π2=12.Plot the log-likelihood s urface log PX(D) as a function of the mean parameters (entries of µ1) forσ2∈ {0.1, 1, 2}. Let the coordinate axis cover the range ([−5, 5]). What can you say about the localmaxima of the likelihood surface, and how it changes with σ2? How does the convergence to the optimaldepend on the location of the initial parameter gues s?b) Starting from the initial parameter estimateπ(0)1= π(0)2=12µ(0)1= −µ(0)2= (−0.1, 0)Σ(0)1= Σ(0)1= I,compute all the quantities involved in the first 3 iterations of the EM algor ithm. For each iterationproduce• plo t 1: the posterior surface PZ|X(1|x) for the first class as a function of x,• plo t 2: the mean of each Gaussian, the contour where the Mahalanobis distance associated withit becomes 1, the points in D, and the means of the solutions obtained the previous steps.Let EM run until convergence, storing the mean estimates at each iteration. Produce the two plotsabove for the final solution. In plot 2, plot the values of the means as they progress from the initial tothe final estimate.5. EM and MAP estimates In this problem we use E M for the maximization of the posteriorprobabilityΨ∗= arg maxΨPΨ|X(Ψ|x).Consider the binomial distribution of problem 3. and a Gamma prio rPΨ(Ψ) =Γ(ν1+ ν2)Γ(ν1)Γ(ν2)Ψν1−1(1 − Ψ)ν2−1.Derive the equations of the EM algorithm for MAP estimation of the parameter Ψ.36. (Computer) This week we use the cheetah image to evaluate the performance of a classifier based onmixture models estimated with EM. Once again we use the decomposition into 8× 8 image blocks, com-pute the DCT of each block, and zig-zag scan. For this (using the da ta in TrainingSamplesDCT new 8.mat)we fit a mixture of Gaussians of diagonal covariance to each class, i.e.PX|Y(x|i) =CXc=1πcG(x, µc, Σc)where all Σcare diagonal matrices. We then apply the BDR based on these density estimates to thecheetah image and measure the probability of error as a function of the number of dimensions o f thespace (as before, use {1, 2, 4 , 8, 16, 24, 32, . . ., 64} dimensions).a) For each c lass, learn 5 mixtures of C = 8 components, using a random initialization (recall that themixture weights must add up to one). Plot the probability of error vs. dimension for each of the 25classifiers obtained with all possible mixture pairs. Comment the dependence of the probability of erroron the initialization.b) For each class, learn mixtures with C ∈ {1, 2, 4, 8, 16, 32}. P lot the probability of error vs. dimensionfor each numbe r of mixture components. What is the effect of the number of mixture


View Full Document

UCSD ECE 271A - Homework Set Five

Download Homework Set Five
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework Set Five and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework Set Five 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?