DOC PREVIEW
UCSD ECE 271A - Mixture Density Estimation

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Mixture density estimationNuno Vasconcelos ECE Department, UCSDp,Recalllast class, we will have “Cheetah Day”what:what:• 4 teams, average of 5 people• each team will write a report on the 4 pcheetah problems• each team will give a presentation on one of the problemspI am waiting to hear on the teams2Plan for todaywe have talked a lot about the BDR and methods based on density estimationypractical densities are not well approximated by simple probability modelslast lecture: alternative way is to go non-parametric• kernel-based density estimates• “place a a pdf (kernel) on top of datapoint” today: mixture models•similar butrestricted number of kernels•similar, but restricted number of kernels• likelihood evaluation significantly simpler•parameter estimation much more complex3ppKernel density estimatesestimate density withhφ()ik lth twhere φ(x)is a kernel, the most popular is the Gaussiansum ofnGaussians centered atXsum of nGaussians centered at XiGaussian kernel density estimate:• “approximate the pdf of X with a sum of Gaussian bumps”4approximate the pdf of X with a sum of Gaussian bumpsKernel bandwidthback to the generic modelhti th l fh(b d idth t )?what is the role of h(bandwidth parameter)?definingwe can write5i.e. a sum of translated replicas of δ(x)Kernel bandwidthh has two roles:1rescalethex-axis1.rescalethe x-axis2. rescale the amplitude ofδ(x)this implies that for large h:pg1. δ(x) has low amplitude2. iso-contours of h are quite distant from zero (xlarge beforeφ(x/h)changes significantly fromφ(0))(xlarge before φ(x/h)changes significantly from φ(0))6Kernel bandwidthfor small h:1δ(x)has large amplitude1.δ(x)has large amplitude2. iso-contours of h are quite close to zero (x small before φ(x/h) changes significantly from φ(0))what is the impact of this on the quality of the density 7estimates?Kernel bandwidthit controls the smoothness of the estimate•ash goes to zero we have a sum of delta functions(very“spiky”as h goes to zero we have a sum of delta functions(very spiky approximation)• as h goes to infinity we have a sum of constant functions(approximation by a constant)(approximation by a constant)• in between we get approximations that are gradually more smooth8Bias and variancethe bias and variance are given bythis means that:•to obtain small bias we needh~0to obtain small bias we need h 0• to obtain small variance we need h infinite9Exampleexample: fit to N(0,I) using h = h/n1/2h1/n1/2small h: spikyneed a lot ofneed a lot of points to converge (variance)large hlarge h: approximateN(0,I) with a sum of Gaussians ofof Gaussians of larger covariancewill never have (bi)10zero error (bias)Optimal bandwidthwe would like• h ~ 0 to guarantee zero biasg• zero variance as n goes to infinitysolution:• make h a function of n that goes to zero• since variance is O(1/nhd) this is fine if nhdgoes to infinityhdhence, we needoptimal sequences exist, e.g.11Optimal bandwidthin practice this has limitations• does not say anything about the finite data case(the one we yy g(care about)• still have to find the best kll d i t i l d t h iusually we end up using trial and error or techniques like cross-validation12Cross-validationbasic idea:• leave some data out of your training set(cross validation set)yg()• train with different parameters• evaluate performance on cross validation set• pick best parameter configurationtraining setxval settrainingtestingtest set13training setLeave-one-out cross-validationmany variationsleaveoneout CV:leave-one-out CV:• compute n estimators of PX(x) by leaving one Xiout at a time• for each PX(x)evaluate PX(Xi)on the point that was left outX()X(i)p• pick PX(x) that maximizes this likelihood testingtest setg...14Non-parametric classifiersgiven kernel density estimates for all classes we can compute the BDRpsince the estimators are non-parametric the resulting classifier will also be non-parametricthis term is general and applies to any learning algorithma very simple example is the nearest neighbor classifier15Nearest neighbor classifieris the simplest possible classifier that one could think of:•it literally consists ofassigning to the vector to classify the label ofit literally consists of assigning to the vector to classify the label of the closest vector in the training set•to classify the red point:to classify the red point:• measure the distanceto all other points•if the closest point•if the closest pointis a square, assignto “square” class• otherwise assigngto “circle” classit works a lot better16than what one might predictNearest neighbor classifierto define it mathematically we need to define• a training set D = {(x1,y1), …, (xn,yn)}• xiis a vector of observations, yiis the label• a vector x to classifythe“decision rule”isthe decision rule is *iyyset =),(minarg*ixxdiwhere=},...,1{ni∈17k-nearest neighborsinstead of the NN, assigns to the majority vote of the k nearest neighborsin this example• NN rule says “A”•but 3NN rule•but 3-NN rulesays “B”for x away from theborderdoes not makemuch differenceusuallybest performanceusually best performancefor k > 1, but there is no universal numberk large: performance degrades(no longer neighbors)18gpg(gg)k should be odd, to prevent tiesMixture density estimatesback to BDR-based classifiersconsider thebridge trafficconsider the bridge traffic analysis problem summary:y• want to classify vehicles into commercial/privatehi l i ht• measure vehicle weight• estimate pdf• use BDRclearly this is not Gaussianpossible solution: use a kernel-based model19pKernel-based estimatesimple learning procedure•measure car weightsxibandwidth too large: biasmeasure car weights xi• place a Gaussian on top of each measurementcan be overkill• spending all degrees of freedom (# of training points) just to get the gp )j gGaussian means• cannot use the data to determine variancesbandwidth too small: variancevarianceshandpicking of bandwidth can lead to too much bias or i20variancemixture density estimateit looks like we could do better byjust picking the right # of GiGaussiansthis is indeed a good model:• density is multimodal because there is a hidden variable Z• Z determines the type of carz ∈ {compact, sedan, station wagon, pick up, van}•for agiven car typethe weight is approximatelyGaussian(or has some•for a given car typethe weight is approximately Gaussian(or has some other


View Full Document

UCSD ECE 271A - Mixture Density Estimation

Download Mixture Density Estimation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mixture Density Estimation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mixture Density Estimation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?