DOC PREVIEW
UB CSE 574 - Gaussian Distribution

This preview shows page 1-2-15-16-31-32 out of 32 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 32 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! !! ! !srihari 1 Gaussian Distribution Sargur N. SrihariMachine Learning ! !! ! !srihari 2 The Gaussian Distribution • For single real-valued variable x • Parameters: – Mean µ, variance σ 2, • Standard deviation σ "• Precision β =1/σ 2, E[x]=µ, Var[x]=σ 2 • For D-dimensional vector x, multivariate Gaussian  N(x |µ,σ2) =1(2πσ2)1/ 2exp −12σ2(x −µ)2⎧ ⎨ ⎩ ⎫ ⎬ ⎭ ⎭⎬⎫⎩⎨⎧−Σ−−Σ=Σ−)ìx()ìx(21exp||1)2(1),ì|x(12/12/TDNπµ is a mean vector, Σ is a D x D covariance matrix, |Σ| is the determinant of Σ"Σ-1 is also referred to as the precision matrix Carl Friedrich Gauss 1777-1855Machine Learning ! !! ! !srihari Covariance Matrix • Gives a measure of the dispersion of the data • It is a D x D matrix – Element in position i,j is the covariance between the ith and jth variables. • Covariance between two variables xi and xj is defined as E[(xi-µi)(yi-µj)] • Can be positive or negative – If the variables are independent then the covariance is zero. • Then all matrix elements are zero except diagonal elements which represent the variances 3Machine Learning ! !! ! !srihari 4 Importance of Gaussian • Gaussian arises in many different contexts, e.g., – For a single variable, Gaussian maximizes entropy (for given mean and variance) – Sum of set of random variables becomes increasingly Gaussian One variable histogram (uniform over [0,1]) Mean of two variables Mean of ten variables The two values could be 0.8 and 0.2 whose average is 0.5 More ways of getting 0.5 than say 0.1Machine Learning ! !! ! !srihari 5 Geometry of Gaussian • Functional dependence of Gaussian on x is through – Called Mahanalobis Distance – reduces to Euclidean distance when Σ is an identity matrix • Matrix Σ is symmetric – Has an Eigenvector equation "Σui = λiui ui are Eigen vectors " "λi are Eigen values " Two dimensional Gaussian x = (x1,x2) )ìx()ìx(12−Σ−=Δ−TRed: Elliptical contour of constant density Major axes: eigenvectors uiMachine Learning ! !! ! !srihari 6 Contours of Constant Density • Determined by Covariance Matrix – Covariances represent how features vary together (a) General form (b) Diagonal matrix (aligned with coordinate axes) (c) Proportional to Identity matrix (concentric circles)Machine Learning ! !! ! !srihari 7 Joint, Marginal and Conditional with Gaussian • If two sets of variables xa,xb are jointly Gaussian then the two conditional densities and the two marginals are also Gaussian • Given joint Gaussian N(x|µ,Σ) with Λ=Σ-1 and x = [xa,xb]T where xa are first m components of x and xb are next D-m components • Conditionals • Marginals Joint p(xa, xb) Marginal p(xa) and Conditional p(xa|xb) )x( where),|x()x|x(1|1| bbabaaabaaababaNpµµµµ−ΛΛ−=Λ=−−⎟⎟⎠⎞⎜⎜⎝⎛ΣΣΣΣ=ΣΣ=bbbaabaaaaaaaxNxp where),|()(µMachine Learning ! !! ! !srihari 8 Maximum Likelihood for the Gaussian • Given a data set X=(x1,..xN)T where the observations {xn} are drawn independently • Log-likelihood function is given by • Derivative wrt µ is • Whose solution is • Maximization w.r.t. Σ is more involved. Yields ∑=−−Σ−−Σ−−=ΣNnnTnNNDXp11)x()x(21||ln2)2ln(2),|( lnµµπµ∑=−∂∂−Σ=ΣNnnXp11)x(),|( lnµµµ∑==NnnNML1x1µTMLnNnMLnMLN)x()x(11µµ−−=Σ∑=Machine Learning ! !! ! !srihari Bias of M. L. Estimate of Covariance Matrix • For N(µ,Σ), m.l.e. of Σ for samples x1,..xN is • arithmetic average of N matrices: • Since we have – m.l.e. is smaller than the true value of Σ – Thus m.l.e. is biased • irrespective of no of samples does not give exact value. – For large N inconsequential. • Rule of thumb: use 1/N for known mean and 1/(N-1) for estimated mean. • Bias does not exist in Bayesian solution. 9 TMLnNnMLnMLN)x()x(11µµ−−=Σ∑= (xn− µML)(xn− µML)T E[ΣML] =1N −1(xn− µML)n=1N∑(xn− µML)TE[ΣML] =N − 1NΣMachine Learning ! !! ! !srihari 10 Sequential Estimation • In on-line applications and large data sets batch processing of all data points in infeasible – Real-time learning scenario where steady stream of data is arriving and predictions must be made before all data is seen • Sequential methods allow data points to be processed one-at-a-time and then discarded – Sequential learning arises naturally with Bayesian viewpoint • M.L.E. for parameters of Gaussian gives a convenient opportunity to discuss more general discussion of sequential estimation for maximum likelihoodMachine Learning ! !! ! !srihari 11 Sequential Estimation of Gaussian Mean • By dissecting contribution of final data point • Same as earlier batch result • Nice interpretation: – After observing N-1 data points we have estimated µ by µMLN-1 – We now observe data point xN and we obtain revised estimate by moving old estimate by small amount – As N increases contribution from successive points smaller )x(1 x111-NML1−=−+==∑NMLNNnnNNMLµµµMachine Learning ! !! ! !srihari 12 General Sequential Estimation • Sequential algorithms cannot always be factored out • Robbins and Monro (1951) gave a general solution • Consider pair of random variables θ and z with joint distribution p(z,θ) • Conditional expectation of z given q is • Which is called a regression function – Same as one that minimizes expected squared loss seen earlier • It can be shown that maximum likelihood solution is equivalent to finding the root of the regression function – Goal is to find θ* at which f(θ*)=0 ∫== dzzzpzEf )|(]|[)(θθθMachine Learning ! !! ! !srihari 13 Robbins-Monro Algorithm • Defines sequence of successive estimates of root θ* as follows • Where z(θ(N))is observed value of z when θ takes the value θ(N) • Coefficients {aN} satisfy reasonable conditions • Solution has a form where z involves a derivative of p(x|θ) wrt θ • Special case of Robbons-Monro is solution for Gaussian mean )()1(1)1()( −−−+=NNNNzaθθθ∞<∞==∑∑∞=∞=∞→1N21N , ,0limNNNNaaaMachine Learning ! !! ! !srihari 14 Bayesian Inference for the Gaussian • MLE framework gives point estimates for


View Full Document

UB CSE 574 - Gaussian Distribution

Download Gaussian Distribution
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gaussian Distribution and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gaussian Distribution 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?