DOC PREVIEW
UB CSE 574 - Probability Distributions

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Learning ! !! ! !Srihari 1 Probability Distributions Sargur N. SrihariMachine Learning ! !! ! !Srihari 2 Distributions: Landscape Discrete- Binary Discrete- Multivalued Continuous Bernoulli Multinomial Gaussian Angular Von Mises Binomial Beta Dirichlet Gamma Wishart Student’s-t Exponential UniformMachine Learning ! !! ! !Srihari 3 Distributions: Relationships Discrete- Binary Discrete- Multi-valued Continuous Bernoulli Single binary variable Multinomial One of K values = K-dimensional binary vector Gaussian Angular Von Mises Binomial N samples of Bernoulli Beta Continuous variable between {0,1] Dirichlet K random variables between [0.1] Gamma ConjugatePrior of univariate Gaussian precision Wishart Conjugate Prior of multivariate Gaussian precision matrix Student’s-t Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians Exponential Special case of Gamma Uniform N=1 Conjugate Prior Conjugate Prior Large N K=2 Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrixMachine Learning ! !! ! !Srihari 4 Binary Variables Bernoulli, Binomial and BetaMachine Learning ! !! ! !Srihari 5 Bernoulli Distribution • Expresses distribution of Single binary-valued random variable x ε {0,1} • Probability of x=1 is denoted by parameter µ, i.e., p(x=1|µ)=µ"• Therefore p(x=0|µ)=1-µ"• Probability distribution has the form Bern(x|µ)=µ x (1-µ) 1-x • Mean is shown to be E[x]=µ"• Variance is Var[x]=µ(1-µ) • Likelihood of n observations independently drawn from p(x|µ) is • Log-likelihood is • Maximum likelihood estimator – obtained by setting derivative of ln p(D|µ) wrt µ equal to zero is • If no of observations of x=1 is m then µML=m/N Jacob Bernoulli 1654-1705Machine Learning ! !! ! !Srihari 6 Binomial Distribution • Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x|µ) • Add up all ways of obtaining heads • Mean and Variance are Histogram of Binomial for N=10 and µ=0.25Machine Learning ! !! ! !Srihari 7 Beta Distribution • Beta distribution • Where the Gamma function is defined as • a and b are hyperparameters that control distribution of parameter µ"• Mean and Variance a=0.1, b=0.1 a=1, b=1 a=2, b=3 a=8, b=4 Beta distribution as function of µ"For values of hyperparameters a and bMachine Learning ! !! ! !Srihari 8 Bayesian Inference with Beta • MLE of µ in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets • Likelihood function takes products of factors of the form µx(1-µ)(1-x) • If prior distribution of µ is chosen to be proportional to powers of µ and 1-µ, posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p(µ)Machine Learning ! !! ! !Srihari 9 Bayesian Inference with Beta • Posterior obtained by multiplying beta prior with binomial likelihood yields – where l=N-m, which is no of tails – m is no of heads • It is another beta distribution – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked a=2, b=2 N=m=1, with x=1 a=3, b=2 Illustration of one step in process µ1(1-µ)0Machine Learning ! !! ! !Srihari 10 Predicting next trial outcome • Need predictive distribution of x given observed D – From sum and products rule • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data € p(x =1 | D) = p(x =1,µ| D)dµ01∫= p(x =1 |µ) p(µ| D)dµ01∫= =µp(µ| D)dµ01∫= E[µ| D]Machine Learning ! !! ! !Srihari 11 Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional formMachine Learning ! !! ! !Srihari 12 Multinomial Variables Generalized Bernoulli and DirichletMachine Learning ! !! ! !Srihari 13 Generalization of Bernoulli • Discrete variable that takes one of K values (instead of 2) • Represent as 1 of K scheme – Represent x as a K-dimensional vector – If x=3 then we represent it as x=(0,0,1,0,0,0)T – Such vectors satisfy • If probability of xk=1 is denoted µk then distribution of x is given by Generalized BernoulliMachine Learning ! !! ! !Srihari 14 Likelihood Function • Given a set of D of N independent observations x1,..xN • The likelihood function has the form • Where mk=Σn xnk is the number of observations of xk=1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which xk=1Machine Learning ! !! ! !Srihari 15 Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given byMachine Learning ! !! ! !Srihari 16 Dirichlet Distribution • Family of prior distributions for parameters µk of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution Lejeune Dirichlet 1805-1859Machine Learning ! !! ! !Srihari 17 Dirichlet over 3 variables • Due to summation constraint – Distribution over space of {µk} is confined to the simplex of dimensionality K-1 – For K=3 αk=0.1 αk=1 αk=10 Plots of Dirichlet distribution over the simplex for various settings of parameters αkMachine Learning ! !! ! !Srihari 18 Dirichlet Posterior Distribution • Multiplying prior by likelihood • Which has the form of the Dirichlet distributionMachine Learning ! !! ! !Srihari 19 Summary • Multinomial is a generalization of Bernoulli – Variable takes on one of K values instead of 2 • Conjugate prior of


View Full Document

UB CSE 574 - Probability Distributions

Download Probability Distributions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Probability Distributions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Probability Distributions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?