UB CSE 574 - Probability Distributions - D2367229

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Probability Distributions

DOC PREVIEW

UB CSE 574 - Probability Distributions

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 19

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning ! !! ! !Srihari 1 Probability Distributions Sargur N. SrihariMachine Learning ! !! ! !Srihari 2 Distributions: Landscape Discrete- Binary Discrete- Multivalued Continuous Bernoulli Multinomial Gaussian Angular Von Mises Binomial Beta Dirichlet Gamma Wishart Student’s-t Exponential UniformMachine Learning ! !! ! !Srihari 3 Distributions: Relationships Discrete- Binary Discrete- Multi-valued Continuous Bernoulli Single binary variable Multinomial One of K values = K-dimensional binary vector Gaussian Angular Von Mises Binomial N samples of Bernoulli Beta Continuous variable between {0,1] Dirichlet K random variables between [0.1] Gamma ConjugatePrior of univariate Gaussian precision Wishart Conjugate Prior of multivariate Gaussian precision matrix Student’s-t Generalization of Gaussian robust to Outliers Infinite mixture of Gaussians Exponential Special case of Gamma Uniform N=1 Conjugate Prior Conjugate Prior Large N K=2 Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrixMachine Learning ! !! ! !Srihari 4 Binary Variables Bernoulli, Binomial and BetaMachine Learning ! !! ! !Srihari 5 Bernoulli Distribution • Expresses distribution of Single binary-valued random variable x ε {0,1} • Probability of x=1 is denoted by parameter µ, i.e., p(x=1|µ)=µ"• Therefore p(x=0|µ)=1-µ"• Probability distribution has the form Bern(x|µ)=µ x (1-µ) 1-x • Mean is shown to be E[x]=µ"• Variance is Var[x]=µ(1-µ) • Likelihood of n observations independently drawn from p(x|µ) is • Log-likelihood is • Maximum likelihood estimator – obtained by setting derivative of ln p(D|µ) wrt µ equal to zero is • If no of observations of x=1 is m then µML=m/N Jacob Bernoulli 1654-1705Machine Learning ! !! ! !Srihari 6 Binomial Distribution • Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x|µ) • Add up all ways of obtaining heads • Mean and Variance are Histogram of Binomial for N=10 and µ=0.25Machine Learning ! !! ! !Srihari 7 Beta Distribution • Beta distribution • Where the Gamma function is defined as • a and b are hyperparameters that control distribution of parameter µ"• Mean and Variance a=0.1, b=0.1 a=1, b=1 a=2, b=3 a=8, b=4 Beta distribution as function of µ"For values of hyperparameters a and bMachine Learning ! !! ! !Srihari 8 Bayesian Inference with Beta • MLE of µ in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets • Likelihood function takes products of factors of the form µx(1-µ)(1-x) • If prior distribution of µ is chosen to be proportional to powers of µ and 1-µ, posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p(µ)Machine Learning ! !! ! !Srihari 9 Bayesian Inference with Beta • Posterior obtained by multiplying beta prior with binomial likelihood yields – where l=N-m, which is no of tails – m is no of heads • It is another beta distribution – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked a=2, b=2 N=m=1, with x=1 a=3, b=2 Illustration of one step in process µ1(1-µ)0Machine Learning ! !! ! !Srihari 10 Predicting next trial outcome • Need predictive distribution of x given observed D – From sum and products rule • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data € p(x =1 | D) = p(x =1,µ| D)dµ01∫= p(x =1 |µ) p(µ| D)dµ01∫= =µp(µ| D)dµ01∫= E[µ| D]Machine Learning ! !! ! !Srihari 11 Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional formMachine Learning ! !! ! !Srihari 12 Multinomial Variables Generalized Bernoulli and DirichletMachine Learning ! !! ! !Srihari 13 Generalization of Bernoulli • Discrete variable that takes one of K values (instead of 2) • Represent as 1 of K scheme – Represent x as a K-dimensional vector – If x=3 then we represent it as x=(0,0,1,0,0,0)T – Such vectors satisfy • If probability of xk=1 is denoted µk then distribution of x is given by Generalized BernoulliMachine Learning ! !! ! !Srihari 14 Likelihood Function • Given a set of D of N independent observations x1,..xN • The likelihood function has the form • Where mk=Σn xnk is the number of observations of xk=1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which xk=1Machine Learning ! !! ! !Srihari 15 Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given byMachine Learning ! !! ! !Srihari 16 Dirichlet Distribution • Family of prior distributions for parameters µk of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution Lejeune Dirichlet 1805-1859Machine Learning ! !! ! !Srihari 17 Dirichlet over 3 variables • Due to summation constraint – Distribution over space of {µk} is confined to the simplex of dimensionality K-1 – For K=3 αk=0.1 αk=1 αk=10 Plots of Dirichlet distribution over the simplex for various settings of parameters αkMachine Learning ! !! ! !Srihari 18 Dirichlet Posterior Distribution • Multiplying prior by likelihood • Which has the form of the Dirichlet distributionMachine Learning ! !! ! !Srihari 19 Summary • Multinomial is a generalization of Bernoulli – Variable takes on one of K values instead of 2 • Conjugate prior of

View Full Document