Purdue CS 59000 - Lecture notes - D637259

Home> Schools> Purdue University> Computer Sciences (CS) > CS 59000> Lecture notes

DOC PREVIEW

Purdue CS 59000 - Lecture notes

School name Purdue University

Course Cs 59000- Topics in Computer Sciences

Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 6Alan QiAcknowledgement: Sargur Srihari’s slides 1OutlineReview of t-distributions, mixture of Gaussians, Exponential familyNonparametric methodsLinear Regression2Student’s t-DistributionThe D-variate case:where .Properties:3Student’s t-DistributionRobustness to outliers: Gaussian vs t-distribution.4Mixtures of Gaussians (1)Old Faithful data setSingle Gaussian Mixture of two Gaussians5Mixtures of Gaussians (2)Combine simple models into a complex model:ComponentMixing coefficientK=36Mixtures of Gaussians (3)7The Exponential Family (1)where ´ is the natural parameter andso g(´) can be interpreted as a normalization coefficient.8The Exponential Family (2)The Bernoulli DistributionComparing with the general form we see thatand soLogistic sigmoid9The Exponential Family (3)The Gaussian Distributionwhere10Property of Normalization CoefficientTaking the gradient of both sides of we getThus11Conjugate priorsFor any member of the exponential family, there exists a priorCombining with the likelihood function, we getPrior corresponds to º pseudo-observations with value Â.12Noninformative Priors (1)With little or no information available a-priori, we might choose a non-informative prior.• ¸ discrete, K-nomial :• ¸2[a,b] real and bounded: • ¸ real and unbounded: improper!A constant prior may no longer be constant after a change of variable; consider p(¸) constant and ¸=´2:13Noninformative Priors (2)Translation invariant priors. ConsiderFor a corresponding prior over ¹, we havefor any A and B. Thus p(¹) = p(¹ { c) and p(¹) must be constant.14Noninformative Priors (3)Example: The mean of a Gaussian, ¹ ; the conjugate prior is also a Gaussian,As , this will become constant over ¹ .15Noninformative Priors (4)Scale invariant priors. Consider and make the change of variable For a corresponding prior over ¾, we havefor any A and B. Thus p(¾) / 1/¾ and so this prior is improper too. Note that this corresponds to p(ln¾) being constant.16Noninformative Priors (5)Example: For the variance of a Gaussian, ¾2, we have If ¸ = 1/¾2and p(¾) / 1/¾ , then p(¸) / 1/¸.We know that the conjugate distribution for ¸ is the Gamma distribution, A noninformative prior is obtained when a0= 0 and b0= 0.17Nonparametric Methods (1)Parametric distribution models are restricted to specific forms, which may not always be suitable; for example, consider modelling a multimodal distribution with a single, unimodal model.Nonparametric approaches make few assumptions about the overall shape of the distribution being modelled.18Nonparametric Methods (2)Histogram methods partition the data space into distinct bins with widths ¢iand count the number of observations, ni, in each bin.•Often, the same width is used for all bins, ¢i= ¢.•¢ acts as a smoothing parameter.•In a D-dimensional space, using M bins in each dimen-sion will require MDbins!19Nonparametric Methods (3)Assume observations drawn from a density p(x) and consider a small region Rcontaining x such thatThe probability that K out of N observations lie inside R is Bin(KjN,P ) and if N is largeIf the volume of R, V, is sufficiently small, p(x) is approximately constant over R andThus20Nonparametric Methods (4)Kernel Density Estimation: fix V, estimate K from the data. Let R be a hypercube centred on x and define the kernel function (Parzen window)It follows that and hence21Nonparametric Methods (5)To avoid discontinuities in p(x), use a smooth kernel, e.g. a GaussianAny kernel such thatwill work.h acts as a smoother.22Nonparametric Methods (6)Nearest Neighbour Density Estimation: fix K, estimate V from the data. Consider a hypersphere centred on x and let it grow to a volume, V?, that includes K of the given N data points. ThenK acts as a smoother.23K-Nearest-Neighbours for Classification (1)Given a data set with Nkdata points from class Ckand , we haveand correspondinglySince , Bayes’ theorem gives24K-Nearest-Neighbours for Classification (2)K = 1K = 325K-Nearest-Neighbours for Classification (3)• K acts as a smother• For , the error rate of the 1-nearest-neighbour classifier is never more than twice the optimal error (obtained from the true conditional class distributions).26Nonparametric vs Parametric Nonparametric models (not histograms) requires storing and computing with the entire data set. Parametric models, once fitted, are much more efficient in terms of storage and computation.27Linear Regression28Basis Functions29Examples of Basis Functions (1)30Examples of Basis Functions

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Purdue CS 59000 - Lecture notes

Sign up for free to view:

Please select your school