Purdue CS 59000 - Lecture notes - D3014303

Home> Schools> Purdue University> Computer Sciences (CS) > CS 59000> Lecture notes

DOC PREVIEW

Purdue CS 59000 - Lecture notes

School name Purdue University

Course Cs 59000- Topics in Computer Sciences

Pages 25

This preview shows page 1-2-24-25 out of 25 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 25 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 8Alan QiOutlineReview of exponential familyNon-informative priorNonparametric methodsLinear Regression with basis functionsThe Exponential Familywhere ´ is the natural parameter andso g(´) can be interpreted as a normalization coefficient.ML estimation for the Exponential FamilyGive a data set, , the likelihood function is given by Thus we have Sufficient statisticConjugate priorsFor any member of the exponential family, there exists a priorCombining with the likelihood function, we getPrior corresponds to º pseudo-observations with value Â.Noninformative Priors (1)With little or no information available a-priori, we might choose a non-informative prior.• ¸ discrete, K-nomial :• ¸2[a,b] real and bounded: • ¸ real and unbounded: improper!A constant prior may no longer be constant after a change of variable; consider p(¸) constant and ¸=´2:6Noninformative Priors (2)Translation invariant priors. ConsiderFor a corresponding prior over ¹, we havefor any A and B. Thus p(¹) = p(¹ { c) and p(¹) must be constant.7Noninformative Priors (3)Example: The mean of a Gaussian, ¹ ; the conjugate prior is also a Gaussian,As , this will become constant over ¹ .8Noninformative Priors (4)Consider . It is scale invariant since by changing variable with a scale c:For a prior over ¾, we have (why the second equality holds?)for any A and B. Thus p(¾) 1/¾ and so this prior is improper too. Note that this corresponds to p(ln¾) being constant.9Noninformative Priors (5)Example: For the variance of a Gaussian, ¾2, we have It is scale invariant density. Consider the prior: If ¸ = 1/¾2and p(¾) 1/¾ , then p(¸) 1/¸.We know that the conjugate distribution for ¸ is the Gamma distribution, A noninformative prior is obtained when a0= 0 and b0= 0.10Nonparametric Methods (1)Parametric distribution models are restricted to specific forms, which may not always be suitable; for example, consider modelling a multimodal distribution with a single, unimodal model.Nonparametric approaches make few assumptions about the overall shape of the distribution being modelled.11Nonparametric Methods (2)Histogram methods partition the data space into distinct bins with widths ¢iand count the number of observations, ni, in each bin.•Assume a uniform distribution inside each bin.•Often, the same width is used for all bins, ¢i= ¢.•¢ acts as a smoothing parameter. •In a D-dimensional space, using M bins in each dimen-sion will require MDbins!12Nonparametric Methods (3)Assume observations drawn from a density p(x) and consider a small region R containing x such thatThe probability that K out of N observations lie inside R is Bin(K|N,P ):13Nonparametric Methods (4)Assume observations drawn from a density p(x) and consider a small region Rcontaining x such thatThe probability that K out of N observations lie inside R is Bin(K|N,P ) and if N is largeIf the volume of R, V, is sufficiently small, p(x) is approximately constant over R andThus14What is the relation to the histogram method?Nonparametric Methods (5)Kernel Density Estimation: fix V, estimate K from the data. Let R be a hypercube centred on x and define the kernel function (Parzen window)It follows that and hence15What are the relation to the histogram method and its drawback?Nonparametric Methods (5)To avoid discontinuities in p(x), use a smooth kernel, e.g. a GaussianAny kernel such thatwill work.h acts as a smoother.16Nonparametric Methods (6)Nearest Neighbour Density Estimation: fix K, estimate V from the data. Consider a hyperspherecentred on x and let it grow to a volume, V?, that includes K of the given N data points. ThenK acts as a smoother.17K-Nearest-Neighbours for Classification (1)Given a data set with Nkdata points from class Ckand , we haveand correspondinglySince , Bayes’ theorem gives18Then how to classify the data points?K-Nearest-Neighbours for Classification (2)K = 1K = 319K-Nearest-Neighbours for Classification (3)• K acts as a smother• For , the error rate of the 1-nearest-neighbour classifier is never more than twice the optimal error (obtained from the true conditional class distributions).20Nonparametric vs Parametric Nonparametric models (not histograms) requires storing and computing with the entire data set. Parametric models, once fitted, are much more efficient in terms of storage and computation.21Linear Regression22Basis Functions23Examples of Basis Functions (1)24Examples of Basis Functions

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-24-25 out of 25 pages.

Purdue CS 59000 - Lecture notes

Sign up for free to view:

Please select your school