Purdue CS 59000 - Statistical Machine learning - D2463464

Home> Schools> Purdue University> Computer Sciences (CS) > CS 59000> Statistical Machine learning

DOC PREVIEW

Purdue CS 59000 - Statistical Machine learning

School name Purdue University

Course Cs 59000- Topics in Computer Sciences

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 59000 Statistical Machine learningLecture 9Alan QiOutlineReview of parzen windowsK-nearest neighbor classificationLinear Regression with basis functionsRidge regression and lassoBayesian model selectionBayesian factor Empirical BayesianNonparametric Methods (4)Assume observations drawn from a density p(x) and consider a small region Rcontaining x such thatThe probability that K out of N observations lie inside R is Bin(K|N,P ) and if N is largeIf the volume of R, V, is sufficiently small, p(x) is approximately constant over R andThus3What is the relation to the histogram method?Nonparametric Methods (5)Kernel Density Estimation: fix V, estimate K from the data. Let R be a hypercube centred on x and define the kernel function (Parzen window)It follows that and hence4What are the relation to the histogram method and its drawback?Nonparametric Methods (5)To avoid discontinuities in p(x), use a smooth kernel, e.g. a GaussianAny kernel such thatwill work.h acts as a smoother.5Nonparametric Methods (6)Nearest Neighbour Density Estimation: fix K, estimate V from the data. Consider a hyperspherecentred on x and let it grow to a volume, V?, that includes K of the given N data points. ThenK acts as a smoother.6K-Nearest-Neighbours for Classification (1)Given a data set with Nkdata points from class Ckand , we haveand correspondinglySince , Bayes’ theorem gives7Then how to classify the data points?K-Nearest-Neighbours for Classification (2)K = 1K = 38K-Nearest-Neighbours for Classification (3)• K acts as a smother• For , the error rate of the 1-nearest-neighbour classifier is never more than twice the optimal error (obtained from the true conditional class distributions).9Nonparametric vs Parametric Nonparametric models (not histograms) requires storing and computing with the entire data set. Parametric models, once fitted, are much more efficient in terms of storage and computation.10Linear Regression11Basis FunctionsExamples of Basis Functions (1)Examples of Basis Functions (2)14Maximum Likelihood Estimation (1)Maximum Likelihood Estimation (2)Sequential EstimationRegularized Least SquaresMore RegularizersVisualization of Regularized RegressionBayesian Linear RegressionPosterior Distributions of ParametersPredictive Posterior DistributionExamples of PredictiveDistributionQuestionSuppose we use Gaussian basis functions.What will happen to the predictive distribution if we evaluate it at places far from all training data points?Equivalent KernelGivenPredictive mean is thusEquivalent kernelBasis Function: Equivalent kernel:GaussianPolynomialSigmoidCovariance between two predictionsPredictive mean at nearby points will be highly correlated, whereas for more distant pairs of points the correlation will be

View Full Document