CMU CS 10701 - Lecture - D1696357

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Lecture

DOC PREVIEW

CMU CS 10701 - Lecture

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 49

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Notes and Announcements Midterm exam Oct 20 Wednesday In Class Late Homeworks Turn in hardcopies to Michelle DO NOT ask Michelle for extensions Note down the date and time of submission If submitting softcopy email to 10 701 instructors list Software needs to be submitted via Blackboard HW2 out today watch email 1 Projects Hands on experience with Machine Learning Algorithms understand when they work and fail develop new ones Project Ideas online discuss TAs every project must have a TA mentor Proposal 10 Oct 11 Mid term report 25 Nov 8 Poster presentation 20 Dec 2 3 6 pm NSH Atrium Final Project report 45 Dec 6 2 Project Proposal Proposal 10 Oct 11 1 pg maximum Describe data set Project idea approx two paragraphs Software you will need to write 1 3 relevant papers Read at least one before submitting your proposal Teammate Maximum team size is 2 division of work Project milestone for mid term report Include experimental results 3 Recitation Tomorrow Linear Non linear Regression Nonparametric methods Strongly recommended Place NSH 1507 Note Time 5 6 pm TK 4 Non parametric methods Kernel density estimate kNN classifier kernel regression Aarti Singh Machine Learning 10 701 15 781 Sept 29 2010 Parametric methods Assume some functional form Gaussian Bernoulli Multinomial logistic Linear for P Xi Y and P Y as in Na ve Bayes P Y X as in Logistic regression Estimate parameters m s2 q w b using MLE MAP and plug in Pro need few data points to learn parameters Con Strong distributional assumptions not satisfied in practice 6 Example 7 1 2 9 4 8 5 3 Hand written digit images projected as points on a two dimensional nonlinear feature spaces 7 Non Parametric methods Typically don t make any distributional assumptions As we have more data we should be able to learn more complex models Let number of parameters scale with number of training data Today we will see some nonparametric methods for Density estimation Classification Regression 8 Histogram density estimate Partition the feature space into distinct bins with widths i and count the number of observations ni in each bin Often the same width is used for all bins i acts as a smoothing parameter Image src Bishop book 9 Effect of histogram bin width bins 1 D Bias of histogram density estimate x Assuming density it roughly constant in each bin holds true if D is small 10 Bias Variance tradeoff Choice of bins bins 1 D p x approx constant per bin more data per bin stable estimate Bias how close is the mean of estimate to the truth Variance how much does the estimate vary around mean Small D large bins Small bias Large variance Large D small bins Large bias Small variance Bias Variance tradeoff 11 Choice of bins MSE Bias Variance bins 1 D Image src Bishop book fixed n D decreases ni decreases Image src Larry book 12 Histogram as MLE Class of density estimates constants on each bin Parameters pj density in bin j Note since Maximize likelihood of data under probability model with parameters pj Show that histogram density estimate is MLE under this model HW Recitation 13 Kernel density estimate 0 35 Histogram blocky estimate 0 3 0 25 0 2 0 15 0 1 0 05 0 5 4 3 2 1 0 1 2 3 4 5 Kernel density estimate aka Parzen moving window method 0 35 0 3 0 25 0 2 0 15 0 1 0 05 0 5 4 3 2 1 0 1 2 3 4 5 14 Kernel density estimate more generally 1 1 15 Kernel density estimation Place small bumps at each data point determined by the kernel function The estimator consists of a normalized sum of bumps Img src Wikipedia Gaussian bumps red around six data points and their sum blue Note that where the points are denser the density estimate will have higher values 16 Kernels Any kernel function that satisfies 17 Kernels Finite support only need local points to compute estimate Infinite support need all points to compute estimate But quite popular since smoother 10 702 18 Choice of kernel bandwidth Too small Image Source Larry s book All of Nonparametric Statistics Bart Simpson Density Just right Too large 19 Histograms vs Kernel density estimation D h acts as a smoother 20 Bias variance tradeoff Simulations 21 k NN Nearest Neighbor density estimation Histogram Kernel density est Fix D estimate number of points within D of x ni or nx from data Fix nx k estimate D from data volume of ball around x that contains k training pts k NN density est 22 k NN density estimation Not very popular for density estimation expensive to compute bad estimates But a related version for classification quite popular k acts as a smoother 23 From Density estimation to Classification 24 k NN classifier Sports Science Arts 25 k NN classifier Test document Sports Science Arts 26 k NN classifier k 4 Test document Dk x Sports Science Arts What should we predict Average Majority Why 27 k NN classifier Optimal Classifier k NN Classifier Majority vote training pts of class y that lie within Dk ball total training pts of class y 28 1 Nearest Neighbor kNN classifier Sports Science Arts 29 2 Nearest Neighbor kNN classifier K even not used in practice Sports Science Arts 30 3 Nearest Neighbor kNN classifier Sports Science Arts 31 5 Nearest Neighbor kNN classifier Sports Science Arts 32 What is the best K Bias variance tradeoff Larger K predicted label is more stable Smaller K predicted label is more accurate Similar to density estimation Choice of K in next class 33 1 NN classifier decision boundary Voronoi Diagram K 1 34 k NN classifier decision boundary K acts as a smoother Bias variance tradeoff Guarantee For the error rate of the 1 nearestneighbour classifier is never more than twice the optimal error 35 Case Study kNN for Web Classification Dataset 20 News Groups 20 classes Download http people csail mit edu jrennie 20Newsgroups 61 118 words 18 774 documents Class labels descriptions 37 Experimental Setup Training Test Sets 50 50 randomly split 10 runs report average results Evaluation Criteria 38 Results Binary Classes Accuracy alt atheism vs comp graphics rec autos vs rec sport baseball k comp windows x vs rec motorcycles 39 From Classification to Regression 40 Temperature sensing What is the temperature in the room at location x x Average Local Average 41 Kernel Regression h Aka Local Regression Nadaraya Watson Kernel Estimator Where Weight each training point based on distance to test point Boxcar kernel yields local average 42 Kernels 43 Choice of kernel bandwidth h h 1 Too small h 10 Too small Image Source Larry s book All of Nonparametric Statistics Choice of kernel is not that important h 50 Just

View Full Document

CMU CS 10701 - Lecture

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

CMU CS 10701 - Lecture

Sign up for free to view:

Please select your school