DOC PREVIEW
CMU CS 10701 - Lecture

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Notes and Announcements Midterm exam Oct 20 Wednesday In Class Late Homeworks Turn in hardcopies to Michelle DO NOT ask Michelle for extensions Note down the date and time of submission If submitting softcopy email to 10 701 instructors list Software needs to be submitted via Blackboard HW2 out today watch email 1 Projects Hands on experience with Machine Learning Algorithms understand when they work and fail develop new ones Project Ideas online discuss TAs every project must have a TA mentor Proposal 10 Oct 11 Mid term report 25 Nov 8 Poster presentation 20 Dec 2 3 6 pm NSH Atrium Final Project report 45 Dec 6 2 Project Proposal Proposal 10 Oct 11 1 pg maximum Describe data set Project idea approx two paragraphs Software you will need to write 1 3 relevant papers Read at least one before submitting your proposal Teammate Maximum team size is 2 division of work Project milestone for mid term report Include experimental results 3 Recitation Tomorrow Linear Non linear Regression Nonparametric methods Strongly recommended Place NSH 1507 Note Time 5 6 pm TK 4 Non parametric methods Kernel density estimate kNN classifier kernel regression Aarti Singh Machine Learning 10 701 15 781 Sept 29 2010 Parametric methods Assume some functional form Gaussian Bernoulli Multinomial logistic Linear for P Xi Y and P Y as in Na ve Bayes P Y X as in Logistic regression Estimate parameters m s2 q w b using MLE MAP and plug in Pro need few data points to learn parameters Con Strong distributional assumptions not satisfied in practice 6 Example 7 1 2 9 4 8 5 3 Hand written digit images projected as points on a two dimensional nonlinear feature spaces 7 Non Parametric methods Typically don t make any distributional assumptions As we have more data we should be able to learn more complex models Let number of parameters scale with number of training data Today we will see some nonparametric methods for Density estimation Classification Regression 8 Histogram density estimate Partition the feature space into distinct bins with widths i and count the number of observations ni in each bin Often the same width is used for all bins i acts as a smoothing parameter Image src Bishop book 9 Effect of histogram bin width bins 1 D Bias of histogram density estimate x Assuming density it roughly constant in each bin holds true if D is small 10 Bias Variance tradeoff Choice of bins bins 1 D p x approx constant per bin more data per bin stable estimate Bias how close is the mean of estimate to the truth Variance how much does the estimate vary around mean Small D large bins Small bias Large variance Large D small bins Large bias Small variance Bias Variance tradeoff 11 Choice of bins MSE Bias Variance bins 1 D Image src Bishop book fixed n D decreases ni decreases Image src Larry book 12 Histogram as MLE Class of density estimates constants on each bin Parameters pj density in bin j Note since Maximize likelihood of data under probability model with parameters pj Show that histogram density estimate is MLE under this model HW Recitation 13 Kernel density estimate 0 35 Histogram blocky estimate 0 3 0 25 0 2 0 15 0 1 0 05 0 5 4 3 2 1 0 1 2 3 4 5 Kernel density estimate aka Parzen moving window method 0 35 0 3 0 25 0 2 0 15 0 1 0 05 0 5 4 3 2 1 0 1 2 3 4 5 14 Kernel density estimate more generally 1 1 15 Kernel density estimation Place small bumps at each data point determined by the kernel function The estimator consists of a normalized sum of bumps Img src Wikipedia Gaussian bumps red around six data points and their sum blue Note that where the points are denser the density estimate will have higher values 16 Kernels Any kernel function that satisfies 17 Kernels Finite support only need local points to compute estimate Infinite support need all points to compute estimate But quite popular since smoother 10 702 18 Choice of kernel bandwidth Too small Image Source Larry s book All of Nonparametric Statistics Bart Simpson Density Just right Too large 19 Histograms vs Kernel density estimation D h acts as a smoother 20 Bias variance tradeoff Simulations 21 k NN Nearest Neighbor density estimation Histogram Kernel density est Fix D estimate number of points within D of x ni or nx from data Fix nx k estimate D from data volume of ball around x that contains k training pts k NN density est 22 k NN density estimation Not very popular for density estimation expensive to compute bad estimates But a related version for classification quite popular k acts as a smoother 23 From Density estimation to Classification 24 k NN classifier Sports Science Arts 25 k NN classifier Test document Sports Science Arts 26 k NN classifier k 4 Test document Dk x Sports Science Arts What should we predict Average Majority Why 27 k NN classifier Optimal Classifier k NN Classifier Majority vote training pts of class y that lie within Dk ball total training pts of class y 28 1 Nearest Neighbor kNN classifier Sports Science Arts 29 2 Nearest Neighbor kNN classifier K even not used in practice Sports Science Arts 30 3 Nearest Neighbor kNN classifier Sports Science Arts 31 5 Nearest Neighbor kNN classifier Sports Science Arts 32 What is the best K Bias variance tradeoff Larger K predicted label is more stable Smaller K predicted label is more accurate Similar to density estimation Choice of K in next class 33 1 NN classifier decision boundary Voronoi Diagram K 1 34 k NN classifier decision boundary K acts as a smoother Bias variance tradeoff Guarantee For the error rate of the 1 nearestneighbour classifier is never more than twice the optimal error 35 Case Study kNN for Web Classification Dataset 20 News Groups 20 classes Download http people csail mit edu jrennie 20Newsgroups 61 118 words 18 774 documents Class labels descriptions 37 Experimental Setup Training Test Sets 50 50 randomly split 10 runs report average results Evaluation Criteria 38 Results Binary Classes Accuracy alt atheism vs comp graphics rec autos vs rec sport baseball k comp windows x vs rec motorcycles 39 From Classification to Regression 40 Temperature sensing What is the temperature in the room at location x x Average Local Average 41 Kernel Regression h Aka Local Regression Nadaraya Watson Kernel Estimator Where Weight each training point based on distance to test point Boxcar kernel yields local average 42 Kernels 43 Choice of kernel bandwidth h h 1 Too small h 10 Too small Image Source Larry s book All of Nonparametric Statistics Choice of kernel is not that important h 50 Just


View Full Document

CMU CS 10701 - Lecture

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?