DOC PREVIEW
CMU CS 10601 - Lecture

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

9/23/20091Machine Learning - 10601Model Selection andNaïve BayesGeoff Gordon, Miroslav Dudík([[[partly based on slides of Tom Mitchell]http://www.cs.cmu.edu/~ggordon/10601/September 23, 2009 Announcements“You’re getting Ph.D.’s for a dollar an hour,” Reed Hastings, chief ofNetflix, said of the people competing for the prize.September 21,2009:Netflix awards $1 Million prize to a team of statisticians,machine-learning expertsand computer engineers9/23/20092How to win $1 MillionGoal:(user,movie) -> ratingData:100M (user,movie,date,rating) tuplesPerformance measure:root mean squared erroron withheld test setHow to win $1 MillionA part of the winning model is the “baseline model” capturing bulk of the information:[Koren 2009]9/23/20093How to win $1 Milliontraining set quiz set test setFAQ: why quiz/test split?We wanted a way of informing you … about your progress … while making it difficult for you to simply train and optimize against “the answer oracle”9/23/20094FAQ: why quiz/test split?Two goals for withholding data• model selection• model assessmentWhat if data is scarce?training set validation set test set9/23/20095Cross-validation• split data randomly into K equal parts• for each model setting:evaluate avg performance across K train-test splits• train the best model on the full data setPart 1 Part 2 Part 3Part 1 Part 2 evaluate errorPart 1 evaluate error Part 3evaluate error Part 2 Part 3Depends on the size of the data set:The best model…y ≈ w0+ w1x + w2x2+ w3x3+ w4x4+ … + w10x109/23/20096K-fold cross-validation trainson of the training dataControlling modelcomplexity• limit the number of features• add a “complexity penalty”9/23/20097Regularized estimationmin errortrain(w) + regularization(w)min -log p(data|w) - log p(w) Examples of regularizationminmin9/23/20098L2:L1:trainingerrorregularizationtraining error+regularizationL1vs L2L1• sparse solutions• more suitable when #featuresmuch larger than training setL2• computationally better-behavedHow do you choose λ?9/23/20099AnnouncementsHW #3 outdue October 7ClassificationGoal:learn a map h: x yData:(x1, y1), (x2, y2)… , (xN, yN) Performance measure:9/23/200910All you need to know is p(X,Y)…If you knew p(X,Y), how would youclassify an example x?Why?How many parametersneed to be estimated?Y binaryX described by M binary features X1,X2,…,XMData:p(X,Y) described by numbers9/23/200911Naïve Bayes Assumption• features of X conditionallyindependent given class YExample: Live in Sq Hill?• S=1 iff live in Sq Hill• G=1 iff shop in Sq Hill Giant Eagle• D=1 iff drive to CMU• A=1 iff owns a Mac9/23/200912Naïve Bayes Assumption• usually incorrect…• Naïve Bayes often performs well,even when the assumption is violated[see Domingos-Pazzani 1996]Learning to classify text documents• which emails are spam?• which emails promise an attachment?• which web pages are student home pages?What are the features of X?9/23/200913Feature Xjis the jth wordAssumption #1: Naïve Bayes9/23/200914Assumption #2: “Bag of words”“Bag of words” approach9/23/2009159/23/2009169/23/2009179/23/200918What you should knowabout Naïve BayesNaïve Bayes• assumption• why we use itText classification• bag of words modelGaussian Naïve Bayes• each feature a Gaussian given the


View Full Document

CMU CS 10601 - Lecture

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?