CMU CS 10601 - Lecture - D2794062

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Lecture

DOC PREVIEW

CMU CS 10601 - Lecture

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

9/23/20091Machine Learning - 10601Model Selection andNaïve BayesGeoff Gordon, Miroslav Dudík([[[partly based on slides of Tom Mitchell]http://www.cs.cmu.edu/~ggordon/10601/September 23, 2009 Announcements“You’re getting Ph.D.’s for a dollar an hour,” Reed Hastings, chief ofNetflix, said of the people competing for the prize.September 21,2009:Netflix awards $1 Million prize to a team of statisticians,machine-learning expertsand computer engineers9/23/20092How to win $1 MillionGoal:(user,movie) -> ratingData:100M (user,movie,date,rating) tuplesPerformance measure:root mean squared erroron withheld test setHow to win $1 MillionA part of the winning model is the “baseline model” capturing bulk of the information:[Koren 2009]9/23/20093How to win $1 Milliontraining set quiz set test setFAQ: why quiz/test split?We wanted a way of informing you … about your progress … while making it difficult for you to simply train and optimize against “the answer oracle”9/23/20094FAQ: why quiz/test split?Two goals for withholding data• model selection• model assessmentWhat if data is scarce?training set validation set test set9/23/20095Cross-validation• split data randomly into K equal parts• for each model setting:evaluate avg performance across K train-test splits• train the best model on the full data setPart 1 Part 2 Part 3Part 1 Part 2 evaluate errorPart 1 evaluate error Part 3evaluate error Part 2 Part 3Depends on the size of the data set:The best model…y ≈ w0+ w1x + w2x2+ w3x3+ w4x4+ … + w10x109/23/20096K-fold cross-validation trainson of the training dataControlling modelcomplexity• limit the number of features• add a “complexity penalty”9/23/20097Regularized estimationmin errortrain(w) + regularization(w)min -log p(data|w) - log p(w) Examples of regularizationminmin9/23/20098L2:L1:trainingerrorregularizationtraining error+regularizationL1vs L2L1• sparse solutions• more suitable when #featuresmuch larger than training setL2• computationally better-behavedHow do you choose λ?9/23/20099AnnouncementsHW #3 outdue October 7ClassificationGoal:learn a map h: x yData:(x1, y1), (x2, y2)… , (xN, yN) Performance measure:9/23/200910All you need to know is p(X,Y)…If you knew p(X,Y), how would youclassify an example x?Why?How many parametersneed to be estimated?Y binaryX described by M binary features X1,X2,…,XMData:p(X,Y) described by numbers9/23/200911Naïve Bayes Assumption• features of X conditionallyindependent given class YExample: Live in Sq Hill?• S=1 iff live in Sq Hill• G=1 iff shop in Sq Hill Giant Eagle• D=1 iff drive to CMU• A=1 iff owns a Mac9/23/200912Naïve Bayes Assumption• usually incorrect…• Naïve Bayes often performs well,even when the assumption is violated[see Domingos-Pazzani 1996]Learning to classify text documents• which emails are spam?• which emails promise an attachment?• which web pages are student home pages?What are the features of X?9/23/200913Feature Xjis the jth wordAssumption #1: Naïve Bayes9/23/200914Assumption #2: “Bag of words”“Bag of words” approach9/23/2009159/23/2009169/23/2009179/23/200918What you should knowabout Naïve BayesNaïve Bayes• assumption• why we use itText classification• bag of words modelGaussian Naïve Bayes• each feature a Gaussian given the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 18 pages.

CMU CS 10601 - Lecture

Sign up for free to view:

Please select your school