http www cs cmu edu guestrin Class 10701 What s learning Point Estimation Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University 2005 2007 Carlos Guestrin September 10th 2007 1 What is Machine Learning 2005 2007 Carlos Guestrin 2 1 Machine Learning Study of algorithms that improve their performance at some task with experience 2005 2007 Carlos Guestrin 3 Object detection Prof H Schneiderman Example training images for each orientation 2005 2007 Carlos Guestrin 4 2 Text classification Company home page vs Personal home page vs Univeristy home page vs 2005 2007 Carlos Guestrin 5 Reading a noun vs verb Rustandi et al 2005 2005 2007 Carlos Guestrin 6 3 Modeling sensor data 50 OFFICE 52 12 9 54 OFFICE 51 49 QUIET PHONE 11 8 53 16 15 10 CONFERENCE 13 14 17 7 18 STORAGE 48 LAB ELEC COPY 5 47 19 6 4 46 45 21 SERVER 44 KITCHEN 39 37 42 41 38 36 23 33 35 40 22 1 43 20 3 2 29 27 31 34 25 32 30 28 24 26 Measure temperatures at some locations Predict temperatures throughout the environment Guestrin et al 04 2005 2007 Carlos Guestrin 7 Learning to act Reinforcement learning An agent Makes sensor observations Must select action Receives rewards positive for good states negative for bad states Ng eat al 05 2005 2007 Carlos Guestrin 8 4 Growth of Machine Learning Machine learning is preferred approach to Speech recognition Natural language processing Computer vision Medical outcomes analysis Robot control Computational biology Sensor networks This trend is accelerating Improved machine learning algorithms Improved data capture networking faster computers Software too complex to write by hand New sensors IO devices Demand for self customization to user environment 2005 2007 Carlos Guestrin 9 Syllabus Covers a wide range of Machine Learning techniques from basic to state of the art You will learn about the methods you heard about Na ve Bayes logistic regression nearest neighbor decision trees boosting neural nets overfitting regularization dimensionality reduction PCA error bounds VC dimension SVMs kernels margin bounds K means EM mixture models semisupervised learning HMMs graphical models active learning reinforcement learning Covers algorithms theory and applications It s going to be fun and hard work 2005 2007 Carlos Guestrin 10 5 Prerequisites Probabilities Distributions densities marginalization Basic statistics Moments typical distributions regression Algorithms Dynamic programming basic data structures complexity Programming Mostly your choice of language but Matlab will be very useful We provide some background but the class will be fast paced Ability to deal with abstract mathematical concepts 2005 2007 Carlos Guestrin 11 Recitations Very useful Review material Present background Answer questions Thursdays 5 00 6 20 in Wean Hall 5409 Special recitation 1 tomorrow Wean 5409 5 00 6 20 Review of probabilities Special recitation 2 on Matlab Tuesday Sept 18th 4 30 5 50pm NSH 3002 2005 2007 Carlos Guestrin 12 6 Staff Four Great TAs Great resource for learning interact with them Joseph Gonzalez Wean 5117 x8 3046 jegonzal cs Office hours Tuesdays 7 9pm Steve Hanneke Doherty 4301H x8 7375 shanneke cs Office hours Fridays 1 3pm Jingrui He Wean 8102 x8 1299 jingruih cs Office hours Wednesdays 11 1pm Sue Ann Hong Wean 4112 x8 3047 sahong cs Office hours Tuesdays 3 5pm Administrative Assistant Monica Hopes x8 5527 meh cs 2005 2007 Carlos Guestrin 13 First Point of Contact for HWs To facilitate interaction a TA will be assigned to each homework question This will be your first point of contact for this question But you can always ask any of us For e mailing instructors always use 10701 instructors cs cmu edu For announcements subscribe to 10701 announce cs https mailman srv cs cmu edu mailman listinfo 10701 announce 2005 2007 Carlos Guestrin 14 7 Text Books Required Textbook Pattern Recognition and Machine Learning Chris Bishop Optional Books Machine Learning Tom Mitchell The Elements of Statistical Learning Data Mining Inference and Prediction Trevor Hastie Robert Tibshirani Jerome Friedman Information Theory Inference and Learning Algorithms David MacKay 2005 2007 Carlos Guestrin 15 Grading 5 homeworks 35 First one goes out 9 12 Start early Start early Start early Start early Start early Start early Start early Start early Start early Start early Final project 25 Details out around Oct 1st Projects done individually or groups of two students Midterm 15 Thu Oct 25 5 6 30pm location MM A14 Final 25 TBD by registrar 2005 2007 Carlos Guestrin 16 8 Homeworks Homeworks are hard start early Due in the beginning of class 3 late days for the semester After late days are used up Half credit within 48 hours Zero credit after 48 hours All homeworks must be handed in even for zero credit Late homeworks handed in to Monica Hopes WEH 4619 Collaboration You may discuss the questions Each student writes their own answers Write on your homework anyone with whom you collaborate Each student must write their own code for the programming part Please don t search for answers on the web Google previous years homeworks etc please ask us if you are not sure if you can use a particular reference 2005 2007 Carlos Guestrin 17 Sitting in Auditing the Class Due to new departmental rules every student who wants to sit in the class not take it for credit must register officially for auditing To satisfy the auditing requirement you must either Do two homeworks and get at least 75 of the points in each or Take the final and get at least 50 of the points or Do a class project and do one homework and get at least 75 of the points in the homework Only need to submit project proposal and present poster and get at least 80 points in the poster Please send us an email saying that you will be auditing the class and what you plan to do If you are not a student and want to sit in the class please get authorization from the instructor 2005 2007 Carlos Guestrin 18 9 Enjoy ML is becoming ubiquitous in science engineering and beyond This class should give you the basic foundation for applying ML and developing new methods The fun begins 2005 2007 Carlos Guestrin 19 Your first consulting job A billionaire from the suburbs of Seattle asks you a question He says I have thumbtack if I flip it what s the probability it will fall with the nail up You say Please flip it a few times You say The probability is He says Why You say Because 2005 2007 Carlos Guestrin 20 10 Thumbtack Binomial Distribution P Heads P Tails 1 Flips are i i d
View Full Document