Unformatted text preview:

ICS 235 Machine Learning Methods Department of Information and Computer Sciences University of Hawai i at M noa Kyungim Baek Reminder Homework assignment 1 Due 11 55 PM Wednesday September 18 Upload your notebook file ipynb in your Drop Box folder in Laulima Do not upload the data file Do not make any subfolders in your Drop Box K NN classifier implementation Part 2 question 3 Test your own implementation from question 1 of Part 2 Part 3 Use the implementation provided in scikit learn 2 1 Lecture 5 Review questions from last class Model evaluation and selection Train test split Underfitting and overfitting Bias and variance Prediction Pos Neg s o P TP e t a t S e u r T FN TN Questions True or False 1 2 3 g e N FP In medical diagnosis false positives are more damaging than false negatives assume positive means the person has a disease negative means they don t In spam email classification false positives are more damaging than false negatives assume positive means the email is spam negative means that it s not If method A gets a higher accuracy than method B that means its recall is also higher 4 5 2 Previously K NN classification K Nearest Neighbors K NN predicts new data points based on K similar records from a dataset What class does belong to Look at the K closest data points For example let K 3 Calculate the distances from to all data points Find the K nearest neighbors Predict that majority class t n a r u a t s e R m o r f s e l i M Late On time What factors can affect the performance of K NN classification Bad Weather Example from Amazon Machine Learning Univ https github com aws samples aws machine learning university accelerated tab tree master slides 6 Influence of the number of neighbors How to choose K K 1 K 5 K 10 K 30 Figure credits A C M ller 9 3 The test set is not available to the model for learning It is only used to ensure that the model generalizes well on new unseen data Model evaluation Splitting data Original dataset Training Set model building Test Set model evaluation Train test split in Python Load data 10 11 4 Train test split in Python cont d Split data K NN in Python 12 13 5 Classification metrics in Python Varying the number of neighbors 14 15 6 Slide credit P Sadowski 16 Model selection Underfitting Underfitting Model is not good enough to describe the relationship between the input data x1 x2 and output y Class 1 Class 2 Class 1 Class 2 x2 Model is too simple to capture important patterns of training data Model will perform poorly on training data and validation data and or test data Slide credit Amazon Machine Learning Univ https github com aws samples aws machine learning university accelerated tab tree master slides 17 x1 7 Model selection Overfitting Overfitting Model memorizes or imitates training data and fails to generalize well on new unseen data test data Class 1 Class 2 Model is too complex Model picks up the noise instead of the underlying relationship Model will perform well on training data but poorly on validation data and or test data Slide credit Amazon Machine Learning Univ https github com aws samples aws machine learning university accelerated tab tree master slides 18 Model selection Good fit Appropriate fitting Model captures the general relationship between the input data x1 x2 and output y Class 1 Class 2 Class 1 Class 2 complex Model not too simple not too Model picks up the underlying relationship rather than the noise in the training Model will perform good enough on training data and validation data and or test data Slide credit Amazon Machine Learning Univ https github com aws samples aws machine learning university accelerated tab tree master slides 19 x2 x2 x1 x1 8 Underfitting and overfitting Example of a regression problem Fitting a polynomial model degree 1 degree 2 degree 3 degree 4 degree 12 Figure from Oh 2017 20 Underfitting and overfitting Accuracy Model complexity Slide credit A C M ller 21 9 Underfitting and overfitting Accuracy Underfitting and overfitting Accuracy Sweet spot Model complexity Slide credit A C M ller 22 Underfitting Overfitting Model complexity Slide credit A C M ller 23 10 Bias and variance In ML model will vary depending on the random training data E g Polynomial models trained on three independent training sets Train set 1 Train set 2 Train set 3 Train set 1 Train set 2 Train set 3 2nd degree polynomial Figure from Oh 2017 12th degree polynomial 25 Bias variance tradeoff If your model is very simple then you won t really learn much from the training set and your model won t be very good in general Underfitting bias variance If your model is very complex then you will learn unreliable patterns that get every single training example correct but there will be a huge gap between training error and validation or test error Overfitting bias variance You want to find the right balance between fitting the training data perfectly and keeping the model simple enough to ensure that it will generalize well Unfortunately it is typically impossible to do both simultaneously Minimize the variance while sacrificing bias minimally 26 11 Bias variance tradeoff Ideal Adding a little bias might reduce the variance a lot Adapted slide credit P Sadowski Bias variance tradeoff 27 30 Slide credit P Sadowski 12 Questions True or False 1 2 If our training error is extremely low that means we re underfitting If we had an infinite amount of training data overfitting would not be a problem 3 The fundamental tradeoff of supervised learning states that as training error goes down the gap between training error and test error tends to go up Next class Model evaluation and selection Train validation test split Cross validation Hyperparameter optimization 31 32 13


View Full Document

UH Manoa ICS 235 - Lecture 5

Documents in this Course
Load more
Download Lecture 5
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?