11Machine LearningCS6375 --- Spring 2015aCross-Validation Instructor: Yang Liu2Avoiding Overfitting• We have a choice of different techniques:Decision trees, Nearest neighbors, Bayes classifier, Neural networks …• For each we have different levels of complexity:– Depth of trees– Number of neighbors in K-NN– Number of layers and hidden units– …..• How to choose the right one?• Overfitting: A complex enough model (e.g., large enough trees,..) will always be able to fit the training data well23Example• Construct a predictor of y from x given this training data4Whichmodel isbest forpredicting yfrom x ????35Whichmodel isbest forpredicting yfrom x ????We want the model that generatesthe best predictions on future data.Not necessarily the one with thelowest error on training data6Using a Test Set1. Use a portion (e.g., 30%) of the data astest data2. Fit a model tothe remainingtraining data3. Evaluate theerror on thetest data478Using a Test Set:+ Simple- Wastes a large % of the data- May get lucky with oneparticular subset of the data59“Leave One Out” Cross-Validation• For k=1 to R– Train on all thedata leaving out(xk,yk)– Evaluate erroron (xk,yk)• Report theaverage errorafter trying allthe data points1061112713“Leave One Out” Cross-Validation• For k=1 to R– Train on all thedata leaving out(xk,yk)– Evaluate erroron (xk,yk)• Report theaverage errorafter trying allthe data points“Leave One Out” Cross-Validation:+ Does not waste data+ Average over large number of trials- Expensive14K-Fold Cross-Validation• Randomly dividethe data set into Ksubsets• For each subset S:– Train on the data not in S– Test on the data in S• Return the average error over the K subsetsExample: K = 3, each color corresponds to a subset81516Classification Problems• The exact same approaches apply for cross-validation except that the error is the number of data points that are misclassified.917Example: CV for KNN• For each kNN, evaluate the error using K-fold Cross-Validation• Choose the one with the minimum cross-validation error18Cross-Validation
View Full Document