ROC Analysis for Evaluation of Machine Learning AlgorithmsLarry HolderComputer Science and EngineeringUniversity of Texas at ArlingtonReferences Provost et al., “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” International Conference on Machine Learning, 1998. Rob Holte’s talk on ROC analysis at www.cs.ualberta.ca/~holte/Learning/ROCtalk/Motivation Most comparisons of machine learning algorithms use classification accuracy Problems with this approach May be different costs associated with false positive and false negative errors Training data may not reflect true class distributionMotivation Perhaps maximizing accuracy is still okay Alter class distribution to reduce FP/FN costs Problems Only works on 2-class case Assigning true costs is difficult Unsure of true class distribution So, must show classifier L1 better than L2 under more general conditionsROC Analysis Receiver Operating Characteristic (ROC) Originated from signal detection theory Common in medical diagnosis Becoming common in ML evaluations ROC curves assess predictive behavior independent of error costs or class distributionsConfusion MatrixClassified AsTrue Class Positive NegativePositive #TP #FNNegative #FP #TN True Positive rate TP = #TP/#P False Positive rate FP = #FP/#N Rates independent of class distributionROC Curves ROC space False positive (FP) rate on X axis True positive (TP) rate on Y axis Each classifier represented by a point in ROC space corresponding to its (FP,TP) pair For continuous-output models, classifiers defined based on varying thresholds on outputExample ROC Curve1.0True positive rateFalse positive rate0.251.00.750.50.250Learner L1Learner L2Learner L3Random0.750.50Domination in ROC Space Learner L1 dominates L2 is L2’s ROC curve is beneath L1’s curve If L1 dominates L2, then L1 better than L2 for all possible costs and class distributions If neither dominates (L2 and L3), then there are times when L2 maximizes accuracy, but does not minimize costExpected ROC Curve Perform k-fold cross-validation on each learner ROC curve from each fold itreated as a function Risuch that TP = Ri(FP)R(FP) = mean (Ri(FP)) Generate ROC curve by evenly sampling Ralong FP axis Compute confidence intervals according to binomial distribution over resulting TP values^^Accuracy vs. ROC Curves Hypothesis Standard learning algorithms produce dominating ROC models Answer: No Results on 10 datasets from UCI repository show only one instance of a dominating model Thus, learners maximizing accuracy typically do not dominate in ROC space Thus, worse than others for some costs and class distributions Non-dominating ROC curves can still provide regions of superiority for different learnersSummary Results comparing accuracy of learning algorithms are questionable Especially in scenarios with non-uniform costs and class distributions ROC curves provide a better look at where different learners minimize cost Recommends proper ROC analysis for comparison of learning
View Full Document