WSU CSE 6363 - Machine Learning Algorithms

Unformatted text preview:

ROC Analysis for Evaluation of Machine Learning AlgorithmsLarry HolderComputer Science and EngineeringUniversity of Texas at ArlingtonReferences Provost et al., “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” International Conference on Machine Learning, 1998. Rob Holte’s talk on ROC analysis at www.cs.ualberta.ca/~holte/Learning/ROCtalk/Motivation Most comparisons of machine learning algorithms use classification accuracy Problems with this approach May be different costs associated with false positive and false negative errors Training data may not reflect true class distributionMotivation Perhaps maximizing accuracy is still okay Alter class distribution to reduce FP/FN costs Problems Only works on 2-class case Assigning true costs is difficult Unsure of true class distribution So, must show classifier L1 better than L2 under more general conditionsROC Analysis Receiver Operating Characteristic (ROC) Originated from signal detection theory Common in medical diagnosis Becoming common in ML evaluations ROC curves assess predictive behavior independent of error costs or class distributionsConfusion MatrixClassified AsTrue Class Positive NegativePositive #TP #FNNegative #FP #TN True Positive rate TP = #TP/#P False Positive rate FP = #FP/#N Rates independent of class distributionROC Curves ROC space False positive (FP) rate on X axis True positive (TP) rate on Y axis Each classifier represented by a point in ROC space corresponding to its (FP,TP) pair For continuous-output models, classifiers defined based on varying thresholds on outputExample ROC Curve1.0True positive rateFalse positive rate0.251.00.750.50.250Learner L1Learner L2Learner L3Random0.750.50Domination in ROC Space Learner L1 dominates L2 is L2’s ROC curve is beneath L1’s curve If L1 dominates L2, then L1 better than L2 for all possible costs and class distributions If neither dominates (L2 and L3), then there are times when L2 maximizes accuracy, but does not minimize costExpected ROC Curve Perform k-fold cross-validation on each learner ROC curve from each fold itreated as a function Risuch that TP = Ri(FP)R(FP) = mean (Ri(FP)) Generate ROC curve by evenly sampling Ralong FP axis Compute confidence intervals according to binomial distribution over resulting TP values^^Accuracy vs. ROC Curves Hypothesis Standard learning algorithms produce dominating ROC models Answer: No Results on 10 datasets from UCI repository show only one instance of a dominating model Thus, learners maximizing accuracy typically do not dominate in ROC space Thus, worse than others for some costs and class distributions Non-dominating ROC curves can still provide regions of superiority for different learnersSummary Results comparing accuracy of learning algorithms are questionable Especially in scenarios with non-uniform costs and class distributions ROC curves provide a better look at where different learners minimize cost Recommends proper ROC analysis for comparison of learning


View Full Document

WSU CSE 6363 - Machine Learning Algorithms

Download Machine Learning Algorithms
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine Learning Algorithms and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine Learning Algorithms 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?