BU CS 565 - Metrics for Performance Evaluation - D2683322

Home> Schools> Boston University> Computer Science (CS) > CS 565> Metrics for Performance Evaluation

DOC PREVIEW

BU CS 565 - Metrics for Performance Evaluation

School name Boston University

Course Cs 565- Advanced Java Programming

Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Model EvaluationMetrics for Performance EvaluationMetrics for Performance Evaluation…Limitation of AccuracyCost MatrixComputing Cost of ClassificationCost vs AccuracyCost-Sensitive MeasuresModel EvaluationMethods for Performance EvaluationLearning CurveMethods of EstimationModel EvaluationROC (Receiver Operating Characteristic)ROC (Receiver Operating Characteristic)ROC CurveROC CurveUsing ROC for Model ComparisonHow to Construct an ROC curveHow to construct an ROC curveEnsemble MethodsGeneral IdeaWhy does it work?Examples of Ensemble MethodsBaggingBoostingBoostingExample: AdaBoostExample: AdaBoostIllustrating AdaBoostIllustrating AdaBoostModel Evaluation•Metrics for Performance Evaluation–How to evaluate the performance of a model?•Methods for Performance Evaluation–How to obtain reliable estimates?•Methods for Model Comparison–How to compare the relative performance of different models?Metrics for Performance Evaluation•Focus on the predictive capability of a model–Rather than how fast it takes to classify or build models, scalability, etc.•Confusion Matrix:PREDICTED CLASSACTUALCLASSClass=Yes Class=NoClass=Yes a: TP b: FNClass=No c: FP d: TNa: TP (true positive)b: FN (false negative)c: FP (false positive)d: TN (true negative)Metrics for Performance Evaluation…•Most widely-used metric:PREDICTED CLASSACTUALCLASSClass=Yes Class=NoClass=Yes a(TP)b(FN)Class=No c(FP)d(TN)FNFPTNTPTNTPdcbadaAccuracyLimitation of Accuracy•Consider a 2-class problem–Number of Class 0 examples = 9990–Number of Class 1 examples = 10•If model predicts everything to be class 0, accuracy is 9990/10000 = 99.9 %–Accuracy is misleading because model does not detect any class 1 exampleCost Matrix PREDICTED CLASSACTUALCLASSC(i|j)Class=Yes Class=NoClass=Yes C(Yes|Yes) C(No|Yes)Class=No C(Yes|No) C(No|No)C(i|j): Cost of misclassifying class j example as class iComputing Cost of ClassificationCost MatrixPREDICTED CLASSACTUALCLASSC(i|j)+ -+ -1 100- 1 0Model M1PREDICTED CLASSACTUALCLASS+ -+ 150 40- 60 250Model M2PREDICTED CLASSACTUALCLASS+ -+ 250 45- 5 200Accuracy = 80%Cost = 3910Accuracy = 90%Cost = 4255Cost vs AccuracyCountPREDICTED CLASSACTUALCLASSClass=Yes Class=NoClass=Yesa bClass=Noc dCostPREDICTED CLASSACTUALCLASSClass=Yes Class=NoClass=Yesp qClass=Noq pN = a + b + c + dAccuracy = (a + d)/NCost = p (a + d) + q (b + c) = p (a + d) + q (N – a – d) = q N – (q – p)(a + d) = N [q – (q-p)  Accuracy] Accuracy is proportional to cost if1. C(Yes|No)=C(No|Yes) = q 2. C(Yes|Yes)=C(No|No) = pCost-Sensitive MeasuresFNFPTPTPcbaaprrpFNTPTPbaaFPTPTPcaa22222(F) measure-F(r) Recall (p)Precision Precision is biased towards C(Yes|Yes) & C(Yes|No)Recall is biased towards C(Yes|Yes) & C(No|Yes)F-measure is biased towards all except C(No|No)dwcwbwawdwaw432141Accuracy WeightedModel Evaluation•Metrics for Performance Evaluation–How to evaluate the performance of a model?•Methods for Performance Evaluation–How to obtain reliable estimates?•Methods for Model Comparison–How to compare the relative performance of different models?Methods for Performance Evaluation•How to obtain a reliable estimate of performance?•Performance of a model may depend on other factors besides the learning algorithm:–Class distribution–Cost of misclassification–Size of training and test setsLearning CurveLearning curve shows how accuracy changes with varying sample sizeRequires a sampling schedule for creating learning curveEffect of small sample size:-Bias in the estimate-Variance of estimateMethods of Estimation•Holdout–Reserve 2/3 for training and 1/3 for testing •Random subsampling–Repeated holdout•Cross validation–Partition data into k disjoint subsets–k-fold: train on k-1 partitions, test on the remaining one–Leave-one-out: k=n•Bootstrap–Sampling with replacementModel Evaluation•Metrics for Performance Evaluation–How to evaluate the performance of a model?•Methods for Performance Evaluation–How to obtain reliable estimates?•Methods for Model Comparison–How to compare the relative performance of different models?ROC (Receiver Operating Characteristic)•Developed in 1950s for signal detection theory to analyze noisy signals –Characterize the trade-off between positive hits and false alarms•ROC curve plots TPR (on the y-axis) against FPR (on the x-axis)FNTPTPTPRTNFPFPFPRPREDICTED CLASSActualYes NoYes a(TP)b(FN)No c(FP)d(TN)ROC (Receiver Operating Characteristic)•Performance of each classifier represented as a point on the ROC curve–changing the threshold of algorithm, sample distribution or cost matrix changes the location of the pointROC CurveAt threshold t:TP=0.5, FN=0.5, FP=0.12, FN=0.88- 1-dimensional data set containing 2 classes (positive and negative)- any points located at x > t is classified as positiveROC Curve(TP,FP):•(0,0): declare everything to be negative class•(1,1): declare everything to be positive class•(1,0): ideal•Diagonal line:–Random guessing–Below diagonal line:• prediction is opposite of the true classPREDICTED CLASSActualYes NoYes a(TP)b(FN)No c(FP)d(TN)Using ROC for Model ComparisonNo model consistently outperform the otherM1 is better for small FPRM2 is better for large FPRArea Under the ROC curveIdeal: Area = 1Random guess: Area = 0.5How to Construct an ROC curveInstance P(+|A) True Class1 0.95 +2 0.93 +3 0.87 -4 0.85 -5 0.85 -6 0.85 +7 0.76 -8 0.53 +9 0.43 -10 0.25 +• Use classifier that produces posterior probability for each test instance P(+|A)• Sort the instances according to P(+|A) in decreasing order• Apply threshold at each unique value of P(+|A)• Count the number of TP, FP, TN, FN at each threshold• TP rate, TPR = TP/(TP+FN)• FP rate, FPR = FP/(FP + TN)How to construct an ROC curveClass + - + - - - + - + + P 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00 TP 5 4 4 3 3 3 3 2 2 1 0 FP 5 5 4 4 3 2 1 1 0 0 0 TN 0 0 1 1 2 3 4 4 5 5 5 FN 0 1 1 2 2 2 2 3 3 4 5 TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0 FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0 Threshold >= ROC Curve:Ensemble Methods•Construct a set of classifiers from the training data•Predict class label of previously unseen records by aggregating predictions made by

View Full Document