Optimizing F-Measure with Support Vector MachinesOverviewRoadmapThe Classification ProblemSlide 5Misclassification Count SVMApprox Misclassification Count SVMStandard “Soft Margin” SVMWeighted Standard SVMMeasures of successF-measureConstructing an F-measure SVMSlide 13The F-measure Maximizing SVMWeighted misclassification count SVMImplications of resultSlide 17Conclusions / SummaryOptimizing F-Measurewith Support Vector MachinesDavid R. MusicantVipin KumarAysel OzgurFLAIRS 2003Tuesday, May 13, 2003Carleton CollegeSlide 2OverviewClassification algorithms often evaluated by test set accuracyTest set accuracy can be a poor measure when one of the classes is rareSupport Vector Machines (SVMs) are designed to optimize test set accuracySVMs have been used in an ad-hoc manner on datasets with rare classesOur new results: current ad-hoc heuristic techniques can be theoretically justified.Slide 3RoadmapThe Traditional SVM and variantsPrecision, Recall, and F-measure metricsThe F-measure Maximizing SVMEquivalence of traditional SVM and F-measure SVM (for the right parameters)Implications and ConclusionsSlide 4The Classification ProblemSeparating Surface:A+A-•= “margin”Slide 5The Classification ProblemGiven m points in the n dimensional space RnEach point represented as xiMembership of each point Ai in the classes A+ or A- is specified by yi = § 1Separate by two bounding planes such that:More succinctly:for i=1,2,…,m.Slide 6Misclassification Count SVM(¢)* is the step function (1 if > 0, 0 otherwise)“Push the planes apart, and minimize number of misclassified points.” –C balances two competing objectives–Minimizing w 0 w pushes planes apart–Problem NP-complete, objective non-differentiableSlide 7 > 0 is an arbitrary fixed constant that determines closeness of approximation.This is still difficult to solve.Approx Misclassification Count SVM•where we use some differentiable approximation, such asSlide 8Standard “Soft Margin” SVM“Push the planes apart, and minimize distance of misclassified points.” We minimize total distances from misclassified points to bounding planes, not actual number of them.Much more tractable, does quite well in optimizing accuracyDoes poorly when one class is rareSlide 9Weighted Standard SVM“Push the planes apart, and minimize weighted distance of misclassified points.” Allows one to choose different C values for the two classes.Often used to weight rare class more heavily.How do we measure success when one class is rare? Assuming that A+ is the rare class…Slide 10Measures of successPrecision and Recall are better descriptors when one class is rare.Slide 11F-measureF-measure: commonly used “average” of precision and recallCan C+ and C- in the weighted SVM be balanced to optimize F-measure?Can we start over and invent an SVM to optimize F-measure?Slide 12Constructing an F-measure SVMHow do we appropriately represent F-measure in an SVM?Substitute P and R into F:Thus to maximize F-measure, we minimizeSlide 13Want to minimizeFP = # misclassified A-FN = # misclassified A+New F-measure maximizing SVM:Constructing an F-measure SVMSlide 14The F-measure Maximizing SVMApproximate with sigmoid:Can we connect with standard SVM?Slide 15Weighted misclassification count SVMHow do these two formulations relate?We show:–Pick a parameter C.–Find classifier to optimize F-measure SVM.–There exist parameters C+ and C- such that misclassification counting SVM has same solution.–Proof and formulas to obtain C+ and C- in paper.F-measure maximizing SVMSlide 16Implications of resultSince there exist C+, C- to yield same solution as F-measure maximizing SVM, finding best C+ and C- for the weighted standard SVM is “the right thing to do.”(modulo approximations)In practice, common trick is to choose C+, C- such that:This heuristic seems reasonable but is not optimal. (Good first guess?)Slide 17Implications of resultSuppose that SVM fails to provide good F-measure for a given problem, for a wide range of C+ and C- values.Q: Is there another SVM formulation that would yield better F-measure?A: Our evidence suggests not. Q: Is there another SVM formulation that would find best possible F-measure more directly?A: Yes, the F-measure maximizing SVM.Slide 18Conclusions / SummaryWe provide theoretical evidence that standard heuristic practices in using SVMs for optimizing F-measure are reasonable.We provide a framework for continued research in F-measure maximizing SVMs.All our results apply directly to SVMs with kernels (see paper).Future work: attacking F-measure maximizing SVM directly to find faster
or
We will never post anything without your permission.
Don't have an account? Sign up