DOC PREVIEW
Error-Sensitive Grading for Model Combination

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IntroductionError Sensitive GradingCost-Sensitive LearningType A vs. Type B ErrorsError Sensitive Grading AlgorithmTie Breaking for GradingTime Complexity of Error-Sensitive GradingExperiments and DiscussionAgainst Different Model Combination MethodsPerformance with Different Base ClassifiersConclusion and Further WorkError-Sensitive Grading for Model CombinationSurendra K. Singhi and Huan LiuDepartment of Computer Science and Engineering,Arizona State University, Tempe, AZ 85287-8809, [email protected], [email protected]. Ensemble learning is a powerful learning approach that com-bines multiple classifiers to improve prediction accuracy. An importantdecision while using an ensemble of classifiers is to decide upon a way ofcombining the prediction of its base classifiers. In this paper, we intro-duce a novel grading-based algorithm for model combination, which usescost-sensitive learning in building a meta-learner. This method distin-guishes between the grading error of classifying an incorrect predictionas correct, and the other-way-round, and tries to assign appropriate coststo the two types of error in order to improve performance. We study is-sues in error-sensitive grading, and then with extensive experiments showthe empirically effectiveness of this new method in comparison with rep-resentative meta-classification techniques.1 IntroductionThe accessibility and abundance of data in today’s information age and theadvent of multimedia and Internet have made machine learning an indispensabletool for knowledge discovery. Ensemble learning is a powerful and widely usedtechnique which combine the decision of a set of classifiers to make the finalprediction, this not only help in reducing the variance of learning, but alsofacilitates learning concepts (or hypothesis) from training data which are difficultfor a single classifier. In large datasets, where there may be multiple functionsdefining the relationship between the predictor and response variables, ensemblemethods allow different classifiers to represent each function individually insteadof using one single overly complex function to approximate all the functions.Building a good quality ensemble is a two steps process. During the firststep (model generation phase), the constituent (or base level) classifiers shouldbe selected such that they make independent or uncorrelated errors, or in otherwords, ensemble should be as diverse as possible. One way of introducing di-versity is by varying the bias of learning, i.e., by employing different learningalgorithms (results in heterogeneous ensemble); another technique is to keep thelearning algorithm same, but manipulate the training data, so that the classifierslearn different functions in the hypothesis space (results in homogeneous ensem-ble). After an ensemble of classifiers is obtained, the next important step is toconstruct a meta classifier, which combines the predictions of the base classifiers(or model combination phase). This is the main focus of this paper.J. Gama et al. (Eds.): ECML 2005, LNAI 3720, pp. 724–732, 2005.c Springer-Verlag Berlin Heidelberg 2005Error-Sensitive Grading for Model Combination 725Different model combination techniques, depending upon the methods usedby them can be partitioned into three categories i.e., voting, stacking and grad-ing. The nomenclature for these categories was decided based on the most basicmethods which represent the underlying principle of the methods falling underthat category.Voting. The techniques in this category are very simple, and widely used withhomogeneous ensembles. Majority voting is a naive voting technique, in which asimple summation of the output probabilities (or 0, 1 values) of base classifiers isdone, and a normalized probability distribution is returned. Weighted Voting,isavariation in which, a reliability weight or confidence value inversely proportionalto the validation-set error rate, is assigned to each classifier. The meta-classifierthen does a weighted sum to arrive at the final class probabilities. In one possiblevariation, instead of assigning a single reliability weight to the base classifier, foreach class a separate reliability weight can be assigned.Stacking. The stacking techniques are based on the idea of stacked general-ization [1]. The distinguishing feature of the stacking techniques is that, themeta-classifier tries to learn the pattern or relationship between the predictionsof the base classifiers and the actual class. Stacking with Multi-response LinearRegression (MLR) [2], is a stacking technique in which the MLR algorithm isused as the meta-classifier algorithm. Based on probability estimates given bythe base-classifiers, meta-training datasets are constructed for each class. Thenfrom these meta-training datasets linear regression models are built, the numberof linear regression models is same as the number of classes. Dzeroski [3] showsthat using Model Tree instead of Multi-response Linear Regression may yieldbetter result. StackingC [4] is a variation, in which while building the meta-training datasets, instead of using class probabilities given by the base classifiersfor all the different classes; only class probabilities corresponding to the particu-lar class for which regression model is being built, are used. This results in fastermodel building time for the meta-classifier and also has the added benefit of thegiving more diverse models for each classifier.Table 1. Grading meta-training dataset, for a datasetwith m features and ninstancesAttributes GradedA1.... AmClassx1,1... x1,m1x2,1... x2,m1... ... ... ...xn,1... xn,m0Grading. The defining feature of methods in thiscategory (also known as referee method [5,6]) isthat, instead of directly finding the relationshipbetween the predictions of the base classifier andthe actual class (as in stacking); the meta-classifiergrades the base-classifiers, and selects either a sin-gle or subset of base-classifier(s) which are likelyto be correct for the given test instance. The in-tuition behind grading is that in large datasetswhere there may be multiple functions definingthe relationship between predictor and responsevariables, it is important to choose the correctfunction for any given test instance. In stacking the meta-classifier uses thepredictions of the base classifier to decide the way they (predictions) should be726 S.K. Singhi and H. Liucombined to make the final decision; but


Error-Sensitive Grading for Model Combination

Download Error-Sensitive Grading for Model Combination
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Error-Sensitive Grading for Model Combination and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Error-Sensitive Grading for Model Combination 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?