Error-Sensitive Grading for Model Combination

Home> Academic Documents> Error-Sensitive Grading for Model Combination

DOC PREVIEW

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

IntroductionError Sensitive GradingCost-Sensitive LearningType A vs. Type B ErrorsError Sensitive Grading AlgorithmTie Breaking for GradingTime Complexity of Error-Sensitive GradingExperiments and DiscussionAgainst Different Model Combination MethodsPerformance with Different Base ClassifiersConclusion and Further WorkError-Sensitive Grading for Model CombinationSurendra K. Singhi and Huan LiuDepartment of Computer Science and Engineering,Arizona State University, Tempe, AZ 85287-8809, [email protected], [email protected]. Ensemble learning is a powerful learning approach that com-bines multiple classiﬁers to improve prediction accuracy. An importantdecision while using an ensemble of classiﬁers is to decide upon a way ofcombining the prediction of its base classiﬁers. In this paper, we intro-duce a novel grading-based algorithm for model combination, which usescost-sensitive learning in building a meta-learner. This method distin-guishes between the grading error of classifying an incorrect predictionas correct, and the other-way-round, and tries to assign appropriate coststo the two types of error in order to improve performance. We study is-sues in error-sensitive grading, and then with extensive experiments showthe empirically eﬀectiveness of this new method in comparison with rep-resentative meta-classiﬁcation techniques.1 IntroductionThe accessibility and abundance of data in today’s information age and theadvent of multimedia and Internet have made machine learning an indispensabletool for knowledge discovery. Ensemble learning is a powerful and widely usedtechnique which combine the decision of a set of classiﬁers to make the ﬁnalprediction, this not only help in reducing the variance of learning, but alsofacilitates learning concepts (or hypothesis) from training data which are diﬃcultfor a single classiﬁer. In large datasets, where there may be multiple functionsdeﬁning the relationship between the predictor and response variables, ensemblemethods allow diﬀerent classiﬁers to represent each function individually insteadof using one single overly complex function to approximate all the functions.Building a good quality ensemble is a two steps process. During the ﬁrststep (model generation phase), the constituent (or base level) classiﬁers shouldbe selected such that they make independent or uncorrelated errors, or in otherwords, ensemble should be as diverse as possible. One way of introducing di-versity is by varying the bias of learning, i.e., by employing diﬀerent learningalgorithms (results in heterogeneous ensemble); another technique is to keep thelearning algorithm same, but manipulate the training data, so that the classiﬁerslearn diﬀerent functions in the hypothesis space (results in homogeneous ensem-ble). After an ensemble of classiﬁers is obtained, the next important step is toconstruct a meta classiﬁer, which combines the predictions of the base classiﬁers(or model combination phase). This is the main focus of this paper.J. Gama et al. (Eds.): ECML 2005, LNAI 3720, pp. 724–732, 2005.c Springer-Verlag Berlin Heidelberg 2005Error-Sensitive Grading for Model Combination 725Diﬀerent model combination techniques, depending upon the methods usedby them can be partitioned into three categories i.e., voting, stacking and grad-ing. The nomenclature for these categories was decided based on the most basicmethods which represent the underlying principle of the methods falling underthat category.Voting. The techniques in this category are very simple, and widely used withhomogeneous ensembles. Majority voting is a naive voting technique, in which asimple summation of the output probabilities (or 0, 1 values) of base classiﬁers isdone, and a normalized probability distribution is returned. Weighted Voting,isavariation in which, a reliability weight or conﬁdence value inversely proportionalto the validation-set error rate, is assigned to each classiﬁer. The meta-classiﬁerthen does a weighted sum to arrive at the ﬁnal class probabilities. In one possiblevariation, instead of assigning a single reliability weight to the base classiﬁer, foreach class a separate reliability weight can be assigned.Stacking. The stacking techniques are based on the idea of stacked general-ization [1]. The distinguishing feature of the stacking techniques is that, themeta-classiﬁer tries to learn the pattern or relationship between the predictionsof the base classiﬁers and the actual class. Stacking with Multi-response LinearRegression (MLR) [2], is a stacking technique in which the MLR algorithm isused as the meta-classiﬁer algorithm. Based on probability estimates given bythe base-classiﬁers, meta-training datasets are constructed for each class. Thenfrom these meta-training datasets linear regression models are built, the numberof linear regression models is same as the number of classes. Dzeroski [3] showsthat using Model Tree instead of Multi-response Linear Regression may yieldbetter result. StackingC [4] is a variation, in which while building the meta-training datasets, instead of using class probabilities given by the base classiﬁersfor all the diﬀerent classes; only class probabilities corresponding to the particu-lar class for which regression model is being built, are used. This results in fastermodel building time for the meta-classiﬁer and also has the added beneﬁt of thegiving more diverse models for each classiﬁer.Table 1. Grading meta-training dataset, for a datasetwith m features and ninstancesAttributes GradedA1.... AmClassx1,1... x1,m1x2,1... x2,m1... ... ... ...xn,1... xn,m0Grading. The deﬁning feature of methods in thiscategory (also known as referee method [5,6]) isthat, instead of directly ﬁnding the relationshipbetween the predictions of the base classiﬁer andthe actual class (as in stacking); the meta-classiﬁergrades the base-classiﬁers, and selects either a sin-gle or subset of base-classiﬁer(s) which are likelyto be correct for the given test instance. The in-tuition behind grading is that in large datasetswhere there may be multiple functions deﬁningthe relationship between predictor and responsevariables, it is important to choose the correctfunction for any given test instance. In stacking the meta-classiﬁer uses thepredictions of the base classiﬁer to decide the way they (predictions) should be726 S.K. Singhi and H. Liucombined to make the ﬁnal decision; but


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

Please select your school