A Balanced Ensemble Approach to Weighting Classifiers for Text Classification

Home> Academic Documents> A Balanced Ensemble Approach to Weighting Classifiers for Text Classification

DOC PREVIEW

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

A Balanced Ensemble Approach to Weighting Classifiers forText ClassificationGabriel Pui Cheong Fung1, Jeffrey Xu Yu1, Haixun Wang2, David W. Cheung3, Huan Liu41The Chinese University of Hong Kong, Hong Kong, China, {pcfung, yu}@se.cuhk.edu.hk2IBM T. J. Watson Research Center, New York, USA, [email protected] University of Hong Kong, Hong Kong, China, [email protected] State University, Arizona, USA, [email protected] paper studies the problem of constructing an effec-tive heterogeneous ensemble classifier for text classifica-tion. One major challenge of this problem is to formu-late a good combination function, which combines the de-cisions of the individual classifiers in the ensemble. Weshow that the classification performance is affected by threeweight components and they should be included in deriv-ing an effective combination function. They are: (1) Globaleffectiveness, which measures the effectiveness of a mem-ber classifier in classifying a set of unseen documents; (2)Local effectiveness, which measures the effectiveness of amember classifier in classifying the particular domain ofan unseen document; and (3) Decision confidence, whichdescribes how confident a classifier is when making a deci-sion when classifying a specific unseen document. We pro-pose a new balanced combination function, called DynamicClassifier Weighting (DCW), that incorporates the afore-mentioned three components. The empirical study demon-strates that the new combination function is highly effectivefor text classification.1 IntroductionLet U be a set of unseen documents and C be a set ofpredefined categories. Automated text classification is theprocess of labeling U with C , such that every d ∈ U willbe assigned to some of the categories in C . Note that d canbe assigned to none of the categories in C . If the numberof categories in C is more than two (|C | > 2), it is a multi-label text classification problem. Since every multi-labeltext classification problem can be transformed to a binary-label text classification problem, we focus on the binaryproblem in this paper (|C | = 2). Let c ∈ C . Binary-labeltext classification is to construct a binary classifier, denotedby Φ(·), for c such that:Φ(d) =(1 if f (d) > 0,−1 otherwise,(1)where Φ(d) = 1 indicates that d belongs to c and Φ(d) =−1 indicates that d does not belong to it. f (·) ∈ ℜ is adecision function. Every classifier, Φi, has its own decisionfunction, fi(·). If there are m different classifiers, there willbe m different decision functions. The goal of constructinga binary classifier, Φ(·), is to approximate the unknown truetarget function˘Φ(·), so that Φ(·) and˘Φ(·) are coincident asmuch as possible [17].In order to improve the effectiveness, ensemble classi-fiers (a.k.a classifier committee) were proposed [1, 3, 5, 6,7, 8, 9, 15, 16, 17, 18, 19]. An ensemble classifier is con-structed by grouping a number of member classifiers. If thedecisions of the member classifiers are combined properly,the ensemble is robust and effective. There are two kinds ofensemble classifiers: homogeneous and heterogeneous.A homogeneous ensemble classifier contains m binaryclassifiers in which all classifiers are constructed by thesame learning algorithm. Bagging and boosting [19] aretwo common techniques [1, 15, 16, 18].A heterogeneous ensemble classifier contains m binaryclassifiers in which all classifiers are constructed by differ-ent learning algorithms (e.g., one SVM classifier and onekNN classifier are grouped together) [19]. The individualdecisions of the classifiers in the ensemble are combined(e.g., through stacking [19]):Θ(d) =(1 if gΦ1(d), Φ2(d), . . . , Φm(d)> 0,−1 otherwise,(2)where Θ(·) is an ensemble classifier; g(·) is a combinationfunction that combines the outputs of all Φi(·). The effec-tiveness of the ensemble classifier, Θ(·), depends on the ef-fectiveness of g(·). In this paper, we concentrate on ana-lyzing heterogeneous ensemble classifiers. Our problem isthus to examine how to formulate a good g(·).Four widely used g(·) are: (1) Majority voting (MV)[8, 9]; (2) Weighted linear combination (WLC) [7]; (3) Dy-namic classifiers selection (DCS) [3, 8, 6, 5]; and (4) Adap-tive classifiers combination (ACC) [8, 9]. Except for MV,the other three functions assign different weights to the clas-sifiers in the ensemble. The bigger the weight, the more ef-1Figure 1. Illustration of local effectiveness and deci-sion confidence.fective is that classifier. In MV, all classifiers in the ensem-ble are equally weighted. It can end up with a wrong deci-sion if the minority votes are significant. WLC assigns staticweights to the classifiers based on their performance on avalidation data. However, a generally well-performed clas-sifier can perform poorly in some specific domains. For in-stance, the micro-F1scores of SVM and Naive Bayes (NB)for the benchmarkReuters21578are respectively 0.860 and0.788. In this sense, SVM excels NB. Yet, for the categoriesPotatoandRetailinReuters21578, the F1scores for NB areboth 0.667, but are both 0.0 for SVM. DCS and ACC weightthe classifiers by partitioning the validation data (domainspecific), they do not combine the classifiers’ decisions, butselect one of the classifiers from the ensemble and rely onit solely. We will show in the experiments that this will leadto inferior results.In this paper, we propose a new combination functioncalled Dynamic Classifiers Weighting (DCW). We considerthree components when combining classifiers: (1) GlobalEffectiveness, which is the effectiveness of a classifier in anensemble when it classifies a set of unseen documents; (2)Local effectiveness, which is the effectiveness of a classifierin an ensemble when it classifies the particular domain ofthe unseen document; and (3) Decision confidence, whichis the confidence of a classifier in making a decision of theensemble for a specific unseen document.2 MotivationsLet Φ1(·), Φ2(·), . . . , Φm(·) be m different binary classi-fiers and f1(·), f2(·), . . . , fm(·) be their corresponding deci-sion functions. Conceptually, Φi(·) divides the entire do-main into two parts according to fi(·). Figure 1 illustratesthis idea. The dashed lines are the decision boundaries. Ifthe unseen document, d, falls into the upper (lower) trian-gle, it would be labeled as positive (negative). Usually, if dis further away from the decision boundary, the decision ofd by Φi(d) is more


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Please select your school