Compact Dual Ensembles for Active Learning

Home> Academic Documents> Compact Dual Ensembles for Active Learning

DOC PREVIEW

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Compact Dual Ensembles for Active LearningAmit Mandvikar1, Huan Liu1, and Hiroshi Motoda21Arizona State University, Arizona, USA.2Osaka University, Osaka, Japan.{huanliu,amitm}@asu.edu and [email protected]. Generic ensemble methods can achieve excellent learningp erformance, but are not good candidates for active learning becauseof their different design purposes. We investigate how to use diversityof the member classifiers of an ensemble for efficient active learning. Weempirically show, using benchmark data sets, that (1) to achieve a good(stable) ensemble, the number of classifiers needed in the ensemble variesfor different data sets; (2) feature selection can be applied for classifierselection from ensembles to construct compact ensembles with high per-formance. Benchmark data sets and a real-world application are used todemonstrate the effectiveness of the proposed approach.1 IntroductionActive learning is a framework in which the learner has the freedom to selectwhich data points are added to its training set [11]. An active learner may be-gin with a small number of labeled instances, carefully select a few additionalinstances for which it requests labels, learn from the result of those requests,and then using its newly-gained knowledge, carefully choose which instances torequest next. More often than not, data in forms of text (including emails),image, multi-media are unlabeled, yet many supervised learning tasks need tobe performed [2, 10] in real-world applications. Active learning can significantlydecrease the number of required labeled instances, thus greatly reduce expertinvolvement. Ensemble methods are learning algorithms that construct a set ofclassifiers and then classify new instances by taking a weighted or unweightedvote of their predictions. An ensemble often has smaller expected loss or errorrate than any of the n individual (member) classifiers. A good ensemble is onewhose members are both accurate and diverse [4]. This work explores the rela-tionship between the two learning frameworks, attempts to take advantage ofthe learning performance of ensemble methods for active learning in a real-worldapplication, and studies how to construct ensembles for effective active learning.2 Our Approach2.1 Ensembles and Active LearningActive learning can be very useful where there are limited resources for label-ing data, and obtaining these lab els is time-consuming or difficult [11]. There2 Mandvikar, Liu & Motodaexist widely used active learning methods. Some examples are: Uncertainty sam-pling [7] selects the instance on which the current learner has lowest certainty;Pool-based sampling [9] selects the best instances from the entire pool of unla-beled instances; and Query-by-Committee [6, 12] selects instances that have highclassification variance themselves.Constructing good ensembles of classifiers has been one of the most activeareas of research in supervised learning [4]. The main discovery is that ensemblesare often much more accurate than the member classifiers that make them up.A necessary and sufficient condition for an ensemble to be more accurate thanany of its members is that the member classifiers are accurate and diverse. Twoclassifiers are diverse if they make different (or uncorrelated) errors on new datapoints. Many methods for constructing ensembles have been developed such asBagging [3] and Boosting [5]. We consider Bagging in this work as it is the moststraightforward way of manipulating the training data to form ensembles [4].Disagreement or diversity of classifiers are used for different purposes for thetwo learning frameworks: in generic ensemble learning, diversity of classifiers isused to ensure high accuracy by voting; in active learning, disagreement of classi-fiers is used to identify critical instances for labeling. In order for active learningto work effectively, we need a small number of highly accurate classifiers so thatthey seldom disagree with each other. Since ensemble methods have shown theirrobustness in producing highly accurate classifiers, we have investigated the useof class-specific ensembles (dual ensembles), and shown their effectiveness in ourprevious work [8]. Next, we empirically investigate whether it is necessary tofind compact dual ensembles and then we present a method to find them whilemaintaining good performance.2.2 Observations from Experiments on Benchmark Data SetsEnsemble’s goodness can be measured by accuracy and diversity. LetˆY (x) =ˆy1(x), ...ˆyn(x) be the set of the predictions made by member classifiers C1, ..., Cnof ensemble E on instance hx, yi where x is input, and y is the true class. Theensemble prediction of a uniform voting ensemble for input x under loss func-tion l is, ˆy(x) = argminy ∈YEc∈C[l(ˆyc(x), y]. The loss of an ensemble on instancehx, yi under loss function l is given by L(hx, yi) = l(ˆy(x), y). The diversity of anensemble on input x under loss function l is given by D = Ec∈C[l(ˆyc(x), ˆy(x))].The error rate for a data set with N instances can be calculated as e =1NPN1Li,where Liis the loss for instance xi. Accuracy of ensemble E is 1−e. Diversityis the expected loss incurred by the predictions of the member classifiers relativeto the ensemble prediction. Commonly used loss functions include square loss,absolute loss, and zero-one loss. We use zero-one loss in this work.The purpose of these experiments is to observe how diversity and error ratechange as ensemble size increases. We use benchmark data sets from the UCIrepository [1] in these experiments. We use Weka [13] implementation of Bag-ging [3] as the ensemble generation method and J4.8 (without pruning) as thebase learning algorithm. For each data set, we run Bagging with increasing en-semble sizes from 5 to 151 and record each ensemble’s error rate e and diversityCompact Dual Ensembles for Active Learning 3D. We run 10-fold cross validation and calculate the average values, e and D.We observed that as the ensemble sizes increase, diversity values increase andapproach to the maximum, and error rates decrease and become stable. Theresults show that smaller ensembles (with 30-70 classifiers) can achieve accuracyand diversity values similar to those of larger ensembles. We will now show aprocedure for selecting compact dual ensembles from larger ensembles.2.3 Selecting Compact Dual Ensembles via Feature SelectionThe experiments with the benchmark data sets show that there exist smallerensembles


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Please select your school