WSU CSE 6363 - Ensembles of Classifiers

Unformatted text preview:

1Ensembles of ClassifiersLarry HolderCSE 6363 – Machine LearningComputer Science and EngineeringUniversity of Texas at Arlington2References Dietterich, “Machine Learning Research: Four Current Directions,” AI Magazine, pp. 97-105, Winter 1997.3Learning Task Given a set S of training examples {(x1,y1),…,(xm,ym)} Sampled from unknown function y = f(x) Each xiis a feature vector <xi,1,…xi,n> of ndiscrete or real-valued features Class y є {1,…,K} Example may contain noise Find hypothesis happroximating f4Ensemble of Classifiers Goal Improve accuracy of supervised learning task Approach Use an ensemble of classifiers, rather than just one Challenges How to construct ensemble How to use individual hypotheses of ensemble to produce a classification5Ensembles of Classifiers Given ensemble of L classifiers h1,…,hL Decisions based on combination of individual hl E.g., weighted or unweighted voting How to construct ensemble whose accuracy is better than any individual classifier?6Ensembles of Classifiers Ensemble requirements Individual classifiers disagree Each classifier’s error < 0.5 Classifiers’ errors uncorrelated THEN, ensemble will outperform any hl7Ensembles of Classifiers (Fig. 1)P(l of 21 hypotheses errant)Each hypothesis has error 0.3Errors independentP(11 or more errant) = 0.0268Constructing Ensembles Sub-sampling the training examples One learning algorithm run on different sub-samples of training to produce different classifiers Works well for unstable learners, i.e., output classifier undergoes major changes given only small changes in training data Unstable learners Decision tree, neural network, rule learners Stable learners Linear regression, nearest-neighbor, linear-threshold (perceptron)9Sub-sampling the Training Set Methods Cross-validated committeesk-fold cross-validation to generate kdifferent training sets Learn kclassifiers Bagging Boosting10Bagging Given mtraining examples Construct Lrandom samples of size mwith replacement Each sample called a bootstrap replicate On average, each replicate contains 63.2% of training data Learn a classifier hlfor each of the Lsamples11Boosting Each of the mtraining examples weighted according to classification difficulty pl(x) Initially uniform: 1/m Training sample of size mfor iteration l drawn with replacement according to distribution pl(x) Learner biased toward higher-weight training examples – if learner can use pl(x) Error εlof classifier hlused to bias pl+1(x) Learn L classifiers Each used to modify weights for next learned classifier Final classifier a weighted vote of individual classifiers12AdaBoost (Fig. 2)13C4.5 with/without BoostingEach point represents1 of 27 test domains.14C4.5 with/without Bagging15Boosting vs. Bagging16Constructing Ensembles Manipulating input features Classifiers constructed using different subsets of features Works only when some redundancy in features17Constructing Ensembles Manipulating Output Targets When large number K of classes Generate L binary partitions of K classes Generate L classifiers for these 2-class problems Classify according to class whose partitions received most votes Similar to error-correcting codes Generally improves performance18Constructing Ensembles Injecting Randomness Multiple neural nets with different random initial weights Randomly-selected split attribute among top 20 in C4.5 Randomly-selected condition among top 20% in FOIL (Prolog rule learner) Adding Gaussian noise to input features Make random modifications to current hand use these classifiers weighted by their posterior probability (accuracy on training set)19Constructing Ensembles using Neural Networks Train multiple neural networks minimizing error and correlation with other networks’ predictions Use a genetic algorithm to generate multiple, diverse networks Have networks also predict various sub-tasks (e.g., one of the input features)20Constructing Ensembles Use several different types of learning algorithms E.g., decision tree, neural network, nearest neighbor Some learners’ error rates may be bad (i.e., > 0.5) Some learners’ predictions may be correlated Need to check using, e.g., cross-validation21Combining Classifiers Unweighted vote Majority vote If hlproduce class probability distributions P(f(x)=k | hl) Weighted vote Classifier weights proportional to accuracy on training data Learning combination Gating function (learn classifier weights) Stacking (learn how to vote)∑====LllhkxfPLkxfP1)|)((1))((22Why Ensembles Work Uncorrelated errors made by individual classifiers can be overcome by voting How difficult is it to find a set of uncorrelated classifiers? Why can’t we find a single classifier that does as well?23Finding Good Ensembles Typical hypothesis spaces Hare large Need are large number (ideally lg(|H|)) of training examples to narrow the search through H Typically, sample Sof size m << lg(|H|) The subset of hypotheses Hconsistent with Sforms a good ensemble24Finding Good Ensembles Typical learning algorithms Lemploy greedy search Not guaranteed to find optimal hypothesis (minimal size and/or minimal error) Generating hypotheses using different perturbations of Lproduces good ensembles25Finding Good Ensembles Typically, the hypothesis space Hdoes not contain the target function f Weighted combinations of several approximations may represent classifiers outside of HDecision surfacesdefined by learneddecision trees.Decision surfacedefined by vote overLearned decision trees.26Summary Advantages Ensemble of classifiers typically outperforms any one classifier Disadvantages Difficult to measure correlation between classifiers from different types of learners Learning time and memory constraints Learned concept difficult to


View Full Document

WSU CSE 6363 - Ensembles of Classifiers

Download Ensembles of Classifiers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Ensembles of Classifiers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Ensembles of Classifiers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?