Weka Just do itOverviewLearning TasksData Format: IRISJ48 = Decision TreeCross-validationJ48 Confusion MatrixPrecision, Recall, and AccuracyOther Evaluation SchemesBootstrap samplingWekaJust do itFree and Open SourceML SuiteIan Witten & Eibe FrankUniversity of WaikatoNew ZealandOverview•Classifiers, Regressors, and clusterers•Multiple evaluation schemes•Bagging and Boosting•Feature Selection: –right features and data key to successful learning•Experimenter•Visualizer•Text not up to date.•They welcome additions.Learning Tasks•Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples.•Regression: given examples labelled with a real value, generate procedure for labelling unseen examples.•Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.Data Format: IRIS@RELATION iris@ATTRIBUTE sepallength REAL@ATTRIBUTE sepalwidth REAL@ATTRIBUTE petallength REAL@ATTRIBUTE petalwidth REAL@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}@DATA5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosaEtc.General from @atttribute attribute-name REAL or list of valuesJ48 = Decision Treepetalwidth <= 0.6: Iris-setosa (50.0) : # under nodepetalwidth > 0.6 # ..number wrong| petalwidth <= 1.7| | petallength <= 4.9: Iris-versicolor (48.0/1.0)| | petallength > 4.9| | | petalwidth <= 1.5: Iris-virginica (3.0)| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)| petalwidth > 1.7: Iris-virginica (46.0/1.0)Cross-validation•Correctly Classified Instances 143 95.3%•Incorrectly Classified Instances 7 4.67 %•Default 10-fold cross validation i.e.–Split data into 10 equal sized pieces–Train on 9 pieces and test on remainder–Do for all possibilities and averageJ48 Confusion Matrix Old data set from statistics: 50 of each class a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 3 47 | c = Iris-virginicaPrecision, Recall, and Accuracy•Precision: probability of being correct given that your decision.–Precision of iris-setosa is 49/49 = 100%–Specificity in medical literature•Recall: probability of correctly identifying class.–Recall accuracy for iris-setosa is 49/50 = 98%–Sensitity in medical literature•Accuracy: # right/total = 143/150 =~95%Other Evaluation Schemes•Leave-one-out cross-validation–Cross-validation where n = number of training instanced•Specific train and test set–Allows for exact replication–Ok if train/test large, e.g. 10,000 range.Bootstrap sampling•Randomly select n with replacement from n•Expect about 2/3 to be chosen for training–Prob of not chosen = (1-1/n)^n ~ 1/e.•Testing on remainder•Repeat about 30 times and average.•Avoids partition
View Full Document