BU CS 565 - Classification - D1294890

Home> Schools> Boston University> Computer Science (CS) > CS 565> Classification

DOC PREVIEW

BU CS 565 - Classification

School name Boston University

Course Cs 565- Advanced Java Programming

Pages 46

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 46 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture outlineSlide 2What is classification?Slide 4Why classification?Typical applicationsGeneral approach to classificationSlide 8Evaluation of classification modelsSupervised vs. Unsupervised LearningDecision TreesTraining DatasetOutput: A Decision Tree for “buys_computer”Constructing decision treesConstructing decision trees: the Hunt’s algorithmDecision-tree construction (Example)Design issuesSplitting methodsSlide 19Slide 20Slide 21Selecting the best splitSlide 23Selecting the best split: Impurity measuresRange of impurity measuresImpurity measuresComputing gain: exampleIs minimizing impurity/ maximizing Δ enough?Slide 29Gain ratioConstructing decision-trees (pseudocode)Stopping criteria for tree inductionAdvantages of decision treesExample: C4.5 algorithmPractical problems with classificationUnderfitting and overfittingOverfitting and underfittingOverfitting due to noiseOverfitting due to insufficient samplesOverfitting: course of actionMethods for estimating the errorAddressing overfitting: Occam’s razorAddressing overfitting: postprunningAddressing overfitting: preprunningDecision boundary for decision treesOblique Decision TreesLecture outline•Classification•Decision-tree classificationWhat is classification?What is classification?•Classification is the task of learning a target function f that maps attribute set x to one of the predefined class labels yWhat is classification?Why classification?•The target function f is known as a classification model•Descriptive modeling: Explanatory tool to distinguish between objects of different classes (e.g., description of who can pay back his loan)•Predictive modeling: Predict a class of a previously unseen record•credit approval•target marketing•medical diagnosis•treatment effectiveness analysisTypical applicationsGeneral approach to classification•Training set consists of records with known class labels•Training set is used to build a classification model•The classification model is applied to the test set that consists of records with unknown labelsGeneral approach to classificationEvaluation of classification models•Counts of test records that are correctly (or incorrectly) predicted by the classification model•Confusion matrixClass = 1 Class = 0Class = 1 f11f10Class = 0 f01f00Predicted ClassActual Class000110110011sprediction of # totalspredictioncorrect #Accuracyffffff000110110110sprediction of # totalsprediction wrong# rateError ffffffSupervised vs. Unsupervised Learning•Supervised learning (classification)–Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations–New data is classified based on the training set•Unsupervised learning (clustering)–The class labels of training data is unknown–Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the dataDecision Trees•Decision tree –A flow-chart-like tree structure–Internal node denotes a test on an attribute–Branch represents an outcome of the test–Leaf nodes represent class labels or class distribution•Decision tree generation consists of two phases–Tree construction•At start, all the training examples are at the root•Partition examples recursively based on selected attributes–Tree pruning•Identify and remove branches that reflect noise or outliers•Use of decision tree: Classifying an unknown sample–Test the attribute values of the sample against the decision treeTraining DatasetOutput: A Decision Tree for “buys_computer”age?overcaststudent? credit rating?no yesfairexcellent<=30>40no noyes yesyes30..40Constructing decision trees•Exponentially many decision trees can be constructed from a given set of attributes•Finding the most accurate tree is NP-hard•In practice: greedy algorithms•Grow a decision tree by making a series of locally optimum decisions on which attributes to use for partitioning the dataConstructing decision trees: the Hunt’s algorithm•Xt: the set of training records for node t•y={y1,…,yc}: class labels•Step 1: If all records in Xt belong to the same class yt, then t is a leaf node labeled as yt•Step 2: If Xt contains records that belong to more than one class, –select attribute test condition to partition the records into smaller subsets–Create a child node for each outcome of test condition–Apply algorithm recursively for each childDecision-tree construction (Example)Design issues•How should the training records be split?•How should the splitting procedure stop?Splitting methods•Binary attributesSplitting methods•Nominal attributesSplitting methods•Ordinal attributesSplitting methods•Continuous attributesSelecting the best split•p(i|t): fraction of records belonging to class i•Best split is selected based on the degree of impurity of the child nodes–Class distribution (0,1) has high purity–Class distribution (0.5,0.5) has the smallest purity (highest impurity)•Intuition: high purity  small value of impurity measures  better splitSelecting the best splitSelecting the best split: Impurity measures•p(i|t): fraction of records associated with node t belonging to class icitiptipt1)|(log)|()(Entropy citipt12)|(1)(Gini )|(max1)(errortion Classifica tiptiRange of impurity measuresImpurity measures•In general the different impurity measures are consistent•Gain of a test condition: compare the impurity of the parent node with the impurity of the child nodes •Maximizing the gain == minimizing the weighted average impurity measure of children nodes•If I() = Entropy(), then Δinfo is called information gainkjjjvINvNparentI1)()()(Computing gain: exampleIs minimizing impurity/ maximizing Δ enough?Is minimizing impurity/ maximizing Δ enough?•Impurity measures favor attributes with large number of values•A test condition with large number of outcomes may not be desirable–# of records in each partition is too small to make predictionsGain ratio•Gain ratio = Δinfo/Splitinfo•SplitInfo = -Σi=1…kp(vi)log(p(vi))•k: total number of splits•If each attribute has the same number of records, SplitInfo = logk •Large number of splits  large SplitInfo  small gain ratioConstructing decision-trees (pseudocode)GenDecTree(Sample S, Features F)1. If stopping_condition(S,F) = true thena.

View Full Document