Unformatted text preview:

ICS 278: Data Mining Lectures 9 and 10: Classification AlgorithmsNotationClassificationClassification v. RegressionDecision Region TerminlogyProbabilistic view of ClassificationExample of Probabilistic ClassificationSlide 8Slide 9Decision Regions and Bayes Error RateSlide 11Procedure for optimal Bayes classifier3 categories of classifiers in generalWhich type of classifier is appropriate?Example: classifying spam emailExamples of classifiersNaïve Bayes ClassifiersLink between Logistic Regression and Naïve BayesImbalanced Class DistributionsRanking and Lift CurvesSlide 21ROC plotsROC ExampleROC PlotExample: Link Prediction in Coauthor Graphs O Madadhain, Hutchins, Smyth, SIGKDD, 2005Evaluation MetricsLink Prediction EvaluationSlide 28Slide 29Lift Curves for Different ModelsInterpretation of Ranking at Top of Ranked ListSlide 32CalibrationCalibration in Probabilistic PredictionLinear DiscriminantsNearest Neighbor ClassifiersLocal Decision BoundariesFinding the Decision BoundariesSlide 39Slide 40Overall Boundary = Piecewise LinearExample: Choosing k in kNN (example from G. Ridgeway, 2003)Decision Tree ClassifiersDecision Tree ExampleSlide 45Slide 46Slide 47Slide 48Decision Tree PseudocodeBinary split selection criteriaComputational Complexity for a Binary TreeSplitting on a nominal attributeHow to Choose the Right-Sized Tree?Choosing a Good Tree for PredictionExample: Spam Email ClassificationSlide 56Slide 57Treating Missing Data in TreesOther Issues with Classification TreesWhy Trees are widely used in PracticeLimitations of TreesDecision Trees are not stableExample of Tree Instability 2 trees fit to 2 splits of data, from G. Ridgeway, 2003Model AveragingSlide 65Bagging for Combining ClassifiersSlide 67Slide 68Support Vector MachinesExperiments by Komarek and Moore, 2005Accuracies and Training Time Komarek and Moore, 2005Slide 72Summary on ClassifiersSlide 74Slide 75Slide 76Slide 77Slide 78Slide 79Slide 80Slide 81Software (same as for Regression)ReadingBackup Slides (not used)Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineICS 278: Data MiningLectures 9 and 10: Classification AlgorithmsPadhraic SmythDepartment of Information and Computer ScienceUniversity of California, IrvineData Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineNotation•Variables X, C….. with values x, y (lower case)•Vectors indicated by X •Components of X indicated by Xj with values xj •“Matrix” data set D with n rows and p columns–jth column contains values for variable Xj –ith row contains a vector of measurements on object i, indicated by x(i)–The jth measurement value for the ith object is xj(i)•Unknown parameter for a model = –Can also use other Greek letters, like ew–Vector of parameters = Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineClassification•Predictive modeling: predict Y given X–Y is real-valued => regression–Y is categorical => classification•Often use C rather than Y to indicate the “class variable”•Classification–Many applications: speech recognition, document classification, OCR, loan approval, face recognition, etcData Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineClassification v. Regression•Similar in many ways…–both learn a mapping from X to C or Y–Both sensitive to dimensionality of X –Generalization to new data is important in both•Test error versus model complexity–Many models can be used for either classification or regression, e.g.,•trees, neural networks•Most important differences–Categorical Y versus real-valued Y–Different score functions•E.g., classification error versus squared errorData Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineDecision Region Terminlogy2 3 4 5 6 7 8 9 10-10123456Feature 1Feature 2TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACEDecisionRegion 2 DecisionRegion 1 DecisionBoundaryData Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineProbabilistic view of Classification•Notation: let there be K classes c1,…..cK•Class marginals: p(ck) = probability of class k•Class-conditional probabilities p( x | ck ) = probability of x given ck , k = 1,…K•Posterior class probabilities (by Bayes rule) p( ck | x ) = p( x | ck ) p(ck) / p(x) , k = 1,…K where p(x) =  p( x | cj ) p(cj) In theory this is all we need….in practice this may not be best approach.Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )p( c1 | x )10.50Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )p( c1 | x )10.50Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineDecision Regions and Bayes Error Ratep( x | c1 )p( x | c2 )Class c1Class c2Class c1Class c2Class c2Optimal decision regions = regions where 1 class is more likelyOptimal decision regions  optimal decision boundariesData Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineDecision Regions and Bayes Error Ratep( x | c1 )p( x | c2 )Class c1Class c2Class c1Class c2Class c2Optimal decision regions = regions where 1 class is more likelyOptimal decision regions  optimal decision boundariesBayes error rate = fraction of examples misclassified by optimal classifier = shaded area above (see equation 10.3 in text)Data Mining Lectures Lectures 9/10: Classification Padhraic Smyth, UC IrvineProcedure for optimal Bayes classifier•For each class learn a model p( x | ck )–E.g., each class is multivariate Gaussian with its own mean and covariance •Use Bayes rule to obtain p( ck | x ) => this yields the optimal decision regions/boundaries => use these decision regions/boundaries for classification•Correct in theory…. but practical problems include:–How do we model p( x | ck ) ?–Even if we know the model for p( x | ck ), modeling a distribution


View Full Document

UCI ICS 278 - Classification Algorithms

Download Classification Algorithms
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Classification Algorithms and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Classification Algorithms 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?