UCI ICS 278 - Classification Algorithms - D58919

Home> Schools> University of California, Irvine> (ICS) > ICS 278> Classification Algorithms

UCI ICS 278 - Classification Algorithms

School name University of California, Irvine

Course Ics 278- Principles of Data Mining

Pages 59

Download Save

Unformatted text preview:

ICS 278: Data Mining Lectures 7 and 8: Classification AlgorithmsNotationClassificationClassification v. RegressionDecision Region TerminlogyProbabilistic view of ClassificationExample of Probabilistic ClassificationSlide 8Slide 9Decision Regions and Bayes Error RateSlide 11Procedure for optimal Bayes classifierThree types of classifiersWhich type of classifier is appropriate?Example: classifying spam emailClasses of classifiersNaïve Bayes ClassifiersAnnouncementsLink between Logistic Regression and Naïve BayesLinear Discriminant ClassifiersNearest Neighbor ClassifiersLocal Decision BoundariesFinding the Decision BoundariesSlide 24Slide 25Overall Boundary = Piecewise LinearDecision Tree ClassifiersDecision Tree ExampleSlide 29Slide 30Slide 31Slide 32Decision Trees are not stableDecision Tree PseudocodeBinary split selection criteriaComputational Complexity for a Binary TreeSplitting on a nominal attributeHow to Choose the Right-Sized Tree?Choosing a Good Tree for PredictionExample: Spam Email ClassificationSlide 41Slide 42Treating Missing Data in TreesOther Issues with Classification TreesWhy Trees are widely used in PracticeLimitations of TreesEvaluating Classification Results (in general)Bagging for Combining ClassifiersSlide 49Slide 50Support Vector Machines (will be discussed again later)Summary on ClassifiersSlide 53Slide 54Slide 55Slide 56Slide 57Software (same as for Regression)ReadingData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineICS 278: Data MiningLectures 7 and 8: Classification AlgorithmsPadhraic SmythDepartment of Information and Computer ScienceUniversity of California, IrvineData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineNotation•Variables X, Y….. with values x, y (lower case)•Vectors indicated by X •Components of X indicated by Xj with values xj •“Matrix” data set D with n rows and p columns–jth column contains values for variable Xj –ith row contains a vector of measurements on object i, indicated by x(i)–The jth measurement value for the ith object is xj(i)•Unknown parameter for a model = –Can also use other Greek letters, like ew–Vector of parameters = Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineClassification•Predictive modeling: predict Y given X–Y is real-valued => regression–Y is categorical => classification•Classification–Many applications: speech recognition, document classification, OCR, loan approval, face recognition, etcData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineClassification v. Regression•Similar in many ways…–both learn a mapping from X to Y–Both sensitive to dimensionality of X –Generalization to new data is important in both•Test error versus model complexity–Many models can be used for either classification or regression, e.g.,•Trees, neural networks•Most important differences–Categorical Y versus real-valued Y–Different score functions•E.g., classification error versus squared errorData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineDecision Region Terminlogy2 3 4 5 6 7 8 9 10-10123456Feature 1Feature 2TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACEDecisionRegion 2 DecisionRegion 1 DecisionBoundaryData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineProbabilistic view of Classification•Notation: let there be K classes c1,…..cK•Class marginals: p(ck) = probability of class k•Class-conditional probabilities p( x | ck ) = probability of x given ck , k = 1,…K•Posterior class probabilities (by Bayes rule) p( ck | x ) = p( x | ck ) p(ck) / p(x) , k = 1,…K where p(x) =  p( x | cj ) p(cj) In theory this is all we need….in practice this may not be best approach.Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )p( c1 | x )10.50Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineExample of Probabilistic Classificationp( x | c1 )p( x | c2 )p( c1 | x )10.50Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineDecision Regions and Bayes Error Ratep( x | c1 )p( x | c2 )Class c1Class c2Class c1Class c2Class c2Optimal decision regions = regions where 1 class is more likelyOptimal decision regions  optimal decision boundariesData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineDecision Regions and Bayes Error Ratep( x | c1 )p( x | c2 )Class c1Class c2Class c1Class c2Class c2Optimal decision regions = regions where 1 class is more likelyOptimal decision regions  optimal decision boundariesBayes error rate = fraction of examples misclassified by optimal classifier = shaded area above (see equation 10.3 in text)Data Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineProcedure for optimal Bayes classifier•For each class learn a model p( x | ck )–E.g., each class is multivariate Gaussian with its own mean and covariance •Use Bayes rule to obtain p( ck | x ) => this yields the optimal decision regions/boundaries => use these decision regions/boundaries for classification•Correct in theory…. but practical problems include:–How do we model p( x | ck ) ?–Even if we know the model for p( x | ck ), modeling a distribution or density will be very difficult in high dimensions (e.g., p = 100)•Alternative approach: model the decision boundaries directlyData Mining Lectures Lecture 7: Classification Padhraic Smyth, UC IrvineThree types of classifiers•Generative (or class-conditional) classifiers:–Learn models for p( x | ck ), use Bayes rule to find decision boundaries–Examples: naïve Bayes models, Gaussian classifiers•Regression (or posterior class probabilities):–Learn a model for p( ck | x ) directly–Example: logistic regression (see lecture 5/6), neural networks•Discriminative classifiers–No probabilities–Learn the decision boundaries directly–Examples:•Linear

View Full Document


School:
Email:
New Password:
Confirm Password:

UCI ICS 278 - Classification Algorithms

Sign up for free to view:

Please select your school