INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER10:LinearDiscriminationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) Discriminant-based: Assume a model for gi(x|Φi); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Linear Discriminant Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Generalized Linear Model Quadratic discriminant: Higher-order (product) terms:Map from x to z using nonlinear basis functions and use a linear discriminant in z-spaceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7GeometryLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Multiple ClassesClasses arelinearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Pairwise SeparationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10From Discriminants to Posteriors When p (x | Ci ) ~ N ( µi, ∑)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Sigmoid (Logistic) FunctionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Gradient-Descent E(w|X) is error with parameters w on sample X Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradientLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14Gradient-Descentwtwt+1ηE (wt)E (wt+1)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Logistic Discrimination Two classes: Assume log likelihood ratio is linearLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Training: Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Training: Gradient-DescentLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)1910100 1000Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20K>2 ClassessoftmaxLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22ExampleLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Generalizing the Linear Model Quadratic: Sum of basis functions:where φ(x) are basis functions Kernels in SVM Hidden units in neural networksLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Discrimination by Regression Classes are NOT mutually exclusive and exhaustiveLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Optimal Separating Hyperplane(Cortes and Vapnik, 1995; Vapnik, 1995)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix and to max marginLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)27Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)28Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)29Most αtare 0 and only a small number have αt>0; they are the support vectorsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)30Soft Margin Hyperplane Not linearly separable Soft error New primal isLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)31Kernel Machines Preprocess input x by basis functions The SVM solutionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)32Kernel Functions Polynomials of degree q: Radial-basis functions: Sigmoidal functions:(Cherkassky and Mulier, 1998)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)33SVM for Regression Use a linear model (possibly kernelized) Use the є-sensitive error
View Full Document