Linear Models for Classification: Introduction Sargur N. Srihari University at Buffalo, State University of New York USATopics • Regression vs Classification • Linear Classification Models • Converting probabilistic regression output to classification output • Three classes of classification models Machine Learning Srihari 2Regression vs Classification • Regression: – assign input vector x to one or more continuous target variables t • Classification: – Assign input vector x to one of K discrete classes Ck, k = 1, . . . ,K. • Ordinal Regression: – Discrete classes have an ordering • E.g., relevance score regression 3 Srihari Machine LearningLinear Classification Models • Common scenario: classes considered disjoint – Each input assigned to only one class • Input space divided into decision regions • Decision surfaces are linear functions of input x – D−1 dim. hyperplane within D dim. input space 4 Srihari Machine Learning !6 !4 !2 0 2 4 6!6!4!20246Data sets whose classes separated exactly by linear decision surface – Linearly separable data Straight line is1-D in 2-D A plane is 2-D in 3-DConverting Regression to Class Output • In regression target variable t is a real number (or vector of real numbers t) • In classification values of t represent class labels 5 Srihari Machine LearningProbabilistic Model of Classification • Two class: Binary representation is convenient – Discrete t ∈ {0, 1}, t = 1 represents C1, t = 0 means class C2 • Can interpret value of t as probability that class is C1 • Probabilities taking only extreme values of 0 and 1 • For K > 2 - Use a 1-of-K coding scheme. – t is a vector of length K • Eg. if K = 5, a pattern of class 2 has t = (0, 1, 0, 0, 0)T • Value of tk interpreted as probability of class Ck – If tk assume real values then we allow different class probabilities • For non-probabilistic models, e.g, nearest nbr – other choices of target variable representation used Srihari Machine LearningTwo Approaches to Classification 1. Discriminant function – Directly assign x to a specific class • E.g., Fisher’s Linear Disc, Perceptron 2. Probabilistic Models 1. Model p(Ck/x) in inference stage (direct or p(x|Ck) ) 2. Use it to make optimal decisions Separating Inference from Decision is better: • Minimize risk (loss function can change in financial app) • Reject option (minimize expected loss) • Compensate for unbalanced data – use modified balanced data & scale by class fractions • Combine models 7 Srihari Machine LearningTwo Prob. Models for Classification • Model p(Ck/x) in an inference stage and use it to make optimal decisions • Two approaches to computing the p(Ck/x) – Generative • Model class conditional densities by p(x|Ck) together with prior probabilities p(Ck) • Then use Bayes rule to compute posterior p(Ck|x) = p(x|Ck)p(Ck)/p(x) – Discriminative • Directly model conditional probabilities p(Ck|x) 8 Srihari Machine LearningConverting Linear Regression Model to Linear Classification • Linear Regression model y(x,w) is a linear function of parameters w • In simplest case also a linear function of variables x – Thus has the form y(x)=wTx+w0 • For classification we wish to obtain a discrete output or posterior probabilities in range (0,1) • Use generalized linear model y(x)=f(wTx+w0) Srihari Machine LearningGeneralized Linear Model y(x)=f(wTx+w0) • f(.) is known as the activation function • The decision surfaces correspond to y(x)=constant or wTx+w0=constant – So decision boundaries are linear in feature space x even if f(.) is nonlinear, e.g., a step function – Hence called generalized linear model • However no longer linear in parameter space w due to presence of f(.) – Leads to more complex models for classification than regression 10 Srihari Machine Learning !6 !4 !2 0 2 4 6!6!4!20246Overview of Linear Classifiers 1. Discriminant Functions – Two class and Multi class – Least squares for classification – Fisher’s linear discriminant – Perceptron algorithm 2. Probabilistic Generative Models – Continuous inputs and max likelihood – Discrete inputs, Exponential Family 3. Probabilistic Discriminative Models – Logistic regression for single and multi class – Laplace approximation – Bayesian logistic regression 11 Srihari Machine
View Full Document