UB CSE 574 - Linear Models for Classification - Introduction - D465006

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Linear Models for Classification - Introduction

DOC PREVIEW

UB CSE 574 - Linear Models for Classification - Introduction

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Linear Models for Classification: Introduction Sargur N. Srihari University at Buffalo, State University of New York USATopics • Regression vs Classification • Linear Classification Models • Converting probabilistic regression output to classification output • Three classes of classification models Machine Learning Srihari 2Regression vs Classification • Regression: – assign input vector x to one or more continuous target variables t • Classification: – Assign input vector x to one of K discrete classes Ck, k = 1, . . . ,K. • Ordinal Regression: – Discrete classes have an ordering • E.g., relevance score regression 3 Srihari Machine LearningLinear Classification Models • Common scenario: classes considered disjoint – Each input assigned to only one class • Input space divided into decision regions • Decision surfaces are linear functions of input x – D−1 dim. hyperplane within D dim. input space 4 Srihari Machine Learning !6 !4 !2 0 2 4 6!6!4!20246Data sets whose classes separated exactly by linear decision surface – Linearly separable data Straight line is1-D in 2-D A plane is 2-D in 3-DConverting Regression to Class Output • In regression target variable t is a real number (or vector of real numbers t) • In classification values of t represent class labels 5 Srihari Machine LearningProbabilistic Model of Classification • Two class: Binary representation is convenient – Discrete t ∈ {0, 1}, t = 1 represents C1, t = 0 means class C2 • Can interpret value of t as probability that class is C1 • Probabilities taking only extreme values of 0 and 1 • For K > 2 - Use a 1-of-K coding scheme. – t is a vector of length K • Eg. if K = 5, a pattern of class 2 has t = (0, 1, 0, 0, 0)T • Value of tk interpreted as probability of class Ck – If tk assume real values then we allow different class probabilities • For non-probabilistic models, e.g, nearest nbr – other choices of target variable representation used Srihari Machine LearningTwo Approaches to Classification 1. Discriminant function – Directly assign x to a specific class • E.g., Fisher’s Linear Disc, Perceptron 2. Probabilistic Models 1. Model p(Ck/x) in inference stage (direct or p(x|Ck) ) 2. Use it to make optimal decisions Separating Inference from Decision is better: • Minimize risk (loss function can change in financial app) • Reject option (minimize expected loss) • Compensate for unbalanced data – use modified balanced data & scale by class fractions • Combine models 7 Srihari Machine LearningTwo Prob. Models for Classification • Model p(Ck/x) in an inference stage and use it to make optimal decisions • Two approaches to computing the p(Ck/x) – Generative • Model class conditional densities by p(x|Ck) together with prior probabilities p(Ck) • Then use Bayes rule to compute posterior p(Ck|x) = p(x|Ck)p(Ck)/p(x) – Discriminative • Directly model conditional probabilities p(Ck|x) 8 Srihari Machine LearningConverting Linear Regression Model to Linear Classification • Linear Regression model y(x,w) is a linear function of parameters w • In simplest case also a linear function of variables x – Thus has the form y(x)=wTx+w0 • For classification we wish to obtain a discrete output or posterior probabilities in range (0,1) • Use generalized linear model y(x)=f(wTx+w0) Srihari Machine LearningGeneralized Linear Model y(x)=f(wTx+w0) • f(.) is known as the activation function • The decision surfaces correspond to y(x)=constant or wTx+w0=constant – So decision boundaries are linear in feature space x even if f(.) is nonlinear, e.g., a step function – Hence called generalized linear model • However no longer linear in parameter space w due to presence of f(.) – Leads to more complex models for classification than regression 10 Srihari Machine Learning !6 !4 !2 0 2 4 6!6!4!20246Overview of Linear Classifiers 1. Discriminant Functions – Two class and Multi class – Least squares for classification – Fisher’s linear discriminant – Perceptron algorithm 2. Probabilistic Generative Models – Continuous inputs and max likelihood – Discrete inputs, Exponential Family 3. Probabilistic Discriminative Models – Logistic regression for single and multi class – Laplace approximation – Bayesian logistic regression 11 Srihari Machine

View Full Document