DOC PREVIEW
Rutgers University CS 536 - INTRODUCTION TO Machine Learning

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER10:LinearDiscriminationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) Discriminant-based: Assume a model for gi(x|Φi); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Linear Discriminant Linear discriminant: Advantages: Simple: O(d) space/computation  Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Generalized Linear Model Quadratic discriminant: Higher-order (product) terms:Map from x to z using nonlinear basis functions and use a linear discriminant in z-spaceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7GeometryLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Multiple ClassesClasses arelinearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Pairwise SeparationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10From Discriminants to Posteriors When p (x | Ci ) ~ N ( µi, ∑)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Sigmoid (Logistic) FunctionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Gradient-Descent E(w|X) is error with parameters w on sample X Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradientLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14Gradient-Descentwtwt+1ηE (wt)E (wt+1)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Logistic Discrimination Two classes: Assume log likelihood ratio is linearLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Training: Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Training: Gradient-DescentLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)1910100 1000Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20K>2 ClassessoftmaxLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22ExampleLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Generalizing the Linear Model Quadratic: Sum of basis functions:where φ(x) are basis functions  Kernels in SVM Hidden units in neural networksLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Discrimination by Regression Classes are NOT mutually exclusive and exhaustiveLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Optimal Separating Hyperplane(Cortes and Vapnik, 1995; Vapnik, 1995)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix and to max marginLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)27Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)28Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)29Most αtare 0 and only a small number have αt>0; they are the support vectorsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)30Soft Margin Hyperplane Not linearly separable Soft error New primal isLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)31Kernel Machines Preprocess input x by basis functions The SVM solutionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)32Kernel Functions Polynomials of degree q: Radial-basis functions: Sigmoidal functions:(Cherkassky and Mulier, 1998)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)33SVM for Regression Use a linear model (possibly kernelized) Use the є-sensitive error


View Full Document
Download INTRODUCTION TO Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view INTRODUCTION TO Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view INTRODUCTION TO Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?