Rutgers University CS 536 - INTRODUCTION TO Machine Learning - D2238122

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> INTRODUCTION TO Machine Learning

DOC PREVIEW

Rutgers University CS 536 - INTRODUCTION TO Machine Learning

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER10:LinearDiscriminationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) Discriminant-based: Assume a model for gi(x|Φi); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Linear Discriminant Linear discriminant: Advantages: Simple: O(d) space/computation  Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Generalized Linear Model Quadratic discriminant: Higher-order (product) terms:Map from x to z using nonlinear basis functions and use a linear discriminant in z-spaceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7GeometryLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Multiple ClassesClasses arelinearly separableLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Pairwise SeparationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10From Discriminants to Posteriors When p (x | Ci ) ~ N ( µi, ∑)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Sigmoid (Logistic) FunctionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Gradient-Descent E(w|X) is error with parameters w on sample X Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradientLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14Gradient-Descentwtwt+1ηE (wt)E (wt+1)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Logistic Discrimination Two classes: Assume log likelihood ratio is linearLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Training: Two ClassesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Training: Gradient-DescentLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)1910100 1000Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20K>2 ClassessoftmaxLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22ExampleLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Generalizing the Linear Model Quadratic: Sum of basis functions:where φ(x) are basis functions  Kernels in SVM Hidden units in neural networksLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Discrimination by Regression Classes are NOT mutually exclusive and exhaustiveLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Optimal Separating Hyperplane(Cortes and Vapnik, 1995; Vapnik, 1995)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix and to max marginLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)27Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)28Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)29Most αtare 0 and only a small number have αt>0; they are the support vectorsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)30Soft Margin Hyperplane Not linearly separable Soft error New primal isLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)31Kernel Machines Preprocess input x by basis functions The SVM solutionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)32Kernel Functions Polynomials of degree q: Radial-basis functions: Sigmoidal functions:(Cherkassky and Mulier, 1998)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)33SVM for Regression Use a linear model (possibly kernelized) Use the є-sensitive error

View Full Document