UB CSE 555 - Linear Discriminant Functions - D2130722

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 555> Linear Discriminant Functions

DOC PREVIEW

UB CSE 555 - Linear Discriminant Functions

School name University at Buffalo, The State University of New York

Course Cse 555- Introduction to Pattern Recognition

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CSE555: SrihariLinear Discriminant FunctionsMinimum Squared Error Procedures:Pseudoinverse and Widrow-HoffHo-Kashyap ProceduresCSE555: SrihariNon-separable Behavior• Perceptron is an error-correcting procedure– Modify weight only when an error is encountered– Success is due to relentless search for error-free solution• Even if separating vector is found does not follow it will perform well on independent test data– Any set of fewer than 2d samples is likely to be linearly separableCSE555: SrihariNon-separable Case• When design sets are large, points are most likely not linearly separable• How does error correction procedures behave when points are not linearly separable?– Corrections will never cease in an error correction procedure– Infinite sequence of weight vectors• Weight vectors are bounded– Empirical rule based on weight vector fluctuating near a terminal value– Averaging weight vectors can reduce risk of obtaining a bad solutionCSE555: SrihariMinimum Squared Error (MSE) Procedures• Criterion function considered so far involve only misclassified samples• Now consider a criterion that involves all of the samples, not just misclassified ones• Previously we were interested in making all of the inner products atyipositive• Now try to make atyi= biwhere biare some arbitrarily specified positive constants• Thus replace the problem of solving a set of linear inequalities with more stringent but better understood problem of finding a solution to a set of linear equationsCSE555: SrihariMinimum Squared Error (MSE) FormulationNote thatdimensionalityis d+1by introducing0 sub-scriptWeight vectorto be determinedwhich we wish to minimize.Can be solved either by a gradient descent procedure ora closed form solution based on gradient Data MatrixCSE555: SrihariMSE Solution based on PseudoinverseClosed form solutionCSE555: SrihariExample of Linear Classifier by Pseudoinverse ω1: (1,2)tand (2,0)t ω2: (3,1)tand (2,3)t0121=⎟⎟⎟⎠⎞⎜⎜⎜⎝⎛xxatx1x1⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡−−−−−−==⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡−−−−−−=−∗3/16/112/703/102/16/12/14/312/134/5)(3211310212111 ttYYYYY⎥⎥⎥⎦⎤⎢⎢⎢⎣⎡−−==⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡=3/23/43/11 issolution our 1111 AssumingbYabtPseudo-inverseSample Matrix (d = 1+2, n = 4)CSE555: SrihariProperties of MSE Procedure1. Related to Fisher’s Linear Discriminant2. Asymptotic approximation to Bayesdiscriminant function3. Can be formulated as a gradient descent procedureCSE555: Srihari1. MSE Relationship to Fisher’s Linear Discriminant• Show that with proper choice of the vector b the MSE discriminant function aty is directly related to Fisher’s linear discriminant• Assume first n1samples are labelled ω1and second n2samples are labelled ω2⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡=⎥⎦⎤⎢⎣⎡=⎥⎦⎤⎢⎣⎡−=22110i1221111 with and inglycorrespond and Partition labeled samples theare rows sematrix who a is andones tor column vec a is1 where11nnnnbwwabaXnXXYiiωThis special choice of blinks the MSE solutionto Fisher’s LinearDiscriminantCSE555: SrihariMSE and Fisher’s Linear Discriminant• Define sample means miand pooled sample scatter matrix Sw• and plug into MSE formulation yieldswhere αis a scalar• which is identical to the solution to the Fisher’s linear discriminant except for a scale factor• Decision rule: Decide ω1if wt(x-m)>0; otherwise decide ω2tiiDxiwDxiimxmxSixnmii))((2,1 , 1−−===∑∑∑∈∈)(211mmnSww−=−αCSE555: SrihariMSE Relationship to Bayes• If b=1nMSE approaches a minimum squared error approximation to Bayesdiscriminant functiong0(x) = P(ω1|x) - P(ω2|x)in the limit as number of samples approaches infinityCSE555: Srihari2. MSE Approximation to Bayes• If b=1nMSE solution approaches a minumum mean squared approximation to Bayes discriminant functionClass conditional densitiesPosteriorsBayes discriminantfunctionMSE solution(best approximation in region of data points)However MSEdoes not necessarilyminimize probability oferrorCSE555: Srihari3. MSE Solution using Gradient Descent:• Criterion function Js(a) = ||Ya-b||2could be minimized by gradient descent• Advantage over pseudo-inverse:– Problem when YtY is singular– Avoids need for working with large matrices– Computation involved is a feedback scheme that copes with roundoff or truncationCSE555: SrihariWidrow-Hoff or Least Mean Squared (LMS) RuleSince ∆ Js= 2Yt(Ya -b)The obvious update rule isa(1) arbitrarya(k+1) = a(k) + η(k)Yt(Ya(k)-b)Can be reduced for storage requirement to the rule where samples are considered sequentially:a(1) arbitrarya(k+1) = a(k) + η(k)(b(k) - at(k)yk)ykCSE555: SrihariWidrow-Hoff or LMS Property• Relaxation procedures need not converge to a separating hyperplaneeven if there exists one• Widrow-Hoff does converge• However solution need not give a separating vector even if one existsCSE555: SrihariHo-Kashyap Procedures• Procedures considered so far:– Perceptron and relaxation procedures finding separating vectors if samples are linearly separable, but do not converge on non-separable problems– MSE procedures yield a weight vector whether samplesa re linearly separable or not, but no guarantee that vector is a separating vector• Criterion function is Js(a)=||Ya-b||2– Subject to constraint b > 0– How to obtain both a separating vector a and a margin b?CSE555: SrihariHo-Kashyap ProceduresCSE555: SrihariHo-KashyapCSE555: SrihariHo-Kashyap RuleError vectorPositivePart ofError vectorCSE555: SrihariHo-Kashyap AlgorithmCSE555: SrihariHo-Kashyap ConclusionIn the nonseparable case there will be an indication of nonseparability.However there is no bound on the number of steps needed to disclosenonseparabilityCSE555: SrihariLinear Programming Algorithms• Perceptron, relaxation, Ho-Kashyap are gradient descent procedures for solving simultaneous inequalities.• Task can also be approached with LP• Minimize objective function z = αtu subject to constraints Au> β• Simplex algorithmSurfaces of constant z are shown in grayWhile constraints are shown in redSimplex finds the extremum of

View Full Document