Unformatted text preview:

Note on Fisher Linear DiscriminantsPietro PeronaCalifornia Institute of [email protected] 8, 20001 IntroductionWe have N data points Xi∈ RD. The points belong to G groups. Considerthe problem of finding a linear transformation a to one-dimensional space suchthat the points Xia = Zi∈ R are easy to classify. For simplicity we will assumethat the N data points Xihave zero mean.Some notation: given a matrix A indicate with Aiand Ajthe i-th row andthe j-th column of A, and with Ajithe i, j-th element of A. Moreover:[i] = g g is the group of point i (1)X =X1. . .XN∈ RN×Dthe data (2)n =n1, . . . , nG number of data points in each group (3)N = diag(n) = GTG =n1, 0, 0, . . . , 00, n2, 0, . . . , 0. . .0, 0, . . . , 0, ng(4)Gji= δ([i], j) i.e. G =1, 0, . . . , 0. . .0, . . . , 0, 1∈ RN×G(5)Mg=1ngX[i]=gXiMean of j-th coordinate in group g (6)M = N−1GTX =M1. . .MG∈ RG×DMatrix collecting the means of each group(7)A = UALAVTAthe singular value decomposition of A (8)D = I − GN−1GT(9)X0= X − GM = DX the data, each referred to group’s mean (10)12 Optimization problemIn order for the points Zito be easy to classify one would like to simultaneouslymaximize the between-clusters distance and minimize the within cluster distance.These quantities may be defined as:Between-clusters distance – Consider the means Mgof each group g. Onewould like to maximize their spread around the overall mean (the origin,since X is zero-mean):B = (GM )T(GM) = XTGN−1GTX (11)Notice that each mean Mgis counted ngtimes in order to reflect thefrequency of group g.Within-clusters distance – Consider the spread of the points around eachgroup’s center. One would like to minimize:W = XT0X0= XTDTDX (12)In order to optimize both quantities simultaneously Fisher prop ose d to max-imize their ratio with respect to the transformation a:J(a) =aTBaaTW a(13)Taking the derivative with respect to a and equating to zero:DJ(a) =2BaaTW a − 2W aaTBa(aTW a)2= 0 (14)define λ.=aTBaaTW a(15)⇒ Ba = λW a aTW a 6= 0 (16)(17)Therefore in order to find the value of a we need to solve the generalized eigen-vector problem Ba = λW a subject to aTW a 6= 0.2.1 Eigenvector problemCall W†the generalized inverse of W , i.e. the inverse restricted to the subspacewhere W is nonsingular. Then:Ba = λW a aTW a 6= 0 (18)W†= VX0L†2X0VTX0(19)⇒ W†Ba = λa (20)define(UW B, LW B, VW B).= SVD(W†B) (21)⇒ a = V1W B(22)22.2 Alternative approachAn equivalent approach consists in calculating a coordinate transformation a =Sb such that aTW a = kbk2. In this case one may calculate the b that maximizesthe numerator, subject to kbk = 1. One must, however, pay attention to thefact that the solution b must not be in the null space of S.From the definition of W and B etc.:W = XT0X0= VX0L2X0VTX0(23)S = VX0L−1X0(24)a = Sb (25)b = S−1a = LX0VTX0a (26)aTW a = aTVX0L2X0VTX0a = bTb (27)aTBa = bTSTBSb (28)(U, L, V ).= SVD(STBS) (29)⇒ b = V1(30)⇒ a = SV1= VX0L−1X0V1(31)3 Code and ReferencesCheck out the Matlab function fisherLD.m written by Markus Weber. A prizeto whoever figures out how the code works.You will find Fisher linear discriminants discussed in B. Ripley Pattern recog-nition and neural networks, Cambridge University Press, 1996 (SFL library, calln. qa 76.87 r56


View Full Document

CALTECH EE 148A - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?