U of M PSY 5038 - Fisher Linear Discriminant - D2718343

Home> Schools> University of Minnesota- Twin Cities> Psychology (PSY) > PSY 5038> Fisher Linear Discriminant

U of M PSY 5038 - Fisher Linear Discriminant

School name University of Minnesota- Twin Cities

Course Psy 5038- Introduction to Neural Networks

Pages 12

Download Save

Unformatted text preview:

Introduction to Neural NetworksU. Minn. Psy 5038Linear discriminantInitialize‡Read in Statistical Add-in packages:Off@General::"spell1"D;<< "Statistics`DescriptiveStatistics`"<< "Statistics`DataManipulation`"<< "Statistics`NormalDistribution`"<< "MultivariateStatistics`"; << "ComputationalGeometry`"<< "MultivariateStatistics`"Review Discriminant functionsLet's review earlier material on discriminant functions. Perceptron learning is an example of nonparametric statistical learning, because it doesn't require knowledge of the underlying probability distributions generating the data (such distribu-tions are characterized by a relatively small number of "parameters", such as the mean and variance of a Gaussian distribu-tion). Of course, how well it does will depend on the generative structure of the data. Much of the material below is covered in Duda and Hart (1978).Linear discriminant functions: Two category caseA discriminant function, g(x) divides input space into two category regions depending on whether g(x)>0 or g(x)<0. (We've switched notation, x=f). The linear case corresponds to the simple perceptron unit we studied earlier:(1)g HxL = w.x + w0where w is the weight vector and w0is the threshold (sometimes called bias, although this "bias" has nothing to do with statistical "bias"). Discriminant functions can be generalized, for example to quadratic decision surfaces: (2)g HxL = w0+‚i=1wi xi+‚i=1‚j=1wij xi xjWe've seen how g(x)=0 defines a decision surface which in the linear case is a hyperplane. Suppose x1 and x2are points sitting on the hyperplane, then their difference is a vector lying in the hyperplaneWe've seen how g(x)=0 defines a decision surface which in the linear case is a hyperplane. Suppose x1 and x2are points sitting on the hyperplane, then their difference is a vector lying in the hyperplane(3)w.x1+ w0= w.x2+ w0w.Hx1- x2L = 0so the weight vector w is normal to any vector lying in the hyperplane. Thus w determines how the plane is oriented. The normal vector w points into the region for which g(x)>0, and -w points into the region for which g(x)<0.Let x be a point on the hyperplane. If we project x onto the normalized weight vector x.w/|w|, we have the normal distance of the hyperplane from the origin equal to:(4)w.x ê»w »= -w0ê»w »Thus, the threshold determines the position of the hyperplane.One can also show that the normal distance of x to the hyperplane is given by:(5)g HxLê»w »So we've seen that: 1) disriminant function divides the input space by a hyperplane decision surface; 2) The orientation of the surface is determined by the weight vector w; 3) the location is determined by the threshold w0; 4) the discriminant function gives a measure of how far in input vector is from the hyperplane.Multiple classesSuppose there are c classes. There are a number of ways to define multiple class discriminant rules. One way that avoids undefined regions is:(6)gi HxL = wi.x + wi0, i = 1, ..., c2 FisherLinearDiscriminant.nb(7)Assign x to the ith class if : gi HxL > gj HxLfor all j ≠ i.It can be shown that this classifier partitions the input space into simply connected convex regions. This means that if you connect any two feature vectors belonging to the same class by a line, all points on the line are in the same class. Thus this linear classifier won't be able to handle problems for which there are disconnected clusters of features that all belong to the same class. Also, from a probabilistic perspective, if the underlying generative probability model for a given class has multiple modes, this linear classifier won't do a good job either.Task-dependent Dimensionality reductionFisher's linear "discriminant"The idea is that the original input space may be impractically huge, but if we can find a subspace (hyperplane) that pre-serves the distinctions between categories as well as possible, we can make our decisions in smaller space. We will derive the Fisher linear "discriminant". This is closely related to the psychology idea of finding "distinctive" features. E.g. consider bird identification. If I want to discriminate cardinals from other birds in my backyard, I can make use of the fact that (male) cardinals may be the only birds that are red. So even tho' the image of a bird can have lots of dimensions, if I project the image on to the "red" axis, I can do fairly well with just one number. How about male vs. female human faces?‡Generative model: two nearby gaussian classesDefine two bivariate base distributionsFisherLinearDiscriminant.nb 3(ar = {{1, 0.99}, {0.99, 1}};ndista = MultinormalDistribution[{0, -1}, ar];)(br = {{1, .9}, {.9, 2}};ndistb = MultinormalDistribution[{0, 1}, br];)Find the expression for the probability distribution function of ndistapdf = PDF[ndista, {x1, x2}]1.12822 ‰12H-x1 H50.2513 x1-49.7487 Hx2+1LL-Hx2+1L H50.2513 Hx2+1L-49.7487 x1LLUse Mean[ ] and CovarianceMatrix[ndista] to verify the population mean and the covariance matrix of ndistbMean@ndistbD80, 1<Covariance@ndistaD1 0.990.99 1Try different covariant matrices. Should they be symmetric? Constraints on the determinant of ar, br?Make a contour plot of the PDF ndistapdfa = PDF@ndista, 8x1, x2<D;ContourPlot@pdfa, 8x1, -3, 3<, 8x2, -3, 3<, PlotPoints Ø 64,PlotRange Ø AllD4 FisherLinearDiscriminant.nb-3 -2 -1 0 1 2 3-3-2-10123nsamples = 500;a = Table@Random@ndistaD, 8nsamples<D;ga = ListPlot@a, PlotRange Ø 88-8, 8<, 8-8, 8<<, AspectRatio Ø 1,PlotStyle Ø [email protected]`D, DisplayFunction Ø IdentityD;b = Table@Random@ndistbD, 8nsamples<D;gb = ListPlot@b, PlotRange Ø 88-8, 8<, 8-8, 8<<, AspectRatio Ø 1,PlotStyle Ø [email protected]`D, DisplayFunction Ø IdentityD;Show@ga, gb, DisplayFunction Ø $DisplayFunctionD-8 -6 -4 -2 2 4 6 8-8-6-4-22468Use Mean[ ] to find the sample mean of b. Whats is the sample covariance of [email protected], 0.925014<[email protected] 0.9337310.933731 2.04603FisherLinearDiscriminant.nb 5‡Try out different projections of the data by varying the slope (m) of the discriminant linem = -2 ê 3;wnvec = 81, m< ê Sqrt@1 + m ^ 2D;88x, y<<.8n1, n2<Map@Ò1 * 8n1, n2< &, 88x, y<<.8n1, n2<D8n1 x + n2 y<Hn1 Hn1 x + n2 yL n2 Hn1 x + n2 yLLaproj = HÒ1 wnvec &L êü Ha.wnvecL;gaproj = ListPlot@aproj, AspectRatio Ø 1, PlotStyle Ø [email protected]`D,DisplayFunction Ø IdentityD;bproj = HÒ1 wnvec &L êü Hb.wnvecL;gbproj = ListPlot@bproj, AspectRatio Ø 1, PlotStyle Ø [email protected]`D,DisplayFunction Ø IdentityD;Show@ga, gb, gaproj, gbproj, DisplayFunction Ø

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M PSY 5038 - Fisher Linear Discriminant

Sign up for free to view:

Please select your school