U of M PSY 5038 - Discriminant functions - D2903768

Home> Schools> University of Minnesota- Twin Cities> Psychology (PSY) > PSY 5038> Discriminant functions

U of M PSY 5038 - Discriminant functions

School name University of Minnesota- Twin Cities

Course Psy 5038- Introduction to Neural Networks

Pages 19

Download Save

Unformatted text preview:

Introduction to Neural NetworksU. Minn. Psy 5038Kalman filterInitialize‡Read in Statistical Add-in packages:Off@General::spell1D;Needs@"ErrorBarPlots`"D;Needs@"MultivariateStatistics`"D;SetOptions@ListPlot, ImageSize Ø SmallD;Discriminant functions Let's build our geometric intuitions of what a simple perceptron unit does by viewing linear discriminants from a more formal point of view. Perceptron learning is an example of nonparametric statistical learning, because it doesn't require knowledge of the underlying probability distributions generating the data (such distributions are characterized by a relatively small number of "parameters", such as the mean and variance of a Gaussian distribution). Of course, how well it does will depend on the generative structure of the data. Much of the material below is covered in Duda and Hart (1978).Linear discriminant functions: Two category caseA discriminant function, g(x) divides input space into two category regions depending on whether g(x)>0 or g(x)<0. (We've switched notation, x=f). The linear case corresponds to the simple perceptron unit we studied earlier:(1)g HxL = w.x + w0where w is the weight vector and w0 is the (scalar) threshold (sometimes called bias, although this "bias" has nothing to do with statistical "bias"). Discriminant functions can be generalized, for example to quadratic decision surfaces: (2)g HxL = w0+‚i=1wixi+‚i=1‚j=1wijxixjwhere x = {x1,x2,x3...}. We've seen how g(x)=0 defines a decision surface which in the linear case is a hyperplane. Suppose x1and x2 are vectors, with endpoints sitting on the hyperplane, then their difference is a vector lying in the hyperplanewhere x = {x1,x2,x3...}. We've seen how g(x)=0 defines a decision surface which in the linear case is a hyperplane. Suppose x1and x2 are vectors, with endpoints sitting on the hyperplane, then their difference is a vector lying in the hyperplane(3)w.x1+ w0= w.x2+ w0w.Hx1- x2L = 0so the weight vector w is normal to any vector lying in the hyperplane. Thus w determines how the plane is oriented. The normal vector w points into the region for which g(x)>0, and -w points into the region for which g(x)<0.Let x be a point on the hyperplane. If we project x onto the normalized weight vector w/ |w|, we have the normal distance of the hyperplane from the origin equal to:(4)w.x ê w = - w0ê wThus, the threshold determines the position of the hyperplane.One can also show that the normal distance of a vector x to the hyperplane is given by:(5)g HxL ê wSo we've seen that: 1) disriminant function divides the input space by a hyperplane decision surface; 2) The orientation of the surface is determined by the weight vector w; 3) the location is determined by the threshold w0; 4) the discriminant function gives a measure of how far an input vector is from the hyperplane.The figure summarizes the basic properties of the linear discriminant.In[179]:=Manipulate@H* x0=82,1<;*Lw = 8w1, w2<;wn = w ê Norm@wD;g@x_D := 8w1, w2<.x + w0;gg = Plot@Tooltip@x2 ê. Solve@8 w1, w2<.8x1, x2< + w0 ã 0, x2D,"discriminant"D, 8x1, -1, 3<, ImageSize Ø MediumD;ggg = Graphics@g@Dynamic@MousePosition@"Graphics"DDDD;Show@8gg, Graphics@Inset@"g@xD=", 81.6, 2<DD,Graphics@Inset@ToString@g@x0DD, 82, 2<DD,Graphics@8Tooltip@Arrow@880, 0<, w<D, "w"D,Tooltip@Arrow@880, 0<, H- w0 ê Norm@wDL * wn<D, "-w0ê» w»"D,Tooltip@8Arrow@880, 0<, x0<D<, "x"D,Tooltip@8Arrow@8x0, x0 - wn * g@x0D ê Norm@wD<D<, "gHxLê»w»"D<D<,PlotRange Ø 88-1, 3<, 8-1, 3<<, AxesOrigin Ø 80, 0<, Axes Ø True,AspectRatio Ø 1D, 88 w0, - 2.5<, -6, 3<, 88w1, 1<, 0, 3<,88 w2, 2<, 0, 3<, 88x0, 82, 1<< , Locator<, ImageSize Ø SmallD2 Lect27.nbOut[179]=w0w1w2g@xD=2.7979- 1123- 1123Lect27.nb 3Task-dependent Dimensionality reductionFisher's linear "discriminant"The idea is that the original input space may be impractically huge, but if we can find a subspace (hyperplane) that pre-serves the distinctions between categories as well as possible, we can make our decisions in smaller space. We will derive the Fisher linear "discriminant". This is closely related to the psychology idea of finding "distinctive" features. E.g. consider bird identification. If I want to discriminate cardinals from other birds in my backyard, I can make use of the fact that (male) cardinals may be the only birds that are red. So even tho' the image of a bird can have lots of dimensions, if I project the image on to the "red" axis, I can do fairly well with just one number. How about male vs. female human faces?‡Generative model: two nearby gaussian classesDefine two bivariate base distributions4 Lect27.nbIn[5]:=(ar = {{1, 0.99}, {0.99, 1}};ndista = MultinormalDistribution[{0, -1}, ar];)(br = {{1, .9}, {.9, 2}};ndistb = MultinormalDistribution[{0, 1}, br];)Find the expression for the probability distribution function of ndistapdf = PDF[ndista, {x1, x2}]1.12822 ‰12H-x1 H50.2513 x1-49.7487 Hx2+1LL-Hx2+1L H50.2513 Hx2+1L-49.7487 x1LLUse Mean[ ] and CovarianceMatrix[ndista] to verify the population mean and the covariance matrix of ndistbIn[7]:=Mean@ndistbDOut[7]=80, 1<In[8]:=Covariance@ndistaDOut[8]=881, 0.99<, 80.99, 1<<Try different covariant matrices. Should they be symmetric? Constraints on the determinant of ar, br?Make a contour plot of the PDF ndistaIn[9]:=pdfa = PDF@ndista, 8x1, x2<D;ContourPlot@pdfa, 8x1, -3, 3<, 8x2, - 3, 3<, PlotPoints Ø 64,PlotRange Ø All, ImageSize Ø SmallDLect27.nb 5Out[10]=In[48]:=nsamples = 500;a = Table@Random@ndistaD, 8nsamples<D;ga = ListPlot@a, PlotRange Ø 88-8, 8<, 8-8, 8<<, AspectRatio Ø 1,PlotStyle Ø [email protected]`DD;b = Table@Random@ndistbD, 8nsamples<D;gb = ListPlot@b, PlotRange Ø 88-8, 8<, 8-8, 8<<, AspectRatio Ø 1,PlotStyle Ø [email protected]`DD;Show@ga, gb, ImageSize Ø SmallDOut[53]=- 55- 556 Lect27.nbUse Mean[ ] to find the sample mean of b. Whats is the sample covariance of b?In[17]:=Mean@bDOut[17]=80.0610542, 1.07125<In[18]:=Covariance@bDOut[18]=880.973561, 0.914097<, 80.914097, 1.98276<<‡Try out different projections of the data by varying the slope (m) of a projection lineIn[19]:=Clear@x, y, n1, n2D;88x, y<<.8n1, n2<Map@Ò1 * 8n1, n2< &, 88x, y<<.8n1, n2<DOut[20]=8n1 x + n2 y<Out[21]=88n1 Hn1 x + n2 yL, n2 Hn1 x + n2 yL<<We'll use the Map[] function to calculate the projection of a data point (x,y) onto unit normal (n1, n2) to produce a vector in the direction of the unit vector.8n1 x + n2 y<Hn1 Hn1 x + n2 yL n2 Hn1 x + n2 yLLLect27.nb 7In[22]:=Manipulate@wnvec = 81, m< ê Sqrt@1 + m ^2D;aproj

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M PSY 5038 - Discriminant functions

Sign up for free to view:

Please select your school