U of M PSY 5038 - Non-linear models - D537145

Home> Schools> University of Minnesota- Twin Cities> Psychology (PSY) > PSY 5038> Non-linear models

U of M PSY 5038 - Non-linear models

School name University of Minnesota- Twin Cities

Course Psy 5038- Introduction to Neural Networks

Pages 20

Download Save

Unformatted text preview:

Introduction to Neural NetworksU. Minn. Psy 5038Lecture 10Non-linear modelsThe perceptronInitializationIn[1]:=Off[SetDelayed::write]Off[General::spell1]IntroductionLast time‡Summed vector memories‡Introduction to statistical learningLinear interpolation interpretation of linear heteroassociative learning and recallIn[583]:=f1 = 80, 1, 0<; f2 = 81, 0, 0<; g1 = 80, 1, 3<; g2 = 81, 0, 5<;W = Outer@Times, g1, f1D;W maps f1 to g1:In[585]:=W.f1Out[585]=80, 1, 3<Similarly, the outer product of f2 and g2 maps f2 to g2:In[586]:=W = Outer@Times, g2, f2D;W.f2Out[587]=81, 0, 5<Because of the orthogonality of f1 and f2, the sum of the two outer products Wt still maps f1 to g1 (and f2 to g2):In[588]:=Wt = Outer@Times, g1, f1D+ Outer@Times, g2, f2D;Wt.f1Out[589]=80, 1, 3<Define an interpolated point fi somewhere between f1 and f2, the position being determined by parameter a. In[590]:=fi = a * f1 + H1 - aL* f2;Wt maps fi to some point in output space (below, we'll call it gt):In[591]:=Wt.fiOut[591]=80.971435, 0.0285649, 4.94287<(An interpolated point gt between g1 and g2 can also be defined as: gt = b*g1 + (1 - b)*g2). Let's take a look at the state space representations of the input and output activity vectors:2 Lect_10_Perceptron.nbIn[592]:=Manipulate@fi = a * f1 + H1 - aL* f2;gt = Wt.fi;GraphicsRow@8ListPointPlot3D@8f1, f2, fi<, PlotStyle Ø [email protected],Ticks Ø None, AxesLabel Ø 8"fx", "fy", "fz"<D,ListPointPlot3D@8g1, g2, gt<, PlotStyle Ø [email protected],Ticks Ø None, AxesLabel Ø 8"gx", "gy", "gz"<D<, ImageSize Ø SmallD,88a, .4<, 0, 1<DOut[592]=afxfyfzgxgygzLect_10_Perceptron.nb 3‡Generative modeling and statistical samplingToday‡Non-linear models for classificationIntroduction to non-linear modelsBy definition, linear models have several limitations on the class of functions they can compute--outputs have to be linear functions of the inputs. However, as we have pointed out earlier, linear models provide an excellent foundation on which to build. On this foundation, non-linear models have moved in several directions.Consider a single unit with output y, and inputs fi. --One way is to "augment" the richness of the input patterns with higher-order terms to form polynomial mappings, or non-linear regression, as in a Taylor series (Poggio, 1979), going from linear, to quadratic, to higher order functions:The linear Lateral inhibition equations can be generalized using products of input and output terms --"shunting" inhibition (Grossberg).--A straightforward generalization of the generic connectionist model is to divide the neural output by the squared responses of neighboring units. This is a steady-state model of a version of shunting inhibition which has been very successful in accounting for a range of neurophysiological receptive field properties in vision (Heeger et al., 1996).--One of the simplest things we can do at this point is to use the generic connectionist neuron with its second stage point-wise non-linearity. Recall that this is an inner product followed by a non-linear sigmoid. Once a non-linearity such as a sigmoid introduced, it makes sense to add more than additional layers of neurons. (Without a non-linearity, any linear network that feeds into another linear network is equivalent to a single linear network, with just one layer of weights. You can prove this using the rules of matrix multiplication.)Much of the modeling of human visual pattern discrimination has used just these "rules-of-the-game" (linear matrix multiplication followed by point non-linearities, i.e. using generic connectionist neurons), with additional complexities (such as a normalization term above) added only as needed. And together with a good learning rule (such as error back-prop that we will study later) provide state of the art machine vision solutions to some restricted problems, such as OCR (see LeNet demo link in syllabus).A central challenge, in the above and all methods that seek general mappings, is to develop techniques to learn the weights, while at the same time avoiding over-fitting (i.e. using too many weights). We'll talk more about this problem later.These modifications produce smooth functions. If we want to classify rather than regress, we need something abrupt. Generally, we add a sigmoidal squashing function. As the slope of the sigmoid increases, we approach a simple step non-linearity. The neuron then makes discrete (binary) decisions. Recall the McCulloch-Pitts model of the 1940's. Let us look at the Perceptron, an early example of a network built on such threshold logic units.4 Lect_10_Perceptron.nb--One of the simplest things we can do at this point is to use the generic connectionist neuron with its second stage point-wise non-linearity. Recall that this is an inner product followed by a non-linear sigmoid. Once a non-linearity such as a sigmoid introduced, it makes sense to add more than additional layers of neurons. (Without a non-linearity, any linear network that feeds into another linear network is equivalent to a single linear network, with just one layer of weights. You can prove this using the rules of matrix multiplication.)Much of the modeling of human visual pattern discrimination has used just these "rules-of-the-game" (linear matrix multiplication followed by point non-linearities, i.e. using generic connectionist neurons), with additional complexities (such as a normalization term above) added only as needed. And together with a good learning rule (such as error back-prop that we will study later) provide state of the art machine vision solutions to some restricted problems, such as OCR (see LeNet demo link in syllabus).A central challenge, in the above and all methods that seek general mappings, is to develop techniques to learn the weights, while at the same time avoiding over-fitting (i.e. using too many weights). We'll talk more about this problem later.These modifications produce smooth functions. If we want to classify rather than regress, we need something abrupt. Generally, we add a sigmoidal squashing function. As the slope of the sigmoid increases, we approach a simple step non-linearity. The neuron then makes discrete (binary) decisions. Recall the McCulloch-Pitts model of the 1940's. Let us look at the Perceptron, an early example of a network built on such threshold logic units.Classification and the PerceptronClassificationPreviously we introduced the distinction between regression and classification in supervised

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M PSY 5038 - Non-linear models

Sign up for free to view:

Please select your school