UW-Madison COMPSCI 540 - Perceptrons - D1949771

Home> Schools> University of Wisconsin, Madison> Computer Sciences (COMPSCI) > COMPSCI 540> Perceptrons

UW-Madison COMPSCI 540 - Perceptrons

School name University of Wisconsin, Madison

Course Compsci 540- Introduction to Artificial Intelligence

Pages 36

Download Save

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 361PerceptronsLouis [email protected] 540 section 2slides borrowed (with modifications) from Burr Settles2AnnouncementsReview Session tonight at 4:30-5:30 CS 1325 (right here)–Come with questions–No lecture preparedMidterm tomorrow night 7:15-9:15 1240 CSHW 3 solution on line. Grading not done yet.3Neural NetworksNeural networks (NNs) are AI models that try to mimic the brain in the way it stores knowledge and processes informationAlso known as:–Artificial Neural Networks (ANNs)–Connectionist Learning Models•As opposed the symbolic models, like decision trees–Parallel Distributed Processing (PDP) Models4Neuroscience (1861-present)Neuroscience is the study of the nervous system, particularly the functions of the brain–By the 19th century, it had been established that the brain played a central role in specific cognitive functions–Before that, people thought the heart or spleen might be the focus of cognitive activityPaul Broca jump-started the field with his studies of speech disorders: he isolated the speech center in the lower left hemisphere of the brain–Now called “Broca’s Area”5NeuroscienceSpecial nerve cells called neurons had been theorized about by the late 1800s–At the turn of the 20th century, a staining method for actually viewing them was developed by Camillo Golgi –Santiago Ramon y Cajal used the staining technique to propose the structure of the nervous system–Golgi & Cajal shared the Nobel prize in 1906, though they had differing views:•Gogli thought brain’s functions were carried out in the medium•Cajal theorized about a connectionist “neuronal doctrine”6Neuronal Structure7Neuronal CommunicationNeurons propagate information by “firing,” or sending electrochemical signals along the axon–Axons can be 1 to 100 centimeters long!Synapses connect the axon of one neuron to the dendrites of up to 100,000 other neurons–The synapses function as signal amplifiers or repressorsIf enough energy flows into a neuron from all of its synapses/dendrites, then it will fire, too, sending a message along its axon to other neurons8Simulated NeuronsWe can create a mathematical approximation to the nature of neuronal communication:–Represent a “neuron” as a Boolean function–Each neuron can have an output capacity of either +1 (fire) or 0 (don’t fire… sometimes use -1)–Each also has a set of inputs (i.e. other neurons, +1/0), each with an associated weight (i.e. synapse)–The neuron can compute a weighted sum over all the inputs and compare it to some threshold t–If the sum is  t, then output +1 (fire), otherwise 09PerceptronsA perceptron is a simulated neuron that takes the agent’s percepts (e.g. feature vector) as inputs and maps them to the appropriate output value:ow1wntx1xn……The output, o is the result of some activation function g(in), where in is the weighted sum of the inputs (x1…xn). Right now, g(in) is a simple threshold or “step” function10Perceptrons – inferenceReally, the threshold, t, is just another weight (called the bias):(w1  x1) + (w2  x2) + … + (wn  xn)  t= (w1  x1) + (w2  x2) + … + (wn  xn) – t  0= (w1  x1) + (w2  x2) + … + (wn  xn) + (t  -1)  0-1tow1wnx1xn……o x1,... , xn1 i f w1x1w2x2... wnxnt0 otherwise11Methods of LearningPerceptron Training RuleDelta Rule12Perceptron LearningA perceptron learns by adjusting its weights in order to minimize the error on the training setTo start off, consider updating the value for a single weight on a single example x with the perceptron learning rule:–wi  wi + wi; wi = (true – o)xi–Where  is the learning rate, a value in the range [0,1], true is the target value for the example, and o is the perceptron’s output (so (true – o) is the error)Note: the notation used in the new version of AI: A Modern Approach is really messy, and riddled with typos… so my notation will differ from the textbook13Using Perceptron Training Rulewi  wi + wi; wi = (true – o)xiSuppose training example correctly classified–What is the change in weight, wi ?What if training example incorrectly classified?–How will weights change?-1tow1wnx1xn……14Perceptron Training RuleProven to converge in a finite number of steps to weights that will correctly classify all training examples, provided the training examples are linearly seperable.15Gradient Descent and Delta Ruleworks with unthresholded perceptronDelta rule converges toward a best-fit approximation to the target concept even when training examples are not linearly seperableTraining error, for a given data set, is defined as–E[w]  ½ d (trued – od)2–Where E[w] is the sum of squared errors for the weight vector w, and d ranges over examples in the training set–This formulation of error makes a parabolic curve, and so has a global minimum.o x1,... , xnw1x1w2x2... wnxno x w x16Gradient Descent and Delta RuleIf we have a perceptron with 2 weights, we want to find the pair of weights (i.e. point in 2D weight space) where E[w] is the lowestBut the weights are continuous values, so how do we know how much to change them?17Gradient Descent and Delta RuleFind the gradient (partial derivatives):–E[w]  [E/w0, E/w1, E/wn]Update weights:–wi ←wi+wi and wi = –[E/wi]–Just need to calculate the partial derivative of the Error function•E/wi= /wi(½ d (trued – od)2)•E/wi= d(trued – od)(-xid)–Putting it all together, this is called the Delta rule for training:–wi = d(trued – od)(xid)–Often this is rule is applied for each example instead of on the entire dataset–This makes sense: if (true – o) is positive, the weight should be increased for positive inputs xi, and decreased for negatives18On Activation FunctionsHouston, we have a problem!–We’re using a simple step function as our activation function g(in)–This isn’t differentiable, so we can’t compute g'(in)–Using Delta rule will not work on thresholded perceptron–To remedy this, we can use a sigmoid

View Full Document


School:
Email:
New Password:
Confirm Password:

UW-Madison COMPSCI 540 - Perceptrons

Sign up for free to view:

Please select your school