11CS 6375 Machine Learning: Artificial Neural Networks (ANNs)Instructor: Yang LiuSlides adopted from Prof. Spears**Significant portions of this lecture are taken from Artificial Intelligence: A Modern Approach by Russell and Norvig, as well as the course textbook on ML by Mitchell.2Math ReviewDot product:nnnnwvwvwvwvwwwwvvvv+++=⋅==...],...,,[];,...,,[22112121rrrrPartial derivative:invvvvf∂∂),...,,(21Hold all vjwhere j = i fixed, then take the derivative.Chain rule for derivatives:(the chain rule can also be applied if these are partial derivatives)dxdgdgdfdxxgdf=))((23Gradient Descent The gradient points in the direction of steepest ascent of the function f. Use gradient descent to find variables minimizing objective function, rather than trying to find close form solutionwi=wi+∆wi4Artificial Neural Network Can be considered as simplified mathematical models of brain-like systems and they function as parallel distributed computing networks. Started >50 years ago. There’s a rebirth in interest in neural computing in 1980s (developed mathematic foundations, learning algorithms) Recently `deep learning’ is a hot topic Used in vision, speech, decision making, signal processors.35Multi-layer Feedforward Neural Network6Properties of ANNs Many neuron-like threshold switching units. Many weighted interconnections between units. Highly parallel, distributed computation. Weights are tuned automatically (training). Especially useful for learning complex functions with continuous-valued outputs and large numbers of noisy inputs, which is the type that logic-based techniques have difficulty with.4Digit Recognition Example78ANNs as Simple Computing Elements Each unit (node) receives signals from its input links and computes a new activation level that it sends along all output links. Computation is split into two steps: ( vector dot product), the linear step xi = g(ini), the nonlinear step.activation functionxWxWinjjijivr•==∑,59NodeA NODE iinigxiinputfunctionactivation functionoutputinput linksoutputlinksxjWj,ixi= g(ini)10Purpose of the Activation Function g We want the unit to be “active” (near +1) when the “right” inputs are given We want the unit to be “inactive” (near 0) when the “wrong” inputs are given. It’s preferable for g to be nonlinear. Otherwise, the entire neural network collapses into a simple linear function.611Possibilities for gStep (threshold) functionSign functionstep(x) = 1, if x > threshold0, if x threshold(in picture above, threshold = 0)sign(x) = +1, if x > 0-1, if x 0Adding an extra input x0= -1 and weight W0,j= t (called the bias weight) is equivalent to having a threshold at t. This way we can always assume a 0 threshold.≤≤sigmoid(x) = 1/(1+e-x)(“soft” threshold)Sigmoid (logistic or squashing) function12Using a Bias Weight to Standardize the Threshold-1tx1x2W1W2W1x1+ W2x2 > tW1x1+ W2x2 - t > 0713Implementing ANDx1x2∑o(x1,x2)otherwise 0 05.1 if 1),(2121=>++−= xxxxo11-1W=1.5Assume Boolean (0/1) input values…14Implementing ORx1x2∑o(x1,x2)11-1W=0.5o(x1,x2) = 1 if – 0.5 + x1+ x2> 0= 0 otherwise815Implementing NOTx1∑o(x1)-1W=-0.5-1otherwise 0 05.0 if 1)(11=>−= xxo16Implementing More Complex Boolean Functions∑x1x2110.5-1x1 or x2∑x3111.5(x1 or x2) and x3-1917Types of ANNs Feedforward: Links are unidirectional, and there are no cycles, i.e., the network is a directed acyclic graph (DAG). Units are arranged in layers, and each unit is linked only to units in the next layer. Recurrent: Links can form arbitrary topologies. Cycles can implement memory. Behavior can become unstable, oscillatory, or chaotic.18A Feedforward Network Topology Multilayer feedforward networkx0 x1x2x3x4h1h2h3o1o2otherwise 0 0 if 1=>=∑kkkjjxWhotherwise 0 0 if 1=>=∑jjjiihWoLayer of input unitsLayer of hidden unitsLayer of output unitsjassuming astep function1019Learning in ANNS The topology (nodes and connections) of the network is often assumed to be fixed. The weights are learned/updated. If the topology is not assumed to be fixed, then genetic algorithms can be used to learn it.Our mainfocus20Perceptrons: Single-layer networks (no hidden layer)1121 Perceptrons are single-layer feedforward networks Each output unit is independent of the others Can assume a single output unit Activation of the output unit is calculated by:where xjis the activation of input unit j, and we assume an additional weight and input to represent the threshold. Perceptrons=∑jjjxWgo22Expressive Limits of Perceptrons Already seen perceptrons for AND, OR, NOT. Can the XOR function be represented by a perceptron? x0 x1 x2o(x1,x2)w0w1w2011001010000210210210210≤⋅+⋅+−>⋅+⋅+−>⋅+⋅+−≤⋅+⋅+−wwwwwwwwwwwwThere is no assignment of values to w0,w1and w2thatsatisfies above inequalities. XOR cannot be represented!XOR(x1,x2)X0=-11223What Can be Represented Using Perceptrons?and orRepresentation Theorem: 1-layer feedforward networks (perceptrons) can only represent linearly separable functions. That is, the decision surface separating positive from negative examples has to be a plane. Examples: AND, OR, NOT.24Perceptron Learning We do not necessarily assume here that the perceptrons have threshold units. They may have sigmoid units. What is the space of hypotheses being searched during learning? The set of all possible weight vectors. The goal is to adjust the network weights to minimize the error on the training set. The error is computed by comparing the output of the ANN with the correct class of the training examples. Error is a function of the weights, use gradient descent to update weights: Wi = Wi+ ∆Wi1325Perceptron Learning The algorithm consists of running the training examples through the ANN one at a time, adjusting the weights slightly after each example [note: or adjust weights using all the examples, or some examples] Each cycle through the set of all examples is called an epoch. Epochs are typically repeated until the weight changes are below some small ε. In other words, ANN training typically requires multiple training epochs over the same set of training examples.26Definition of Error: Sum of Squared Errors∑−=examplesotE2)(21Here, t is the correct (desired) output and o is the actualoutput of the neural net.Introduce Errto simplify the
View Full Document