DOC PREVIEW
UT Dallas CS 6375 - NeuralNets

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11CS 6375 Machine Learning: Artificial Neural Networks (ANNs)Instructor: Yang LiuSlides adopted from Prof. Spears**Significant portions of this lecture are taken from Artificial Intelligence: A Modern Approach by Russell and Norvig, as well as the course textbook on ML by Mitchell.2Math ReviewDot product:nnnnwvwvwvwvwwwwvvvv+++=⋅==...],...,,[];,...,,[22112121rrrrPartial derivative:invvvvf∂∂),...,,(21Hold all vjwhere j = i fixed, then take the derivative.Chain rule for derivatives:(the chain rule can also be applied if these are partial derivatives)dxdgdgdfdxxgdf=))((23Gradient Descent The gradient points in the direction of steepest ascent of the function f. Use gradient descent to find variables minimizing objective function, rather than trying to find close form solutionwi=wi+∆wi4Artificial Neural Network Can be considered as simplified mathematical models of brain-like systems and they function as parallel distributed computing networks.  Started >50 years ago. There’s a rebirth in interest in neural computing in 1980s (developed mathematic foundations, learning algorithms) Recently `deep learning’ is a hot topic Used in vision, speech, decision making, signal processors.35Multi-layer Feedforward Neural Network6Properties of ANNs Many neuron-like threshold switching units. Many weighted interconnections between units. Highly parallel, distributed computation. Weights are tuned automatically (training). Especially useful for learning complex functions with continuous-valued outputs and large numbers of noisy inputs, which is the type that logic-based techniques have difficulty with.4Digit Recognition Example78ANNs as Simple Computing Elements Each unit (node) receives signals from its input links and computes a new activation level that it sends along all output links. Computation is split into two steps: ( vector dot product), the linear step xi = g(ini), the nonlinear step.activation functionxWxWinjjijivr•==∑,59NodeA NODE iinigxiinputfunctionactivation functionoutputinput linksoutputlinksxjWj,ixi= g(ini)10Purpose of the Activation Function g We want the unit to be “active” (near +1) when the “right” inputs are given We want the unit to be “inactive” (near 0) when the “wrong” inputs are given. It’s preferable for g to be nonlinear. Otherwise, the entire neural network collapses into a simple linear function.611Possibilities for gStep (threshold) functionSign functionstep(x) = 1, if x > threshold0, if x threshold(in picture above, threshold = 0)sign(x) = +1, if x > 0-1, if x 0Adding an extra input x0= -1 and weight W0,j= t (called the bias weight) is equivalent to having a threshold at t. This way we can always assume a 0 threshold.≤≤sigmoid(x) = 1/(1+e-x)(“soft” threshold)Sigmoid (logistic or squashing) function12Using a Bias Weight to Standardize the Threshold-1tx1x2W1W2W1x1+ W2x2 > tW1x1+ W2x2 - t > 0713Implementing ANDx1x2∑o(x1,x2)otherwise 0 05.1 if 1),(2121=>++−= xxxxo11-1W=1.5Assume Boolean (0/1) input values…14Implementing ORx1x2∑o(x1,x2)11-1W=0.5o(x1,x2) = 1 if – 0.5 + x1+ x2> 0= 0 otherwise815Implementing NOTx1∑o(x1)-1W=-0.5-1otherwise 0 05.0 if 1)(11=>−= xxo16Implementing More Complex Boolean Functions∑x1x2110.5-1x1 or x2∑x3111.5(x1 or x2) and x3-1917Types of ANNs Feedforward: Links are unidirectional, and there are no cycles, i.e., the network is a directed acyclic graph (DAG). Units are arranged in layers, and each unit is linked only to units in the next layer. Recurrent: Links can form arbitrary topologies. Cycles can implement memory. Behavior can become unstable, oscillatory, or chaotic.18A Feedforward Network Topology Multilayer feedforward networkx0 x1x2x3x4h1h2h3o1o2otherwise 0 0 if 1=>=∑kkkjjxWhotherwise 0 0 if 1=>=∑jjjiihWoLayer of input unitsLayer of hidden unitsLayer of output unitsjassuming astep function1019Learning in ANNS The topology (nodes and connections) of the network is often assumed to be fixed. The weights are learned/updated. If the topology is not assumed to be fixed, then genetic algorithms can be used to learn it.Our mainfocus20Perceptrons: Single-layer networks (no hidden layer)1121 Perceptrons are single-layer feedforward networks Each output unit is independent of the others Can assume a single output unit Activation of the output unit is calculated by:where xjis the activation of input unit j, and we assume an additional weight and input to represent the threshold. Perceptrons=∑jjjxWgo22Expressive Limits of Perceptrons Already seen perceptrons for AND, OR, NOT. Can the XOR function be represented by a perceptron? x0 x1 x2o(x1,x2)w0w1w2011001010000210210210210≤⋅+⋅+−>⋅+⋅+−>⋅+⋅+−≤⋅+⋅+−wwwwwwwwwwwwThere is no assignment of values to w0,w1and w2thatsatisfies above inequalities. XOR cannot be represented!XOR(x1,x2)X0=-11223What Can be Represented Using Perceptrons?and orRepresentation Theorem: 1-layer feedforward networks (perceptrons) can only represent linearly separable functions. That is, the decision surface separating positive from negative examples has to be a plane. Examples: AND, OR, NOT.24Perceptron Learning We do not necessarily assume here that the perceptrons have threshold units. They may have sigmoid units.  What is the space of hypotheses being searched during learning?  The set of all possible weight vectors. The goal is to adjust the network weights to minimize the error on the training set.  The error is computed by comparing the output of the ANN with the correct class of the training examples. Error is a function of the weights, use gradient descent to update weights: Wi = Wi+ ∆Wi1325Perceptron Learning The algorithm consists of running the training examples through the ANN one at a time, adjusting the weights slightly after each example [note: or adjust weights using all the examples, or some examples] Each cycle through the set of all examples is called an epoch.  Epochs are typically repeated until the weight changes are below some small ε. In other words, ANN training typically requires multiple training epochs over the same set of training examples.26Definition of Error: Sum of Squared Errors∑−=examplesotE2)(21Here, t is the correct (desired) output and o is the actualoutput of the neural net.Introduce Errto simplify the


View Full Document

UT Dallas CS 6375 - NeuralNets

Documents in this Course
ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

vc

vc

24 pages

svm-2

svm-2

16 pages

svm-1

svm-1

18 pages

rl

rl

18 pages

mle

mle

16 pages

mdp

mdp

19 pages

knn

knn

11 pages

intro

intro

19 pages

hmm-train

hmm-train

26 pages

hmm

hmm

28 pages

ensemble

ensemble

17 pages

em

em

17 pages

dtree

dtree

41 pages

cv

cv

9 pages

bayes

bayes

19 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw5

hw5

2 pages

hw3

hw3

3 pages

20.mdp

20.mdp

19 pages

19.em

19.em

17 pages

16.svm-2

16.svm-2

16 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

21.rl

21.rl

18 pages

CNF-DNF

CNF-DNF

2 pages

ID3

ID3

4 pages

mlHw6

mlHw6

3 pages

MLHW3

MLHW3

4 pages

MLHW4

MLHW4

3 pages

ML-HW2

ML-HW2

3 pages

vcdimCMU

vcdimCMU

20 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

15.svm-1

15.svm-1

18 pages

14.vc

14.vc

24 pages

hw2

hw2

2 pages

hw1

hw1

4 pages

hw0

hw0

2 pages

hw3

hw3

3 pages

9.hmm

9.hmm

28 pages

5.mle

5.mle

16 pages

3.bayes

3.bayes

19 pages

2.dtree

2.dtree

41 pages

1.intro

1.intro

19 pages

Load more
Download NeuralNets
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view NeuralNets and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view NeuralNets 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?