K-State CIS 830 - Artificial Neural Networks in Data Engineering

Unformatted text preview:

PowerPoint PresentationSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceWednesday, February 9, 2000William H. HsuDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~bhsuReadings:Chapter 19, Russell and NorvigArtificial Neural Networks in Data Engineering:OverviewLecture 10Lecture 10Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceLecture OutlineLecture Outline•Read Sections 4.5-4.9, Mitchell; Chapter 4, Bishop; Rumelhart et al•Multi-Layer Networks–Nonlinear transfer functions–Multi-layer networks of nonlinear units (sigmoid, hyperbolic tangent)•Backpropagation of Error–The backpropagation algorithm•Relation to error gradient function for nonlinear units•Derivation of training rule for feedfoward multi-layer networks–Training issues•Local optima•Overfitting in ANNs•Hidden-Layer Representations•Examples: Face Recognition and Text-to-Speech•Advanced Topics (Brief Survey)•Next Week: Chapter 5 and Sections 6.1-6.5, Mitchell; Quinlan paperKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Nonlinear Units–Recall: activation function sgn (w - x)–Nonlinear activation function: generalization of sgn•Multi-Layer Networks–A specific type: Multi-Layer Perceptrons (MLPs) –Definition: a multi-layer feedforward network is composed of an input layer, one or more hidden layers, and an output layer–“Layers”: counted in weight layers (e.g., 1 hidden layer  2-layer network)–Only hidden and output layers contain perceptrons (threshold or nonlinear units)•MLPs in Theory–Network (of 2 or more layers) can represent any function (arbitrarily small error)–Training even 3-unit multi-layer ANNs is NP-hard (Blum and Rivest, 1992)•MLPs in Practice–Finding or designing effective networks for arbitrary functions is difficult –Training is very computation-intensive even when structure is “known” Multi-Layer NetworksMulti-Layer Networksof Nonlinear Unitsof Nonlinear Unitsx1x2x3Input Layeru 11h1h2 h3h4Hidden Layero1 o2v42Output LayerKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Sigmoid Activation Function–Linear threshold gate activation function: sgn (w - x)–Nonlinear activation (aka transfer, squashing) function: generalization of sgn is the sigmoid function –Can derive gradient rules to train•One sigmoid unit•Multi-layer, feedforward networks of sigmoid units (using backpropagation)•Hyperbolic Tangent Activation FunctionNonlinear Activation FunctionsNonlinear Activation Functionsx1x2xnw1w2wnx0 = 1w0xwxwnetn0iii-    netσwxσxo - netenetσ11   netnetnetneteeeenetnetnetσcoshsinhKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBackpropagation AlgorithmBackpropagation Algorithm•Intuitive Idea: Distribute Blame for Error to Previous Layers•Algorithm Train-by-Backprop (D, r)–Each training example is a pair of the form <x, t(x)>, where x is the vector of input values and t(x) is the output value. r is the learning rate (e.g., 0.05)–Initialize all weights wi to (small) random values–UNTIL the termination condition is met, DOFOR each <x, t(x)> in D, DOInput the instance x to the unit and compute the output o(x) = (net(x))FOR each output unit k, DOFOR each hidden unit j, DOUpdate each w = ui,j (a = hj) or w = vj,k (a = ok) wstart-layer, end-layer  wstart-layer, end-layer +  wstart-layer, end-layer wstart-layer, end-layer  r end-layer aend-layer –RETURN final u, v         xoxtxoxokkkk 1δk    outputskjjvxhxhjkj,jδ1δx1x2x3Input Layeru 11h1h2 h3h4Hidden Layero1 o2v42Output LayerKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceBackpropagation and Local OptimaBackpropagation and Local Optima•Gradient Descent in Backprop–Performed over entire network weight vector–Easily generalized to arbitrary directed graphs–Property: Backprop on feedforward ANNs will find a local (not necessarily global) error minimum•Backprop in Practice–Local optimization often works well (can run multiple times)–Often include weight momentum –Minimizes error over training examples - generalization to subsequent instances? –Training often very slow: thousands of iterations over D (epochs)–Inference (applying network after training) typically very fast•Classification•Control   1αΔδΔ  nwa rnwlayer-end layer,-startlayer-endlayer-endlayer-end layer,-startKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceFeedforward ANNs:Feedforward ANNs:Representational Power and BiasRepresentational Power and Bias•Representational (i.e., Expressive) Power–Backprop presented for feedforward ANNs with single hidden layer (2-layer)–2-layer feedforward ANN•Any Boolean function (simulate a 2-layer AND-OR network)•Any bounded continuous function (approximate with arbitrarily small error) [Cybenko, 1989; Hornik et al, 1989]–Sigmoid functions: set of basis functions; used to compose arbitrary functions–3-layer feedforward ANN: any function (approximate with arbitrarily small error) [Cybenko, 1988]–Functions that ANNs are good at acquiring: Network Efficiently Representable Functions (NERFs) - how to characterize? [Russell and Norvig, 1995]•Inductive Bias of ANNs–n-dimensional Euclidean space (weight space)–Continuous (error function smooth with respect to weight parameters)–Preference bias: “smooth interpolation” among positive examples–Not well understood yet (known to be computationally hard)Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Hidden Units and Feature Extraction–Training procedure: hidden unit representations that minimize error E–Sometimes backprop will


View Full Document

K-State CIS 830 - Artificial Neural Networks in Data Engineering

Download Artificial Neural Networks in Data Engineering
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Artificial Neural Networks in Data Engineering and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Artificial Neural Networks in Data Engineering 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?