K-State CIS 830 - Artificial Neural Networks Presentation

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceMonday, February 21, 2000Prasanna JayaramanDepartment of Computing and Information Sciences, KSUhttp://www.cis.ksu.edu/~prasannaReadings:“The Wake-Sleep Algorithm For Unsupervised Neural Networks”- Hinton, Dayan, Frey and NealArtificial Neural Networks Presentation (3 of 4): Pattern Recognition using Unsupervised ANNs Lecture 15Lecture 15Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligencePresentation OutlinePresentation Outline•Paper–“The Wake-Sleep Algorithm For Unsupervised Neural Networks”–Authors: Hinton, Dayan, Frey and Neal•Necessity of this Topic–Supervised learning algorithm for multi-layer network suffers from¤Requirement of a teacher¤Requirement of an error communication method•Overview–Unsupervised learning algorithm for a multi-layer network•Wake-Sleep Algorithm•Boltzmann and factorial distribution•Kullback-Leibler divergence•Training algorithmsKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceThe Core IdeaThe Core Idea•GoalEconomical representation and accurate reconstruction of input.•AimTo minimize the “description length”.•IdeaDriving the neurons of ANN by the appropriate connection in the corresponding phase achieves the desired goal.•A Few Basic Jargons–ANN Connections¤Recognition connections convert the input vector into a representation in hidden units.¤Generative connections reconstruct an approximation to the input vector from its underlying representation.Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence–Wake Phase¤The units are driven bottom-up using the recognition weights, producing a representation of the input vector in all the hidden layers.¤This “total representation” is used to communicate the input vector, d, to the receiver.¤Generative connections are adapted to increase the probability that they would reconstruct the correct activity vector in the layer below.¤Only generative weights learn in this phase.–Sleep Phase¤Neurons are driven top-down by generative connections which reconstruct the representation in one layer from the representation in the layer above.¤Recognition connections are adapted to increase the probability that they would produce the correct activity vector in the layer above.Sleep & Wake PhasesSleep & Wake PhasesKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligenced1j11j21 j31Hidden Layerd2Input Vectorj12j22To the Receiverαd1Output UnitOnly One Hidden LayerInput UnitFundamentals of Wake - Sleep AlgorithmBasics of Other Training AlgorithmsExplanatory FiguresExplanatory FiguresKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceSample FiguresSample FiguresKansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Wake phase is invoked initially to create the total representation of the inputs.•Stochastic binary units are chosen for training the 2 basic connections of ANN.•The probability that the unit is on is:•The binary state of each hidden unit, j, in total representation α is •Activity of each unit, k, in the top hidden layer is communicated using the distribution •Activities of the units in each lower layer are communicated using the distributionWake - Sleep AlgorithmWake - Sleep Algorithmuuuwssobexp(-b 11 )1(Prjs kkpp 1, jjpp 1,Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•The description length of the binary state of unit “j” is:•The description length for the entire input vector “d” is:•All the recognition weights are turned off and the generative weights drive the units in the top-down fashion.•As the hidden units are stochastic, this produces a “fantasy” vectors on the input units.•Generative weight is adjusted in proportion to minimize the expected cost and to maximize the probability that the visible vectors generated by the model would match the observed data.•Then, only the recognition weights are adjusted to maximize the log probability of recovering the hidden activities that actually caused the fantasy.Wake - Sleep AlgorithmWake - Sleep Algorithm     jjjjjpspssC  1log1log       Ll Lj idijscscdccdc ||)(,Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial IntelligenceHelmholtz MachineHelmholtz Machine•The recognition weights determine a conditional probability distribution Q(. | d ) over α.•Initially, fantasies will have a different distribution than the training data.•Helmholtz Machine–We restrict Q( . | d ) to be a product distribution within each layer that is conditional on the binary states in the layer below and we can therefore compute it efficiently using a bottom-up recognition network. We call the model that uses a bottom-up recognition to minimize the bound as Helmholtz machine.–Minimizing the cost of representation can be done by generating a distribution sample from the recognition network and incrementing the top-down weight. This is a bit difficult but a simple approximation method could be generating a stochastic sample from the generative model and then we increment each bottom-up weight to increase the log probability that the recognition weights would produce the correct activities in the layer above. This way of fitting a Helmholtz machine is called the “wake-sleep” algorithm.Kansas State UniversityDepartment of Computing and Information SciencesCIS 830: Advanced Topics in Artificial Intelligence•Boltzmann & Factorial Distribution–The recognition weights take the binary activities in one layer and stochastically produce binary activities in the layer above using a logistic function. So, for a given visible vector, the recognition weights may


View Full Document

K-State CIS 830 - Artificial Neural Networks Presentation

Download Artificial Neural Networks Presentation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Artificial Neural Networks Presentation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Artificial Neural Networks Presentation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?