Unformatted text preview:

CMSC 671 Fall 2001Today’s classMachine Learning: Neural and BayesianNeural functionBiology of a neuronBrain structureComparison of computing powerNeural networksLayered feed-forward networkNeural unit“Executing” neural networksLearning neural networksLearning Bayesian networksParameter estimationParameter estimation in BNsSufficient statistics: ExampleModel selectionStructure selection: ScoringHeuristic searchExploiting decomposabilityVariations on a themeHandling missing dataEM (expectation maximization)EM example1CMSC 671CMSC 671Fall 2001Fall 2001Class #25-26 – Tuesday, November 27 / Thursday, November 292Today’s class•Neural networks•Bayesian learning3Machine Learning: Neural and BayesianChapter 19Some material adapted from lecture notes by Lise Getoor and Ron Parr4Neural function•Brain function (thought) occurs as the result of the firing of neurons•Neurons connect to each other through synapses, which propagate action potential (electrical impulses) by releasing neurotransmitters•Synapses can be excitatory (potential-increasing) or inhibitory (potential-decreasing), and have varying activation thresholds•Learning occurs as a result of the synapses’ plasticicity: They exhibit long-term changes in connection strength•There are about 1011 neurons and about 1014 synapses in the human brain5Biology of a neuron6Brain structure•Different areas of the brain have different functions–Some areas seem to have the same function in all humans (e.g., Broca’s region); the overall layout is generally consistent–Some areas are more plastic, and vary in their function; also, the lower-level structure and function vary greatly•We don’t know how different functions are “assigned” or acquired–Partly the result of the physical layout / connection to inputs (sensors) and outputs (effectors)–Partly the result of experience (learning)•We rea lly don’t understand how this neural structure leads to what we perceive as “consciousness” or “thought”•Our neural networks are not nearly as complex or intricate as the actual brain structure7Comparison of computing power•Computers are way faster than neurons…•But there are a lot more neurons than we can reasonably model in modern digital computers, and they all fire in parallel•Neural networks are designed to be massively parallel•The brain is effectively a billion times faster8Neural networks•Neural networks are made up of nodes or units, connected by links•Each link has an associated weight and activation level•Each node has an input function (typically summing over weighted inputs), an activation function, and an output9Layered feed-forward networkOutput unitsHidden unitsInput units10Neural unit11“Executing” neural networks•Input units are set by some exterior function (think of these as sensors), which causes their output links to be activated at the specified level•Working forward through the network, the input function of each unit is applied to compute the input value–Usually this is just the weighted sum of the activation on the links feeding into this node•The activation function transforms this input function into a final value–Typically this is a nonlinear function, often a sigmoid function corresponding to the “threshold” of that node12Learning neural networks•Backpropagation•Cascade correlation: adding hidden units13Learning Bayesian networks •Given training set•Find B that best matches D–model selection –parameter estimation]}[],...,1[{ MxxD Data DInducerInducerInducerInducerCAEB][][][][]1[]1[]1[]1[MCMAMBMECABE14Parameter estimation•Assume known structure•Goal: estimate BN parameters –entries in local probability models, P(X | Parents(X))•A parameterization  is good if it is likely to generate the observed data:•Maximum Likelihood Estimation (MLE) Principle: Choose  so as to maximize LmmxPDPDL )|][()|():( i.i.d. samples15Parameter estimation in BNs•The likelihood decomposes according to the structure of the network→ we get a separate estimation task for each parameter•The MLE (maximum likelihood estimate) solution:–for each value x of a node X–and each instantiation u of Parents(X)–Just need to collect the counts for every combination of parents and children observed in the data–MLE is equivalent to an assumption of a uniform prior over parameter values)(),(*|uNuxNuxsufficient statistics16Sufficient statistics: Example•Why are the counts sufficient?Earthquake BurglaryAlarmMoon-phaseLight-level17Model selectionGoal: Select the best network structure, given the dataInput:–Training data–Scoring functionOutput:–A network that maximizes the score18Structure selection: Scoring•Bayesian: prior over parameters and structure–get balance between model complexity and fit to data as a byproduct•Score (G:D) = log P(G|D)  log [P(D|G) P(G)]•Marginal likelihood just comes from our parameter estimates•Prior on structure can be any measure we want; typically a function of the network complexitySame key property: DecomposabilityScore(structure) = i Score(family of Xi)Marginal likelihoodPrior19Heuristic searchBEACBEACBEACBEACΔscore(C)Add ECΔscore(A)Delete EAΔscore(A)Reverse EA20Exploiting decomposabilityBEACBEACBEACΔscore(C)Add ECΔscore(A)Delete EAΔscore(A)Reverse EABEACΔscore(A)Delete EATo recompute scores, only need to re-score familiesthat changed in the last move21Variations on a theme•Known structure, fully observable: only need to do parameter estimation•Unknown structure, fully observable: do heuristic search through structure space, then parameter estimation•Known structure, missing values: use expectation maximization (EM) to estimate parameters•Known structure, hidden variables: apply adaptive probabilistic network (APN) techniques•Unknown structure, hidden variables: too hard to solve!22Handling missing data•Suppose that in some cases, we observe earthquake, alarm, light-level, and moon-phase, but not burglary•Should we throw that data away??•Idea: Guess the missing valuesbased on the other dataEarthquake BurglaryAlarmMoon-phaseLight-level23EM (expectation maximization)•Guess probabilities for nodes with missing values (e.g., based on other observations)•Compute the probability distribution over the missing values, given our guess•Update the probabilities


View Full Document

UMBC CMSC 671 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?