DOC PREVIEW
UCLA STATS 238 - Lecture 5

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Course: Model, Learning, and Inference: Lecture 5Alan YuilleDepartment of Statistics, UCLALos Angeles, CA [email protected] distributions on structured representation. Dynamic Programming. Learning with EM.NOTE: NOT FOR DISTRIBUTION!!1 IntroductionWe discuss how to define probabilistic models that use richly structured probability distributions and describe howgraphical models can be used to represent the dependencies among a set of variables.Then we describe dynamic programming and EM for learning.2 Representing structured probability distributionsA probabilistic model defines the joint distribution for a set of random variables. For example, imagine that a friend ofyours claims to possess psychic powers – in particular, the power of psychokinesis. He proposes to demonstrate thesepowers by flipping a coin, and influencing the outcome to produce heads. You suggest that a better test might be tosee if he can levitate a pencil, since the coin producing heads could also be explained by some kind of sleight of hand,such as substituting a two-headed coin. We can express all possible outcomes of the proposed tests, as well as theircauses, using the binary random variables X1, X2, X3, and X4to represent (respectively) the truth of the coin beingflipped and producing heads, the pencil levitating, your friend having psychic powers, and the use of a two-headedcoin. Any set of beliefs about these outcomes can be encoded in a joint probability distribution, P (x1, x2, x3, x4).For example, the probability that the coin comes up heads (x1= 1) should be higher if your friend actually does havepsychic powers (x3= 1).Once we have defined a joint distribution on X1, X2, X3, and X4, we can reason about the implications of eventsinvolving these variables. For example, if flipping the coin produces heads (x1= 1), then the probability distributionover the remaining variables isP (x2, x3, x4|x1= 1) =P (x1= 1, x2, x3, x4)P (x1= 1). (1)This equation can be interpreted as an application of Bayes’ rule, with X1being the data, and X2, X3, X4being thehypotheses. However, in this setting, as with most probabilistic models, any variable can act as data or hypothesis.In the general case, we use probabilistic inference to compute the probability distribution over a set of unobservedvariables (here, X2, X3, X4) conditioned on a set of observed variables (here, X1).Another common pattern of influence is explaining away. Imagine that your friend flipped the coin, and it came upheads (x1= 1). The propositions that he has psychic powers (x3= 1) and that it is a two-headed coin (x4= 1)might both become more likely. However, while these two variables were independent before seeing the outcome ofthe coinflip, they are now dependent: if you were to go on to discover that the coin has two heads, the hypothesis ofpsychic powers would return to its baseline probability – the evidence for psychic powers was “explained away” bythe presence of the two-headed coin.14coin produces headstwo−headed coinpencil levitatesfriend has psychic powersXX1X3X2Figure 1: Directed graphical model (Bayes net) showing the dependencies among variables in the “psychic friend”example discussed in the text.2.1 Directed graphical modelsDirected graphical models, also known as Bayesian networks or Bayes nets, consist of a set of nodes, representingrandom variables, together with a set of directed edges from one node to another, which can be used to identifystatistical dependencies between variables. Typically, nodes are drawn as circles, and the existence of a directed edgefrom one node to another is indicated with an arrow between the corresponding nodes. If an edge exists from node Ato node B, then A is referred to as the “parent” of B, and B is the “child” of A. This genealogical relation is oftenextended to identify the “ancestors” and “descendants” of a node.The directed graph used in a Bayes net has one node for each random variable in the associated probability distribution.The edges express the statistical dependencies between the variables in a fashion consistent with the Markov condition:conditioned on its parents, each variable is independent of all other variables except its descendants. This has animportant implication: a Bayes net specifies a canonical factorization of a probability distribution into the product ofthe conditional distribution for each variable conditioned on its parents. Thus, for a set of variables X1, X2, . . . , XM,we can write P (x1, x2, . . . , xM) =QiP (xi|Pa(Xi)) where Pa(Xi) is the set of parents of Xi.Figure 1 shows a Bayes net for the example of the friend who claims to have psychic powers. This Bayes net identifiesa number of assumptions about the relationship between the variables involved in this situation. For example, X1and X2are assumed to be independent given X3, indicating that once it was known whether or not your friend waspsychic, the outcomes of the coin flip and the levitation experiments would be completely unrelated. By the Markovcondition, we can write P (x1, x2, x3, x4) = P (x1|x3, x4)P (x2|x3)P (x3)P (x4). This factorization allows us to usefewer numbers in specifying the distribution over these four variables: we only need one number for each variable,conditioned on each set of values taken on by its parents. In this case, this adds up to 8 numbers rather than 15.2.2 Undirected graphical modelsUndirected graphical models, also known as Markov Random Fields (MRFs), consist of a set of nodes, representingrandom variables, and a set of undirected edges, defining neighbourhood structure on the graph which indicates theprobabilistic dependencies of the variables at the nodes. Each set of fully-connected neighbors as associated with apotential function, which varies as the associated random variables take on different values. When multiplied together,these potential functions give the probability distribution over all the variables. Unlike directed graphical models, thereneed be no simple relationship between these potentials and the local conditional probability distributions. Moreover,undirected graphical models usually have closed loops (if they do not, then they can be reformulated as directedgraphical models).In this section we will use Xito refer to variables whose values can be directed observed and Yito refer to latent, orhidden, variables whose values can only be inferred, see figure (4). We will use the vector notation ~x and ~y


View Full Document

UCLA STATS 238 - Lecture 5

Download Lecture 5
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?