Learning Hidden Variable Networks

Home> Academic Documents> Learning Hidden Variable Networks

DOC PREVIEW

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Journal of Machine Learning Research 6 (2005) 81–127 Submitted 9/04; Revised 12/04; Published 1/05Learning Hidden Variable Networks:The Information Bottleneck ApproachGal Elidan [email protected] of Engineering and Computer ScienceThe Hebrew UniversityJerusalem, 91904, IsraelNir Friedman [email protected] of Engineering and Computer ScienceThe Hebrew UniversityJerusalem, 91904, IsraelEditor: David Maxwell ChickeringAbstractA central challenge in learning probabilistic graphical models is dealing with domains that involvehidden variables. The common approach for learning model parameters in such domains is theexpectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in sub-optimal local maxima. Learning the model structure is even more challenging. The structural EMalgorithm can adapt the structure in the presence of hidden variables, but usually performs poorlywithout prior knowledge about the cardinality and location of the hidden variables. In this work, wepresent a general approach for learning Bayesian networks with hidden variables that overcomesthese problems. The approach builds on the information bottleneck framework of Tishby et al.(1999). We start by proving formal correspondence between the information bottleneck objectiveand the standard parametric EM functional. We then use this correspondence to construct a learningalgorithm that combines an information-theoretic smoothing term with a continuation procedure.Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following acontinuous path from a solution of, an easy and smooth, target function, to a solution of the desiredlikelihood function. As we show, our algorithmic framework allows learning of the parametersas well as the structure of a network. In addition, it also allows us to introduce new hidden vari-ables during model selection and learn their cardinality. We demonstrate the performance of ourprocedure on several challenging real-life data sets.Keywords: Bayesian networks, hidden variables, information bottleneck, continuation, variationalmethods1. IntroductionProbabilistic graphical models have been widely used to model real world domains and are par-ticularly appealing due to their natural interpretation. Despite extensive research in learning thesemodels from data (Pearl, 1988; Heckerman, 1998), learning with hidden (or latent) variables hasA preliminary version of this paper appeared in the Proceedings of the Nineteenth Conference on Uncertainty inArtificial, 2003 (UAI ’03).c2005 Gal Elidan and Nir Friedman.ELIDAN AND FRIEDMANremained a central challenge in learning graphical models in general, and Bayesian networks inparticular. Hidden entities play a central role in many real-life problems: an unknown regulatingmechanism can be the key to complex biological systems; correlating symptoms might hint at a hid-den fundamental problem in a diagnostic system; an intentionally masked economic power mightbe the cause of related financial phenomena. Indeed, hidden variables typically serve as a summa-rizing mechanism that “captures” information from some of the observed variables and “passes”this information to some other part the network. As such, hidden variables can simplify the networkstructure and consequently lead to better generalization.When learning the parameters of a Bayesian network with missing values or hidden variables,the most common approach is to use some variant of the expectation maximization (EM) algorithm(Dempster et al., 1977; Lauritzen, 1995). This algorithm performs a greedy search of the likelihoodsurface and converges to a local stationary point (usually a local maximum). Unfortunately, inchallenging real-life learning problems, there are many local maxima that can trap EM in a poorsolution. Attempts to address this problem use a variety of strategies (e.g., Glover and Laguna(1993); Kirkpatrick et al. (1983); Rose (1998); Elidan et al. (2002)). When learning structure, thestructural EM (SEM) algorithm (Friedman, 1997; Meila and Jordan, 1998; Thiesson et al., 1998)can adapt the network topology. In this approach, as in the classical parametric EM algorithm, weuse the distribution induced by our current model, to probabilistically complete the data. Unlikeparametric EM, we then use the completed data to evaluate different candidate structures. Thisallows us to perform structure improvement steps in the M-Step of a structural EM iteration. Asin the case of EM, while convergence is guaranteed, the algorithm typically converges to a localmaximum.An even more challenging problem is that of model selection with hidden variables. This in-volves choosing the number of hidden variables, their cardinalities and the dependencies betweenthem and the observed entities of the domain. These decisions are crucial to achieve good gen-eralization. In particular, in hard real-life learning problems, structural EM will perform poorlyunless some prior knowledge of the interaction between the hidden and observed variables exists orif the cardinality of the hidden variables is not (at least approximately) known. These challengingproblems have received surprisingly little attention.In this paper, we introduce a new approach to learning Bayesian networks with hidden variables.We pose the learning problem as an the optimization of a target function that includes a tradeoffbetween two information theoretic objectives. The first objective is to compress information aboutthe training data. Intuitively, this is required when we want to generalize from the training datato new unseen instances. The second objective is to make the hidden variables informative aboutthe observed attributes to ensure they preserve the relevant information. This objective is directlyrelated to maximizing the likelihood of the training data. By exploring different relative weightingsof these two objectives, we are able to bypass local maxima and learn better models.Our approach builds on the information bottleneck framework of Tishby et al. (1999) and itsmultivariate extension (Friedman et al., 2001). This framework provides methods for constructinga set of new variables T that are stochastic functions of one set of variables Y and at the same timeprovide information on another set of variables X. The intuition is that the new variables T capturethe relevant aspects of Y that are informative about X. We show how to pose the learning


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Please select your school