Solving Factored MDP with Continuous and Discrete Variables

Home> Academic Documents> Solving Factored MDP with Continuous and Discrete Variables

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Appeared in the Twentieth Conference on Uncertainty in Artificial Intelligence, Banff, Canada, July 2004.Solving Factored MDPs with Continuous and Discrete VariablesCarlos Guestrin Milos Hauskrecht Branislav KvetonBerkeley Research CenterIntel CorporationDepartment of Computer ScienceUniversity of PittsburghIntelligent Systems ProgramUniversity of PittsburghAbstractAlthough many real-world stochastic planningproblems are more naturally formulated by hybridmodels with both discrete and continuous vari-ables, current state-of-the-art methods cannot ad-equately address these problems. We present thefirst framework that can exploit problem structurefor modeling and solving hybrid problems effi-ciently. We formulate these problems as hybridMarkov decision processes (MDPs with continu-ous and discrete state and action variables), whichwe assume can be represented in a factored wayusing a hybrid dynamic Bayesian network (hy-brid DBN). This formulation also allows us to ap-ply our methods to collaborative multiagent set-tings. We present a new linear program approx-imation method that exploits the structure of thehybrid MDP and lets us compute approximatevalue functions more efficiently. In particular, wedescribe a new factored discretization of continu-ous variables that avoids the exponential blow-upof traditional approaches. We provide theoreticalbounds on the quality of such an approximationand on its scale-up potential. We support our theo-retical arguments with experiments on a set of con-trol problems with up to 28-dimensional continu-ous state space and 22-dimensional action space.1 IntroductionMarkov decision processes (MDPs) [2, 3] offer an elegantmathematical framework for representing sequential deci-sion problems in the presence of uncertainty. While standardsolution techniques, such as value or policy iteration, scale-up well in terms of the total number of states and actions,these techniques are less successful in real-world MDPs rep-resented by state and action variables. In purely discrete set-tings, the running time of these algorithms grows exponen-tially in the number variables, the so called curse of dimen-sionality. Furthermore, many real-world problems include acombination of continuous and discrete state and action vari-ables. In this hybrid setting, the continuous components aretypically discretized, leading again to an exponential blowup in the number of variables.In this work, we present the first MDP framework that canexploit problem structure to represent and solve large hy-brid stochastic planning problems efficiently. We modelthese problems using a hybrid factored MDP where the dy-namics are represented compactly by a probabilistic graph-ical model, a hybrid dynamic Bayesian network (DBN) [7].Unfortunately, compact DBN parameterizations of hybridMDPs do not guarantee the existence of efficient exact so-lutions to these problems. In hybrid settings, this issue is ac-centuated, as exact parametric solutions may not exist. Wethus approximate the solution with a simpler parametric rep-resentation. We focus on linear approximations, where thevalue function is approximated by a linear combination ofbasis functions [1, 3]. Specifically, we use a factored (lin-ear) value function [12], where each basis function dependson a small number of state variables. This architecture iscentral for obtaining efficient approximation algorithms forfactored MDPs [12, 8].In hybrid settings, each basis function may depend on bothcontinuous and discrete state components. We show that theweights of this approximation can be optimized using a con-vex formulation we call hybrid approximate linear program-ming (HALP). The HALP reduces to the approximate linearprogramming (ALP) formulation [15] in purely discrete set-tings. When applied to discrete factored MDPs, the ALPapproach leads to linear programming (LP) problems with asmall number of variables, but with exponentially many con-straints; one constraint per each state-action pair. Guestrinet al. [8] present an efficient LP decomposition techniquethat exploits the structure of the factored model to repre-sent this LP by an equivalent, exponentially-smaller prob-lem. Schuurmans and Patrascu [14] build on this decompo-sition technique to design an efficient constraint generationapproach for the ALP problem. Alternatively, de Farias andVan Roy [5] propose a sampling approach for selecting asubset of the exponentially-large constraint set.In purely continuous settings, the HALP reduces to the for-mulation recently proposed by Hauskrecht and Kveton [11]for factored continuous-state MDPs (CMDPs). This formu-lation requires the computation of several complex integra-tion problems, which Hauskrecht and Kveton address byidentifying conjugate classes of parametric transition mod-els and basis functions that lead to efficient closed-form so-lutions. Their formulation also includes a constraint for eachstate, now leading to an infinite number of constraints. Tosolve the problem they apply the constraint sampling methodin combination with various heuristics to generate a small setof constraints that define an LP approximation with no theo-retical guarantees.In this paper, we first provide theoretical guarantees for thesolutions obtained by the HALP formulation that also applyto the purely continuous setting of Hauskrecht and Kveton.Specifically, we extend the theoretical analysis of de Fariasand Van Roy [6] for discrete ALPs to the continuous and hy-brid settings addressed by our HALP formulation, providingbounds with respect to the best approximation in the spaceof the basis functions.Although theoretically sound, the HALP formulation maystill be hard to solve directly. In particular, the HALP leadsto a linear program with an infinite number of constraints.We address this problem by defining a relaxed formulation,ε-HALP, over a finite subset of constraints, using a new fac-tored discretization technique that exploits structure in thecontinuous components. Once discretized, this formulationcan be solved efficiently by existing factored ALP methods[8, 14]. We finally provide a bound on the loss in the qualityof the solution of this relaxed relaxed ε-HALP with respectto that of the complete HALP formulation.We illustrate the feasibility of our formulation and its solu-tion algorithm on a sequence of control optimization prob-lems with 28-dimensional continuous state space and 22-dimensional action space. These nontrivial dynamic opti-mization


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school