Duke CPS 296.1 - An Analysis of Stochastic Game Theory - D1783642

Home> Schools> Duke University> (CPS) > CPS 296.1> An Analysis of Stochastic Game Theory

DOC PREVIEW

Duke CPS 296.1 - An Analysis of Stochastic Game Theory

School name Duke University

Course Cps 296.1- Introduction to Computer Vision

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

An Analysis of Stochastic Game Theory for MultiagentReinforcement LearningMichael Bowling Manuela VelosoOctober, 2000CMU-CS-00-165School of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213AbstractLearning behaviors in a multiagent environment is crucial for developing and adapting multiagent systems. Reinforce-ment learning techniques have addressed this problem for a single agent acting in a stationary environment, which ismodeled as a Markov decision process (MDP). But, multiagent environments are inherently non-stationary since theother agents are free to change their behavior as they also learn and adapt. Stochastic games, first studied in the gametheory community, are a natural extension of MDPs to include multiple agents. In this paper we contribute a compre-hensive presentation of the relevant techniques for solving stochastic games from both the game theory community andreinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identifysimilarities between these algorithms, single agent reinforcement learners, and basic game theory techniques.This research was sponsored by the United States Air Force under Cooperative Agreements No. F30602-97-2-0250 and No.F30602-98-2-0135. The views and conclusions contained in this document are those of the authors and should not be interpretedas necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced ResearchProjects Agency (DARPA), the Air Force, or the US Government.Keywords: Multiagent systems, stochastic games, reinforcement learning, game theory1 IntroductionThe problem of an agent learning to act in an unknown world is both challenging and interesting. Reinforcementlearning has been successful at finding optimal control policies for a single agent operating in a stationary environment,specifically a Markov decision process.Learning to act in multiagent systems offers additional challenges; see the following surveys [17, 19, 27]. Multipleagents can be employed to solve a single task, or an agent may be required to perform a task in a world containingother agents, either human, robotic, or software ones. In either case, from an agent’s perspective the world is notstationary. In particular, the behavior of the other agents may change, as they also learn to better perform their tasks.This type of multiagent nonstationary world creates a difficult problem for learning to act in these environments.However, this nonstationary scenario can be viewed as a game with multiple players. Game theory has aimed atproviding solutions to the problem of selecting optimal actions in multi-player environments. In game theory, thereis an underlying assumption that the players have similar adaptation and learning abilities. Therefore the actions ofeach agent affect the task achievement of the other agents. It seems therefore promising to identify and build upon therelevant results from game theory towards multiagent reinforcement learning.Stochastic games extend the single agent Markov decision process to include multiple agents whose actions allimpact the resulting rewards and next state. They can also be viewed as an extension of game theory’s simpler notionof matrix games. Such a view emphasizes the difficulty of finding optimal behavior in stochastic games, since optimalbehavior depends on the behavior of the other agents, and vice versa. This model then serves as a bridge combiningnotions from game theory and reinforcement learning.A comprehensive examination of the multiagent learning techniques for stochastic games does not exist. In thispaper we contribute such an analysis, examining techniques from both game theory and reinforcement learning. Theanalysis both helps to understand existing algorithms as well as being suggestive of areas for future work.In section 2 we provide the theoretical framework for stochastic games as extensions of both MDPs and matrixgames. Section 3 summarizes algorithms for solving stochastic games from the game theory and reinforcement learn-ing communities. We discuss the assumptions, goals, and limitations of these algorithms. We also taxonomize thealgorithms based on their game theoretic and reinforcement learning components. Section 4 presents two final algo-rithms that are based on a different game theoretic mechanism, which address a limitation of the other algorithms.Section 5 concludes with a brief summary and a discussion of the future work in multiagent reinforcement learning.2 Theoretical FrameworkIn this section we setup the framework for stochastic games. We first examine MDPs, which is a single-agent, multiplestate framework. We then examine matrix games, which is a multiple-agent, single state framework. Finally weintroduce the stochastic game framework which can be seen as the merging of MDPs and matrix games.2.1 Markov Decision ProcessesA Markov decision process is a tuple, (S, A, T, R), where S is a set of states, A is a set of actions, T is a transitionfunction S × A × S → [0, 1], and R is a reward function S × A → RR. The transition function defines a probabilitydistribution over next states as a function of the current state and the agent’s action. The reward function defines thereward received when selecting an action from the given state. Solving MDPs consists of finding a policy, π : S → A,which determines the agent’s actions so as to maximize discounted future reward, with discount factor γ.MDPs are the focus of much of the reinforcement learning work [11, 20]. The crucial result that forms the basisfor this work is the existence of a stationary and deterministic policy that is optimal. It is such a policy that is the targetfor RL algorithms.12.2 Matrix GamesA matrix game or strategic game (see [14] for an overview) is a tuple (n, A1...n, R1...n), where n is the number ofplayers, Aiis the set of actions available to player i (and A is the joint action space A1× . . . × An), and Riis playeri’s payoff function A → RR. The players select actions from their available set with the goal to maximize their payoffwhich depends on all the players’ actions. These are often called matrix games, since the Rifunctions can be writtenas n-dimensional matrices.Unlike MDPs, it is difficult to define what it means to “solve” a matrix game. A stationary strategy can only beevaluated if the other players’ strategies are known. This can be illustrated in the

View Full Document