Duke CPS 296.3 - Learning t o Coordinate Actions i n Multi-Agen t Systems

Unformatted text preview:

Learning to Coordinate Actions in Multi-Agent Systems Gerhard WeiB Institut fur lnformatik, Technische Universitat Miinchen Arcisstr. 21, 8000 Miinchen 2, Germany [email protected] Abstract This paper deals with learning in reactive multi-agent systems. The central problem ad-dressed is how several agents can collectively learn to coordinate their actions such that they solve a given environmental task together. In approaching this problem, two important con-straints have to be taken into consideration: the incompatibility constraint, that is, the fact that different actions may be mutually exclu-sive; and the local information constraint, that is, the fact that each agent typically knows only a fraction of its environment. The contents of the paper is as follows. First, the topic of learning in multi-agent systems is motivated (section 1). Then, two algorithms called ACE and AGE (standing for "ACtion Estimation" and "Action Group Estimation", respectively) for the reinforcement learning of appropriate sequences of action sets in multi agent systems are described (section 2). Next, experimental results illustrating the learning abilities of these algorithms are presented (sec-tion 3). Finally, the algorithms are discussed and an outlook on future research is provided (section 4). 1 Introduction Multi-Agent Systems. In computer science and arti-ficial intelligence the concept of multi-agent systems has influenced the initial developments in areas like cognitive modelling [Selfridge, 1959; Minsky, 1979], blackboard systems [Erman and Lesser, 1975], object-oriented pro-gramming languages [Hewitt, 1977], and formal models of concurrency [Petri, 1962; Brauer et ai, 1987]. Nowa-days multi-agent systems establish a major research sub-ject in distributed artificial intelligence; see [Bond and Gasser, 1988; Brauer and Hernandez, 1991; Gasser and Huhns, 1989; Huhns, 1987]. The interest in multi-agent systems is largely founded on the insight that many real-world problems are best modelled using a set of agents instead of a single agent. In particular, multi-agent mod-elling makes it possible (i) to cope with natural con-straints like the limitation of the processing power of a single agent or the physical distribution of the data to be processed and (it) to profit from inherent properties of distributed systems like robustness, fault tolerance, parallelism and scalability. Generally, a multi-agent system is composed of a num-ber of agents that are able to interact with each other and the environment and that differ from each other in their skills and their knowledge about the environment. (Usually an individual agent is assumed to consist of sen-sor component, a motor component, a knowledge base, and a learning component.) There is a great variety in the multi-agent systems studied in distributed arti-ficial intelligence [Huhns, 1987, foreword]. This paper deals with reactive multi-agent systems, where 'reac-tive" means that the behavior and the environment of the system are strongly coupled (there is a continuous interaction between the system and its environment). Learning. There is a common agreement that there are two important reasons for studying learning in multi -agent systems: to be able to endow artificial multi-agent systems (e.g., systems of interacting autonomous robots) with the ability to automatically improve their behavior; and to get a better understanding of the learn-ing processes in natural multi-agent systems (e.g., hu-man groups or societies). In a multi-agent system two forms of learning can be distinguished [Shaw and Whin-ston, 1989]. First, centralized or isolated learning, i.e. learning that is done by a single agent on its own (e.g. by creating new knowledge structures or by practicing motor activities). And second, distributed or collective learning, i.e. learning that is done by the agents as a group (e.g. by exchanging knowledge or by observing other agents). This paper focusses on collective learn-ing, and the central question addressed is: "How can eacn agent learn which action it shall perform under which circumstances?" In answering this question, two important constraints have to be taken into consider-ation [WeiB, 1993a, 1993b]. First, the incompatibility constraint, i.e. the fact that different actions may be incompatible in the sense that the execution of one ac-tion leads to environmental changes that impair or even prevent the execution of the others. And second, the local information constraint, i.e. the fact that an agent typically has only local information about the actualen-vironmental state, and this information may differ from the one another agent has; this situation is illustrated by figure 1. Two algorithms called the ACE algorithm and the AGE algorithm for reinforcement learning in reactive multi-agent systems are described (ACE and AGE are acronyms for "ACtion Estimation" and "Action Group Estimation", respectively). These algorithms base on the action-oriented version [WeiB, 1992] of the bucket brigade learning model for classifier systems [Holland, 1986]. According to both algorithms the agents collec-tively learn to estimate the goal relevance of their actions and, based on their estimates, to coordinate their actions Weiss 311312 Distributed AlWeiss 313Figure 2: A blocks world task. As it is described in section 2, learning proceeds by the repeated execution of the basic working cycle. A trial is defined as any sequence of at most four cycles that trans-forms the start into the goal configuration (successful trial), as well as any sequence of exactly four cycles that transforms the start into a non-goal configuration. At the end of each trial the start configuration is restored, and it is again presented to the agents. Additionally, at the end of each successful trial a non-zero external reward Rext is provided. Task Analysis. As a consequence of the local infor-mation constraint, an agent may be unable to distinguish between environmental states in which its actions are useful and relevant to goal attainment and environmen-tal states in which its actions are useless. (This situation is sometimes called the Sussman anomaly.) Consider the environmental states T, U and V shown in figure 3. Based on the usual blocks world notation, these three states are completely described by As it is easy to see, the action put(A1 B) of the agent A\ is useful in state T but not useful in state V. How-ever, because A1's local


View Full Document

Duke CPS 296.3 - Learning t o Coordinate Actions i n Multi-Agen t Systems

Download Learning t o Coordinate Actions i n Multi-Agen t Systems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning t o Coordinate Actions i n Multi-Agen t Systems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning t o Coordinate Actions i n Multi-Agen t Systems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?