



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Paper; Class: DOCTORAL SEMINAR: SYSTEMS; Subject: Information Science; University: University of Pittsburgh; Term: Spring 2005;
Typology: Papers
1 / 27
This page cannot be seen from the preview
Don't miss anything!
Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands
Abstract
Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these depen- dencies will change rapidly as a result of the continuously changing state. Via a context-specific decomposition of the problem into smaller subproblems, coordina- tion graphs offer scalable solutions to the problem of multiagent decision making. In this work, we apply coordination graphs to a continuous (robotic) domain by assigning roles to the agents and then coordinating the different roles. Moreover, we demonstrate that, with some additional assumptions, an agent can predict the actions of the other agents, rendering communication superfluous. We have suc- cessfully implemented the proposed method into our UvA Trilearn simulated robot soccer team which won the RoboCup-2003 World Championship in Padova, Italy.
Key words: Multiagent coordination, coordination graphs, game theory, RoboCup
1 Introduction
A multiagent (multi-robot) system is a group of agents that coexist in an environment and can interact with each other in several different ways in order to optimize a performance measure (1). Research in multiagent systems aims at providing principles for the construction of complex systems containing multiple independent agents and focuses on behavior management issues (e.g., coordination of behaviors) in such systems.
Email addresses: jellekok@science.uva.nl (Jelle R. Kok), mtjspaan@science.uva.nl (Matthijs T. J. Spaan), vlassis@science.uva.nl (Nikos Vlassis).
Preprint submitted to Elsevier Science 21 January 2005
We are interested in fully cooperative multiagent systems in which all agents share a common goal. A key aspect in such systems is the problem of coor- dination: the process that ensures that the individual decisions of the agents result in jointly optimal decisions for the group. In principle game theoretic techniques can be applied to solve the coordination problem (2), but this ap- proach requires reasoning over the joint action space of the agents, whose size is exponential in the number of agents. For practical situations involving many agents, modeling n-person games becomes intractable. However, the particu- lar structure of the coordination problem can often be exploited to reduce its complexity.
A recent approach to decrease the size of the joint action space involves the use of a coordination graph (CG) (3). In this graph, each node represents an agent, and an edge indicates that the corresponding agents have to coordinate their actions. In order to reach a jointly optimal action, a variable elimination algorithm is applied that iteratively solves the local coordination problems one by one and propagates the result through the graph using a message passing scheme. In a context-specific CG (4) the topology of the graph is first dynamically updated based on the current state of the world before the elimination algorithm is applied.
In this work we will describe a framework to coordinate multiple robots using coordination graphs. We assume a group of robotic agents that are embedded in a continuous and dynamic domain and are able to perceive their surround- ings with sensors. The continuous nature of the state space makes the direct application of context-specific CGs difficult. Therefore, we appropriately ‘dis- cretize’ the continuous state by assigning roles to the agents (5) and then, instead of coordinating the different agents, coordinate the different roles. It turns out that such an approach offers additional benefits: the set of roles not only allows for the definition of natural coordination rules that exploit prior knowledge about the domain, but also constrains the feasible action space of the agents. This greatly simplifies the modeling and the solution of the problem at hand.
Furthermore, we will describe a method that, using some additional common knowledge assumptions, allows an agent to predict the optimal action of its neighboring agents, making communication unnecessary. Finally, we work out an extensive example in which we apply coordination graphs to the RoboCup simulated soccer domain.
The setup is as follows: in Section 2 we review the coordination problem from a game-theoretic perspective, and in Section 3 we explain the concept of a coordination graph. In Section 4 we will describe our framework to coordi- nate agents in a continuous dynamic environment using roles without using communication. This is followed by an extensive example in the RoboCup soc-
agents i and Rj (a) > Rj (a∗) for at least one agent j. That is, there is no other outcome that makes every player at least as well off and at least one player strictly better off. There are many examples of strategic games where a Pareto optimal solution is not a Nash equilibrium and vice versa (e.g., in the famous prisoner’s dilemma (2)). However, in coordinated games such as the one depicted in Fig. 1 each Pareto optimal solution is also a Nash equilibrium by definition.
Formally, the coordination problem can be seen as the problem of selecting one single Pareto optimal Nash equilibrium 1 in a coordination game (1). This can be accomplished using several different methods (7): using communication, learning, or by imposing social conventions. In the first case an agent can inform the other agent of its action, restricting the choice of the other agents to a simplified coordination game. If in the movie example the first agent would notify the other agent that it will select the comedy, the coordination game is simplified to the second row which contains only one equilibrium. Secondly, learning can be used when the strategic game is played repeatedly. Each agent makes predictions about the actions of the other players based on the previous interactions and chooses its action accordingly. This approach has received much attention over the past several years (7; 8; 9). Finally, social conventions are constraints on the action choices of the agents. It can be regarded as a rule to select one of all the possible equilibria. As long as this convention is common knowledge among the agents, no agent can benefit from not abiding it. This general, domain-independent method will always result in an optimal joint action and moreover, it can be implemented offline: during execution the agents do not have to explicitly coordinate their actions, e.g., via negotiation. For instance, we can create a lexicographic ordering scheme in which we first order the agents and then the actions in our previous example. Assuming the ordering ‘1 ¬ 2’ (meaning that agent 1 has priority over agent 2) and ‘thriller ¬ comedy’, the second agent can derive from the social conventions that the first agent will select the thriller and will therefore also choose the thriller.
In the above cases we assume that all equilibria can be found and coordination is the result of each individual agent selecting its individual action based on the same equilibrium. However, the number of joint actions grows exponentially with the number of agents, making it infeasible to determine all equilibria in the case of many agents. This calls for methods that first reduce the size of the joint action space before solving the coordination problem. One such approach, explained next, is based on the use of a coordination graph that captures local coordination requirements between agents.
(^1) In the rest of this article, we denote a Pareto optimal Nash equilibrium simply
by equilibrium, unless otherwise stated.
PSfrag replacements
f 1 f 2
f 3
Fig. 2. An example coordination graph for a 4-agent problem.
3 Coordination graphs
In systems where multiple agents have to coordinate their actions, it is infea- sible to model all possible joint actions since this number grows exponentially with the number of agents. Fortunately, most problems exhibit the property that each agent only has to coordinate with a small subset of the other agents, e.g., in many robotic applications only robots that are close to each other have to coordinate their actions. A recent approach to exploit such dependencies in- volves the use of a coordination graph (CG), which represents the coordination requirements of a system (3).
The main assumption is that the global payoff function R(a) can be decom- posed into a linear combination of local payoff functions, each involving only a few agents. For example, suppose that there are four agents and the following decomposition of the payoff function:
R(a) = f 1 (a 1 , a 2 ) + f 2 (a 1 , a 3 ) + f 3 (a 3 , a 4 ).
The functions fi specify the local coordination dependencies between the ac- tions of the agents and can be graphically depicted as in Fig. 2. A node in this graph represents an agent, denoted by Ai, while an edge defines a (possible directed) dependency between two agents. Only interconnected agents have to coordinate their actions at any particular instance. In the decomposition of R(a), A 2 has to coordinate with A 1 , A 4 has to coordinate with A 3 , A 3 has to coordinate with both A 4 and A 1 , and A 1 has to coordinate with both A 2 and A 3. The global coordination problem is thus replaced by a number of local coordination problems each involving fewer agents.
In order to solve the coordination problem and find the optimal joint action a∗^ that maximizes R(a), we can apply a variable elimination algorithm, which is almost identical to variable elimination in a Bayesian network (3; 10). The main idea is that the agents are eliminated one by one after performing a local maximization step which takes all possible action combinations of an agent’s neighbors into account.
The algorithm operates as follows. One agent is selected for elimination, and it collects all payoff functions from its neighbors. Next, this agent optimizes its
PSfrag replacements
A 1 〈a 1 ∧ a 3 ∧ x : 4〉 〈a 1 ∧ a 2 ∧ x : 5〉 A 2 〈a 2 ∧ x : 2〉 A 3 〈a 3 ∧ a 2 ∧ x : 5〉 A 4 〈a 3 ∧ a 4 ∧ x : 10〉
〈a 1 ∧ a 3 : 4〉 〈a 1 ∧ a 2 : 5〉 〈a 2 : 2〉 〈a 3 ∧ a 2 : 5〉
Fig. 3. Initial coordination graph (left) and graph after conditioning on the context x = true (right).
factorized representation of the complete payoff matrices since they are only specified for a context with non-zero payoff.
More formally, let A 1 ,... , An be a group of agents, where each agent Aj has to choose an action aj ∈ Aj resulting in a joint action a ∈ A = A 1 × ... × An and let X be a set of discrete state variables. The context c is then an element from the set of all possible combinations of the state and action variables, c ∈ C ⊆ X ∪ A. A value rule 〈 p ; c : v〉 ∈ P is a function p : C → IR such that p(x, a) = v when c = (x, a) is consistent with the current context and 0 otherwise. For a particular situation only those value rules contribute to the global payoff R(a) that are consistent with the current context: R(a) = ∑m i=1 pi(x, a) where^ m^ is the total number of value rules and^ x^ and^ a^ are respectively a state and joint action.
As an example, consider the case where two persons have to coordinate their actions to pass through a narrow door. We describe this situation using the following value rule:
〈p 1 ; in-front-of-same-door(1, 2) ∧ a 1 = passThroughDoor ∧ a 2 = passThroughDoor : − 50 〉
This rule indicates that when the two agents are located in front of the same door and both select the same action (passing through the door), the global payoff value will be reduced by 50. When the state is not consistent with the above rule (and the agents are not located in front of the same door), the rule does not apply and the agents do not have to coordinate their actions. By conditioning on the current state the agents discard all irrelevant rules, and as a result the CG is dynamically updated and simplified. Note that for the conditioning step each agent only needs to observe that part of the state mentioned in its own value rules.
For a more extensive example, see Fig. 3. Below the left graph all value rules, defined over binary action and context variables 2 , are depicted together with the agent the rule applies to. The coordination dependencies between the agents are represented by directed edges, where each (child) agent has an incoming edge from the (parent) agent that affects its decision. After the agents observe the current state, x = true, they condition on the context. The rule of A 4 does not apply and is removed. As a consequence, the optimal joint action is independent of the action of A 4 and the edge to A 4 is deleted from the graph as shown in the right graph of Fig. 3. In this case, A 4 can thus select either action without affecting the global reward R(a).
After the agents have conditioned on the state variables, the agents are one by one eliminated from the graph. Let us assume that we first eliminate A 3 in the above example. Agent 3 first collects all rules from its children in which it is involved and then maximizes over the rules 〈a 1 ∧ a 3 : 4〉〈a 3 ∧ a 2 : 5〉. For all possible actions of A 1 and A 2 , A 3 determines its best-response and then distributes the corresponding conditional strategy, in this case equal to 〈a 2 : 5〉〈a 1 ∧ a 2 : 4〉, to its parent A 2. Now a new directed edge from A 1 to A 2 is generated, since A 2 receives a rule containing an action of A 1. After this step, A 3 has no children in the coordination graph anymore and is eliminated. The procedure continues and after A 2 has distributed its conditional strategy 〈a 1 : 11〉〈a 1 : 5〉 to A 1 , it is also eliminated. Finally, A 1 is the last agent left and fixes its action to a 1. Now a second pass in the reverse order is performed, in which each agent distributes its strategy to its parents, who then determine their final strategy. This results in the optimal joint action {a 1 , a 2 , a 3 , a 4 } and a global payoff of 11. Note that {a 1 , a 2 , a 3 , a 4 } is also an optimal joint action. It depends on the individual action choice of A 4 which joint action is selected.
The outcome of the variable elimination algorithm is independent of the elim- ination order and the initial distribution of the rules and will always result in an optimal joint action (3). However, the execution time of the algorithm does depend on the elimination order. In the table-based approach the cost of the algorithm is linear in the number of new functions introduced (3). In the rule-based approach the cost is polynomial in the number of new rules generated in the maximization operation (4). This number is never larger and often exponentially smaller than the complexity of the table-based approach 3. Computing the optimal order for minimizing the mentioned runtime costs is known to be NP-complete (11), but good heuristics exists, e.g., minimum de- ficiency search which first eliminates the agents with the minimum difference between incoming and outgoing edges (12; 13).
(^2) The action a 1 corresponds to a 1 = true and the action a 1 to a 1 = false. (^3) Do note that the rule-based approach involves an extra cost regarding the man-
agement of the sets of rules, causing its advantage to manifest primarily in problems with a fair amount of context specific dependencies (4).
current (continuous) state, (ii) it should be sparse in order to keep the depen- dencies and the associated local coordination problems as simple as possible, (iii) it should be applicable in situations where communication is unavailable or very expensive.
In the remainder of this section, we will concentrate on the two main features of our proposed method, designed to fulfill the requirements mentioned above. The first is the assignment of roles to the agents in order to apply coordination graphs to continuous domains and to reduce the action sets of the different agents; the second is to predict the chosen action of the other agents, rendering communication superfluous.
4.1 Context-specificity using roles
Conditioning on a context that is defined over a continuous domain is difficult in the original rule-based CG representation. A way to ‘discretize’ the context is by assigning roles to agents (5; 16; 17; 1). Roles are a natural way of intro- ducing domain prior knowledge to a multiagent problem and provide a flexible solution to the problem of distributing the global task of a team among its members. In the soccer domain for instance one can easily identify several roles ranging from ‘active’ or ‘passive’ depending on whether an agent is in control of the ball or not, to more specialized ones like ‘striker’, ‘defender’, ‘goalkeeper’, etc.
In (18) a role is defined as an abstract specification of the set of activities an individual or subteam undertakes in service of the team’s overall activity. In our framework a role m ∈ M defines this set of activities as a set of value rules Pm. In the original rule-based CG each agent has only one set of value rules, which then would have to include all rules for all roles. Now, based on the given role assignment only a subset of all value rules applies which simplifies the coordination graph. Furthermore, the value rules in Pm pose additional constraints on the role of other agents contained in the value rules, reducing the edges in the coordination graph even further. For instance, agent i in role ‘goalkeeper’ who controls the ball and considers to pass the ball to any of the agents j in role ‘defender’ could result in the following value rule:
〈pgoalkeeper 1 ; has-role-defender(j) ∧ ai = passTo(j) : 10 〉, ∀j 6 = i.
An assignment of roles to agents provides a natural way to parametrize a coordination structure over a continuous domain. The intuition is that, instead of directly coordinating the agents in a particular situation, we assign roles to the agents based on this situation and subsequently try to ‘coordinate’ the set of roles. The other roles {m′} ∈ M \ m mentioned in the set of value rules
Pm for role m define a coordination subgraph structure on M. As such, the assigned roles induce a coordination graph between all agents, each executing a certain role.
A question which remains is how roles are assigned to agents. In this section we describe the communication based case, which we can exploit to use a distributed role assignment algorithm (5; 16; 1). In the next section, we will concentrate on the situation in which communication is unavailable.
The role assignment algorithm, which is common knowledge among the agents, defines a sequence M ′^ of roles where |M ′| ≥ n which represents a preference ordering over the roles: the most ‘important’ role is assigned to an agent first, followed by the second most important role, etc. By construction, the same role can be assigned to more than one agent, but each agent is assigned only a single role. Each role m has an associated potential rim which is a real-valued estimate of how appropriate agent i is for the role m in the current world state. These potentials rim depend on features of the state space relevant for role m as observed by agent i. For example, relevant features for role ‘striker’ could be the time needed to intercept the ball or the global position on the field. Each agent computes its potential for each m ∈ M ′^ and sends these to the other agents. Now the first m ∈ M ′^ is assigned to the agent that has the highest potential for that role. This agent is no longer under consideration, the next m is assigned to another agent, and so on, until all agents have been assigned a role. This algorithm requires sending O(|M |n) messages, as each agent has to send each other agent its potential rim for all m ∈ M ′.
The roles can be regarded as an abstraction of a continuous state to a discrete context, allowing the application of existing techniques for discrete-state CGs. Furthermore, roles can reduce the action space of the agents by ‘locking out’ specific actions. For example, the role of the goalkeeper does not include the action ‘score’, and in a ‘passive’ role the action ‘shoot’ is deactivated. Such a reduction of the action space can offer computational savings, but more impor- tantly it can facilitate the solution of a local coordination game by restricting the joint action space to a subspace that contains less Nash equilibria.
4.2 Non-communicating agents
For the role assignment each agent has to communicate its potential for a certain role to all other agents. Furthermore, the variable elimination requires that each agent receives the payoff functions of its neighboring agents, and after computing its optimal conditional strategy communicates a new payoff function back to its neighbors. Similarly, in the reverse process each agent needs to communicate its decision to its neighbors in order to reach a coordi-
In the noncommunicative case the elimination order neither has to be fixed in advance nor has to be common knowledge among all agents as in (3), but each agent is free to choose any elimination order, for example, one that allows the agent to quickly compute its own optimal action. This is possible because a particular elimination order affects only the speed of the algorithm and not the computed joint action as described earlier.
In terms of complexity, the computational costs for each individual agent are clearly increased to compensate for the unavailable communication. Instead of only optimizing for its own action, in the worst case each agent has to calculate the action of every other agent in the subgraph. The computational cost per agent increases thus linearly with the number of new payoff func- tions generated during the elimination procedure. Communication, however, is not used anymore which allows for a speedup since these extra individual computations may now run in parallel. This is in contrast to the original CG approach where computations need to be performed sequentially.
In summary, we can apply the CG framework without using communication when all agents are able to run the same algorithm in parallel. For this, we have to make the following assumptions:
Finally, we note that the common knowledge assumption is strong and even in cases where communication is available it cannot always be guaranteed (19). In multiagent systems without communication common knowledge can be guar- anteed if all agents consistently observe the same world state, but this is also violated in practice due to partial observability of the environment (a soccer player has a limited field of view). In our case, when the agents have to agree on a particular role distribution given a particular context, the only requirement we impose is that the role assignment in a particular local context is based on those parts of the state that are, to a good approximation, fully observable by all agents involved in the role assignment. For example, in a soccer game the particular role assignment may require that in a group of coordinating agents observe the position of each other in the field, as well as the positions of their nearby opponents, and have a rough estimate of the position of the ball (e.g., knowing that the ball is far away). As long as such a context is encountered, a local graph is formed which is disconnected from the rest of the CG and can be solved separately.
5 Experiments
We have applied the aforementioned framework in our simulation robot soc- cer team UvA Trilearn (20). The main motivation was to improve upon the coordination during ball passes between teammates. First of all, we have con- ducted an experiment in which a complete team strategy was specified using value rules and each player selected its action based on the result of the vari- able elimination algorithm. In this case we made the world fully observable for all agents and used no communication between the agents. Furthermore, we incorporated this framework in our competition team, which participated in the RoboCup-2003 World Championships. For this we made some necessary modifications since during competition the world is only partially observable. Both approaches will be explained next in more detail.
5.1 Fully observable, non-communicating team
In this section we will explain how we have constructed a complete team strategy using the value rules from the CG framework. We assume that the world is fully observable 5 such that each agent can model the complete CG algorithm by itself as explained in Section 4.2. This is necessary since the RoboCup soccer simulation does not allow agents to communicate with more than one agent at the same time, which makes it impossible to apply the original variable elimination algorithm. This has no effect on the outcome of the algorithm. Furthermore, we used the synchronization mode to ensure that the simulator only proceeds to the next cycle when all the actions of the players are received. This to make sure no action opportunities are lost because of the computation time of the algorithm.
In order to accomplish coordination, all agents first perform the role assign- ment which maps each agent to one of the following ordered sequence of roles
M ′^ = {active, receiver, receiver, passive, passive, passive, passive, passive, passive, passive}.
There are four possible roles. The first and most important role is that of the active player which performs an action with the ball. We distinguish between two different types of roles, interceptor and passer, depending whether the ball can be kicked by the active player or not. Next, we assign the two roles
(^5) This is a configuration setting in the soccer server.