Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Non-communicative Multi-Robot Coordination in Dynamic Environments | INFSCI 3350, Papers of Information Technology

University of Pittsburgh (Pitt) - Medical Center-Health System Information Technology

Material Type: Paper; Class: DOCTORAL SEMINAR: SYSTEMS; Subject: Information Science; University: University of Pittsburgh; Term: Spring 2005;

Typology: Papers

Pre 2010

Uploaded on 09/02/2009

koofers-user-qck-1 🇺🇸

10 documents

1 / 27

This page cannot be seen from the preview

Don't miss anything!

Non-communicative multi-robot coordination

in dynamic environments

Jelle R. Kok, Matthijs T. J. Spaan, Nikos Vlassis

Informatics Institute, Faculty of Science, University of Amsterdam

Kruislaan 403, 1098 SJ Amsterdam, The Netherlands

Abstract

Within a group of cooperating agents the decision making of an individual agent

depends on the actions of the other agents. In dynamic environments, these depen-

dencies will change rapidly as a result of the continuously changing state. Via a

context-specific decomposition of the problem into smaller subproblems, coordina-

tion graphs offer scalable solutions to the problem of multiagent decision making.

In this work, we apply coordination graphs to a continuous (robotic) domain by

assigning roles to the agents and then coordinating the different roles. Moreover,

we demonstrate that, with some additional assumptions, an agent can predict the

actions of the other agents, rendering communication superfluous. We have suc-

cessfully implemented the proposed method into our UvA Trilearn simulated robot

soccer team which won the RoboCup-2003 World Championship in Padova, Italy.

Key words:

Multiagent coordination, coordination graphs, game theory, RoboCup

1 Introduction

A multiagent (multi-robot) system is a group of agents that coexist in an

environment and can interact with each other in several different ways in

order to optimize a performance measure (1). Research in multiagent systems

aims at providing principles for the construction of complex systems containing

multiple independent agents and focuses on behavior management issues (e.g.,

coordination of behaviors) in such systems.

Email addresses: jellekok@science.uva.nl (Jelle R. Kok),

mtjspaan@science.uva.nl (Matthijs T. J. Spaan), vlassis@science.uva.nl

(Nikos Vlassis).

Preprint submitted to Elsevier Science 21 January 2005

Partial preview of the text

Download Non-communicative Multi-Robot Coordination in Dynamic Environments | INFSCI 3350 and more Papers Information Technology in PDF only on Docsity!

Non-communicative multi-robot coordination

in dynamic environments

Jelle R. Kok, Matthijs T. J. Spaan, Nikos Vlassis

Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands

Abstract

Within a group of cooperating agents the decision making of an individual agent depends on the actions of the other agents. In dynamic environments, these depen- dencies will change rapidly as a result of the continuously changing state. Via a context-specific decomposition of the problem into smaller subproblems, coordina- tion graphs offer scalable solutions to the problem of multiagent decision making. In this work, we apply coordination graphs to a continuous (robotic) domain by assigning roles to the agents and then coordinating the different roles. Moreover, we demonstrate that, with some additional assumptions, an agent can predict the actions of the other agents, rendering communication superfluous. We have suc- cessfully implemented the proposed method into our UvA Trilearn simulated robot soccer team which won the RoboCup-2003 World Championship in Padova, Italy.

Key words: Multiagent coordination, coordination graphs, game theory, RoboCup

1 Introduction

A multiagent (multi-robot) system is a group of agents that coexist in an environment and can interact with each other in several different ways in order to optimize a performance measure (1). Research in multiagent systems aims at providing principles for the construction of complex systems containing multiple independent agents and focuses on behavior management issues (e.g., coordination of behaviors) in such systems.

Email addresses: jellekok@science.uva.nl (Jelle R. Kok), mtjspaan@science.uva.nl (Matthijs T. J. Spaan), vlassis@science.uva.nl (Nikos Vlassis).

Preprint submitted to Elsevier Science 21 January 2005

We are interested in fully cooperative multiagent systems in which all agents share a common goal. A key aspect in such systems is the problem of coor- dination: the process that ensures that the individual decisions of the agents result in jointly optimal decisions for the group. In principle game theoretic techniques can be applied to solve the coordination problem (2), but this ap- proach requires reasoning over the joint action space of the agents, whose size is exponential in the number of agents. For practical situations involving many agents, modeling n-person games becomes intractable. However, the particu- lar structure of the coordination problem can often be exploited to reduce its complexity.

A recent approach to decrease the size of the joint action space involves the use of a coordination graph (CG) (3). In this graph, each node represents an agent, and an edge indicates that the corresponding agents have to coordinate their actions. In order to reach a jointly optimal action, a variable elimination algorithm is applied that iteratively solves the local coordination problems one by one and propagates the result through the graph using a message passing scheme. In a context-specific CG (4) the topology of the graph is first dynamically updated based on the current state of the world before the elimination algorithm is applied.

In this work we will describe a framework to coordinate multiple robots using coordination graphs. We assume a group of robotic agents that are embedded in a continuous and dynamic domain and are able to perceive their surround- ings with sensors. The continuous nature of the state space makes the direct application of context-specific CGs difficult. Therefore, we appropriately ‘dis- cretize’ the continuous state by assigning roles to the agents (5) and then, instead of coordinating the different agents, coordinate the different roles. It turns out that such an approach offers additional benefits: the set of roles not only allows for the definition of natural coordination rules that exploit prior knowledge about the domain, but also constrains the feasible action space of the agents. This greatly simplifies the modeling and the solution of the problem at hand.

Furthermore, we will describe a method that, using some additional common knowledge assumptions, allows an agent to predict the optimal action of its neighboring agents, making communication unnecessary. Finally, we work out an extensive example in which we apply coordination graphs to the RoboCup simulated soccer domain.

The setup is as follows: in Section 2 we review the coordination problem from a game-theoretic perspective, and in Section 3 we explain the concept of a coordination graph. In Section 4 we will describe our framework to coordi- nate agents in a continuous dynamic environment using roles without using communication. This is followed by an extensive example in the RoboCup soc-

agents i and Rj (a) > Rj (a∗) for at least one agent j. That is, there is no other outcome that makes every player at least as well off and at least one player strictly better off. There are many examples of strategic games where a Pareto optimal solution is not a Nash equilibrium and vice versa (e.g., in the famous prisoner’s dilemma (2)). However, in coordinated games such as the one depicted in Fig. 1 each Pareto optimal solution is also a Nash equilibrium by definition.

Formally, the coordination problem can be seen as the problem of selecting one single Pareto optimal Nash equilibrium 1 in a coordination game (1). This can be accomplished using several different methods (7): using communication, learning, or by imposing social conventions. In the first case an agent can inform the other agent of its action, restricting the choice of the other agents to a simplified coordination game. If in the movie example the first agent would notify the other agent that it will select the comedy, the coordination game is simplified to the second row which contains only one equilibrium. Secondly, learning can be used when the strategic game is played repeatedly. Each agent makes predictions about the actions of the other players based on the previous interactions and chooses its action accordingly. This approach has received much attention over the past several years (7; 8; 9). Finally, social conventions are constraints on the action choices of the agents. It can be regarded as a rule to select one of all the possible equilibria. As long as this convention is common knowledge among the agents, no agent can benefit from not abiding it. This general, domain-independent method will always result in an optimal joint action and moreover, it can be implemented offline: during execution the agents do not have to explicitly coordinate their actions, e.g., via negotiation. For instance, we can create a lexicographic ordering scheme in which we first order the agents and then the actions in our previous example. Assuming the ordering ‘1 ¬ 2’ (meaning that agent 1 has priority over agent 2) and ‘thriller ¬ comedy’, the second agent can derive from the social conventions that the first agent will select the thriller and will therefore also choose the thriller.

In the above cases we assume that all equilibria can be found and coordination is the result of each individual agent selecting its individual action based on the same equilibrium. However, the number of joint actions grows exponentially with the number of agents, making it infeasible to determine all equilibria in the case of many agents. This calls for methods that first reduce the size of the joint action space before solving the coordination problem. One such approach, explained next, is based on the use of a coordination graph that captures local coordination requirements between agents.

(^1) In the rest of this article, we denote a Pareto optimal Nash equilibrium simply

by equilibrium, unless otherwise stated.

PSfrag replacements

A 1

A 2 A 3

A 4

f 1 f 2

f 3

Fig. 2. An example coordination graph for a 4-agent problem.

3 Coordination graphs

In systems where multiple agents have to coordinate their actions, it is infea- sible to model all possible joint actions since this number grows exponentially with the number of agents. Fortunately, most problems exhibit the property that each agent only has to coordinate with a small subset of the other agents, e.g., in many robotic applications only robots that are close to each other have to coordinate their actions. A recent approach to exploit such dependencies in- volves the use of a coordination graph (CG), which represents the coordination requirements of a system (3).

The main assumption is that the global payoff function R(a) can be decom- posed into a linear combination of local payoff functions, each involving only a few agents. For example, suppose that there are four agents and the following decomposition of the payoff function:

R(a) = f 1 (a 1 , a 2 ) + f 2 (a 1 , a 3 ) + f 3 (a 3 , a 4 ).

The functions fi specify the local coordination dependencies between the ac- tions of the agents and can be graphically depicted as in Fig. 2. A node in this graph represents an agent, denoted by Ai, while an edge defines a (possible directed) dependency between two agents. Only interconnected agents have to coordinate their actions at any particular instance. In the decomposition of R(a), A 2 has to coordinate with A 1 , A 4 has to coordinate with A 3 , A 3 has to coordinate with both A 4 and A 1 , and A 1 has to coordinate with both A 2 and A 3. The global coordination problem is thus replaced by a number of local coordination problems each involving fewer agents.

In order to solve the coordination problem and find the optimal joint action a∗^ that maximizes R(a), we can apply a variable elimination algorithm, which is almost identical to variable elimination in a Bayesian network (3; 10). The main idea is that the agents are eliminated one by one after performing a local maximization step which takes all possible action combinations of an agent’s neighbors into account.

The algorithm operates as follows. One agent is selected for elimination, and it collects all payoff functions from its neighbors. Next, this agent optimizes its

PSfrag replacements

A 1 A 1

A 2 A 3 A 2 A 3

A 4 A 4

A 1 〈a 1 ∧ a 3 ∧ x : 4〉〈a 1 ∧ a 2 ∧ x : 5〉 A 2 〈a 2 ∧ x : 2〉 A 3 〈a 3 ∧ a 2 ∧ x : 5〉 A 4 〈a 3 ∧ a 4 ∧ x : 10〉

〈a 1 ∧ a 3 : 4〉〈a 1 ∧ a 2 : 5〉〈a 2 : 2〉〈a 3 ∧ a 2 : 5〉

Fig. 3. Initial coordination graph (left) and graph after conditioning on the context x = true (right).

factorized representation of the complete payoff matrices since they are only specified for a context with non-zero payoff.

More formally, let A 1 ,... , An be a group of agents, where each agent Aj has to choose an action aj ∈ Aj resulting in a joint action a ∈ A = A 1 × ... × An and let X be a set of discrete state variables. The context c is then an element from the set of all possible combinations of the state and action variables, c ∈ C ⊆ X ∪ A. A value rule 〈 p ; c : v〉 ∈ P is a function p : C → IR such that p(x, a) = v when c = (x, a) is consistent with the current context and 0 otherwise. For a particular situation only those value rules contribute to the global payoff R(a) that are consistent with the current context: R(a) = ∑m i=1 pi(x, a) where^ m^ is the total number of value rules and^ x^ and^ a^ are respectively a state and joint action.

As an example, consider the case where two persons have to coordinate their actions to pass through a narrow door. We describe this situation using the following value rule:

〈p 1 ; in-front-of-same-door(1, 2) ∧ a 1 = passThroughDoor ∧ a 2 = passThroughDoor : − 50 〉

This rule indicates that when the two agents are located in front of the same door and both select the same action (passing through the door), the global payoff value will be reduced by 50. When the state is not consistent with the above rule (and the agents are not located in front of the same door), the rule does not apply and the agents do not have to coordinate their actions. By conditioning on the current state the agents discard all irrelevant rules, and as a result the CG is dynamically updated and simplified. Note that for the conditioning step each agent only needs to observe that part of the state mentioned in its own value rules.

For a more extensive example, see Fig. 3. Below the left graph all value rules, defined over binary action and context variables 2 , are depicted together with the agent the rule applies to. The coordination dependencies between the agents are represented by directed edges, where each (child) agent has an incoming edge from the (parent) agent that affects its decision. After the agents observe the current state, x = true, they condition on the context. The rule of A 4 does not apply and is removed. As a consequence, the optimal joint action is independent of the action of A 4 and the edge to A 4 is deleted from the graph as shown in the right graph of Fig. 3. In this case, A 4 can thus select either action without affecting the global reward R(a).

After the agents have conditioned on the state variables, the agents are one by one eliminated from the graph. Let us assume that we first eliminate A 3 in the above example. Agent 3 first collects all rules from its children in which it is involved and then maximizes over the rules 〈a 1 ∧ a 3 : 4〉〈a 3 ∧ a 2 : 5〉. For all possible actions of A 1 and A 2 , A 3 determines its best-response and then distributes the corresponding conditional strategy, in this case equal to 〈a 2 : 5〉〈a 1 ∧ a 2 : 4〉, to its parent A 2. Now a new directed edge from A 1 to A 2 is generated, since A 2 receives a rule containing an action of A 1. After this step, A 3 has no children in the coordination graph anymore and is eliminated. The procedure continues and after A 2 has distributed its conditional strategy 〈a 1 : 11〉〈a 1 : 5〉 to A 1 , it is also eliminated. Finally, A 1 is the last agent left and fixes its action to a 1. Now a second pass in the reverse order is performed, in which each agent distributes its strategy to its parents, who then determine their final strategy. This results in the optimal joint action {a 1 , a 2 , a 3 , a 4 } and a global payoff of 11. Note that {a 1 , a 2 , a 3 , a 4 } is also an optimal joint action. It depends on the individual action choice of A 4 which joint action is selected.

The outcome of the variable elimination algorithm is independent of the elim- ination order and the initial distribution of the rules and will always result in an optimal joint action (3). However, the execution time of the algorithm does depend on the elimination order. In the table-based approach the cost of the algorithm is linear in the number of new functions introduced (3). In the rule-based approach the cost is polynomial in the number of new rules generated in the maximization operation (4). This number is never larger and often exponentially smaller than the complexity of the table-based approach 3. Computing the optimal order for minimizing the mentioned runtime costs is known to be NP-complete (11), but good heuristics exists, e.g., minimum de- ficiency search which first eliminates the agents with the minimum difference between incoming and outgoing edges (12; 13).

(^2) The action a 1 corresponds to a 1 = true and the action a 1 to a 1 = false. (^3) Do note that the rule-based approach involves an extra cost regarding the man-

agement of the sets of rules, causing its advantage to manifest primarily in problems with a fair amount of context specific dependencies (4).

current (continuous) state, (ii) it should be sparse in order to keep the depen- dencies and the associated local coordination problems as simple as possible, (iii) it should be applicable in situations where communication is unavailable or very expensive.

In the remainder of this section, we will concentrate on the two main features of our proposed method, designed to fulfill the requirements mentioned above. The first is the assignment of roles to the agents in order to apply coordination graphs to continuous domains and to reduce the action sets of the different agents; the second is to predict the chosen action of the other agents, rendering communication superfluous.

4.1 Context-specificity using roles

Conditioning on a context that is defined over a continuous domain is difficult in the original rule-based CG representation. A way to ‘discretize’ the context is by assigning roles to agents (5; 16; 17; 1). Roles are a natural way of intro- ducing domain prior knowledge to a multiagent problem and provide a flexible solution to the problem of distributing the global task of a team among its members. In the soccer domain for instance one can easily identify several roles ranging from ‘active’ or ‘passive’ depending on whether an agent is in control of the ball or not, to more specialized ones like ‘striker’, ‘defender’, ‘goalkeeper’, etc.

In (18) a role is defined as an abstract specification of the set of activities an individual or subteam undertakes in service of the team’s overall activity. In our framework a role m ∈ M defines this set of activities as a set of value rules Pm. In the original rule-based CG each agent has only one set of value rules, which then would have to include all rules for all roles. Now, based on the given role assignment only a subset of all value rules applies which simplifies the coordination graph. Furthermore, the value rules in Pm pose additional constraints on the role of other agents contained in the value rules, reducing the edges in the coordination graph even further. For instance, agent i in role ‘goalkeeper’ who controls the ball and considers to pass the ball to any of the agents j in role ‘defender’ could result in the following value rule:

〈pgoalkeeper 1 ; has-role-defender(j) ∧ ai = passTo(j) : 10 〉, ∀j 6 = i.

An assignment of roles to agents provides a natural way to parametrize a coordination structure over a continuous domain. The intuition is that, instead of directly coordinating the agents in a particular situation, we assign roles to the agents based on this situation and subsequently try to ‘coordinate’ the set of roles. The other roles {m′} ∈ M \ m mentioned in the set of value rules

Pm for role m define a coordination subgraph structure on M. As such, the assigned roles induce a coordination graph between all agents, each executing a certain role.

A question which remains is how roles are assigned to agents. In this section we describe the communication based case, which we can exploit to use a distributed role assignment algorithm (5; 16; 1). In the next section, we will concentrate on the situation in which communication is unavailable.

The role assignment algorithm, which is common knowledge among the agents, defines a sequence M ′^ of roles where |M ′| ≥ n which represents a preference ordering over the roles: the most ‘important’ role is assigned to an agent first, followed by the second most important role, etc. By construction, the same role can be assigned to more than one agent, but each agent is assigned only a single role. Each role m has an associated potential rim which is a real-valued estimate of how appropriate agent i is for the role m in the current world state. These potentials rim depend on features of the state space relevant for role m as observed by agent i. For example, relevant features for role ‘striker’ could be the time needed to intercept the ball or the global position on the field. Each agent computes its potential for each m ∈ M ′^ and sends these to the other agents. Now the first m ∈ M ′^ is assigned to the agent that has the highest potential for that role. This agent is no longer under consideration, the next m is assigned to another agent, and so on, until all agents have been assigned a role. This algorithm requires sending O(|M |n) messages, as each agent has to send each other agent its potential rim for all m ∈ M ′.

The roles can be regarded as an abstraction of a continuous state to a discrete context, allowing the application of existing techniques for discrete-state CGs. Furthermore, roles can reduce the action space of the agents by ‘locking out’ specific actions. For example, the role of the goalkeeper does not include the action ‘score’, and in a ‘passive’ role the action ‘shoot’ is deactivated. Such a reduction of the action space can offer computational savings, but more impor- tantly it can facilitate the solution of a local coordination game by restricting the joint action space to a subspace that contains less Nash equilibria.

4.2 Non-communicating agents

For the role assignment each agent has to communicate its potential for a certain role to all other agents. Furthermore, the variable elimination requires that each agent receives the payoff functions of its neighboring agents, and after computing its optimal conditional strategy communicates a new payoff function back to its neighbors. Similarly, in the reverse process each agent needs to communicate its decision to its neighbors in order to reach a coordi-

In the noncommunicative case the elimination order neither has to be fixed in advance nor has to be common knowledge among all agents as in (3), but each agent is free to choose any elimination order, for example, one that allows the agent to quickly compute its own optimal action. This is possible because a particular elimination order affects only the speed of the algorithm and not the computed joint action as described earlier.

In terms of complexity, the computational costs for each individual agent are clearly increased to compensate for the unavailable communication. Instead of only optimizing for its own action, in the worst case each agent has to calculate the action of every other agent in the subgraph. The computational cost per agent increases thus linearly with the number of new payoff func- tions generated during the elimination procedure. Communication, however, is not used anymore which allows for a speedup since these extra individual computations may now run in parallel. This is in contrast to the original CG approach where computations need to be performed sequentially.

In summary, we can apply the CG framework without using communication when all agents are able to run the same algorithm in parallel. For this, we have to make the following assumptions:

the payoff functions of an agent i are common knowledge among all agents reachable from i,
each agent i can compute the potential rim for all agents i in its subgraph,
the action ordering is common knowledge among all agents,
for the context-specific CG all agents reachable from i also observe the state variables located in the value rules of agent i.

Finally, we note that the common knowledge assumption is strong and even in cases where communication is available it cannot always be guaranteed (19). In multiagent systems without communication common knowledge can be guar- anteed if all agents consistently observe the same world state, but this is also violated in practice due to partial observability of the environment (a soccer player has a limited field of view). In our case, when the agents have to agree on a particular role distribution given a particular context, the only requirement we impose is that the role assignment in a particular local context is based on those parts of the state that are, to a good approximation, fully observable by all agents involved in the role assignment. For example, in a soccer game the particular role assignment may require that in a group of coordinating agents observe the position of each other in the field, as well as the positions of their nearby opponents, and have a rough estimate of the position of the ball (e.g., knowing that the ball is far away). As long as such a context is encountered, a local graph is formed which is disconnected from the rest of the CG and can be solved separately.

5 Experiments

We have applied the aforementioned framework in our simulation robot soc- cer team UvA Trilearn (20). The main motivation was to improve upon the coordination during ball passes between teammates. First of all, we have con- ducted an experiment in which a complete team strategy was specified using value rules and each player selected its action based on the result of the vari- able elimination algorithm. In this case we made the world fully observable for all agents and used no communication between the agents. Furthermore, we incorporated this framework in our competition team, which participated in the RoboCup-2003 World Championships. For this we made some necessary modifications since during competition the world is only partially observable. Both approaches will be explained next in more detail.

5.1 Fully observable, non-communicating team

In this section we will explain how we have constructed a complete team strategy using the value rules from the CG framework. We assume that the world is fully observable 5 such that each agent can model the complete CG algorithm by itself as explained in Section 4.2. This is necessary since the RoboCup soccer simulation does not allow agents to communicate with more than one agent at the same time, which makes it impossible to apply the original variable elimination algorithm. This has no effect on the outcome of the algorithm. Furthermore, we used the synchronization mode to ensure that the simulator only proceeds to the next cycle when all the actions of the players are received. This to make sure no action opportunities are lost because of the computation time of the algorithm.

In order to accomplish coordination, all agents first perform the role assign- ment which maps each agent to one of the following ordered sequence of roles

M ′^ = {active, receiver, receiver, passive, passive, passive, passive, passive, passive, passive}.

There are four possible roles. The first and most important role is that of the active player which performs an action with the ball. We distinguish between two different types of roles, interceptor and passer, depending whether the ball can be kicked by the active player or not. Next, we assign the two roles

(^5) This is a configuration setting in the soccer server.

Fig. 5. A situation involving one passer and two possible receivers. The other agents are passive.

passT o(i, dir): pass the ball to a position with a fixed distance from agent i in the direction dir ∈ D = {center, n, nw, w, sw, s, se, e, ne}. The direction parameter specifies a direction relative to the receiving agent. ‘North’ is always directed toward the opponent goal and ‘center’ corresponds to a pass directly to the current agent position,
moveT o(dir): move in the direction dir ∈ D,
dribble(dir): move with the ball in direction dir ∈ D,
score: shoot to the best spot in the opponent goal (22),
clearBall: shoot the ball hard between the opponent defenders to the op- ponent side,
moveT oStratP os: move to the agent’s strategic position based on its home position and the position of the ball which serves as an attraction point.

All mentioned actions are available in the released parts of the source code of our UvA Trilearn 2003 team 7. A more detailed description of how these actions are transformed into primary actions is given in (20).

We also defined (boolean) state variables that extract important (high-level) information from the world state:

is-pass-blocked(i, j, dir) indicates whether a pass from agent i to agent j is blocked by an opponent or not. The actual position to which is passed is the position at a small fixed distance from agent j in direction dir. A pass is blocked when there is at least one opponent located within a cone from the passing player to this position.
is-empty-space(i, dir), indicates that there are no opponents within a small circle in the specified direction dir of agent i.
is-in-front-of-goal(i) indicates whether agent i is located in front of the op- ponent goal.

(^7) This fully documented source code release is freely available from our website:

http://www.science.uva.nl/~jellekok/robocup/.

Using these action and state variables, we can define the complete strategy of our team by means of value rules which specify the contribution to the global payoff in a specific context. The value rules are specified for each player i and make use of the above defined actions and state variables 8.

〈pinterc. 1 ; intercept : 10〉〈ppasser 2 ; has-role-receiver(j) ∧ ¬isPassBlocked(i, j, dir) ∧ ai = passTo(j, dir) ∧ aj = moveTo(dir) : u(j, dir) ∈ [5, 7]〉 ∀j 6 = i 〈ppasser 3 ; is-empty-space(i, n) ∧ ai = dribble(n) : 2〉〈ppasser 4 ; ai = clearBall : 0. 1 〉〈ppasser 5 ; is-in-front-of-goal(i) ∧ is-ball-kickable(i) ∧ ai = score : 10〉〈preceiver 6 ; has-role-interceptor(j) ∧ ¬isPassBlocked(j, i, dir) ∧ aj = intercept ∧ ai = moveTo(dir) : u(i, dir) ∈ [5, 7]〉 ∀j 6 = i 〈preceiver 7 ; has-role-receiver(k) ∧ ¬isPassBlocked(k, i, dir) ∧ aj = passTo(k, dir2) ∧ ak = moveTo(dir2) ∧ ai = moveTo(dir) : u(i, dir) ∈ [5, 7]〉 ∀j, k 6 = i 〈preceiver 8 ; moveToStratPos : 1〉〈ppassive 9 ; moveToStratPos : 1〉

The first five rules are related to the action options for the active player. The first rule, p 1 , indicates that intercepting the ball is the only option when per- forming the interceptor role. As a passer, there are several alternatives. Value rule p 2 represents an active pass to the relative direction dir of player j which can be performed when there are no opponents along that trajectory and the receiving agent will move in that direction to intercept the coming pass. The value that is contributed to the global payoff is returned by u(j, dir) and de- pends on the position where the receiving agent j will receive the pass (the

(^8) Note that we enumerate all rules using variables. The complete list of value rules is

the combination of all possible instantiations of these variables. In all rules, dir ∈ D.

1 ^ ^2

Fig. 6. The coordination graph at Fig. 3 after conditioning on the state variables. The passer (agent 1) decides to pass the ball to the first receiver (agent 2), while the second receiver (agent 3) moves to a good position for the first receiver to pass the ball to.

inated from the graph by maximizing its local payoff. In the case that agent 1 is eliminated first, it gathers all value rules that contain a 1 , maximizes its local payoff and distributes its conditional strategy consisting of the generated value rules

〈ppasser 10 ; a 2 = moveTo(s) ∧ a 3 = moveTo(nw) : 11〉〈ppasser 11 ; a 2 = moveTo(s) ∧ a 3 = ¬moveTo(nw) : 6〉〈ppasser 12 ; a 2 = ¬moveTo(s) : 2〉

to its parents. Note that ppasser 10 is formed by combining ppasser 2 and preceiver 7 when both agent 2 and 3 fulfill the listed actions. When agent 3 performs a different action, the payoff is still 6 when agent 2 moves south as is stated in ppasser 11. When agent 2 also performs a different action, the only remaining action is the dribble with a payoff of 2. After agent 2 and 3 have also fixed their strategy, agent 1 will perform passTo(2, s), agent 2 will execute moveTo(s) to intercept the pass and agent 3 will perform moveTo(nw) to intercept a possible future pass of agent 2. During a match, this procedure is executed after each update of the state and the agents will change their action based on the new information. If in this example for some unpredicted reason the first pass fails, the graph will automatically be updated and correspond to the new situation.

To test this approach, we played games against ourselves, with one team using explicit coordination and the other team without using any coordination dur- ing passing. The latter case was modeled by deleting the rules p 6 , p 7 from the list of value rules and removing the condition aj = moveTo(dir) from rule p 2 to indicate that it is not necessary for the receiver to anticipate the pass. Now, in the non-coordinating case a teammate moves to the interception point only after he has observed a change in the ball velocity (after someone has passed the ball) and concludes that it is the fastest teammate to the ball. Before the

100

of goalsball possession %

successful passing %# of shots on goalball each side %

(a) Coordinating team against bench- mark team.

100

of goalsball possession %

successful passing %# of shots on goalball each side %

(b) Non-coordinating team against benchmark team.

100

of goalsball possession %

successful passing %# of shots on goalball each side %

100

of goalsball possession %

successful passing %# of shots on goalball each side %

(d) Coordinating team against non- coordinating team.

Fig. 7. Mean and standard deviation of several statistics for the three tested teams. All results are averaged over 10 matches.

ball changes velocity, it has no notion of the fact that it will receive the ball and does not coordinate with the passing player. Furthermore, we also played matches against a benchmark version of our team in which the strategy was identical to our basic client implementation 9. In this team the active player would intercept the ball and immediately kick it with maximal velocity to a random corner of the opponent goal. The implementation of the kick and the intercept are identical to the two other teams, with the result that the three teams only differ with respect to their (high-level) strategy.

Since many different factors contribute to the overall performance of the team, it is difficult to measure the actual effect of the coordination with one single value. Therefore, we have concentrated on multiple statistics generated by the

(^9) This is the same behavior as the released UvA Trilearn 2003 source code. Quali-

fication for RoboCup-2003 was based on the performance of the teams against the 2002 version of this release.

Non-communicative Multi-Robot Coordination in Dynamic Environments | INFSCI 3350, Papers of Information Technology

Related documents

Partial preview of the text

Download Non-communicative Multi-Robot Coordination in Dynamic Environments | INFSCI 3350 and more Papers Information Technology in PDF only on Docsity!

Non-communicative multi-robot coordination

in dynamic environments

Jelle R. Kok, Matthijs T. J. Spaan, Nikos Vlassis

A 1

A 2 A 3

A 4

A 1 A 1

A 2 A 3 A 2 A 3

A 4 A 4

of goalsball possession %

of goalsball possession %

of goalsball possession %

of goalsball possession %