of 2
Current View
Towards Optimization-Based Multi-Agent
Collision-Avoidance under Continuous Stochastic Dynamics
Jan-Peter Calliess, Michael A. Osborne, Stephen Roberts
Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK
f
jan,mosb,sjrob
g
@robots.ox.ac.uk
1 Introduction
In our ongoing work, we aim to control a team of agents so
as to achieve a prescribed goal state while being confident
that collisions with other agents are avoided. Each agent is
associated with a feedback controlled plant, whose continu-
ous state trajectories follow some stochastic differential dy-
namics. To this end we describe a collision-detection module
based on a distribution-independent probabilistic bound and
employ a fixed priority method to resolve collisions. Due
to their practical importance, multi-agent collision avoid-
ance and control have been extensively studied across differ-
ent communities including AI, robotics and control. How-
ever, these works typically assume linear and discrete dy-
namic models; by contrast, our work intends to overcome
these limitations and to present solutions for continuous
state space. While our current experiments were conducted
with linear stochastic differential equation (SDE) models
with state-independent noise (yielding Gaussian processes)
we believe that our approach could also be applicable to non-
Gaussian cases with state-dependent uncertainties.
2 Method
Model.
We assume the system contains a set
A
of agents
indexed by
a
2 f
1
;:::;
j
A
jg
. Each agent
a
’s associated
plant has a probabilistic state trajectory following a stochas-
tic controlled
D
-dimensional state dynamics (we consider
the case
D
= 2
) in the continuous interval of (future) time
I
= (
t
0
;t
f
]
. For our method to work all we need to require
is that the trajectory’s mean function
m
:
I
!
R
D
and co-
variance matrix function
 :
I
!
R
D
D
are evaluable for
all times
t
2
I
. A prominent class for which closed-form
moments can be easily derived are linear Ito-SDEs. For in-
stance, we consider the SDE
dx
a
(
t
) =
K
x
a
(
t
)
a
(
t
)
dt
+
BdW
(1)
where
K;B
2
R
D
D
are matrices
x
a
:
I
!
R
D
is the state
trajectory and
W
is a vector-valued Wiener process. Here,
K
(
x
a
a
)
will be interpreted as the control output of a
linear feedback-controller regulating the state to converge to
a desired signal
a
(
t
) =
a
0
f
0
g
(
t
) +
P
H
a
i
=1
a
i
a
i
(
t
)
where
Copyright
c
2012, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
i
:
R
!f
0
;
1
g
denotes the indicator function of the half-
open interval
a
i
= (
t
a
i
1
;t
a
i
]
[0
;T
a
]
and each
a
i
2
R
D
is a
setpoint
. If
K >
0
the agent’s state trajectory is deter-
mined by setpoint sequence
p
a
= (
t
a
i
;
a
i
)
H
a
i
=0
(aside from the
random disturbances) which we will refer to as the agent’s
plan
. For example, plan
p
a
:=
(
t
0
;x
a
0
)
;
(
t
f
;x
a
f
)
could be
used to regulate agent
a
’s
start state
x
a
0
to a given
goal state
x
a
f
between times
t
0
and
t
f
. For simplicity, we assume the
agents are always initialized with plans of this form before
coordination commences.
Task.
Each agent
a
desires to find a sequence of setpoints
(
p
a
)
such that (i) it moves from its start state
x
a
0
to its goal
state
x
a
f
along a low-cost trajectory and (ii) such that along
the trajectory its plant (with diameter
) does not collide
with any other agents’ plant in state space with at least a
given probability
1
2
(0
;
1)
.
Approach.
As the focus of our work is on collision de-
tection and avoidance in continuous domains, we opt for a
simple
fixed-priority
(FP) coordination scheme. While the
rigidity of the approach typically leads to sub-optimality in
terms of a social cost function, the simplicity and computa-
tional tractability of the method may be the reason why it
has been deployed most widely in robotics (e.g. (Bennewitz,
Burgard, and Thrun 2001)).
Specifically, each agent has a unique ranking (or prior-
ity) according to its index
a
(i.e. agent 1 has highest priority,
agent
A
lowest). When all higher-priority agents are done
planning, agent
a
is informed of their planned trajectories
which it has to avoid with a probability greater than
1
.
To this end, the two main mechanisms to develop are (I)
an algorithm capable of detecting collisions on a
continuous
time interval
I
with a sufficiently high probability and (II)
to use this information to augment the existing plan by new
setpoints in order to avoid the collision with the given con-
fidence bound
1
. Next, we briefly sketch the underlying
ideas of our approach to address these two problems.
(I) Based on a generalized Chebychev-bound (Whittle
1958), (Lyons, Calliess, and Hanebeck 2012) derived
D
distribution-independent constraints of the form
C
j
(
t
)
0
(
j
= 1
;:::;D
) depending on the components of the means
and the covariance matrix functions of two stochastic trajec-
tories, as well as on
. For any given time
t
, it was shown
that satisfaction of at least one of the constraints is a suffi-
cient condition for collision avoidance (with prob.
>
1
).
























Table 1: Left to right: Uncoordinated Agents’ plans (l), after conflict resolution with methods
WAIT
(c) and
FREE
(r).
For conservative collision detection between two agents’ tra-
jectories
x
a
;x
r
, we propose to construct a (continuous)
cri-
terion function
a
;
r
(
t
) := max
j
f
C
j
(
t
)
g
. By construction,
a collision between the trajectories with probability above
can be ruled-out if
a
;
r
attains only positive values. For col-
lision detection at time
t
all an agent needs to do is to min-
imize the continuous function
a
(
t
) := min
r
6
=
a
f
a
;
r
(
t
)
g
and (conservatively) assume a collision unless
a
(
t
)
>
0
.
(II) As long as agent
a
detects a collision at some time
t
2
I
we will invoke a conflict resolution method. As a
first method we could insert a time-setpoint pair
(
t;x
a
0
)
into
the previous plan
p
a
. Since this aims to cause the agent to
wait at its start location
x
a
0
we will call the method
WAIT
.
It is possible that multiple such insertions are necessary
until collisions are avoided. Of course, if a higher-priority
agent decides to traverse through
x
a
0
, this method is too
rigid to resolve a conflict. Alternatively, we could optimize
for the time and location of the new setpoint. Let
p
a
"
(
t;s
)
denote the plan updated by insertion of time-setpoint pair
(
t;s
)
2
I
R
D
. We propose to choose the candidate setpoint
(
t;s
)
that minimizes a function being a weighted sum of the
expected cost entailed by executing updated plan
p
a
"
(
t;s
)
and
a collision penalty
min
t
a
(
t
)
H
(
min
t
a
(
t
))
. Here,
is a large number,
H
is the Heaviside function and
a
is computed based on the assumption we were to execute
p
a
"
(
t;s
)
. Since the new setpoint can be chosen freely in time
and state-space we refer to the method as
FREE
.
3 Simulations
As a first test, we simulated a simple three-agent scenario.
Each agent’s dynamics were instantiations of an SDE of the
form of Eq. 1. The initial plans of the agents are depicted in
Tab. 1 (left). The curves represent the predicted trajectories
of Agents 1 (red), 2 (blue) and 3 (green) for the given initial
plans and are depicted for a time interval of
I
= [0
s;
2
s
]
.
The uncertainties are indicated by covariance ellipses plot-
ted at discrete points in time every .13 s. Drawing 1000 sam-
ple paths from each of the SDEs with the Euler-Maruyama
method we recorded a collision in 56% of the draws from the
SDEs under the initial plan. The collision source is mainly
the predicted collision found at the intersection between the
first two trajectories around time
:
13
s
. The altered plans due
to invoking collision avoidance method
WAIT
are shown in
the centre plot. Here, Agent 2 waited at its start location (5,0)
for about .26s to let the first agent pass by until resuming to-
wards its goal thereby avoiding the collision with the higher
priority agent 1. Simulation of the resulting plans yielded
collisions in 5% of the draws – mainly due to conflicts be-
tween 2 and 3 which could not be resolved by 3 waiting at
its start location. Finally, the
FREE
method generated plans
that caused agent 2 to circumvent 1 by arcing to the left. This
brought it closer to the 3 which in turn, decided to leave for
a while (until 2 had passed by) before returning to its goal.
None of the draws from the simulated trajectories collided.
In contrast, the
WAIT
method was computed faster (.41s vs
20s, on a standard Laptop running MATLAB) but could not
prevent all collisions.
4 Ongoing Work
Our current work focuses on the following questions: How
can we quantify the probabilities that all required minimiza-
tions succeed given a fixed computational budget? How can
we bring to bear more advanced coordination mechanisms,
such as auctions, in order to improve the social cost of the
solutions? In addition we intend to assess our method in
systems with non-linear dynamics and multiplicative noise.
We anticipate being able to speed up our method signifi-
cantly by means of improved implementation, hardware and
parallelization. Finally, we are investigating the impact of
tighter bounds for collision detection on social cost in set-
tings where such bounds are available.
Acknowledgements.
The authors are grateful for funds via the UK EPSRC “OR-
CHID” project EP/I011587/1.
References
Bennewitz, M.; Burgard, W.; and Thrun, S. 2001. Exploit-
ing constraints during prioritized path planning for teams of
mobile robots. In
IROS
.
Lyons, D.; Calliess, J.; and Hanebeck, U. 2012. Chance con-
strained model predictive control for multi-agent systems
with coupling constraints. In
American Control Conference
(ACC)
.
Whittle, P. 1958. A multivariate generalization of
Tchebichev’s inequality.
Quarterly Journal of Mathem.