Towards Optimization-Based Multi-Agent

Collision-Avoidance under Continuous Stochastic Dynamics

Jan-Peter Calliess, Michael A. Osborne, Stephen Roberts

Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK

f

jan,mosb,sjrob

g

@robots.ox.ac.uk

1 Introduction

In our ongoing work, we aim to control a team of agents so

as to achieve a prescribed goal state while being confident

that collisions with other agents are avoided. Each agent is

associated with a feedback controlled plant, whose continu-

ous state trajectories follow some stochastic differential dy-

namics. To this end we describe a collision-detection module

based on a distribution-independent probabilistic bound and

employ a fixed priority method to resolve collisions. Due

to their practical importance, multi-agent collision avoid-

ance and control have been extensively studied across differ-

ent communities including AI, robotics and control. How-

ever, these works typically assume linear and discrete dy-

namic models; by contrast, our work intends to overcome

these limitations and to present solutions for continuous

state space. While our current experiments were conducted

with linear stochastic differential equation (SDE) models

with state-independent noise (yielding Gaussian processes)

we believe that our approach could also be applicable to non-

Gaussian cases with state-dependent uncertainties.

2 Method

Model.

We assume the system contains a set

A

of agents

indexed by

a

2 f

1

;:::;

j

A

jg

. Each agent

a

’s associated

plant has a probabilistic state trajectory following a stochas-

tic controlled

D

-dimensional state dynamics (we consider

the case

D

= 2

) in the continuous interval of (future) time

I

= (

t

0

;t

f

]

. For our method to work all we need to require

is that the trajectory’s mean function

m

:

I

!

R

D

and co-

variance matrix function

:

I

!

R

D

D

are evaluable for

all times

t

2

I

. A prominent class for which closed-form

moments can be easily derived are linear Ito-SDEs. For in-

stance, we consider the SDE

dx

a

(

t

) =

K

x

a

(

t

)

a

(

t

)

dt

+

BdW

(1)

where

K;B

2

R

D

D

are matrices

x

a

:

I

!

R

D

is the state

trajectory and

W

is a vector-valued Wiener process. Here,

K

(

x

a

a

)

will be interpreted as the control output of a

linear feedback-controller regulating the state to converge to

a desired signal

a

(

t

) =

a

0

f

0

g

(

t

) +

P

H

a

i

=1

a

i

a

i

(

t

)

where

Copyright

c

2012, Association for the Advancement of Artificial

Intelligence (www.aaai.org). All rights reserved.

i

:

R

!f

0

;

1

g

denotes the indicator function of the half-

open interval

a

i

= (

t

a

i

1

;t

a

i

]

[0

;T

a

]

and each

a

i

2

R

D

is a

setpoint

. If

K >

0

the agent’s state trajectory is deter-

mined by setpoint sequence

p

a

= (

t

a

i

;

a

i

)

H

a

i

=0

(aside from the

random disturbances) which we will refer to as the agent’s

plan

. For example, plan

p

a

:=

(

t

0

;x

a

0

)

;

(

t

f

;x

a

f

)

could be

used to regulate agent

a

’s

start state

x

a

0

to a given

goal state

x

a

f

between times

t

0

and

t

f

. For simplicity, we assume the

agents are always initialized with plans of this form before

coordination commences.

Task.

Each agent

a

desires to find a sequence of setpoints

(

p

a

)

such that (i) it moves from its start state

x

a

0

to its goal

state

x

a

f

along a low-cost trajectory and (ii) such that along

the trajectory its plant (with diameter

) does not collide

with any other agents’ plant in state space with at least a

given probability

1

2

(0

;

1)

.

Approach.

As the focus of our work is on collision de-

tection and avoidance in continuous domains, we opt for a

simple

fixed-priority

(FP) coordination scheme. While the

rigidity of the approach typically leads to sub-optimality in

terms of a social cost function, the simplicity and computa-

tional tractability of the method may be the reason why it

has been deployed most widely in robotics (e.g. (Bennewitz,

Burgard, and Thrun 2001)).

Specifically, each agent has a unique ranking (or prior-

ity) according to its index

a

(i.e. agent 1 has highest priority,

agent

A

lowest). When all higher-priority agents are done

planning, agent

a

is informed of their planned trajectories

which it has to avoid with a probability greater than

1

.

To this end, the two main mechanisms to develop are (I)

an algorithm capable of detecting collisions on a

continuous

time interval

I

with a sufficiently high probability and (II)

to use this information to augment the existing plan by new

setpoints in order to avoid the collision with the given con-

fidence bound

1

. Next, we briefly sketch the underlying

ideas of our approach to address these two problems.

(I) Based on a generalized Chebychev-bound (Whittle

1958), (Lyons, Calliess, and Hanebeck 2012) derived

D

distribution-independent constraints of the form

C

j

(

t

)

0

(

j

= 1

;:::;D

) depending on the components of the means

and the covariance matrix functions of two stochastic trajec-

tories, as well as on

. For any given time

t

, it was shown

that satisfaction of at least one of the constraints is a suffi-

cient condition for collision avoidance (with prob.

>

1

).

Table 1: Left to right: Uncoordinated Agents’ plans (l), after conflict resolution with methods

WAIT

(c) and

FREE

(r).

For conservative collision detection between two agents’ tra-

jectories

x

a

;x

r

, we propose to construct a (continuous)

cri-

terion function

a

;

r

(

t

) := max

j

f

C

j

(

t

)

g

. By construction,

a collision between the trajectories with probability above

can be ruled-out if

a

;

r

attains only positive values. For col-

lision detection at time

t

all an agent needs to do is to min-

imize the continuous function

a

(

t

) := min

r

6

=

a

f

a

;

r

(

t

)

g

and (conservatively) assume a collision unless

a

(

t

)

>

0

.

(II) As long as agent

a

detects a collision at some time

t

2

I

we will invoke a conflict resolution method. As a

first method we could insert a time-setpoint pair

(

t;x

a

0

)

into

the previous plan

p

a

. Since this aims to cause the agent to

wait at its start location

x

a

0

we will call the method

WAIT

.

It is possible that multiple such insertions are necessary

until collisions are avoided. Of course, if a higher-priority

agent decides to traverse through

x

a

0

, this method is too

rigid to resolve a conflict. Alternatively, we could optimize

for the time and location of the new setpoint. Let

p

a

"

(

t;s

)

denote the plan updated by insertion of time-setpoint pair

(

t;s

)

2

I

R

D

. We propose to choose the candidate setpoint

(

t;s

)

that minimizes a function being a weighted sum of the

expected cost entailed by executing updated plan

p

a

"

(

t;s

)

and

a collision penalty

min

t

a

(

t

)

H

(

min

t

a

(

t

))

. Here,

is a large number,

H

is the Heaviside function and

a

is computed based on the assumption we were to execute

p

a

"

(

t;s

)

. Since the new setpoint can be chosen freely in time

and state-space we refer to the method as

FREE

.

3 Simulations

As a first test, we simulated a simple three-agent scenario.

Each agent’s dynamics were instantiations of an SDE of the

form of Eq. 1. The initial plans of the agents are depicted in

Tab. 1 (left). The curves represent the predicted trajectories

of Agents 1 (red), 2 (blue) and 3 (green) for the given initial

plans and are depicted for a time interval of

I

= [0

s;

2

s

]

.

The uncertainties are indicated by covariance ellipses plot-

ted at discrete points in time every .13 s. Drawing 1000 sam-

ple paths from each of the SDEs with the Euler-Maruyama

method we recorded a collision in 56% of the draws from the

SDEs under the initial plan. The collision source is mainly

the predicted collision found at the intersection between the

first two trajectories around time

:

13

s

. The altered plans due

to invoking collision avoidance method

WAIT

are shown in

the centre plot. Here, Agent 2 waited at its start location (5,0)

for about .26s to let the first agent pass by until resuming to-

wards its goal thereby avoiding the collision with the higher

priority agent 1. Simulation of the resulting plans yielded

collisions in 5% of the draws – mainly due to conflicts be-

tween 2 and 3 which could not be resolved by 3 waiting at

its start location. Finally, the

FREE

method generated plans

that caused agent 2 to circumvent 1 by arcing to the left. This

brought it closer to the 3 which in turn, decided to leave for

a while (until 2 had passed by) before returning to its goal.

None of the draws from the simulated trajectories collided.

In contrast, the

WAIT

method was computed faster (.41s vs

20s, on a standard Laptop running MATLAB) but could not

prevent all collisions.

4 Ongoing Work

Our current work focuses on the following questions: How

can we quantify the probabilities that all required minimiza-

tions succeed given a fixed computational budget? How can

we bring to bear more advanced coordination mechanisms,

such as auctions, in order to improve the social cost of the

solutions? In addition we intend to assess our method in

systems with non-linear dynamics and multiplicative noise.

We anticipate being able to speed up our method signifi-

cantly by means of improved implementation, hardware and

parallelization. Finally, we are investigating the impact of

tighter bounds for collision detection on social cost in set-

tings where such bounds are available.

Acknowledgements.

The authors are grateful for funds via the UK EPSRC “OR-

CHID” project EP/I011587/1.

References

Bennewitz, M.; Burgard, W.; and Thrun, S. 2001. Exploit-

ing constraints during prioritized path planning for teams of

mobile robots. In

IROS

.

Lyons, D.; Calliess, J.; and Hanebeck, U. 2012. Chance con-

strained model predictive control for multi-agent systems

with coupling constraints. In

American Control Conference

(ACC)

.

Whittle, P. 1958. A multivariate generalization of

Tchebichev’s inequality.

Quarterly Journal of Mathem.