of 8
Current View
Efficient Budget Allocation with Accuracy Guarantees for
Crowdsourcing Classification Tasks
Long Tran-Thanh, Matteo Venanzi, Alex Rogers & Nicholas R. J
ennings
University of Southampton
{ltt08r,mv1g10,acr,nrj}@ecs.soton.ac.uk
ABSTRACT
In this paper we address the problem of budget allocation for
redundantly crowdsourcing a set of classification tasks whe
re
a key challenge is to find a trade–off between the total cost
and the accuracy of estimation. We propose CrowdBudget,
an agent–based budget allocation algorithm, that efficientl
y
divides a given budget among different tasks in order to
achieve low estimation error. In particular, we prove that
CrowdBudget can achieve at most max
n
0
,
K
2
O
B
o
estimation error with high probability, where
K
is the num-
ber of tasks and
B
is the budget size. This result signif-
icantly outperforms the current best theoretical guarante
e
from Karger
et al
. In addition, we demonstrate that our
algorithm outperforms existing methods by up to 40% in
experiments based on real–world data from a prominent
database of crowdsourced classification responses.
Categories and Subject Descriptors
I.2.11 [
Distributed Artificial Intelligence
]: Intelligent
Agents
General Terms
Algorithms, Theory, Experimentation
Keywords
Crowdsourcing, Task Allocation, Budget Limit, Regret Boun
ds
1. INTRODUCTION
Crowdsourcing classification tasks, such as classification
of
complex images [5] or identification of buildings on maps
[3], has recently become widely used as it presents a low-cos
t
and flexible approach to solve complex classification tasks b
y
combining human computation with agent intelligence [2].
In particular, by dedicating tasks to a population of hired
users (i.e. workers) for a small fee, the taskmaster (i.e. ta
sk
requester) can collect a large set of class labels from the
workers and ultimely estimate classification answers from
the multiple reports. To achieve this, the taskmaster
re-
dundantly
allocates tasks to users (in order to reduce uncer-
tainty, as a single response might be unreliable), and then
Appears in:
Proceedings of the 12th International Confer-
ence on Autonomous Agents and Multiagent Systems (AA-
MAS 2013), Ito, Jonker, Gini, and Shehory (eds.), May,
6–10, 2013, Saint Paul, Minnesota, USA.
Copyright
c
2013, International Foundation for Autonomous Agents and
Multiagent Systems (www.ifaamas.org). All rights reserve
d.
aggregates the responses into a final estimate using a fusion
method (e.g. majority voting [1] or IBCC [10]). Within
many systems, such a process of task allocation and reports
fusion is not trivial and is typically done by a computer
agent as it might need complex computation that humans
cannot provide [5, 6].
1
Now, a key challenge within these crowdsourcing systems
is to find an efficient trade–off between the cost of redun-
dant task allocation and the accuracy of the result. In more
detail, by assigning multiple users to a single task, we can
achieve higher accuracy of answer estimation, but we might
also suffer a higher cost due to hiring more users. To date,
research work has typically focused on applications where
the cost of task allocation is uniform for each task [2, 12,
5]. In this case, the search of the aforementioned trade–off
is reduced to the problem of minimising the estimation er-
ror per task given the number of assigned users to a single
task. However, in many real–world scenarios, completing
different tasks might require
different costs
that depend on
the difficulty and the time required for the user to complete
such a task. For example, consider a habitat monitoring
project where the goal is to accurately identify the living
area of some rare species (e.g. the desert tortoise from the
Mojave desert, US, or the New Forest cicada in the UK). To
avoid sending expeditions of professionals with high cost,
one low–budget, but efficient, way to tackle this problem
is to ask people from neighbouring areas to make observa-
tions and then send reports in return for a certain amount of
payment. However, as the approachability of geographical
areas might vary (e.g. due to possible dangers or landscape)
,
we might need to pay different costs to motivate people to
approach different areas in order to provide reports. Fur-
thermore, this allocation scheme has to cope with the fact
that the project is limited by its funds.
Another example comes from the human computation do-
main. Suppose a system designer aims to use a crowdsourc-
ing platform to execute complex workflows of jobs [13]. In so
doing, he/she decomposes the workflows into sets of micro–
tasks, and assigns these tasks to a population of users. Now,
a typical classification task within this domain is to deter-
mine whether a micro–task is completed, and the goal is to
maximise the total accuracy of such classifications. To do so
,
he/she incentivises users to participate in this classifica
tion
phase and compensate their effort with a certain payment.
Note that the classification tasks may vary in difficulty, as
some are more time consuming, while many others are triv-
ial. Thus, users might need different sizes of payment when
1
Hence the combination between human and agent (i.e.
computer) intelligence.

they face tasks with different difficulty levels. In addition,
the total payment is limited, as the system designer wants
to keep the cost within a finite budget.
Within the aforementioned scenarios, and many others
besides, the main challenge is to redundantly allocate a set
of tasks with different costs to an open population of users,
with respect to a
budget limit
, such that the estimation error
is minimised. In particular, if we allocate large budgets to
cheaper tasks, whose answers can typically be accurately es
-
timated with only a few responses, we might fail to achieve
high total accuracy of estimation, as a large portion of thes
e
budgets could be used for other tasks where the allocated
budget is insufficiently low. In contrast, it might not be ef-
ficient either to allocate much of the budget to expensive
tasks, as this will have less available for the others. Since
existing methods are typically not designed to make such
trade–offs, they are unlikely to be efficient in our settings.
Moreover, these methods do not provide theoretical guaran-
tees on the estimation error, which implies that they might
perform well on average, but they might arbitrarily poorly
in many specific cases. A notable exception is the work of
Karger
et al.
[6] which provides a
e
O
(
n
)
bound for the aver-
age estimation error, where
n
is the number of users assigned
to a single task. However, this bound is only asymptotic (i.e
.
it holds when the number of tasks tends to infinity), so is not
practical for real–world systems, where the number of tasks
is typically finite. Against this background, we focus on the
budget allocation problem
in which an agent (on the behalf of
the taskmaster) has to allocate budgets to a set of tasks and
the total allocated budget cannot exceed a certain limit. Th
e
agent’s goal is then to find an optimal budget allocation that
minimises the total estimation error
of the answers. Within
this paper, we propose CrowdBudget as a budget allocation
strategy that efficiently tackles this problem. In particula
r,
by combining CrowdBudget with a majority voting-efficient
fusion method (i.e. methods that outperform the majority
voting rule, see Section 3), our agent can proveably achieve
a max
n
0
,
K
2
O
B
o
upper bound on its total estima-
tion error with high probability, where
B
is the budget size
and
K
is the number of tasks. This bound is lower than
the one provided by Karger
et al.
. Moreover, in contrast to
most current approaches, our algorithm allocates the bud-
gets in advance through the analysis of costs and expected
accuracy. This type of functioning is motivated by many
crowdsourcing platforms in which the task requester needs
to pre-set the number of assignments per task (e.g. Amazon
Mechanical Turk or CrowdFlower). Given this, we advance
the state–of–the–art as follows:
We introduce the problem of budget allocation for crowd-
sourcing classification tasks, in which the goal is to
minimise the error of the estimated answers for a fi-
nite number of tasks, with respect to a budget limit.
We develop CrowdBudget, an algorithm that, com-
bining with a fusion method, proveably achieves an
efficient bound on the estimation error, which signifi-
cantly advances the best known results.
By comparing the performance of CrowdBudget with
existing algorithms through extensive numerical eval-
uations on real–world data taken from a prominent
crowdsourcing system, we demonstrate that our al-
gorithm typically outperforms the state–of–the–art by
achieving up to 40% lower estimation error.
The remainder of the paper is structured as follows. In Sec-
tion 2 we discuss the related work in the domain of redun-
dant task assignment in crowdsourcing applications. Then
we formalise the model of budget allocation for task crowd-
sourcing in Section 3. Following this, we detail our propose
d
algorithm in Section 4. We also provide a theoretical per-
formance analysis within this section. The results of the
experimental evaluation is then discussed in Section 5. Sec
-
tion 6 concludes, and the appendix briefs the proofs.
2. RELATED WORK
Work on redundant allocation of classification tasks in crow
d-
sourcing applications has typically focused on minimising
the estimation error given the number of users assigned to
a single task. In particular, Wellinder
et al.
proposed a
multidimensional model of users in order to estimate the ac-
curacy of a particular user’s answer, and thus, to improve
the estimation of the ground truth [12]. In a similar vein, a
number of works used Bayesian learning techniques to pre-
dict the users’ responses, such as the work of Kamar
et al.
[5] and the IBCC algorithm (for independent Bayesian clas-
sifiers combination) [10]. Apart from these, Dai
et al.
used
a PoMDP–based (for partially observable Markov decision
process) approach to model the estimation’s quality [2]. In
addition, Bachrach
et al.
relied on a machine learning based
aggregator to derive an efficient estimation of the correct
answer [1]. However, none of these methods will work in our
setting, as they do not address the challenge of having differ
-
ent costs for different classification tasks. Nevertheless,
they
can be used as an underlying fusion method for CrowdBud-
get, as they provide majority voting efficient response fusio
n
approaches (for more details, see Section 3).
More related to our work is CrowdScreen, an algorithm
proposed by Parameswaran
et al.
[9], that aims to find an
optimal dynamic control policy with respect to both total
cost and total estimation error over a finite set of tasks.
However, the cost of task allocation is considered to be uni-
form among different tasks in the system. In addition, as per
the other aforementioned approaches, there are no guaran-
tee on performance. One notable exception that does pro-
vide theoretical guarantees is the work of Karger
et al.
[6].
Within this work, the authors developed an algorithm based
on low–rank matrix approximation to assign tasks to users
and estimate correct answers. In addition, they devised an
e
O
(
n
)
upper bound on the estimation error. However, this
bound only holds when the number of tasks tends to infin-
ity, which implies that their bound is not useful for most
practical contexts, as opposed to our results (see Section 4
for more details).
3. MODEL DESCRIPTION
Let 1
, ...,K
denote the classification tasks whose outcomes
have to be estimated. Within our model, we assume that
the classifications are binary. Note that this assumption is
reasonable, as it is true in many real–world systems. Let
t
k
∈ {
0
,
1
}
be the
unknown
ground truth (i.e. the correct
answer) of each task
k
. To estimate
t
k
, we can request re-
sponses from a (large) set of users (i.e. population of the
crowd) as follows. In our model, the taskmaster does not
deterministically choose specific users to perform a partic
u-
lar task
k
, but only submits a set of tasks to a crowdsourcing
system, as is the case in many open crowdsourcing systems
with a large population of users (e.g. MechanicalTurk or