crowdverify_final.dvi - crowdverify

Efficient Budget Allocation with Accuracy Guarantees for

Crowdsourcing Classification Tasks

Long Tran-Thanh, Matteo Venanzi, Alex Rogers & Nicholas R. J

ennings

University of Southampton

{ltt08r,mv1g10,acr,nrj}@ecs.soton.ac.uk

ABSTRACT

In this paper we address the problem of budget allocation for

redundantly crowdsourcing a set of classification tasks whe

a key challenge is to find a trade–off between the total cost

and the accuracy of estimation. We propose CrowdBudget,

an agent–based budget allocation algorithm, that efficientl

divides a given budget among different tasks in order to

achieve low estimation error. In particular, we prove that

CrowdBudget can achieve at most max

−

√

estimation error with high probability, where

is the num-

ber of tasks and

is the budget size. This result signif-

icantly outperforms the current best theoretical guarante

from Karger

et al

. In addition, we demonstrate that our

algorithm outperforms existing methods by up to 40% in

experiments based on real–world data from a prominent

database of crowdsourced classification responses.

Categories and Subject Descriptors

I.2.11 [

Distributed Artificial Intelligence

]: Intelligent

Agents

General Terms

Algorithms, Theory, Experimentation

Keywords

Crowdsourcing, Task Allocation, Budget Limit, Regret Boun

1. INTRODUCTION

Crowdsourcing classification tasks, such as classification

complex images [5] or identification of buildings on maps

[3], has recently become widely used as it presents a low-cos

and flexible approach to solve complex classification tasks b

combining human computation with agent intelligence [2].

In particular, by dedicating tasks to a population of hired

users (i.e. workers) for a small fee, the taskmaster (i.e. ta

requester) can collect a large set of class labels from the

workers and ultimely estimate classification answers from

the multiple reports. To achieve this, the taskmaster

re-

dundantly

allocates tasks to users (in order to reduce uncer-

tainty, as a single response might be unreliable), and then

Appears in:

Proceedings of the 12th International Confer-

ence on Autonomous Agents and Multiagent Systems (AA-

MAS 2013), Ito, Jonker, Gini, and Shehory (eds.), May,

6–10, 2013, Saint Paul, Minnesota, USA.

2013, International Foundation for Autonomous Agents and

Multiagent Systems (www.ifaamas.org). All rights reserve

aggregates the responses into a final estimate using a fusion

method (e.g. majority voting [1] or IBCC [10]). Within

many systems, such a process of task allocation and reports

fusion is not trivial and is typically done by a computer

agent as it might need complex computation that humans

cannot provide [5, 6].

Now, a key challenge within these crowdsourcing systems

is to find an efficient trade–off between the cost of redun-

dant task allocation and the accuracy of the result. In more

detail, by assigning multiple users to a single task, we can

achieve higher accuracy of answer estimation, but we might

also suffer a higher cost due to hiring more users. To date,

research work has typically focused on applications where

the cost of task allocation is uniform for each task [2, 12,

5]. In this case, the search of the aforementioned trade–off

is reduced to the problem of minimising the estimation er-

ror per task given the number of assigned users to a single

task. However, in many real–world scenarios, completing

different tasks might require

different costs

that depend on

the difficulty and the time required for the user to complete

such a task. For example, consider a habitat monitoring

project where the goal is to accurately identify the living

area of some rare species (e.g. the desert tortoise from the

Mojave desert, US, or the New Forest cicada in the UK). To

avoid sending expeditions of professionals with high cost,

one low–budget, but efficient, way to tackle this problem

is to ask people from neighbouring areas to make observa-

tions and then send reports in return for a certain amount of

payment. However, as the approachability of geographical

areas might vary (e.g. due to possible dangers or landscape)

we might need to pay different costs to motivate people to

approach different areas in order to provide reports. Fur-

thermore, this allocation scheme has to cope with the fact

that the project is limited by its funds.

Another example comes from the human computation do-

main. Suppose a system designer aims to use a crowdsourc-

ing platform to execute complex workflows of jobs [13]. In so

doing, he/she decomposes the workflows into sets of micro–

tasks, and assigns these tasks to a population of users. Now,

a typical classification task within this domain is to deter-

mine whether a micro–task is completed, and the goal is to

maximise the total accuracy of such classifications. To do so

he/she incentivises users to participate in this classifica

tion

phase and compensate their effort with a certain payment.

Note that the classification tasks may vary in difficulty, as

some are more time consuming, while many others are triv-

ial. Thus, users might need different sizes of payment when

Hence the combination between human and agent (i.e.

computer) intelligence.

they face tasks with different difficulty levels. In addition,

the total payment is limited, as the system designer wants

to keep the cost within a finite budget.

Within the aforementioned scenarios, and many others

besides, the main challenge is to redundantly allocate a set

of tasks with different costs to an open population of users,

with respect to a

budget limit

, such that the estimation error

is minimised. In particular, if we allocate large budgets to

cheaper tasks, whose answers can typically be accurately es

timated with only a few responses, we might fail to achieve

high total accuracy of estimation, as a large portion of thes

budgets could be used for other tasks where the allocated

budget is insufficiently low. In contrast, it might not be ef-

ficient either to allocate much of the budget to expensive

tasks, as this will have less available for the others. Since

existing methods are typically not designed to make such

trade–offs, they are unlikely to be efficient in our settings.

Moreover, these methods do not provide theoretical guaran-

tees on the estimation error, which implies that they might

perform well on average, but they might arbitrarily poorly

in many specific cases. A notable exception is the work of

Karger

et al.

[6] which provides a

−

(

)

bound for the aver-

age estimation error, where

is the number of users assigned

to a single task. However, this bound is only asymptotic (i.e

it holds when the number of tasks tends to infinity), so is not

practical for real–world systems, where the number of tasks

is typically finite. Against this background, we focus on the

budget allocation problem

in which an agent (on the behalf of

the taskmaster) has to allocate budgets to a set of tasks and

the total allocated budget cannot exceed a certain limit. Th

agent’s goal is then to find an optimal budget allocation that

minimises the total estimation error

of the answers. Within

this paper, we propose CrowdBudget as a budget allocation

strategy that efficiently tackles this problem. In particula

by combining CrowdBudget with a majority voting-efficient

fusion method (i.e. methods that outperform the majority

voting rule, see Section 3), our agent can proveably achieve

a max

−

√

upper bound on its total estima-

tion error with high probability, where

is the budget size

and

is the number of tasks. This bound is lower than

the one provided by Karger

et al.

. Moreover, in contrast to

most current approaches, our algorithm allocates the bud-

gets in advance through the analysis of costs and expected

accuracy. This type of functioning is motivated by many

crowdsourcing platforms in which the task requester needs

to pre-set the number of assignments per task (e.g. Amazon

Mechanical Turk or CrowdFlower). Given this, we advance

the state–of–the–art as follows:

•

We introduce the problem of budget allocation for crowd-

sourcing classification tasks, in which the goal is to

minimise the error of the estimated answers for a fi-

nite number of tasks, with respect to a budget limit.

•

We develop CrowdBudget, an algorithm that, com-

bining with a fusion method, proveably achieves an

efficient bound on the estimation error, which signifi-

cantly advances the best known results.

•

By comparing the performance of CrowdBudget with

existing algorithms through extensive numerical eval-

uations on real–world data taken from a prominent

crowdsourcing system, we demonstrate that our al-

gorithm typically outperforms the state–of–the–art by

achieving up to 40% lower estimation error.

The remainder of the paper is structured as follows. In Sec-

tion 2 we discuss the related work in the domain of redun-

dant task assignment in crowdsourcing applications. Then

we formalise the model of budget allocation for task crowd-

sourcing in Section 3. Following this, we detail our propose

algorithm in Section 4. We also provide a theoretical per-

formance analysis within this section. The results of the

experimental evaluation is then discussed in Section 5. Sec

tion 6 concludes, and the appendix briefs the proofs.

2. RELATED WORK

Work on redundant allocation of classification tasks in crow

sourcing applications has typically focused on minimising

the estimation error given the number of users assigned to

a single task. In particular, Wellinder

et al.

proposed a

multidimensional model of users in order to estimate the ac-

curacy of a particular user’s answer, and thus, to improve

the estimation of the ground truth [12]. In a similar vein, a

number of works used Bayesian learning techniques to pre-

dict the users’ responses, such as the work of Kamar

et al.

[5] and the IBCC algorithm (for independent Bayesian clas-

sifiers combination) [10]. Apart from these, Dai

et al.

used

a PoMDP–based (for partially observable Markov decision

process) approach to model the estimation’s quality [2]. In

addition, Bachrach

et al.

relied on a machine learning based

aggregator to derive an efficient estimation of the correct

answer [1]. However, none of these methods will work in our

setting, as they do not address the challenge of having differ

ent costs for different classification tasks. Nevertheless,

they

can be used as an underlying fusion method for CrowdBud-

get, as they provide majority voting efficient response fusio

approaches (for more details, see Section 3).

More related to our work is CrowdScreen, an algorithm

proposed by Parameswaran

et al.

[9], that aims to find an

optimal dynamic control policy with respect to both total

cost and total estimation error over a finite set of tasks.

However, the cost of task allocation is considered to be uni-

form among different tasks in the system. In addition, as per

the other aforementioned approaches, there are no guaran-

tee on performance. One notable exception that does pro-

vide theoretical guarantees is the work of Karger

et al.

[6].

Within this work, the authors developed an algorithm based

on low–rank matrix approximation to assign tasks to users

and estimate correct answers. In addition, they devised an

(

−

)

upper bound on the estimation error. However, this

bound only holds when the number of tasks tends to infin-

ity, which implies that their bound is not useful for most

practical contexts, as opposed to our results (see Section 4

for more details).

3. MODEL DESCRIPTION

Let 1

, ...,K

denote the classification tasks whose outcomes

have to be estimated. Within our model, we assume that

the classifications are binary. Note that this assumption is

reasonable, as it is true in many real–world systems. Let

∈ {

}

be the

unknown

ground truth (i.e. the correct

answer) of each task

. To estimate

, we can request re-

sponses from a (large) set of users (i.e. population of the

crowd) as follows. In our model, the taskmaster does not

deterministically choose specific users to perform a partic

lar task

, but only submits a set of tasks to a crowdsourcing

system, as is the case in many open crowdsourcing systems

with a large population of users (e.g. MechanicalTurk or