key: cord-0234813-k2935or7
authors: Allouche, Tahar; Lang, J'erome; Yger, Florian
title: Multi-winner Approval Voting Goes Epistemic
date: 2022-01-17
journal: nan
DOI: nan
sha: 18f10a274346db7c0bef08f61574e38129646cae
doc_id: 234813
cord_uid: k2935or7

Epistemic voting interprets votes as noisy signals about a ground truth. We consider contexts where the truth consists of a set of objective winners, knowing a lower and upper bound on its cardinality. A prototypical problem for this setting is the aggre-gation of multi-label annotations with prior knowledge on the size of the ground truth. We posit noisemodels, for which we define rules that output an optimal set of winners. We report on experiments on multi-label annotations (which we collected).

The epistemic view of voting assumes the existence of a ground truth which, usually, is either an alternative or a ranking over alternatives. Votes reflect opinions or beliefs about this ground truth; the goal is to aggregate these votes so as to identify it. Usual methods define a noise model specifying the probability of each voting profile given the ground truth, and output the alternative that is the most likely state of the world, or the ranking that is most likely the true ranking. Now, there are contexts where the ground truth does not consist of a single alternative nor a ranking, but of a set of alternatives. Typical examples are multi-label crowdsourcing (find the items in a set that satisfy some property, e.g. the sport teams appearing on a picture) or finding the objectively k best candidates (best papers at a conference, best performance in artistic sports, k patients with highest probabilities of survival if being assigned a scarce medical resource).

These alternatives that are truly in the ground truth are called 'winning' alternatives. Depending on the context, the number of winning alternatives can be fixed, unconstrained, or more generally, constrained to be in a given interval. This constraint expresses some prior knowledge on the cardinality of the ground truth. Here are some examples:

• Picture annotation via crowdsourcing: participants are shown a picture taken from a soccer match and have to identify the team(s) appearing in it. The ground truth is known to contain one or two teams. • Guitar chord transcription: voters are base classifier algorithms [Nguyen et al., 2020] which, for a given chord, * Contact Author select the set of notes constitute it. The true set of notes can contain three to six alternatives.

• Jury: participants are members of a jury which has to give an award to three papers presented at a conference: the number of objective winners is fixed to three. (In a variant, the number of awards would be at most three.)

• Resource allocation: participants are doctors and alternatives are Covid-19 patients in urgent need of intensive care; there is a limited number k of intensive care units. The ground truth consists of those patients who most deserve to be cured (for example those with the k highest probabilities of survival if cured).

We assume that voters provide a simple form of information: approval ballots, indicating which alternatives they consider plausible winners. These approval ballots are not subject to any cardinality constraint: a voter may approve a number of alternatives, even if it does not lie in the interval bearing on the output. This is typically the case for totally ignorant voters, who are expected to approve all alternatives.

Sometimes, the aggregating mechanism has some prior information about the likelihood of alternatives and the reliability of voters. We first study a simple case where this information is specified in the input: in the noise model, each voter has a probability p i (resp. q i ) of approving a winning (resp. non-winning) alternative, and each alternative has a prior probability to be winning. This departs from classical voting, where voters are usually treated equally (anonymity), and similarly for alternatives (neutrality).

This simple case serves as a building component for the more complex case where these parameters are not known beforehand but estimated from the votes: votes allow to infer information about plausibly winning alternatives, from which we infer information about voter reliabilities, which leads to revise information about winning alternatives, and so on until the process converges. Here we move back to an anonymous and neutral setting, since all alternatives (resp. voters) are treated equally before votes are known.

After discussing related work (Section 2), we introduce the model (Section 3) and give an estimation algorithm (Section 4), first in the case where the parameters are known, and then in the case where they are estimated from the votes. In Section 5 we present a data gathering task and analyse the results of the experiments. Section 6 concludes.

Epistemic social choice It studies how a ground truth can be recovered from noisy votes, viewing voting rules as maximum likelihood estimators. Condorcet's jury theorem [Condorcet, 1785] considers n independent, equally reliable voters and two alternatives that are a priori equally likely, and states that if every voter votes for the correct alternative with probability p > 1 2 , then the majority rule outputs the correct decision with a probability that increases with n and tends to 1 when n grows to infinity. See [Nitzan and Paroush, 2017] and [Dietrich, 2008] for proofs and discussion.

The framework was later generalized to more than two alternatives [Young, 1988] , to voters with different competences [Shapley and Grofman, 1984; Drissi-Bakhkhat and Truchon, 2004] , to a nonuniform prior over alternatives [Ben-Yashar and Nitzan, 1997; Ben-Yashar and Paroush, 2001] , to various noise models [Conitzer and Sandholm, 2005; Conitzer et al., 2009] , to correlated votes [Pivato, 2013; Pivato, 2017] , and to multi-issue domains [Xia et al., 2010] . [Meir et al., 2019] define a method to aggregate votes weighted according to their average proximity to the other votes as an estimation of their reliability. A review of the field can be found in [Elkind and Slinko, 2016] .

Epistemic voting with approval ballots has scarcely been considered. [Procaccia and Shah, 2015] study noise models for which approval voting is optimal given k-approval votes, in the sense that the objectively best alternative gets elected, the ground truth being a ranking over all alternatives. [Allouche et al., 2022] do a similar work for a ground truth consisting of a single candidate. [Caragiannis and Micha, 2017] prove that the number of samples needed to recover the ground truth ranking over alternatives with high enough probability from approval ballots is exponential if ballots are required to approve k candidates, but polynomial if the size of the ballots is randomized. Multi-winner voting rules They output a set of alternatives (of fixed cardinality or not) from a set of votes (approvals or rankings). There have been a lot of recent developments in the field (a recent survey is [Faliszewski et al., 2017] ), mostly concerning the classical (non-epistemic) view of social choice, where votes express preferences.

Multi-winner epistemic voting has received only little attention. [Procaccia et al., 2012] assume a ground truth ranking over alternatives, and identify rules that output the k alternatives maximizing the likelihood to contain the best alternative, or the likelihood to coincide with the top-k alternatives. The last section of [Xia and Conitzer, 2011 ] defines a noise model where the ground truth is a set of k alternatives (and the reported votes are partial orders). The only work we know where the noise models produce random approval votes from a ground truth consisting of a set of alternatives is [Caragiannis et al., 2020] . They define a family of distance-based noise models, whose prototypical instance generates approval votes selecting an alternative in the ground truth (resp. not in the ground truth) with probability p (resp. 1 − p); as we see further, this is a specific case of our noise model. Crowdsourcing [Kruger et al., 2014; Qing et al., 2014] give a social choice-theoretic study of collective annota-tion tasks. [Shah and Zhou, 2020] design mechanisms for incentive-compatible elicitation with approval ballots in crowdsourcing applications. Beyond social choice, collective multi-label annotation was first addressed in [Nowak and Rüger, 2010] , which studies the agreement between experts and non-experts in some multi-labelling tasks, and in [Deng et al., 2014] , where a scalable aggregation method is presented to solve the multi-label estimation problem.

Let N = {1, . . . , n} be a set of voters, and A = {a 1 , . . . , a m } a set of alternatives (possible objects in images, notes in chords, papers, patients...). Consider a set of L instances: an instance z consists of an approval profile A z = (A z 1 , . . . , A z n ) where A z i ⊆ A is an approval ballot for every i ∈ N . For example, in a crowdsourcing context, a task usually contains multiple questions, and an instance comprises the voters' answers to one of these questions.

For each instance z ∈ L, there exists an unknown ground truth S * z belonging to S = 2 A , which is the set of objectively correct alternatives in instance z. It is common knowledge that the number of alternatives in each of them lies in the interval [l, u]: S * z ∈ S l,u = {S ∈ S, l ≤ |S| ≤ u}, for given bounds 0 ≤ l ≤ u ≤ m.

Our goal is to unveil the ground truth for each of these instance using the votes and the prior knowledge on the number of winning alternatives. We define a noise model consisting of two parametric distributions, namely, a conditional distribution of the approval ballots given the ground truth, and a prior distribution on the ground truth. Here we depart from classical noise models in epistemic social choice, as we suppose that the parameters of these distributions may be unknown and thus need to be estimated.

For each voter i ∈ N , we suppose that there exist two unknown parameters (p i , q i ) in (0, 1) such that the approval ballot A z i on an instance z ∈ L is drawn according to the following distribution: for each a ∈ A,

where p i (resp. q i ) is the (unknown) probability that voter i approves a correct (resp. incorrect) alternative. Then we make the following assumptions:

(1) A voter's approvals of alternatives are mutually independent given the ground truth and parameters (p i , q i ) i∈N . (2) Voters' ballots are mutually independent given the ground truth.

(3) Instances are independent given the parameters (p i , q i ) i∈N and the ground truths. To model the prior probability of any set S to be the ground truth S * , we define parameters t j = P (a j ∈ S * ). t j can be understood as the prior probability of a j to be in the ground truth set S * before the cardinality constraints are taken into account. These, together with an independence assumption on the events {a j ∈ S * }, gives P (S = S * ) =

Note that the choice of the parameters t j is not crucial when running the algorithm for estimating the ground truth: we will see in Section 4.3 that it converges whatever their values. The distribution conditional to the prior knowledge on the size of the ground truth can be seen as a projection on the constraints followed by a normalization:

It follows:

The ground truths associated with different instances are assumed to be mutually independent given the parameters.

Two particular cases are worth discussing. First, when (l, u) = (0, m), the problem is unconstrained and we have

In this case the problem degenerates into a series of independent binary label-wise estimations (see Subsection 4.1).

Second, in the single-winner case (l, u) = (1, 1), we havẽ

, therefore, for any approval profile

. We recover the same estimation problem if we simply introduce α j = P (S * = {a j }) with α j = 1 as in [Ben-Yashar and Paroush, 2001] , in which case we have P

Our aim is the intertwined estimation of the ground truth and the parameters via maximizing the total likelihood of the instances:

where:

To this aim, we will introduce an iterative algorithm whose main two steps will be presented in sequence, in the next subsections, before the main algorithm is formally defined and its convergence shown. These two steps are:

• Estimating the ground truths given the parameters.

• Estimating the parameters given the ground truths. Simply put, the algorithm consists in iterating these two steps until it converges to a fixed point.

Since instances are independent given the parameters, we focus here on one instance with ground truth S * and profile A = (A 1 , . . . , A n ). Before diving into maximum likelihood estimation (MLE), we introduce some notions and prove some lemmas. In this subsection, we suppose that the parameters (p i , q i ) i∈N and (t j ) j∈A are known (later on, these parameters will be replaced by their estimations at each iteration of the algorithm). Thus, all in all, input and output are as follows:

• Input: approval profile A; parameters (p i , q i ) i∈N and (t j ) j∈A . • Output: MLE of the ground truth S * . Definition 1 (weighted approval score). Given an approval profile (A 1 , . . . , A n ), noise parameters (p i , q i ) 1≤i≤n and prior parameters (t j ) 1≤j≤m , define:

The scores app w (a j ) can be interpreted as weighted approval scores for a (n + m)-voter profile where:

• for each voter 1 ≤ i ≤ n: i has a weight w i = ln pi(1−qi) qi(1−pi) and casts approval ballot A i . • for each 1 ≤ j ≤ m: there is a virtual voter with weight w j = ln tj 1−tj who casts approval ballot A j = {a j }. While the weight of each voter i ∈ N depends on her reliability, each prior information on an alternative plays the role of a virtual voter who only selects the concerned alternative, with a weight that increases as the prior parameter increases.

From now on, we suppose without loss of generality that the alternatives are ranked according to their score:

Definition 2 (threshold and partition). Define the threshold:

and the partition of the set of alternatives in three sets:

max ∪ S τn tie ) and let k τn max = |S τn max |, k τn tie = |S τn tie |, k τn min = |S τn min |. The next result characterizes the sets in S that are MLEs of the ground truth given the parameters.

Theorem 1.S ∈ arg max S∈S L(A, S, p, q, t) if and only if there exists k ∈ [l, u] such thatS is the set of k alternatives with the highest k values of app w and:

So the estimatorS is made of some top-k alternatives, where the possible values of k are determined by Eq. (1). The first equation imposes thatS includes as many elements as possible from S τn max (without exceeding the upper-bound u), whereas the second one imposes thatS includes as few elements as possible from S τn min (without getting below the lower-bound l). An example is included in the appendix.

Proof. SinceP (S) > 0 ⇐⇒ S ∈ S l,u , we have that arg max S∈S L(S) = arg max S∈S l,u L(S). Moreover, we have that for any S ∈ S l,u :

Thus the log-likelihood reads:

This means that a ∈ S τn max if and only if l(a) > 0 , a ∈ S τn min if and only if l(a) < 0 and a ∈ S τn tie if and only if l(a) = 0. Now, let S M be a maximizer of the likelihood.

tie , which would mean that there are elements in S τn tie and S τn max which are not in S M , which is a contradiction since |S M ∩ S τn min | > 0 and S M is a top-k set. Now consider a ∈ S M ∩ S τn min , we have that |S M \{a}| ≥ l and l(S M ) = l(S M \{a}) + l(a) < l(S M \{a}) which is a contradiction.

With the same idea we can prove that |S M ∩ S τn max | = min(u, k τn max ). Conversely, consider an admissible set S of top-k alternatives that verifies the constraints (1). Let S M be a MLE which, by the first part of the proof, is a top-k set that also satisfies the same constraints (1). Thus we have that |S M ∩ S τn max | = |S ∩ S τn max | = min(u, k τn max ), and since S and S M are top-k and top-k sets, we have that S ∩ S τn max = S M ∩ S τn max . Similarly we have that S ∩ S τn min = S M ∩ S τn min . This suffices to prove that l(S) = l(S M ) is maximal.

Notice that when (l, u) = (0, m), the problem degenerates into a collection of label-wise problems, one for each alternative: a j is selected if a j ∈ S τn max , rejected if a j ∈ S τn min , and those that are on the fence can be arbitrarily selected or not. Example 1. Consider 5 alternatives A = {a, b, c, d, e} and 10 voters N all sharing the same parameters (p, q) = (0.7, 0.4). We thus have that all voters share the same weight w = ln p(1−q) q(1−p) = 1.25 and τ n = n i=1 ln 1−q 1−p = 6.93. We consider the constraints (l, u) = (1, 4)

First, suppose that t d = 0.6 and that t j = 0.5 for all the remaining candidates. Consider also the approval counts (and weighted approval scores) in the table below.

Candidate a b c d e Approval count 9 8 7 5 5 app w 11.25 10 8.75 6.65 6.25

We can easily check, by Theorem 1 thatS = arg max S∈S P (S = S * |A) = {a, b, c}. We have that S τn max = {a, b, c}, S τn tie = ∅ and S τn min = {d, e}. We know that there exists some k ∈ [1, 4] such thatS would consist of the top k alternatives. We also have that: 

Estimating the prior parameters over alternatives

Once the ground truths are estimated at one iteration of the algorithm, the next step consists in estimating the prior parameters (t j ) j∈A , with the ground truths being given (in Subsection 4.3 the ground truth will be replaced by its estimation at each iteration). The next proposition explicits the closedform expression of the MLE of the prior parameter of each alternative given the ground truth of each instance S * z once the prior parameters of all other alternatives are fixed.

• Input: Approval profile (A 1 , . . . , A n ), ground truths S * z , and all but one prior parameters (t h ) h =j . • Output: MLE of t j .

Proposition 2. For every a j ∈ A: arg max t∈(0,1) L(A, S, p, q, t, t −j ) = occ(j)α j (L − occ(j))α j + occ(j)α j where:

Notice that α j = P (l ≤ |S * | ≤ u|a j ∈ S * ) and α j = P (l ≤ |S * | ≤ u|a j / ∈ S * ) so β = α j t j + α j (1 − t j ). occ(j) is the number of instances whose ground truth contains a j .

Proof. Fix all sets S z ∈ S l,u and all the noise parameters (p i , q i ) i and all the prior parameters t h but for one t j for some j ≤ m, and let t ∈ (0, 1):

Taking the log we can write the function as:

Its derivative reads:

Canceling it, we obtain:

The derivative vanishes in a single point in (0, 1) and lim t→0 (t) = lim t→1 (t) = −∞ thus reaches a unique maximum.

We will see later that the algorithm applies Proposition 2 sequentially to estimate the alternatives' parameters one by one (see Example 2).

Estimating the voter parameters Once the ground truths are known (or estimated), we can estimate the voters' parameters (p, q).

• Input: Instances (A 1 , . . . , A L ), ground truths (S * 1 , . . . , S * L ). • Output: MLE of voter reliabilities (p, q). The next result simply states that the maximum likelihood estimator of p i of some voter is the fraction of alternatives that the voter approves and that actually belong to the ground truth; the estimation of q i is similar. See Example 2. Proposition 3. Fix sets S z ∈ S l,u and prior parameters t j . Then:

arg max (p,q)∈(0,1) 2n

The (simple) proof is omitted; it is in the Appendix.

Now the estimation of the ground truths and that of the parameters are intertwined to maximize the overall likelihood L(A, S, p, q, t) by the Alternating Maximum Likelihood Estimation algorithm. AMLE is an iterative procedure similar to the Expectation-Maximization procedure introduced in [Baharad et al., 2011] but with a coordinate-steepest-ascentlike iteration, whose aim is to intertwinedly estimate the voter reliabilities, the alternatives' prior parameters and the Algorithm 1 AMLE procedure Input: Approval ballots (A z i ) 1≤z≤L,i∈N Initial parametersθ (0) , Bounds (l, u), Tolerance ε Output: Estimations (Ŝ z ), (p i ,q i ), (t j ) repeat for z = 1 . . . L do ComputeŜ (v+1) z = {a 1 , . . . , a k } with k ∈ [l, u] and:

instances' ground truths. The idea behind this estimation consists in alternating a MLE of the ground truths given the current estimate of the parameters, and an updating of these parameters via a MLE based on the current estimate of the ground truths. 1 Each of these steps have been discussed in the previous subsections and are now incorporated into Algo. 1.

The algorithm continues to run until a convergence criterion is met in the form of a bound on the norm of the change in the parameters' estimations. In practice we chose ∞ , but any other norm could be used in Algorithm 1 as in finite dimensions, all norms are equivalent (if a sequence converges according to one norm then it does so for any norm).

We define the vector of parametersθ (v) = (p (v) ,q (v) ,t (v) ) containing the voters' estimated noise parameters as well as the prior information estimated parameters at iteration v. In particularθ (0) is the input initial values. The choice of the exact initial values depends on the application at hand.

Note that at convergence, only local optimality is guaran-teed, as classical in optimization.

Theorem 4. For any initial valuesθ (0) , AMLE converges to a fixed point after a finite number of iterations.

Proof. First we have by Theorem 1 that L(A,Ŝ (v+1) ,θ (v) ) = max S∈S L(A, S,θ (v) ), and we have in particular that:

To prove that L(A,Ŝ (v+1) ,θ (v+1) ) ≥ L(A,Ŝ (v+1) ,θ (v) ) we use the fact that we update (p, q, t) by their MLE. By Proposition 3 we have that (p (v+1) ,q (v+1) ) = arg max (p,q) L(A,Ŝ (v+1) , p, q,t (v) ). Also by Proposition 2, and since we apply it sequentially to update t j we have:

To prove convergence, it suffices to show thatŜ (v) = S (v+1) for some v (which guarantees the estimators staying unchanged hereafter). Notice that the ground truth has a finite number of possible values (exactly 2 mL ), leading the algorithm to cycle at some iteration. For the sake of simplicity, suppose that this cycle is of length 2, in other words, suppose thatŜ (v+2) =Ŝ (v) for some v; this also implies that θ (v+2) =θ (v) . So:

By optimality ofŜ (v+1) , we have also that:

Hence, we get that:

and thus,Ŝ (v+1) =Ŝ (v) = arg max S∈S l,u L(A, S,θ (v) ) and the estimators will remain the same after any number of iterations following v.

Because L(A,Ŝ (v+1) ,θ (v+1) ) ≥ L(A,Ŝ (v+1) ,θ (v) ) ≥ L(A,Ŝ (v) ,θ (v) ), the likelihood increases at each step of the algorithm. This guarantees that whenever the execution stops, the likelihood is closer to the maximum than it initially was. Therefore the algorithm can not only be run until convergence, but it can also be run as an anytime algorithm.

Example 2. Take n = 3, m = 5, l = 1, u = 2, L = 4, and the following profile and initial parameters:

Estimating the ground truth: The first step is the application of Theorem 1 to estimate the ground truth of the instances given the initial parameters, yieldinĝ

Estimating the voter reliabilities: In the next step we use these estimates of the ground truths to compute the MLEs of the voter reliabilities. For instance, voter 1 has 2 false positive labels from a total of 12 negative labels soq 

Estimating the prior parameters: The final step of this iteration consists in updating the estimations of the prior parameters by applying Proposition 2 sequentially. First we estimatet

5 by maximum likelihood estimation. We first compute α 1 , α 1 and occ(a 1 ): α 1 = β(0, 1, t 2 , . . . , t 5 ) = 0.3125 α 1 = β(1, 2, t 2 , . . . , t 5 ) = 1 occ(a 1 ) = 1 Then the maximum likelihood estimation of t 1 is:

The next steps are to estimatet

4 ,t (0) 5 and so on. Finally, we get:

5 = 0.20 Fix ε = 10 −5 . We repeat all steps until convergence (according to ∞ ), after 5 full iterations. In the fixed point, the estimations of the ground truths are:

We designed an image annotation task as a football quiz. 2 . We selected 15 pictures taken during different matches between two of the following teams: Real Madrid, Inter Milan, Bayern Munich, Barcelona, Paris Saint-Germain. In each picture, it may be the case that players from both teams appear, or players from only one team, therefore l = 1 and u = 2. Each participant is shown the instances one by one, and is each time asked to select all the teams she can spot (see Figure 1) . We designed a simple incentive for participants, consisting in ranking them according to the following principle:

• The participants get one point whenever their answer contains all correct alternatives for a picture. They are then ranked according to their cumulated points. • To break ties, the participant who selected a smaller number of alternatives overall is ranked first. We gathered the answers of 76 participants (only two of them spammed by simply selecting all the alternatives). , we assign more weight to voters who are closer to the others on average, initializing the precision parameters (p i , q i ) accordingly. This suits our context, where voter competence is highly polarized: some voters are experts and cast similar answers close to the ground truth, the others are less reliable and their answers are dispersed among all combinations. Details on the heuristic are in the Appendix.

To assess the importance of prior information on the size of the ground truth, we tested the AMLE algorithm with free bounds (l, u) = (0, m) (will be referred to as AMLE f ) and the AMLE c algorithm with (l, u) = (1, 2). We also apply the modal rule [Caragiannis et al., 2020] which outputs the subset of alternatives that most frequently appears as an approval ballot arg max S∈S |i ∈ N , S = A i |, and a variant of labelwise majority rule which outputs the subset of alternatives S such that a ∈ S ⇐⇒ |i ∈ N , a ∈ A i | > n 2 . If this subset is empty it is replaced by the alternative with highest approval count, and if it has more than two alternatives then we only keep the top-2 alternatives.

We took 20 batches of n = 10 to n = 74 randomly drawn voters and applied the four methods to all of them (see Figure 2a, 2b) . As classically done in the literature [Nguyen et al., 2020] , we use the Hamming accuracy 1 mL L z=1 |S * z ∩ S z | + |S * z ∩Ŝ z | and the 0/1 accuracy 1 L L z=1 1{S * z =Ŝ z } as metrics and report their 0.95 confidence intervals.

We notice that the majority and the modal rule are outperformed by AMLE, which can be explained by the fact that they do not take into account the voters' reliabilities. Comparing the performances of AMLE c and AMLE f emphasizes the importance of the prior knowledge on the committee size to improve the quality of the estimation.

We study multi-winner approval voting from an epistemic point of view. We propose a noise model that incorporates the prior belief about the size of the ground truth. Then we derive an iterative algorithm to intertwinedly estimate the ground truth labels, the voter noise parameters and the prior belief parameters and we prove its convergence. Our algorithm is based on a simplification of Expectation-Maximization (EM), and its simple steps are more easily explainable to voters than EM and other similar statistical learning approaches. Although we mainly considered a general multi-instance task that fits the collective annotation framework, where each voter answers several questions on the same set of alternatives, we can nonetheless apply the same algorithm to singleinstance problems (such as the allocation of scarce medical resources) where only one question is answered. In this case, the prior parameters cannot be updated and it suffices to fix them once and for all and alternate between the estimation of the ground truth and the voter parameters.

In some contexts (e.g., patients in a hospital), alternatives and votes are not observed at once but streamed. To cope with this online setup we consider extending our AMLE algorithm in the spirit of [Cappé and Moulines, 2009] .

To see how the participants behave given the ranking incentives that we defined in the football quiz, we plotted the histogram of the sizes of the answers (see Figure 3 ). It appears that although the platform enables to select every alternative, only two voters did so for all the questions. Moreover, figures 3b and 3a show that the majority of the voters tend to select exactly the number of teams that appear in an image. , we devised an initialisation strategy for the voters' reliabilities. In his book, Leo Tolstoi stated that "Happy families are all alike; every unhappy family is unhappy in its own way". In the same spirit, it seems reasonable to make the hypothesis that accurate users tend to make similar answers, whereas inaccurate users have each their own way of being inaccurate.

We use the following heuristic (see Algorithm 2) for the initialization. We used the Jaccard distance given by:

Remark. The formulas in Algorithm 4 guarantee that a voter's parameters (p

i ) are such that her initial weight is equal to w i , and that wmax wmin = n which means that initially, the voter closest in average to the other voters counts n times the voter with biggest average distance. Example 3. Consider following the approval profile (Table  1) for 3 voters, 5 alternatives and 4 Instances. Here we have w max = n n + 1 = 0.75, w min = 1 n + 1 = 0.25

First, compute the mean Jaccard distance of all voters: d 1 = 1.71, d 2 = 1.69, d 3 = 1.65. So d max = d 1 = 1.71 and d min = d 3 = 1.65, which means that voter 3 (the closest in average to all the voters) will get the biggest weight w 3 = w max = 0.75 and voter 1 gets the smallest weight w 1 = w min . Next, compute the weight that will be assigned to each voter, for instance:

Now we can set the initial values for the reliability parameters accordingly:p

We can check that these parameters are such that:

After proceeding in the same fashion with all the voters, we get the initial parameters:

Since the AMLE only guarantees convergence to a local maximum, which makes the result depending on the initial point, we compared the results of this initialization (Anna Karenina) to other procedures to motivate its choice, see Fig- ure 4, namely we tested:

• Uniform weights: Initially all the voters in the batch are given the same weight.

• Random weights: Initially, for each voter in the batch, p i is randomly picked from (0.5, 1) and q i is randomly picked from (0, 0.5).

We can notice that these two baseline procedures show very similar performances, and that they are both outperformed by the Anna Karenina initialization. 

We assessed the execution time of the AMLE algorithm with and without constraints (refered to as AMLE and AMLE f ), run on Intel Core i7-10610U CPU @1.80Ghz 4 cores, 8 threads and 32Gb RAM. Results are show in Figure 5 . We can see that whereas the number of iteration does not seem to grow as the number of voter increases, the execution time of AMLE does, especially around 40 voters. In addition to the Hamming and 0-1 subset accuracies, we introduced a new metric which can be considered as an intermediate one. The Hamming metric considers each label independently and the 0-1 subset loss considers them jointly in a strict fashion, whereas the harmonic accuracies that we introduced considers all the instance's labels jointly but with different convex weights depending on the number of correctly predicted ones:

T (S, S * ) = |S∩S * | k=1 1 6 − k So out of the 5 labels:

• if 0 labels are correct then T = 0.

• if 1 labels is correct then T = 1 5 . • if 2 labels are correct then T = 1 5 + 1 4 . • if 3 labels are correct then T = 1 5 + 1 4 + 1 3 .

• if 4 labels are correct then T = 1 5 + 1 4 + 1 3 + 1 2 . • if 5 labels are correct then T = 1 5 + 1 4 + 1 3 + 1 2 + 1. Defined as such, this accuracy favours the estimators that are able to correctly estimate most of the instance's labels without being as rigid as the 0-1 subset accuracy.

This metric is reminiscent of the Proportional Approval Voting rule for multiwinner elections, which defines the score of a subset of candidates W for a voter as 1 + 1 2 + . . . + 1 j , where j is the number of candidates in W approved by the voter. We could consider more generally a class of metrics defined by a vector w, such that T (S, S * ) = w |S∩S * | . This class generalizes Hamming, 0-1 and Harmonic and is reminiscent of the class of Thiele rules (see for instance [Lackner and Skowron, 2020] for an extended presentation of multiwinner approval-based committee rules).

We show in Table 2 the accuracies of the considered methods when applied to the entire annotation dataset. In Figure 6 we show the evolution of the Harmonic accuracies when the number of randomly picked voters in each batch increase. 

The optimal decision rule for fixed-size committees in dichotomous choice situations: The general result

Preference functions that score rankings and maximum likelihood estimation

Maximum likelihood approach to vote aggregation with variable probabilities. Social Choice and Welfare

Multiwinner voting: A new challenge for social choice theory

How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggregating partial orders