key: cord-0560783-x969gjrj
authors: Kanaparthy, Samhita; Damle, Sankarshan; Gujar, Sujit
title: REFORM: Reputation Based Fair and Temporal Reward Framework for Crowdsourcing
date: 2021-12-20
journal: nan
DOI: nan
sha: 121ce22de11718542a77e2d477359083e3499262
doc_id: 560783
cord_uid: x969gjrj

Crowdsourcing is an effective method to collect data by employing distributed human population. Researchers introduce appropriate reward mechanisms to incentivize agents to report accurately. In particular, this paper focuses on Peer-Based Mechanisms (PBMs). We observe that with PBMs, crowdsourcing systems may not be fair, i.e., agents may not receive the deserved rewards despite investing efforts and reporting truthfully. Unfair rewards for the agents may discourage participation. This paper aims to build a general framework that assures fairness for PBMs in temporal settings, i.e., settings that prefer early reports. Towards this, we introduce two general notions of fairness for PBMs, namely gamma-fairness and qualitative fairness. To satisfy these notions, our framework provides trustworthy agents with additional chances of pairing. We introduce Temporal Reputation Model (TERM) to quantify agents' trustworthiness across tasks. With TERM as the key constituent, we present our iterative framework, REFORM, that can adopt the reward scheme of any existing PBM. We demonstrate REFORM's significance by deploying the framework with RPTSC's reward scheme. Specifically, we prove that REFORM with RPTSC considerably improves fairness; while incentivizing truthful and early reports. We conduct synthetic simulations and show that our framework provides improved fairness over RPTSC.

Crowdsourcing is a popular method for requesters to collect information from many people who input their data through the internet, sensors, and other data streams. Crowdsourcing systems enable the crowd with diverse expertise to contribute to any outsourced tasks, including rating products online, urban sensing, collecting real-world data [1, 2, 3] . In these systems, to maintain accuracy, researchers devote a large body of work in incentivizing agents for truthful data elicitation [4, 5, 6, 7] . Typically, these tasks do not have access to the ground truth. Hence, we cannot verify the correctness of the agents' reports. Towards this, researchers introduce Peer Based Mechanisms (PBMs). PBMs reward an agent based on its consistency with other often random agents referred to as "peers". For instance, Robust Peer Truth Serum for Crowdsourcing (RPTSC) [8] evaluates an agent's report against a randomly paired peer's report and rewards it if they match. With this, RPTSC incentivizes agents to exert efforts and report truthfully.

This section describes our crowdsourcing model, relevant game-theoretic definitions and introduces our novel fairness notions. We tabulate notations used in this paper in the supplementary material for reference.

The requester of the crowdsourcing system publishes a set of n tasks T = {τ 1 , τ 2 , . . . , τ n } along with their deadlines {δ τ1 , δ τ2 , . . . , δ τn } on the platform every round. These tasks have a discrete and finite answer space X . Tasks in this setting are statistically independent and a-priori similar [17] . The requester assigns tasks to available agents A = {a 1 , a 2 , . . . , a m }, and each task in set T is assigned to at least two agents. We consider that each agent performs a single task in a round for ease of discussion. Whenever an agent is assigned more than one task, we apply our framework to each task solved by the agent separately. Agent a i is rewarded R i (y i , t i ) based on its report y i ∈ X and the time of report, t i .

We assume that the agents in the system are rational and intelligent. In our setting, the agents solve the assigned task either by exerting high (e H ) or low (e L ) efforts. Naturally, the cost of putting high efforts is greater, i.e., c(e H ) > c(e L ). We consider any effort which is insufficient to solve the task as low. If agent a i does its reasonable best to solve the task, i.e., exerts high effort, it obtains an evaluation x i . Based on their efforts and reporting behavior, an agent a i has a choice between the following two strategies.

1. Trustworthy Strategy. Exert high effort (e i = e H ) and report true evaluation (y i = x i ) at time t * i , which is the time taken to solve the task. We refer to an agent choosing this strategy as Trustworthy agent (TA). 2. Random Strategy. Exert low effort (e i = e L ) and report answer randomly according to its prior distribution at any time. We refer to this category of agents as Random agents (RAs).

In PBMs, the agents' reports are evaluated against their peers' reports. Depending on its reward scheme, PBMs reward an agent a i having report y i with reward, denoted by the function peer -fac(y i ). Agent a i is rewarded when its reports match with that of peer a p 's, i.e., y i = y p and is penalized otherwise. For instance, in PTS the reward is peer -fac(y i ) =

Iy i =yp f (yi) . Here, I yi=yp is an indicator variable 1 , and f (y i ) is the frequency function which counts the number of occurrences of report y i . As we consider temporal setting, we want the agents to submit their reports at the earliest. One may note that, for any mechanism to incentivize early reporting, the reward should reduce with an increase in time taken for submitting the report. For instance, crowdfunding mechanisms are shown to incentivize early contributions through "refunds" that decay with time [22, 23, 24] . To incorporate this, we employ a decay factor β(t) that decays with time in the reward. In our setting, the reward comprises of (i) peer factor peer -fac(·), and (ii) decay factor β(·). In summary, the reward agent a i gets on reporting y i for a task τ after time t i ≤ δ τ 2 is,

Note that, this incentive structure results in a game among the interested agents. To study the game induced, we now define game-theoretic definitions used in the paper. For this, we define the utility of the agent a i as,

Further, let s = (s 1 , s 2 , . . . , s n ) be the strategy profile such that agent a i 's strategy s i = (y i , e i , t i ) is a tuple consisting of its report, the effort exerted, and the time taken to report. We use subscript −i for the strategy profile without an agent a i . Agent a i 's utility for solving the task when all the agents play the strategy profile s = (

. Additionally, let t * i be the time taken by the agent a i to solve the task. We denote strategy profile

∀i when all the agents choose trustworthy strategy. And, let S = {(y i , e i , t i ) | ∀y i ∈ X , e i ∈ {e H , e L }, ∀t i } be the strategy space. With this, we give the following definition. Definition 1 (Nash Incentive Compatible (NIC)). A mechanism is said to be NIC if every agent a i employing trustworthy strategy maximizes its utility given all the other agents choose trustworthy strategy.

Note. We say that a NIC is strict if the inequality (2) is strict. 1 Equal to 1 if yi = yp and 0 otherwise 2 Reports submitted after the task deadline are not considered.

As stated, PBMs in the existing literature evaluate agents against a randomly selected peer and reward them based on their reports. One may observe that such evaluations may result in unfair rewards because of a trustworthy agent getting paired with a randomly selected agent. Even when a PBM incentivizes agents to exert efforts and report truthfully, if still, the agent's peer receives a noisy observation or is malicious, it can lead to unfair rewards for the agent who observed correct signals. As a result, a trustworthy agent may incur an unfair penalty. The lack of fairness in PBMs, i.e., trustworthy agents incurring penalties, was first observed in [25] . To quantify fairness, we now present two novel notions of fairness relevant to PBMs.

This notion of fairness depends on the difference in optimal and expected rewards of trustworthy agents in a PBM. We believe that this is the first general notion of quantifying fairness in PBMs.

Let for any PBM, M * be the optimal reward a trustworthy agent gets when its report matches with a peer's report, and E * be the expected reward. With this, we define γ-fairness as follows, Definition 2 (γ-Fairness). For a trustworthy agent, we say a PBM is γ-Fair if the expected difference in its optimal and the expected rewards is equal to 1 γ , that is,

In mechanisms that deploy reputation scores, it is desirable to prioritize the agents with better reputation over an agent with a lesser reputation. A report from the agent who promptly submits the truth is always valuable compared to an agent's report with an arbitrary history of reporting.

Similarly, in PBMs with reputation scores, an agent with a higher reputation should have a higher expected reward than agents with the same report but a lesser reputation. We capture this desired property with a new notion of fairness, namely Qualitative Fairness. Definition 3 (Qualitative Fairness). Let agents a i , a j ∈ A submit their reports y i , y j at the same time t such that y i = y j . We say a PBM guarantees qualitative fairness if its rewards satisfy,

Here, E[R i (y i = y, t)|Ω i ] is expected reward of agent a i with reputation score Ω i for reporting y i at time t.

To improve fairness in PBMs and satisfy the given notions, we look at two different approaches and analyze them. Approach 1. A straightforward way to overcome unfairness in PBMs is by using average (or weighted average) over multiple reports, say w reports, to reward a particular report.

This reduces the penalty obtained from unfair pairings for a trustworthy agent to some extent. However, it also assures higher expected rewards for random agents, increasing the overall budget, which is not desirable. Thus, we aim for a reward scheme that guarantees better fairness for trustworthy agents while discouraging random reporting. Towards this, we present our approach. Approach 2. We provide agents with additional chances of pairing, say k, to evaluate their reports when paired with a less reputed agent.

Consider a trustworthy agent who reported y for some task. Let the optimal reward it obtains when its report matches its peer's report (ignoring the decay factor β(·)) be g = peer -fac(y|y = y) and penalty obtained when reports do not match be l = peer -fac(y|y = y).

Expected reward with Approach 1: Let us consider that the probability of an agent's report matching with its random peer's report be 1 2 . Now, averaging reward across w such reports, the expected reward of an agent is

Expected reward with Approach 2: For our approach, consider the least number of additional chances of pairing, i.e., k = 2. The agent is given another chance when its report does not match, and its reputation score is higher than its peer's in the first matching. Here, we assume that an agent's reputation is more than its peer with probability 0 ≤ r ≤ 1.

Hence, the expected reward is

As we see, the expected reward R-2 is greater and is closer to the optimal reward g compared to R-1. Hence, we adopt Approach 2, which guarantees better fairness compared to Approach 1.

Remark. The ingenuity of our approach is to give trustworthy agents additional chances of pairing to evaluate their reports, which reduces the possibility of agents getting penalized for unfair pairings. This decrease in unfair penalty leads to higher expected rewards, further improving fairness. However, we must ensure that this process does not increase the expected rewards for agents with random strategies due to additional matching. To decide which agent will receive additional chances to pair, we use reputation scores as a metric.

Research has shown that reputation scores successfully quantify the trust a crowdsourcing system must place on an individual agent based on its history. However, no reputation model exists in the literature, which factors in the time taken to submit the report. Consequently, we introduce our reputation model, TERM, to quantify trustworthiness in temporal setting in the next section.

In crowdsourcing systems, mechanisms introduce reputation score of an agent as a parameter of trust the system places in its submitted report. The system builds up this trust in the agent, gradually after several instances of trustworthy behavior, and diminishes relatively quickly if the system observes adversarial behavior. This logic applies to any trust-based system such as Amazon Mechanical Turk [26] , Crowdflower [27] , etc.

Typically, reputation scores require to satisfy the following:

1. Builds trust in the agent gradually with honest behavior. 2. To incorporate temporal setting, the increase in scores should be inversely proportional to the time taken to report.

3. The score growth should decrease as it reaches the extreme and should not cross the maximum score allowed.

In general, the existing literature for a reputation model in crowdsourcing only factors the report submitted by the agents [12, 28, 2] . Towards this, we propose Temporal Reputation Model (TERM), which assigns TERM scores to agents considering both the accuracy of the report and the time taken to submit.

For TERM to satisfy the properties mentioned above, we use Gompertz function [12] whose variation is gradual, smooth, and is well suited for the model. Gompertz function is a particular case of sigmoid function, in which the growth at the start and end is slow. Several crowdsourcing mechanisms deploy this function to measure trust [12, 3, 29] . In TERM, we maintain the agents' history.

History (H). We maintain a history H of all the scores for every agent in each round. Let

where Ω i,j is the TERM score, |φ| i,j is the normalized round-score obtained for the report submitted in the round r j .

Algorithm 1 presents TERM, here, the requester maintains frequency f (y i ) of the report y i . TERM calculates normalized round-scores of agents from the reports submitted and time taken for reporting (Lines 5-6). The cumulative-score calculation uses all the obtained normalized round scores until the latest round (Line 7). We take cumulative-score as input to the Gompertz function, whose output is the TERM score (Line 8). We now formally define TERM and give its properties.

Let Ω i,j be the TERM score an agent a i obtains after round r j . We define TERM score as,

where, G(·) is the Gompertz function with parameters a ∈ R controls the asymptote, b ∈ R − sets the displacement, and c ∈ R − controls the growth rate of the curve. We set a = 1, b = −1, c = −1/2 for a smooth growth of TERM scores. Here, the input to Gompertz function is cummulative score ψ i,j (defined in Eq. 4) 1 Agent a i submits report y i for a task τ in round r j at time t i . 2 Input: Report y i , Time taken t i , History H i,j−1 3 Output: Updated TERM score Ω i,j 4 Randomly choose a report y p of agent a p from the same task τ .

2 )) ; TERM score calculation Further, φ i,j is round-score obtained by agent a i for submitting y i in the round r i after time t i , defined as,

Where y p is the random peer's report chosen from the same task.

We map all the round-scores to [−1, 1] as follows,

Here, min(φ j ) and max(φ j ) denote that minimum and maximum round-scores in the round r j , respectively.

We then calculate cumulative-score ψ i,j of each agent a i by taking all the normalized round-scores (|φ| i,j ) it obtained till the latest round, as follows,

Trivially, λ (j−k) gradually reduces the impact of previous round-scores. As desired, TERM score is an aggregate of all the submissions made by an agent and accounts for the fact that recent submissions are more relevant.

Notice that Eq. TERM used in the reputation model gradually increases with early reporting but reduces relatively fast with random reporting when the reports do not match. With this, one can observe that trustworthy reporting benefits the agents over random reporting. Hence, agents cannot manipulate their TERM score.

We now consider a collusive strategy wherein all agents collude to submit the same report, i.e., single report strategy. We prove that TERM score is resistant to such a strategy. Lemma 1. TERM is resistant to single report strategy.

In the next section, we introduce our novel iterative framework REFORM for crowdsourcing which uses TERM as a key component.

We now present REFORM a novel iterative framework for crowdsourcing, based on Approach 2 (Section 3.4). Intuitively, REFORM incentivizes an agent to report truthfully by improving the expected reward of trustworthy agents. We achieve this increase in the expected reward by ingeniously offering reputed agents additional chance(s) of pairing. Framework 1 formally presents REFORM. Observe that in Framework 1, peer -fac(·) may be the reward scheme of any existing PBM. In REFORM, based on the selection of the reward scheme, we evaluate an agent's report against the randomly chosen peer's report from the same task and reward the agent if the reports match (Lines 7-8). For temporal setting, we use TERM scores to decide whether to offer additional chance(s) of pairing to an agent. More concretely, if REFORM Framework 2: REFORM 1 Agent a i submits a report y i for an assigned task τ at time t i ≤ δ τ in round r j .

Input: peer -fac(·), k > 1, y i , t i , History H i,j−1 2 initialization: l = 0 3 while l < k do 4 Randomly choose peer report y p from the same task τ . if l = 1 then

if y i = y p then /* reports match, agent gets optimal reward */

if Ω i,j ≤ Ω p,j ∨ l = k then /* reputation score is less or maximum chances reached, no more pairing */

Return:

an agent's submitted report does not match with its peer's report, and if the agent has a TERM score higher than that of its peer without having reached the maximum number of chances k, we give it another chance to pair. Otherwise, we penalize according to the reward scheme adopted (Lines 10-11).

Note. We can plug any relevant reputation model instead of TERM in REFORM, and it still helps to improve the fairness of a PBM. As we focus on crowdsourcing in temporal setting, we employ our novel reputation model TERM for the same.

We next demonstrate REFORM's significance by deploying the framework over RPTSC's reward scheme. We first present the agent beliefs required in REFORM with RPTSC reward and its properties relevant to our work. Later we demonstrate its game-theoretic and fairness properties.

We focus on RPTSC [8] than other PBMs because of its practicality. Briefly, RPTSC is desirable over other PBMs as it (i) does not assume any prior, (ii) incentivizes efforts and truthful reporting, (iii) uses a surprisingly common rule to reward, and (iv) is resistant to single report strategy. The mechanism evaluates agents' reports against a randomly chosen peer's report, and each report y i is rewarded as follows:

where, α > 0 is a scalar constant used to tweak the reward budget. Further, I yi=yp is an indicator variable and f (y i ) is a frequency function, as defined previously.

In our setting, every agent has their own private belief about other agents' reports and reputations. Thus, the agent's estimate of its expected reward depends on its belief. We identify the distributions, P (·), Q(·), and T (·), as agents' beliefs about evaluations, reports, and reputation scores, respectively; as defined next.

The prior belief P p (x p ) denotes the probability with which agent a p 's evaluation is x p . Consider a p as another agent who solves the same task. Then, P p|i (x p |x i ) is a i 's posterior belief about agent a p 's evaluation being x p when its evaluation is x i . We assume that all the agents' beliefs are fully mixed, i.e., they believe that every possible event has a minimum probability (strictly > 0) of occurrence. Mathematically, for all the agents, ∀x i , x p ∈ X : 0 < P p (x p ), P p|i (x p |x i ) < 1.

Since the tasks are statistically independent, if agent a i has not solved task τ k , it has no evaluation for that task, i.e., x i = ∅. Therefore, agent a i 's posterior belief about the evaluation of agent a q for task τ k is P q|i (x q |x i ) = P q|i (x q |∅) = P q (x q )∀x ∈ X . That is, the same as its prior belief.

To decide its best strategy, agent a i estimates its expected reward for reporting y i based on its beliefs about the other agents' reports. For this, it transforms its beliefs about others' evaluations into beliefs about reports; (P p , P p|i ) → (Q p , Q p|i ). The posterior belief Q p|i (y p |x i ) is the probability that agent a p reports y p when agent a i 's evaluation is x i . As random agents do not evaluate the task; they do not have posterior beliefs. They report according to their prior distribution. Formally, if all the agents choose random strategy, i.e., their evaluations are ∅, we have Q p (y) = Q q (y) = P p (y) and Q p|i (y|∅) = P p|i (y|∅) = P p (y)∀y ∈ X . Likewise, when all the agents choose trustworthy strategy, i.e., all of them report their true evaluations, their beliefs about evaluations are the same as that of reports, Q p (y) = Q q (y) = P p (y); Q p|i (y) = P p|i (y)∀y ∈ X .

Self-predicting Condition. If agent a i exerts high effort (e H ) for the task, to acquire an evaluation x i , it develops a posterior belief P p|i (y i |x i ) regarding the evaluations of peers. We assume that the posterior belief has a positive correlation with agent a i 's evaluation x i . This assumption is the self-predicting condition defined as [8] :

If an agent a i 's belief satisfies the self-predicting condition, RPTSC [8] introduces a self-predictor ∆ i as a minimum value in [0, 1] such that the following holds.

We redefine ∆ i as the infimum over all possible ∆ i ∈ [0, 1] satisfying:

Self-predictor ∆ i characterizes an agent's belief regarding the degree of correlation of its evaluation with others' reports.

The smaller the value of ∆ i , greater the belief agent a i has on its evaluation. If ∆ i ≈ 1, the agent a i is more likely to confuse between different answers. In contrast, if ∆ i ≈ 0, the answers do not correlate.

We recognize an agent a i 's beliefs regarding the reputation score of agent a p with a distribution T p . T (·) is a distribution that is symmetric and identical. The distribution does not change with a particular peer, and all the agents share the same distribution, that is,

Particularly, when the underlying reputation score is TERM (Eq. TERM), we denote T p (Ω i , Ω p ) = P r(Ω i ≥ Ω p ) as the probability that an agent a p has its TERM score less than Ω i .

We now summarize the relevant properties of RPTSC required for the analysis of our results. To ease our notations, we use the following set of notations in the remainder of the paper.

• p yi := P p (y i ); p yi := P p|i (y i |x i ); p xi := P p (x i )

In RPTSC, the expected reward of agent a i with evaluation x i and report y i is,

Algorithm 3: REFORM with RPTSC 1 agent a i submits a report for an assigned task τ in round r j . Input: k > 1, report y i , time taken t i ≤ δ τ , History H i,j Result: Reward, R i (y i , t i ) 2 initialization: l = 1. Randomly sample n − 1 reports, one from each task other than τ 3 while l < k do 4 Randomly choose another report y p from the same task τ . Calculate the frequency of y i from the sampled reports, as f (y i ) = num(yi) Let M be the optimal reward of a trustworthy agent a i in RPTSC (i.e., if its report matches its peer's report, q yi = 1 in Prop. 1). This implies that,

In RPTSC, the expected reward of an agent a i before evaluation when all the other agents choose trustworthy strategy is, from [8, Eq. 8],

Trustworthy strategy is strict Nash Equilibrium in RPTSC, if all the agents a i , answers space X , scaling factor α and number of tasks n satisfy:

To simplify the proofs, we use Assumption B 1 , which is obtained from Assumption B by putting n := n + 1.

We now present the essential properties and results for REFORM with RPTSC reward scheme. Crucially, our results do not require any further assumptions, over the standard RPTSC assumptions, i.e., A and B 1 . Due to space constraints, we omit the formal proofs of the results. The corresponding proofs are available in the supplementary material.

Within RPTSC assumptions, TERM scores are high for truthful and early reporting. Observe that TERM scores increase with cumulative-score, which aggregates all the normalized round scores. Hence, an increase in round-score increases the TERM score. Having the self-predicting condition Eq. 7, round-scores are always higher for truthful reports. With this, we give the following lemma. Lemma 2. Given all the other agents choose trustworthy strategy, TERM incentivizes an agent to report truthfully and early under Assumption B 1 (Eq. 10).

Having TERM's incentive properties, we now provide the expected reward of RPTSC in REFORM framework. Lemma 3. In REFORM with RPTSC, the expected reward of agent a i with evaluation x i and report y i at time t i is,

Where k = 2 and n is the number of tasks.

Proof. (SKETCH). Trivially, if q yi = 0 the expected reward is 0. For q yi = 0, we calculate the expected reward as follows: Observe that when k = 1, REFORM with RPTSC has similar reward as RPTSC, and hence, the expected reward for k = 1 is the decay factor times the expected reward in RPTSC, i.e., β(t i ) × E (refer Prop. 1).

When k = 2, if the agent's TERM score is less than that of its peers', there is no additional chance of pairing; therefore, the expected reward is the same as β(t i ) × E . However, if the agent's TERM score is high, the agent gets an additional chance. In this case, (i) If reports match in the first pairing, the agent gets a reward which is the decay factor times the optimal reward M (Eq. 8), i.e., β(t i ) × M . (ii) Otherwise, the agent can get an additional chance, and the reward is β(t i ) × E . All the cases mentioned above, calculated with their probabilities of occurrence, give the expected reward an agent obtains in REFORM framework with RPTSC reward.

Note that the expected reward in REFORM with RPTSC is greater than RPTSC's expected reward (Prop. 1). This shows the desirability of REFORM i.e., the agents with higher TERM scores (i.e., TAs) are benefited significantly more in our framework. Naturally, the increase in the expected reward will increase as the number of chances for pairing. We formalize this intuitive result in Corollary 1. Corollary 1. In REFORM, the expected reward increases with an increase in additional chances, k.

We now prove that in REFORM with RPTSC's reward, it is strict Nash equilibrium for every agent to choose trustworthy strategy. To report truthfully, each agent must first exert high efforts. To incentivize agents to exert efforts, their expected utility must be strictly greater than randomly reporting (by investing low efforts). Considering all the agents are trustworthy (i.e., q yi = p xi , q yi = p xi ), agent a i 's expected reward for reporting at time t i , before its evaluation is,

We provide a random agent's expected reward, E ra , in Lemma 4. We then prove that REFORM with RPTSC incentivizes agents to exert efforts by showing that the expected utility of an agent for exerting high efforts is strictly greater than random reporting (Lemma 5). Lemma 4. In REFORM with RPTSC, the expected reward of a random agent a i with report y i at time t i when all the other agents choose trustworthy strategy is,

To ensure that the agents who exert efforts are accordingly compensated, we choose the scalar constant α in RPTSC's reward such that it absorbs the delay caused due to efforts. For agent a i who reported at time t i , α should satisfy: 3

REFORM Lemma 5. In REFORM with RPTSC, an agent is incentivized to exert high efforts given all the other agents choose trustworthy strategy under Assumption A 1 (Eq. 12).

Proof. (SKETCH) We know that the expected utility of a trustworthy agent a i for exerting efforts, before evaluation of tasks is R i (α) − c(e H ) (from Eq. 11). Moreover, the expected utility of a random agent is E ra − c(e L ) (from Lemma 4). Now from Assumption A 1 (Eq. 12), we have

. The proof then follows by showing that E ra < αβ(t i ). This implies R i (α) − c(e H ) ≥ E ra − c(e L ), that is, the expected utility of an agent for exerting efforts is greater than reporting randomly.

From Lemma 5, one can note that random agents are at a disadvantage. What is more, TERM score for agents who report randomly drops significantly, implying that they do not receive additional chances. Hence, even when an agent a i reports randomly at a time t i → 0 , it does not get a better reward than trustworthy agents.

Lemma 6. In REFORM with RPTSC, an agent is incentivized to report truthfully when all the other agents choose trustworthy strategy. Assuming that the agents' beliefs satisfy: 4

Proof. (SKETCH) For the proof, we first assume that the expected reward of an agent a i reporting truthfully is greater than any other strategy, i.e.,

. Using Eqs. 6 and 13, we get p xi − p xi > −r. We know that p xi − p xi ≥ 0 (from the self-predicting condition) and −r ≤ 0. These impliy that B 2 always holds. Thus, REFORM with RPTSC reward scheme incentivizes truthful reporting.

We conclude our game-theoretic analysis by proving that REFORM with RPTSC is NIC.

Theorem 1. REFORM with RPTSC is strict Nash incentive compatible under assumptions A 1 (Eq. 12) and B 2 (Eq. 13).

Proof. (SKETCH) From Lemmas 5 and 6, we see that REFORM with RPTSC reward incentivizes high efforts and truthful reporting. Moreover, we see that the expected reward E[R i (y i |x i ); k = 2] (Lemma 3) is proportional to decay factor β(·) which decreases with an increase in time taken for reporting. Thus, we show that exerting efforts and early truthful reporting, i.e., trustworthy strategy is a strict Nash Equilibrium in REFORM with RPTSC.

Discussion. With Lemma 1, we show that TERM is resistant to single report strategy. Further, RPTSC reward is also resistant to this strategy [8, Section 4.4] . One can observe that the reward in Eq. 5 is zero in the case of single report strategy. Hence, for any appropriately chosen scalar constant α, REFORM with RPTSC reward is also resistant to single report strategy.

In the next section, we show that REFORM with RPTSC reward guarantees significantly better fairness than RPTSC and satisfies qualitative fairness.

Consider the following propositions using our novel fairness definition γ-fairness (Definition 2).

Proof. (SKETCH) In RPTSC, the expected reward differs from optimal reward when reports do not match. That is, with probability (1 − q xi ) it does not produce an optimal reward. Proposition 4. For any a i ∈ A, REFORM with RPTSC is γ-fair with 1

Proof. (SKETCH) The reward in REFORM with RPTSC, differs from optimal reward with smaller probability, i.e., (1 − q xi )(1 − rq xi ) because of the additional chance(s) given to a trustworthy agent. From the Props. 3 and 4, we observe that γ in REFORM with RPTSC is greater than that in RPTSC. This implies that employing REFORM guarantees an expected reward closer to the optimal reward. Thus, REFORM with RPTSC is fairer for trustworthy agents compared to RPTSC.

To show that REFORM with RPTSC is qualitatively fair, we use TERM scores of the agents for reputation scores in Definition 3.

Theorem 2. REFORM with RPTSC satisfies qualitative fairness.

Proof. (SKETCH) Note that, REFORM provides additional chances of pairing to the agents with high TERM scores. This reduces their chances of getting penalized from unfair pairing, leading to increased expected rewards. Thus, expected rewards are proportional to TERM scores, satisfying qualitative fairness.

So far, we theoretically show that REFORM improves fairness in PBMs through its solution of providing additional chance(s) to agents. We next validate our results through synthetic simulations.

In order to evaluate the performance of REFORM with RPTSC reward, we simulate our crowdsourcing model. Our main objective is to empirically observe the fairness improvements of RPTSC with REFORM; thus, we neglect the decay factor used in the reward Eq. 1. We consider our setting with sufficient homogeneous tasks and answer space X = {0, 1, 2}. To satisfy Assumptions A (Eq. 2) and A 1 (Eq. 12), we set α as 10 (for RPTSC) and 11 (for REFORM).

We run our simulations for 200 rounds, where each round has n = 50 tasks and populated with m = 750 agents. We assume that the set of agents comprise (i) 60% of agents that follow trustworthy strategy (TAs) and report correct answers with a probability of at least 0.9; and (ii) 40% of agents that report randomly (RAs) and that report every possible answer with equal probability. In each round, agents report their answers for a single task. We evaluate the agents against a randomly paired peer from the same task. In both the mechanisms, the sample size chosen for reward computation is equal to the number of tasks (i.e., 50). We compare rewards for TAs and RAs by averaging over all the rounds. Here, N-REFORM and N-RPTSC are independent of α. Observe that with an increase in k, N-REFORM tends towards 1; that is, TA's reward in REFORM tends to the optimal reward. Moreover, TA's reward in REFORM is significantly higher than RPTSC, whereas the RA's rewards are almost the same. Thus, one can note that REFORM guarantees better rewards than RPTSC for TAs. We also observe γ to be 0.09 and 0.05 for REFORM and RPTSC, respectively after 200 rounds, which further highlight that REFORM with RPTSC is fairer compared to RPSTC w.r.t. γ-fairness.

Since REFORM produces higher rewards for TAs, one may observe that this increases the requester's budget. However, our experiments show that this increase is marginal. E.g., for k = 2, REFORMś budget per agent is approximately 5% higher than RPTSC. Thus, REFORM ensures better fairness for a moderate increase in the budget.

In this paper, we focused on designing a fair reward scheme for crowdsourcing while incorporating temporal settings. Towards this, we introduced two notions of fairness, namely γ-fairness and quantitative fairness. To address the persistent issue of fairness in PBMs, we provided additional chances of pairing for trustworthy agents. To quantify an agent's trustworthiness, we introduced the reputation model, TERM, and proved that it provides a high score for trustworthy reporting (Lemma 2). We proposed REFORM a novel iterative framework that takes the reward scheme of any existing PBMs as a plug-in along with the reputation model, TERM. We proved that REFORM with RPTSC is strict Nash incentive compatible (Theorem 1) and is resistant to single report strategy. We demonstrated that the framework REFORM improves the fairness of RPTSC. We established that REFORM with RPTSC achieves fairness with a marginal increase in the budget with the experiments.

Symbol Description T Set of tasks published in a round A Set of agents in the system X Answer space x i Evaluation of agent a i y i Report submitted by agent a i t i Time taken by agent a i to report e i Effort an agent a i exerted for a task c(e L ), c(e H )

Cost of low and high effort

Reward agent a i gets for reporting y i after time t i P p (x p ) or p xp Probability that a random peer a p has evaluation equal to x p P p|i (x p |x i ) or p xp Probability that a random peer a p has evaluation x p when a i 's evaluation is

Probability that a random peer a p has report equal to x p Q p|i (x p |x i ) or q xp Probability that a random peer a p has report x p when a i 's evaluation is x i Ω i,j TERM score of agent a i in the round r j H i,j

History of the agent a i till the round r j φ i,j Round-score obtained by the agent a i for submitting the report in round r j T p (Ω i , Ω p ) or r Probability that a random peer a p has TERM score less than Ω i Proof. We observe that in TERM, the round-score φ i,j =

Iy i =yp f (yi)ti directly depends on the report submitted y i and time taken t i by the agent a i .

Consider single report strategy where all the agents report the same answer y i = y. In this case, f (y i ) = 1 and the expected round-score of the agent a i is,

Suppose, out of m agents in a round, l agents report their evaluation x, and others report y.

The expected round-score of agent a i who report x in a non-colluding (trustworthy) strategy is,

We see that the expected round-score in colluding strategy, CS, is equal to a trustworthy strategy, TS. Therefore, any rational agent prefers to choose a trustworthy strategy, as it does not benefit from single report strategy. Thus, we claim that TERM is resistant to single report strategy.

Proof. To prove the lemma, we show that TERM produces high scores for reporting truth early after exerting efforts, considering that all other agents are trustworthy. Agents exerting effort is assured by Lemma 5; this overcomes random reporting. We have seen that, TERM score obtained by agent a i is Ω i,j = G(ψ i,j ) in round r j . Trivially, TERM score increases with an increase in cumulative-score, which aggregates all the normalized round-scores. Hence, an increase in round-score increases the TERM score.

Assuming two agents with identical round-scores in previous rounds, a difference in round-score of the present round will show a difference in their TERM score. From Eq. 3, we calculate the round-score as φ i,j =

Iy i =yp f (yi)ti . Here, f (y i ) is the frequency function of y i , calculated as the ratio of the number of reports (say, b + 1) that match with report y i to the total number of sampled reports (say, n). That is, f (y i ) = num(yi) y∈X num(y) = b+1 n . Further, y p is the random report sampled from the same task. We have seen that a trustworthy agent's strategy is (x i , e H , t * i ), where t * i is time taken for solving the task. Round-score of a agent who solves the task, but does not report truth (i.e., with strategy (y i , e H , t i ), where t i ≥ t * i ) is,

From the first inequality, we observe that the round-score of an agent when it reports truth (i.e., its evaluation x i ) is greater. And it is evident that the round-score increases with early reporting. Hence, TERM incentivizes early as well as truthful reporting.

Proof. Observe that when q yi = 0 (i.e., the probability with which agent a i 's peer reports y i is 0), the expected reward of agent a i is 0. Now, consider agent a i with q yi > 0 and evaluation x i , reports y i after time t i . From Proposition 1, the expected reward of an agent a i for reporting y i in RPTSC is E .

The expected reward, E[R i (y i |x i ); k = 2], is calculated as follows. For this, let TERM score of the agent a i in round r j be Ω i . Let agent a p with report y p and TERM score Ω p be the peer against whom a i is evaluated. As seen before, agent a i 's belief regarding the TERM scores of the any peer is same, i.e., ∀a p ∈ A, r = T p (Ω i , Ω p ).

From Algorithm 3, if the TERM score Ω i of agent a i is less than Ω p , in the first chance of pairing, then agent a i does not get an additional chance to pair. In this case, the expected reward is the same as RPTSC expected reward times the decay factor, i.e., β(t i ) × E .

However, if a i 's TERM score is higher than that of its peer's, then since k = 2, it gets another chance to pair. In this case, we have: (i) if y i = y p the reward is equal to optimal reward times with the decay factor, i.e., β(t i ) × M ; and (ii) if y i = y p the expected reward is equal to β(t i ) × E , as it receives an additional chance. Formally, we have,

This completes the proof of the lemma.

Proof. Similar to the proof given for Lemma 3, the expected reward of an agent a i with evaluation x i and report y i in REFORM with RPTSC is, From the above, we see that every term is positive, and with an increase in k expected reward increases. This proves the lemma.

Proof. Note that a random agent does not exert efforts for a task, i.e., its evaluation for the task is ∅. Therefore, p yi = p yi . Now the expected reward of a random agent, when all agents are trustworthy (From Lemma 3), is as follows,

The equality proves the lemma.

Proof. Before evaluation of the task, agent a i 's expected utility for investing high efforts is Ref i (α) − c(e H ) and its expected utility when it reports randomly is E ra − c(e L ). We show that Ref i (α) − c(e H ) > E ra − c(e L ), to prove that REFORM incentivizes high efforts. 

From Lemma 4, we have the expected reward of random agent:

The last inequality is because yi p yi = 1, and 0 < r(1 − p yi ) 1 − (1 − p yi ) n−1 < 1 ∀y i ∈ X . Then, 

From the above, we see that REFORM is α xi∈X (1 − rq xi )(1 − q xi ) 1 − (1 − q xi ) n−1 -fair.

Proof. The expected reward Lemma 3 of a agent a i for reporting y i is

Where r is the probability with which the peer's TERM score is less than that of agent a i 's score.

Consider two agents a 1 , a 2 who reported same answers, with TERM scores Ω 1 , Ω 2 (where, Ω 1 < Ω 2 ) respectively. The beliefs about other agents having lesser TERM scores is given by T p , where T p (Ω, Ω p ) is the probability with which peer a p 's TERM score is less than Ω. Since,

Assuming that both the agents have the same beliefs. The expected reward for the agents a 1 and a 2 are given as,

We see that the agent's expected reward with a greater reputation is higher than the agent's expected reward with a lower reputation having the same report. Therefore, REFORM with RPTSC is qualitatively fair.

Crowd sourced product reviews: A study of evaluation standards used

Quality-aware and fine-grained incentive mechanisms for mobile crowdsensing

Reputation-aware data fusion and malicious participant detection in mobile crowdsensing

Peer prediction with heterogeneous tasks

Peer prediction with heterogeneous users

International Foundation for Autonomous Agents and Multiagent Systems

Personalized peer truth serum for eliciting multi-attribute personal data

Incentives for effort in crowdsourcing using the peer truth serum

Crowdsourcing real-time traveler information systems

Designing a mobile crowdsourcing system for campus safety

Crowdsourcing for robust, real-time covid-19 data timely data on the contagion will be crucial to saving millions of lives, say weixing

Are you contributing trustworthy data? the case for a reputation system in participatory sensing

A bayesian truth serum for subjective data

Enhancing reliability using peer consistency evaluation in human computation

Peer truth serum: Incentives for crowdsourcing measurements and opinions

Informed truthfulness in multi-task peer prediction

Dominantly truthful multi-task peer prediction with a constant number of tasks

Incentivizing distributive fairness for crowdsourcing workers

Fairness-based multi-task reward allocation in mobile crowdsourcing system

Deep bayesian trust: A dominant and fair incentive mechanism for crowd

Farm: Fair reward mechanism for information aggregation in spontaneous localized settings

Designing refund bonus schemes for provision point mechanism in civic crowdfunding

Civic crowdfunding for agents with negative valuations and agents with asymmetric beliefs

Crowdfunding public projects with provision point: A prediction market approach

Incentives for truthful reporting in crowdsourcing

An innovative crowdsourcing approach for amazon mechanical turk

Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution

Reputation-based worker filtering in crowdsourcing

Improving data quality with an accumulated reputation model in participatory sensing systems

The expected utility of an agent before evaluation for exerting efforts is strictly greater than the expected utility from random reporting. Thus, a random agent is incentivized to exert high efforts in REFORM.

Proof. Consider agent a i with evaluation x i and report y i submitted after time t i . We assume that all other agents are trustworthy. For the strategy profile where all the agents are trustworthy, q = p ; q = p.From Lemma 3, the expected reward of agent a i for reporting y i , in REFORM with RPTSC when k = 2 isWe prove that the expected reward of a strategic agent, a i is less when it reports any value other than its evaluation x i . For this, we start with the assumption that the reward for reporting the truth is more than the reward for reporting non-truth and arrive at a noticeably obvious result.We know that p xi − p xi > 0 (From, self-predicting condition) and −r ≤ 0. Thus, the above condition is true, implying the assumption made is true. That is, the expected utility for reporting the truth is strictly more than strategic reporting. Hence, proving the lemma.

Proof. From Lemmas 5 and 6, we see that the agent a i under the given assumptions is strictly incentivized to exert efforts and report truthfully. We know that the expected reward of an agent a i for reporting the truth, i.e., x i is,One can observe that this expected reward is proportional to β(t i ) and β(·) function increases with decrease in time t i . Thus, in REFORM with RPTSC trustworthy strategy is a strict Nash equilibrium. And hence, REFORM with RPTSC is strict Nash incentive compatible.

Proof. From Proposition 1 and Eq. 8, expectation of the difference in optimal reward (M ) and expected reward (E ) for RPTSC over all possible evaluations is,From the above, we see that RPTSC is α xi∈X 1 − q xi 1 − (1 − q xi ) n−1 -fair.

Proof. From Lemma 3, we have the expected reward of REFORM with ∀t, β(t) = 1 as, E[R i (x i |x i ); k = 2]