key: cord-0495490-ycc9jtoe
authors: Abouzeid, Ahmed; Granmo, Ole-Christoffer; Webersik, Christian; Goodwin, Morten
title: Socially Fair Mitigation of Misinformation on Social Networks via Constraint Stochastic Optimization
date: 2022-03-23
journal: nan
DOI: nan
sha: c069adafd1669fb9d81c30b088d2cc7005179130
doc_id: 495490
cord_uid: ycc9jtoe

Recent social networks' misinformation mitigation approaches tend to investigate how to reduce misinformation by considering a whole-network statistical scale. However, unbalanced misinformation exposures among individuals urge to study fair allocation of mitigation resources. Moreover, the network has random dynamics which change over time. Therefore, we introduce a stochastic and non-stationary knapsack problem, and we apply its resolution to mitigate misinformation in social network campaigns. We further propose a generic misinformation mitigation algorithm that is robust to different social networks' misinformation statistics, allowing a promising impact in real-world scenarios. A novel loss function ensures fair mitigation among users. We achieve fairness by intelligently allocating a mitigation incentivization budget to the knapsack, and optimizing the loss function. To this end, a team of Learning Automata (LA) drives the budget allocation. Each LA is associated with a user and learns to minimize its exposure to misinformation by performing a non-stationary and stochastic walk over its state space. Our results show how our LA-based method is robust and outperforms similar misinformation mitigation methods in how the mitigation is fairly influencing the network users.

From a computation perspective, there are many approaches to combat the dissemination of misinformation 1 . Recently, (De Beer and Matthee 2020) illustrated some of the main techniques for classifying misinformation content and how these approaches can be applied in different scenarios. However, classification methods tend to be offline and limited to particular social network features to be learned, such as linguistics and the local political context (Lazer et al. 2018) . Furthermore, such classification models have a potential for False Positive matches, which may violate human rights conventions by misjudging and questioning individuals credibility and controlling free speech (Özdan 2021). On the other hand, recent work proposed intervention-based resolutions as an online approach to mitigate the circulation of misinformation on social media platforms. Such an approach is considered more convenient since it facilitates better collaboration between humans and technology by providing learned misinformation mitigation strategies instead of black-box classification models. For example, (Farajtabar et al. 2017 ) proposed a reinforcement learning-based optimization method which provides a strategy to decrease the difference between misinformation and true content exposures in Twitter, given that such misinformation exposure was dominating the network. The purpose was to mitigate the effect of misinformation on network users by incentivizing the latter to spread true information. A similar method was developed to facilitate decentralized and faster computation, as proposed by (Abouzeid et al. 2021) .

The latter approach introduces a light-weight decentralized computation that reduces the optimization sample space and utilizes Learning Automata (LA) that learn from reinforcement feedback (Narendra and Thathachar 1974) . However, the method was evaluated according to the decrease in difference between the dominating misinformation and the incentivized true content, averaging over the whole network. The problem with such an approach is that there would be real-world scenarios where some individuals need mitigation efforts more than others, while a sub-network individuals would be already protected from high misinformation exposures. Therefore, we believe a more socially fair intervention and allocation of mitigation resources should be introduced under the framework of (Abouzeid et al. 2021) .

This paper proposes a robust LA-based decentralized mitigation method that addresses a wide range of possible unbalanced exposures to either misinformation or true content, seeking robustness on a variety of a social network's statistics. Our contribution is threefold: stationarity comes from the temporal changes that occur over the whole network when an individual user responds to incentivization. This non-stationarity is particularly intricate due to the hidden dependencies in the information diffusion model. The LA task is to construct a network of individual automata on top of the social network. Each individual automaton is associated with a single user and performs a constraint Knapsack optimization via a random walk (Pearson 1905 ) over the automaton state space. • We propose a novel loss function to ensure that true content incentivization budget is fairly assigned according to individual users exposure needs. To this end, the problem is defined as a stochastic and non-stationary multi-agent Knapsack (Nicosia, Pacifici, and Pferschy 2009 ) optimization problem. • We introduce two evaluation metrics (Achieved Mitigation and Achieved Fairness) to measure the efficiency and robustness of the proposed misinformation mitigation algorithm on different social network's statistics. And we evaluate how our proposed technique is more socially fair compared to the proposed approach in (Abouzeid et al. 2021) . We conduct our empirical experiments on both synthetic and real-world social networks. Software source code and data are available at (https://github.com/Ahmed-Abouzeid/MMSS).

Information Diffusion Modelling. In order to apply intervention-based resolutions to misinformation mitigation, an information diffusion model is required to simulate the social network which to intervene with. The simulation is considered because intervention with the actual social media platforms is not feasible. We simulate the process of information diffusion by employing a Multivariate Hawkes process (MHP) as practiced by (Goindani and Neville 2020) , (Abouzeid et al. 2021) , and (Farajtabar et al. 2017 ). An MHP is a multivariate stochastic process (Chen 2016) which models the occurrence of temporal or spatio-temporal asynchronous events by capturing the mutual-excitation (dependencies) between these events. To model the social network dynamics, each user is represented by two Hawkes processes (HP), one for misinformation dissemination behavior, and the other for true content. The associated user HPs generate estimated random counts for both information types, given some behaviour observation in the past (e.g., estimating number of re/tweeted events given historical dependency). These counts indicate the intensity of the process at a specific time realization. Hence, An HP can be defined with its conditional intensity function λ. The intensity function has two main components: base intensity µ, and exponential decay function g over an adjacency matrix A. The formal explanation of the conditional intensity function is given by:

Where µ is the base intensity that models some external motivation to propagate some content (independent from inferred relationships in data). On the other hand, g is some kernel function over the observed history H tr associated with user i from the discrete time realization t s prior to time t r . g is concerned with the history of some influence matrix A i. , where A ij = 1 if there is an influence indicating that user i influences user j, and A ij = 0 if not. We used an exponential decay kernel function g = A i. e −wt as practiced by (Farajtabar et al. 2017) , where w is the decay factor which represents the rate for how the influence is reduced over time. For all users, the base intensity vector µ, and the influence matrix A can be estimated using maximum likelihood as proposed in (Ozaki 1979) . To simulate all users behaviours for each content type, an MHP is created, given that different intensity rates are generated at different discrete time realizations. Hence, at each realization, each user behaviour is simulated as an estimated number of events (misinformation or true content) to be generated. We set the interval window between realizations to two hours. The HP simulation algorithm adopted in this study follows the modified thinning algorithm introduced by (Ogata 1981) . See Appendix A.1 for a detailed explanation of the simulation evaluation metric.

Mitigated Diffusion. The core idea behind misinformation mitigation is by introducing the true information to the network through incentivization. Therefore, users associated true content HPs are modified. Hence, let x i be the incentivization amount decided for user i, and the modified HP for mitigation purposes can be redefined by:

(2)

Misinformation Impact. According to (Bradshaw and Howard 2017) , at least 50% of the world's countries suffer from organized political manipulation campaigns over social media. Other examples of misinformation can be observed during the Ebola outbreak in West Africa, which was believed to be three times more worse than the previous Ebola outbreaks (Jin et al. 2014) . Therefore, research on the role of online media and border-free passing through messages became an emerging topic of interest in scientific communities. Furthermore, investigation on such a topic is more complicated and requires different perspectives of analysis. For example, recent studies (Rampersad and Althiyabi 2020) argued that the influence of social media on accepting political misinformation may differ depending on age, culture or gender. Such social studies actively investigated the social impact of misinformation propagation on different social media platforms such as Reddit, Facebook, and Twitter.

Novel views on the problem emerged recently. For instance, recent investigations reported that deliberation contexts promoted in social media overcome false information about health (Pulido et al. 2020 ). An example of such deliberation can be viewed as a counterfactual campaigns to spread true health information against the spread of misinformation as practiced for the COVID-19 case on Twitter by (Abouzeid et al. 2021 ).

Misinformation Detection. The spread of fake news on social media has been initially considered as the intentional dissemination of false content in news articles (Allcott and Gentzkow 2017) . Progressively, others gave attention to the broader range of the problem (Sharma et al. 2019; Shu et al. 2017) . Moreover, rumor detection (Zhang et al. 2015) , malicious accounts classification (Zannettou et al. 2019; Shao et al. 2018) , and the causal aspects of misinformation (Abouzeid et al. 2019 ) have been discussed. However, the majority of these methods are highly depending on linguistic or local features which cause a lack of generality in the final resolution. To the best of our knowledge, it is hard to solve the problem in real-time or without data selection-bias concerns (Ousidhoum, Song, and Yeung 2020) . (Wasike 2013 ) expressed similar moral concerns since fake news detection resolutions are judgemental by nature. Therefore, the need for safer online strategies that would lead to more generic and authentic resolutions is critically desirable.

Knapsack Optimization. The utilization of Learning Automaton (LA) with Knapsack optimization problems is widely approached in the literature. For instance, (Granmo et al. 2007 ) worked on optimizing the allocation of polling resources for web page monitoring when the monitoring capacity is restricted. In web page monitoring systems, the system may involve n web pages that are updated on different time intervals. Hence, to avoid involving all web pages including the ones with no updates, the system must determine the most important web pages only, without exceeding the monitoring capacity. The work utilized a team of learning automata, where each automaton is involved with a particular web page and learns its importance to a Knapsack total value. Similarly, (Yazidi and Hammer 2018) dealt with a Stochastic Non-linear Fractional Equality Knapsack (NFEK) problem which is a fundamental resource allocation problem based on incomplete and noisy information. In the latter work, they proposed an optimal resolution to the resource allocation problem using a continuous LA without mapping the Knapsack materials onto a binary hierarchy. In such work, the proposed LA had a Reward-Inaction (R-I) learning scheme which only updates the LA actions (transitions) probabilities when rewarded. (Ulker and Tongur 2017) worked on another combinatorial optimization problem for Knapsack with a proposed Migrating Birds Optimization (MBO) algorithm to solve a 0-1 knapsack problem (Fréville 2004 ).

Hawkes Processes. The utilization of Hawkes processes-based intervention strategies was effectively presented on minimizing-risk problems. For example, (Gupta et al. 2018 ) worked on the problem of invasive species spreading to new areas which threatens the stability of ecosystems and causes major economic losses. The latter study proposed a novel approach to minimize the spread of an invasive species given a limited intervention budget, where the spread of species was modelled by a Hawkes process and the minimization task was considered a constraint Knapsack optimization problem.

Learning Automata Network. A Learning Automaton (LA) is a stochastic model suitable for learning in random environments (Narendra and Thathachar 1974) . The LA learns by interacting with the random environment, and updates its actions or state transitions according to the stochastic signal from the environment. Depending on the automaton design and architecture, the task is to find either an optimum/sub-optimum action or state. The LA seeks convergence to such state or action, eventually. The advantage of utilizing an LA-based optimization is due to its decentralized and easy implementation. An LA defined by its stochastic state transitions can be formally defined as a Markov Process (Ames 1989) . Therefore, to reach equilibrium over all LA, we build a network of LA, each performs a random walk over a finite and discrete state space, where the individual optimum or sub-optimum states will be the recommended incentivization values for a misinformation mitigation campaign. The individual random walks together form as a multidimensional joint random walk (Marquioni 2019) modelled by a multivariate Markov chain (Gotzamani et al. 2018) . Figure 1 demonstrates the proposed LA network and the underlying multivariate Markov chain (e.g., three automata with M states, each.), where the joint state transitions and their probabilities are derived by the individual automata state transitions which are dictated by a reward signal β.

i indicating going to left, staying at same state, and moving to the right, respectively. In order to reach an optimum or sub-optimum state S * i , LA i needs to learn the probabilities of its state transitions until it converges. Consequently, the optimum or sub-optimum S * i value will be the recommended incentivization value x * i to modify the information diffusion model with (See Equation 2). LA i could have only two possible state transitions:

, when k = 0 or k = M , respectively. At each interaction step t, the probability of LA i being in a next state depends on its present state and the transition direction a t i . With a uniform initial state transitions probabilities, LA i determines the next state S t+1 i and updates its state transition probability distribution vector π i according to the below: Where π i states probabilities are updated with regard to their rewarded visits frequency, and a t i represents the applied state transition a t i = S k,j i , where k, j are neighbor state indices and k = j if it was a recurrent state transition. Based on a t i and the environment stochastic reward β t i , LA i conducts a random step move over its state space. For instance, if a t0 i = S k,k+1 i , the state transition function δ commits the transition

..] t1 , respectively. We denote v t i and w t i as how many times a transition was rewarded (β i = 0) and performed for LA i up to interaction step t, respectively. Hence, For the state indices k, j, when k = j + 1, state transition probabilities are updated as the below:

Since each LA performs a random walk over its state space through a stochastic state transition, then the optimization problem is solved by the multidimensional joint random walk over the automata network. Furthermore, the individual state transitions are dependent to each others due to the shared knapsack capacity and the inter-connected influence in their environment rewards. Therefore, the probability of a particular automata network state is calculated as the joint probability of the individual automata current states. Hence the joint probability can be calculated as the below, where N is the network size:

LA Environment. To learn incentivization values for the social network's users, all users' associated LA interact with a Knapsack which evaluates how valuable the current LA state (incentive) for the mitigation campaign. The Knapsack evaluation is individual to each user behaviour on the network. Users behaviours are modeled through a multivariate Hawkes process (MHP). Hence, the LA environment has the following main properties. • Stochastic: which is due to the randomness of each HP itself, which generates random counts for each user events (e.g., re/tweets). • Non-stationary: which occurs because of the dependencies between users HP generated events. For instance, when both users i, j have an explicit or implicit dependency, a particular incentivization value x i = 0 might not be optimum for user i but could be optimum when the incentivization value x j > 0. Since the latter could cause user i to be fairly exposed to true content without the need to increase for x i (incentivize user i).

To reinforce the learning of targeted state values. each individual LA i will receive a reward signal β i from its Knapsack environment where β i ∈ {1, 0}, indicating a penalty, or reward Knapsack signal, respectively. The final committed state transition for an LA i is driven by the reward signal β i . For instance, if LA i randomly walks towards the right and received a reward, it commits the transition and updates its current state. However, if LA i receives a penalty, it rolls back the transition and stays at its recent current state before that transition. The state update mechanism also works if LA i randomly walks to the left direction. These random walks probabilities in both directions are learned according to Equations 4, 5. On the other hand, recurrent state transitions probabilities are updated according to Equation 6 until converging to a state where the probabilities of performing random walks in both directions became almost 0. The detailed information about how the reward signal β i is calculated for each direction of an LA i random walk:

Where m i is the slope of a fairness loss function F for the associated user i and Φ indicates either the Knapsack is currently full (Φ = 1) or not (Φ = 0). The Knapsack initial capacity starts with 0 and increased or decreased according to each individual LA state transition, while the current Knapsack capacity is shared across the LA network. Given that x i = S i : i ≤ M , since the mitigation incentive x i over time is represented by the current LA state where such an LA has M states. The above definition of the environment reward for the proposed random walk state transitions ensures converging to optimum or suboptimum mitigation incentive values. Figure 2 shows an example of our proposed LA state transitions mechanism where the optimization environment is non-stationary and stochastic. However, the LA managed to find a sub-optimal state value.

Fairness Loss Function. To achieve fair mitigation, we need to consider each individual user exposures to both misinformation and true content. Each user exposure associated with a content type is calculated as how much impact that content has on the user. Therefore, the ratio between true and misinformation impact for each user is considered. Hence, a more skewed initial distribution of these ratios will acquire a fair mitigation strategy to assign the incentivization budget according to user needs, Algorithm 1: Fair misinformation mitigation.

7:

x i ← S t i .

8:

10:

12:

14: without wasting the budget on users with already high exposures to true content. During the intervention, a ratio R i < 1 means that user i is more exposed to misinformation. Alternatively, a ratio R i > 1 indicates that user i incentivization is not necessary since the latter has already high level of true content exposures. The exposure values used in R i were calculated as proposed by (Abouzeid et al. 2021 ), see Appendix A.2 for more details. Below, we define our proposed fair misinformation mitigation loss function:

Where N represents the number of network users and n is the number of adjacent users connected to user i, where user i is considered adjacent to itself. Therefore, j is the index represents i and all its adjacent over the summation. R xi j represents the updated ratio between true content and misinformation after applying the incentivization value x i to the true content HP diffusion model associated with user i. As noticed in Equation 12, we square the subtraction 1 − R xi j to maintain positive values in the interval [0, ∞), while the task is to minimize the loss function as much closer to 0 as possible (See Figure 2) . It is important to highlight that the total loss is calculated through the achieved individual loss of each user during the allocation of incentives (e.g., associated LA and its current state value). That means the total loss ensures optimum or sub-optimum assigned incentivization values over X, where X can be viewed as the set of all automata current states. Eventually, the consumption of all incentivization values (LA states) must not exceed the bound C, which represents the Knapsack capacity.

Misinformation Mitigation. To obtain the optimum or sub-optimum learned states vectors of N automata, we initialize each individual LA i with an initial state transition probability vector π t0 i , and the initial ratio R xi=0 where no incentivization values yet to be added to the associated estimated base intensity µ t0 i of the relevant HP. Eventually, the initial fairness loss function F t0 i (x i = 0) is calculated while the Knapsack is initially empty c t0 = 0. The mitigation algorithm then iterates over the whole LA network until it converges to all optimum or sub-optimum state probability vectors. Then, converged states values are suggested as incentivization values for the underlying associated users on the network. The details of the misinformation mitigation procedure is shown in Algorithm 1. 

In our experiments we design six synthetic social networks {syn1, syn2, syn3, ..., syn6}. Each with a unique statistical misinformation exposure distribution among users. The six networks represent the possible real-world scenarios where some user groups might be highly exposed to misinformation more than other groups on the social network. Moreover, some individuals in these groups might be also highly exposed to misinformation more than others from the same group. Allowing for these possible scenarios in our experiments should stress the evaluation of robustness for a fair misinformation mitigation resolution. We design our synthetic networks by randomly generate variant true information and misinformation event counts on both user and network levels. Then, we set different bounds on these synthetic exposures to maintain a variety of statistics for each network. Eventually, we run our resolution on a real-world social network used in (Abouzeid et al. 2021 ) as another benchmark. The real-world network is a COVID-19 social network and annotated for ordinary and false re/tweets from Twitter on the 28 th of March, 2020. The collected re/tweets focused on discussions about COVID-19. The criteria for the misinformation annotation was if any propagated content urged the public for using false drugs (Tesfaye et al. 2020 ) without any official statements from the health authorities at that time. Within each of our experiments, we consider different mitigation incentivization budget for the Knapsack capacity to evaluate for different levels of constraints. Due to the randomness of experiments, we run each for multiple times and take the average as an estimate of the final outcome. 

Uniform-baseline. To highlight the need for a fair misinformation mitigation method, we make an analogy with a uniform allocation of the incentivization budget. For instance, if all or almost network users are equally exposed to misinformation than true content, a uniform distribution of incentivization budget is theoretically an optimum fair mitigation strategy. We refer to the latter as Case-0. However, the more the two content types were unbalanced on the network, the more challenging for a budget uniform distribution to achieve the desired mitigation results. For example if only 20% of network users were exposed to misinformation, a uniform incentivization becomes a waste for 80% of the budget, which might cause no mitigation at all since 20% of the budget becomes insufficient to maintain R = 1 for the targeted users. We refer to the latter as Case-1. Another form of skewness is when the majority of users are exposed to misinformation but a subset of them are significantly more exposed to misinformation than others, in such scenario, the uniform method will suffer as well, since these subset of users will need more incentivization than others. We refer to the latter as Case-2. It is important to highlight that the purpose of the HP information diffusion model is to predict future behaviours. Therefore, the initial distribution of misinformation exposures before any future intervention is unknown, and a robust incentivization is mandatory to overcome all the potential misinformation percentages.

AVG-LA-baseline. We further investigate how our LA-based resolution performs against current existed LA-based methods (Abouzeid et al. 2021) . We refer to the latter as AVG-LA, while we refer to our proposed method as Fair-LA.

Mitigation Efficiency. To evaluate for robustness on multiple social networks' scenarios, we introduce a mitigation efficiency metric which is calculated as per the below:

Where a and b are the misinformation percentages after and before mitigation, respectively. According to our synthetic social networks' different setups (See Table 1 ), Case-1 can be observed in syn1 and syn4, while Case-2 can be observed in syn3, and syn6. As concluded from Figure 3 , our proposed Fair-LA outperforms both AVG-LA and Uniform methods in most of the scenarios, especially in Case-1. Moreover, when Case-2 occurs, Fair-LA still outperforms other methods when the Knapsack capacity C was larger. From our statistical analysis on the COVID-19 network with 200 users, we observed almost a scenario equivalent to Case-0. Therefore, the Uniform method performs better than others. However, we can observe how the efficiency gap is reduced between Fair-LA and Uniform when the Knapsack capacity is more restricted. Eventually, the STD error in the achieved mitigation efficiency percentages for Fair-LA is significantly lower than AVG-LA which also shows how our proposed method is more stable. Learning Bias. In the context of our work, a learning bias means unnecessary incentivization values to be assigned based on incomplete evaluation of users' needs due to the non-stationary problem. To reduce such bias, we considered a relatively small learning rate (the automaton state increase/ decrease value) that ensured all users will be visited almost equal times before consuming the whole budget. Moreover, the fairness error ensured that no user will consume more than its needs from the budget. Eventually, political polarization would reshape how the learned incentives could actually cause mitigation. Hence, modeling the polarized responses to incentives should be integrated with our resolution in the future work.

Desired Mitigation Baseline. As demonstrated earlier, the idea of misinformation mitigation is to introduce counter information by incentivizing users to propagate it on the network. However, a question remains about to which extend a mitigation should be considered enough. In other words, what if an equal exposure of counter information to misinformation is not enough to maintain authenticity on the network. In such scenario, we propose a balance factor parameter, where the ratios in Equation 12 are considered fair only when approaching some balance. For instance, if the desired counter information exposure needed to be twice the amount of misinformation exposure per each user, then, the balance factor is set to 2 and the fairness of the ratio R is interpreted accordingly. See Appendix A.2.

Computation Speed. Due to the criticality of the misinformation problem, time is an important factor when evaluating misinformation mitigation resolutions. The complete comparison between AVG-LA and Fair-LA regarding their computation speed is given in Appendix A.5.

Large Scaled Networks. Appendix A.6 demonstrates how our method could be scaled on larger networks when sampling techniques are adopted to reduce the optimization space without sacrificing the mitigation efficiency.

This paper proposed a socially fair approach to misinformation mitigation on social networks. We introduced different synthetic social networks to generate diversity in scenarios where fairness will be critical to how we consume mitigation resources. Unlike other methods, where the fairness perspective was not considered and therefore the social networks which were evaluated were not diverse enough. However, as a limitation in our work, we did not consider the problem of non-responding users in a detailed manner. For instance, some users might be extremely polarized to respond to our incentivization even if their associated HP was responsive. Therefore, we believe that a model for political polarization can be integrated with our proposed method in the future.

Avg Absolute Difference Error Social Networks Figure 5 : A two-hours realization interval average absolute difference error of all social networks used in the paper experiments.

Where E ts+∆ is the average absolute difference error between actual and predicted timestamps within a certain time realization t s . The actual timestamps (from test data) and the HP predicted timestamps are represented by the intensity functions λ R i and λ H i , respectively. We used an exponential decay kernel function g = A i. e −wt as practiced by (Abouzeid et al. 2021 ) and (Farajtabar et al. 2017) , where w is the decay factor represents the rate for how the influence is reduced over time. We set the intensity decay factors w = .7 and w = 1 for the misinformation and normal content, respectively.

For any simulated social network, the MHP simulation average absolute difference error is crucial to indicate by how far that simulation is accurate. Consequently, it also indicates how reliable the results from the learned mitigation strategy. Figure  5 demonstrates the obtained simulation errors on the seven social networks used in our experiments, where time realization intervals are set to two hours, and the task is to predict next two hours events. The error is calculated by splitting the original timestamps to train and test data. In the train data, the HP estimation algorithm is fed with eight hours of users events history on the network. On the other hand, test data contained the next two hours in future, which to be compared with the HP predictions to calculate the average absolute difference error. Eventually, we set an error baseline to 5, as a baseline for our simulation performance. The error baseline value is inspired by the achieved simulation error in the related work (Abouzeid et al. 2021) and (Farajtabar et al. 2017 ). However, we believe our achieved simulation error is significantly small due to the small size of the networks. For the Hawkes parameters estimation and process simulation, we utilize the python package Tick (Bacry et al. 2017 ) in our python implementation.

The weight of either misinformation or normal content on the social network could be viewed as a count. Such counts represent how much each type of content was generated on the network by each user and at a specific time. Moreover, by considering the influence matrix A, the influencing users A .i on each user i indicate the amount of exposure (impact) the latter has. Hence, to calculate the impact of network circulated misinformation on user i, we sum the number of Hawkes process (HP) generated misinformation events for user i and the number of HP generated misinformation events for its n influencing users derived from A .i . For each discrete time realization, the impact is calculated and accumulated with the previous time realizations impact(s). Similarly, we repeat the procedure for the HP generated normal content events. Eventually, when user i is incentivized by the amount x i , the normal content HP base intensity µ i is increased by x i , and R xi is the outcome ratio between the normal content impact (after incentivization) and the default misinformation impact (original HP with no modification). The latter amounts are represented by T i (x i ) and F i , respectively. Below, we give the complete details of how users impacts ratios are calculated: 

Where A ij = 1 if user i is influenced by user j, and A ij = 0 if not, given that user i is considered to be an influencer to itself. t s is the current time realization index of r discrete realizations and x i is less than or equal to the maximum allowed incentivization budget C. Eventually, we add the value 1 to both the nominator and denominator of the ratio R tr i (x i ) to avoid division by zero. Since the definition of a good mitigation is contextual and follows a particular preference, we introduce a balance factor hyper-parameter b which is responsible for determining when to consider the ratio R value as good or not for the mitigation. That is by multiplying the value of the ratio R denominator by b. For example, if the impact of normal content was 1 and the impact of misinformation was also 1, but we aim to mitigate the misinformation impact with regard to being exposed to normal content by at least 1.5 more than misinformation, hence the balance factor b = 1.5. Below is how the impact ratios are calculated, given the balance factor b, while we set b = 1.3 in all experiments done for this paper:

Our proposed misinformation mitigation framework depends on some hyper-parameters for both the information diffusion and the mitigation algorithm. We adopt same information diffusion model parameters as applied in (Abouzeid et al. 2021 ) and (Farajtabar et al. 2017 ). On the other hand, We conduct a grid search to obtain the best parameters values for our proposed mitigation algorithm. Moreover, since the baseline model (Abouzeid et al. 2021 ) also depends on even more number of hyperparameters, and was only evaluated on the COVID-19 network, we perform another grid search to obtain its best parameters values on the six proposed synthetic networks for a fair comparison. Table 2 shows the final selected values for each model parameter, where our proposed model depends on less hyper-parameters. We refer to our model as Fair-LA, while the other baseline model as AVG-LA. Furthermore, for AVG-LA, we select sample size |U − | = 200 which is the whole network, to compare with the maximum performance AVG-LA can achieve. However, in the original AVG-LA experiments, sample size was reduced to 5 and 25 as a trade-off between reducing computation time while still achieving the accepted mitigation efficiency.

The Fair-LA is considered a wiser LA when consuming the Knapsack budget. As a consequence, the Knapsack maximum allowed budget C is not always fully consumed since it could happen that no more users need incentivization or their associated HP stopped changing while increasing the base intensity µ. Table 3 demonstrates the achieved mitigation efficiency for both Fair-LA and AVG-LA, while Knapsack budget consumption is also given to indicate how with the least amount of incentivization, our proposed Fair-LA is still performing much better in most of the scenarios.

We run all experiments on a regular CPU workstation with 8 GB of RAM and Intel Core i5 − 8250@1.60GHz − 1.80GHz. Table 4 compares both Fair-LA and AVG-LA with regard to the computation time of each experiment's single run, averaged over the multiple runs. From the reported statistics in Table 4 , we observe how the Fair-LA method high performance comes with a sacrifice to computation speed and convergence time, while we keep observing how the Fair-LA is more stable with regard to the STD error in most of the scenarios. However, in Appendix A.6, we show further enhancements on how a sampling technique improves the Fair-LA computation time. A.6 Sampling Figure 6 demonstrates how the Fair-LA mitigates misinformation on a Covid-19 network with 1, 164 users, compared to a poor mitigation by the Uniform method. The sampling technique shows how it did not affect the mitigation efficiency and total fairness loss. The sampling works by evaluating a randomly sampled subset of users for the fairness loss function given in Equation 12 instead of all users. We refer to our sampling misinformation mitigation system as MMSS-Fair-LA. For a relatively larger networks' experiments, we use a machine with 64 CPU cores and 128GB of RAM. Figure 7 shows computation time and gives an idea of the time complexity of the MMSS-Fair-LA when different sample sizes were used for the whole Covid-19 network. Moreover, different network sizes were evaluated for the Synthetic networks with misinformation percentages around 20 − 30 with fixed sample size of 200. However, we believe the optimum way to run the MMSS-Fair-LA is on a cluster of machines since the utilized LA network facilitate distributed computing. 

Causality-based Social Media Analysis for Normal Users Credibility Assessment in a Political Crisis

Learning Automata-based Misinformation Mitigation via Hawkes Processes

Social media and fake news in the 2016 election

The Markov process as a compositional model: A survey and tutorial

tick: a Python library for statistical learning, with a particular emphasis on time-dependent modeling

Troops, trolls and troublemakers: A global inventory of organized social media manipulation

Thinning algorithms for simulating point processes

Approaches to Identify Fake News: A Systematic Literature Review

Fake news mitigation via point process based intervention

The multidimensional 0-1 knapsack problem: An overview

Social reinforcement learning to combat fake news spread

Introducing multivariate Markov modeling within QFD to anticipate future customer preferences in product design

Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation

Discrete Interventions in Hawkes Processes with Applications in Invasive Species Management. In IJCAI

Misinformation propagation in the age of twitter

The science of fake news

Multidimensional elephant random walk with coupled memory

Learning automata-a survey

On Multi-Agent Knapsack Problems

On Lewis' simulation method for point processes

Comparative evaluation of label agnostic selection bias in multilingual hate speech datasets

Maximum likelihood estimation of Hawkes' self-exciting point processes

The Right to Freedom of Expression Versus Legal Actions Against Fake News: A Case Study of Singapore

The problem of the random walk

A new application of social impact in social media for overcoming fake news in health

Fake news: Acceptance by demographics and culture on social media

The spread of low-credibility content by social bots

Combating fake news: A survey on identification and mitigation techniques

Fake news detection on social media: A data mining perspective

How do we combat bogus medicines in the age of the COVID-19 pandemic?

Migrating birds optimization (MBO) algorithm to solve knapsack problem

Social media ethical issues: role of a librarian

Solving stochastic nonlinear resource allocation problems using continuous learning automata

Who let the trolls out? towards understanding state-sponsored trolls

Automatic detection of rumor on social network

To simulate the dynamics of information diffusion on the social network, each user i is associated with two HPs. Therefore, timestamps of both misinformation and normal 2 content events are considered as inputs for each HP. Then, each HP parameter estimation algorithm (Ozaki 1979) is responsible for estimating the parameters needed for simulating future events. Hence, the latter estimates the base intensity µ, and the influence matrix A for the kernel function g to calculate the conditional base intensity function as given in Equation 1.Eventually, the HP thinning algorithm (Ogata 1981) simulates the network dynamics for the desired time realizations in future. The HP simulation generates future prediction of each user's timestamps for either misinformation or normal content. Across all network's users, we obtain a multivariate Hawkes process (MHP) which describes either the misinformation or the normal content diffusion on the network. The MHP simulation performance is measured according to the below Equation as applied in (Abouzeid et al. 2021) and (Farajtabar et al. 2017 ):