key: cord-0674121-lo4lh3kx
authors: Mao, Yanbing; Li, Jining; Hovakimyan, Naira; Abdelzaher, Tarek; Lebiere, Christian
title: Cost Function Learning in Memorized Social Networks with Cognitive Behavioral Asymmetry
date: 2022-03-19
journal: nan
DOI: nan
sha: 99f3182a76799a3622f8affb043e209daca8ee3e
doc_id: 674121
cord_uid: lo4lh3kx

This paper investigates the cost function learning in social information networks, wherein the influence of humans' memory on information consumption is explicitly taken into account. We first propose a model for social information-diffusion dynamics with a focus on systematic modeling of asymmetric cognitive bias, represented by confirmation bias and novelty bias. Building on the proposed social model, we then propose the M$^{3}$IRL: a model and maximum-entropy based inverse reinforcement learning framework for learning the cost functions of target individuals in the memorized social networks. Compared with the existing Bayesian IRL, maximum entropy IRL, relative entropy IRL and maximum causal entropy IRL, the characteristics of M$^{3}$IRL are significantly different here: no dependency on the Markov Decision Process principle, the need of only a single finite-time trajectory sample, and bounded decision variables. Finally, the effectiveness of the proposed social information-diffusion model and the M$^{3}$IRL algorithm are validated by the online social media data.

online social media platform has strikingly enhanced the reach and scale of harmful content of low-quality information due to the inauthentic behavior, often inciting civil unrest [5] , causing widespread panic [6] , fueling vaccine hesitancy [7] , and many others. Moreover, according to the investigation of COVID-19-related infodemics on public health, health misinformation fueled by rumors, stigma, and conspiracy theories has dire implications on the individual and community if prioritized over evidence-based guidelines [8] . COVID-19 will not be the worlds last catastrophe! Hence, developing a low-quality information management framework, with a focal consideration of fighting the unreliable information sources with inauthentic behavior, is even more vital to defend security today.

To fight the information sources with inauthentic behavior, the game-theoretic frameworks of competitive information diffusion were initially formulated in [9] , [10] , where the malicious information sources disperse the low-quality information while the defenders spread the truthful information to counter the influence of low-quality information on the public opinions. However, the game-theoretic frameworks rely on a common assumption that the game players know each other's cost function for decision making. Therefore, there exists a formidable challenge that thwarts the feasibility of gametheoretic frameworks in reality: How to learn the implicit cost functions of game players? To pave the way to fight against malicious information sources, we thus investigate the cost function learning in this paper.

Cost function learning is a much more challenging and deeper problem, which has deep roots in the inverse optimal control (IOC) [11] and then later studied in the context of inverse reinforcement learning (IRL) [12] [13] [14] [15] [16] . The radical difference between IOC and IRL lie in their assumptions: IOC relies on the assumption of optimal behavior [17] while IRL updates the assumption with probabilistic (local) optimal behavior [12] [13] [14] [15] [16] . The cost function learning via IOC or IRL has a wide application domains, for example, human motion analysis [17] , imitation learning [18] , human-autonomous driving [19] and human-robot collaboration [19] . However, the application in social information networks with humans in the loop is not explored yet. The challenges we are facing that can hinder the application can include:

• Human Memory Influence: Recent investigation on social (mis)-information spread from the perspective of cognitive architecture reveals the significant influence of human memory in the information consumption [20] , [21] , which thus contradicts with the common assumption of Markov Decision Process (MDP) in both IOC and IRL [12] [13] [14] [15] [16] [17] [18] [19] .

• Dynamic External Stimulus: The dynamic external stimulus push the cost functions of decision making to be dynamic as well. For example, before the outbreak of COVID-19 crisis in the US, the news agency Fox News dispersed the misinformation that the COVID-19 is a hoax, which is driven by the event of US President impeachment, and lately changed the tune due to the possible lawsuit [22] . Intuitively, in the dynamic scenarios, the most recent trajectory of evolving opinions can yield much more accurate inference of cost functions than a large number of trajectory samples collected under different external stimulus. However, the existing IRLs, with exception being [13] , need a large number of trajectory samples to approximate the partition function.

The seminal IRL with local optimality [13] -albeit it relies on the assumptions of MDP and unbounded decision variables -provides a buildingblock to address the aforementioned challenges, since

• it needs only a single finite-time trajectory, and • it removes the assumption that the expert demonstrations are globally probabilistic optimal, thus allowing IRL to use examples that only exhibit local optimality.

We note that the proposed IRL in [13] is model based, since the local optimality needs a fairly accurate model of the underlying dynamics in consideration. Inspired by the model-based IRL [13] , we propose the M 3 IRL: a model and maximum-entropy based inverse reinforcement learning with local optimality for the memorized social information networks.

Since the proposed M 3 IRL is model based, the availability of information aggregate model that can best describe how humans consume and propagate information as well as how beliefs evolve will empower the M 3 IRL the high accurate costfunction inference. The dynamics of the opinion formation or information spread in networks has been an important research subject for decades, see e.g., [23] , [24] . A few wellknown models include DeGroot model [25] (whose roots go back to [24] , [26] ) that considers opinion evolution within a network in terms of the weighted average of individuals' connections where weights are determined by influences. Friedkin-Johnsen model [27] incorporates individual innate opinion or subconscious bias, thereby making the model more suitable to several real-life scenarios, as well as real applications, e.g., debiasing social influence [28] . While the social information diffusion dynamics has always been active research area, recently with the wide use of social media [29] , in conjunction with automated news generation with the help of artificial intelligence technologies [30] , [31] , it has gained a vital importance in studying misinformation spread and political polarization. In this regard cognitive bias, especially, the confirmation bias and the novelty bias play a key role. Concretely, it is well understood that confirmation bias helps create "echo chambers" within online social networks [32] , [33] , in which misinformation and polarization thrive [34] , [35] . Recently, Abdelzaher et al. in [36] and Xu et al. in [29] reveal the significant influence of consumer preferences for outlying content (due to novelty bias) on the opinion polar-ization in the modern era of information overload. Hence, the challenge pertaining to information diffusion dynamics is How to capture human cognitive bias in information consumption?

Though imposing a bounded confidence on social influence, Hegselmann-Krause (HK) model [37] has the capability of capturing confirmation bias [38] . The HK model involves a discontinuity in the influence impact, i.e., an individual completely ignores the opinions that are "too far" from hers, which renders the steady-state analysis difficult. As a remedy, the continuous state-dependent social influence models proposed in [9] , [39] [40] [41] [42] to study the polarization and homogeneity are sensed in capturing confirmation bias as well, since both polarization and homogeneity are the results of the conjugate effect of confirmation bias and social influence [34] , [38] . In the most of social problems, e.g., president election and product rating, humans hold the asymmetric cognitive bias. However, the HK model with symmetric confidence boundary [37] and the continuous models [9] , [39] [40] [41] [42] can only capture the symmetric confirmation bias, and the HK model with asymmetric confidence boundaries [37] can only partially capture the asymmetric confirmation bias. Additionally, the existing information diffusion models do not consider capturing the novelty bias yet [9] , [25] , [27] , [37] , [39] [40] [41] [42] . To provide a fairly accurate model of social information aggregate for the proposed M 3 IRL algorithm, we investigate the information diffusion dynamics with a foci of systematic modeling of asymmetric cognitive bias: confirmation bias and novelty bias.

The contributions of this paper are summarized as follows.

• We propose a social information diffusion dynamics, which explicitly takes subconscious bias, human memory, confirmation bias and novelty bias into account. Meanwhile, we provide the systematic modeling guidance for capturing the asymmetric confirmation bias and the asymmetric novelty bias. • Building on the proposed information diffusion dynamics, we propose the M 3 IRL for learning the cost functions of target individuals in memorized social networks with cognitive behavioral asymmetry, which does not rely on the principle of MDP, and needs only a single finite-time trajectory of public evolving opinions.

• Given a library of basis functions that constitute the cost functions in social information networks, we validate the effectiveness of proposed social information-diffusion dynamics and M 3 IRL by the online social media data.

This paper is organized as follows. In Section II, we present the preliminaries, which include the social informationdiffusion dynamics and the problem formulation. In Section III, we investigate the systematic modeling of the asymmetric confirmation bias and the asymmetric novelty bias. In Section IV, we propose the M 3 IRL algorithm. We present validation results in Section V. We finally present conclusions in Section VI.

We let R n and R m×n denote the set of n-dimensional real vectors and the set of m × n−dimensional real matrices, respectively. For a matrix W , [W ] i,j denotes the element in row i and column j. |·| denotes the cardinality (i.e., size) of a set, the determinant of a matrix, and the absolute value of a scalar. N denotes the set of natural numbers. We let 1, 0 and O, respectively, denote the vectors of all ones and all zeros and the zero matrix, with compatible dimensions. The superscript '⊤' stands for the matrix or vector transposition. For the vectors x and y, [x ; y] = [x ⊤ , y ⊤ ] ⊤ . I m ∈ R n denotes a vector indicator , whose m th element is 1 and all other elements are zeros, i.e.,

For y ∈ R, Λ(i, y) denotes a diagonal matrix with compatible dimensions, whose ith diagonal entry is y while all others are zeros, i.e., Λ(i, a) = diag{ 0

The social system is composed of n individuals. The interaction among individuals is modeled by a digraph G = (V, E), where V = {v 1 , . . . , v n } is the set of vertices representing the individuals, and E ⊆ V × V is the set of edges of the digraph G representing the influence structure.

We consider the following model of social information diffusion or public opinion evolution, which will be used to derive the model-based cost function learning from a single finite-time trajectory.

where we clarify the notations and variables: 1] is her subconscious bias (also referred to innate opinion), which is based on inherent personal characteristics (e.g., socio-economic conditions where the individual grew up and/or lives in) [27] .

• The state-dependent influence weight c ij (x, k, τ i ) is proposed to capture v i 's cognitive bias induced by the conjunctive confirmation bias and novelty bias, which thus is written as

where c(x i (x, k, τ i ), x j (k)) ≥ 0 is proposed to capture the novelty bias, and c(x i (x, k, τ i ), x j (k)) ≥ 0 describes the confirmation bias.

pectation from her surroundings -her neighbors and the information sources she follows -recorded in her memory over the horizon {k−τ i , k−τ i +1, . . . , k}, which is defined as

The time-varying function m i (t) in (4) indicates the influence of memory horizon on the sensed expectation, which satisfies

• The x i (x, k, τ i ) in (3) denotes individual v i 's sensed expectation from her own memory: 1] for ∀k ∈ N and ∀i ∈ V, it is determined in such a way that

Remark 1: The imposed condition (5) indicates the decaying influence of memory horizon on the individuals' real-time sensed exceptions (4) and (7), which can be a function of decaying activation (e.g., base-level activation) proposed in the cognitive architecture [20] , [21] .

The availability of information aggregate model that can best describe how humans consume and propagate information as well as how beliefs evolve will empower the model-based cost function learning the high accuracy. It is well understood and has been demonstrated via data mining that confirmation bias helps create "echo chambers" within online social networks [32] , [33] , in which misinformation and polarization thrive [34] , [35] . Meanwhile, Abdelzaher et al. in [36] and Xu et al. in [29] recently revealed the significant influence of novelty bias on the opinion polarization in the modern era of information overload. While the social informationdiffusion dynamics has always been active research area [9] , [25] , [27] , [37] , [39] [40] [41] [42] , how to systemically model the realistic asymmetric cognitive bias that humans hold in the most of social problem is not explored yet. Therefore, the first problem we will address is pertaining to the cognitive-social information dynamics, which is formally presented below.

Problem 1: What is the systematic modeling guidance for the information diffusion dynamics (1) that can capture the asymmetric confirmation bias and the asymmetric novelty bias? Generally, the individuals have the implicit individual/joint objective functions for (e.g., political-gain driven, profit driven and curiosity driven) decision making. Moreover, the gametheoretic frameworks of fighting against malicious information sources rely on an assumption that the players know each other's cost function for decision making [9] , [10] . Therefore, gaining the knowledge of cost functions is an existing challenge that thwarts the feasibility of game-theoretic defense strategies. Meanwhile, we note that the dynamic external stimulus push the cost functions of decision makers to be dynamic as well [22] . In the dynamic scenarios, the most recent trajectory rather than a larger number of trajectory samples collected under different external stimulus is more desired. Building on the answer to the Problem 1, the cost function learning constitutes the second problem.

Problem 2: Given the social information-diffusion dynamics (1) that explicitly takes human memory, asymmetric confirmation bias and asymmetric novelty bias into account, how to leverage the most recent trajectory to learn the cost functions of target individuals?

With the answers to Problems 1 and 2, the proposed cost function learning framework is shown in Figure 1 .

The more than 40 years' studies in cognitive and social psychology have revealed that the asymmetry effect/bias (i.e., the distance from X to Y may be estimated differently from Y to X) is a universal phenomenon, ranging from psychological similarity estimations [43] to social perception [44] . The most recent investigations focus on the determinants of the asymmetry bias [45] , which points out that, in the self-others comparisons, the asymmetry bias was displayed not only if the self was a prototype in social perception (which led to egocentric asymmetry), but also when the other had stereotypical cognitive representation (which led to allocentric asymmetry). The term j∈V c ij (x, k, τ i )x j (k) in the public opinion evolution model (1) has explicitly described humans' conformity behaviors in information consumption, which implicitly captures the allocentric asymmetry bias since the surroundings' opinions are sensed as a cognitive prototype (can also viewed as peer pressure). To answer Problem 1, we investigate How to systematically and mathematically describe humans' egocentric asymmetry bias in information consumption?

We first use the confirmation bias, as an example, to describe the egocentric asymmetry bias phenomenon in opinion/belief evolution and our corresponding expected behavior that the influence weight c(x i (x, k, τ i ), x j (k)) should capture. We then extend the modeling mechanism to the novelty bias. For the sake of simplifying the presentation, we refer x a , x b and x g (dropping out x, k and τ i without loss of generality) to the opinions of Alex, Bob and George (sensed from his memory), respectively. We suppose the topic being discussed is "COVID-19 Is a Hoax." The hierarchy representations of x i (t) is illustrated by Figure 2 -(i), where 1 and −1 correspond to completely opposing and completely supporting the claim, respectively.

Confirmation bias is generally referred to the cognitive behavior that a person gives larger weight to evidence that confirms his belief and undervalues evidence that could disprove it [46] . We now formally describe George's five behavior in reality if he holds confirmation bias towards the opinions of his neighbors Alex and Bob. The five corresponding behavioral scenarios are shown in Figure 2 In this scenario, George should treat the opinions of Alex and Bob equally, as long as they have the same distance with his. This behavior is formally described by 15, x a = 0.8} means that both the Alex and Bob have the same opinion distance with George's, but George and Alex are in the same domain of opposing the claim while Bob is in the other domain of supporting. In this scenario, George should favor Alex's opinion more. This behavior is formally described by 45, x a = 0.8} indicates that Alex and Bob have the same opinion distance with George's and they three are in the same domain of opposing the claim. But Bob is more hesitating in his opinion and more likely to leave current domain in his next opinion evolution process, while George and Alex are more stubborn. In this scenario, George should also favor Alex's opinion more, which is formally described by 1, x a = 0.8} means that 1) George, Alex and Bob are in the same domain of opposing the claim, 2) both Alex and Bob are stubborn in their opinions, but 3) Bob's opposing degree is closer to George's. In this scenario, George should favors Bob's opinion more. This expected behavior is described as 1, x a = 0.8} means 1) although George and Alex are in the same domain of opposing the claim but their opposing degrees has large distance, i.e., Alex is much more stubborn while George is much more hesitating, 2) Bob is in the other domain, but he likes George, is much more hesitating. In this scenario, George should favors Bob's opinion more. This expected behavior is described by

Remark 2: Taking x g > 0 as an example and considering 0 < ζ(x g , x a , x b ) < 1, the condition (12) implies that c(x g , x b ) > c(x g , x a ), i.e., George puts larger influence weight on x b than x a when the ratio of their opinion differences is larger than the threshold, i.e,

We now examine the existing models in capturing asymmetric confirmation bias.

The seminal Hegselmann-Krause model [37] , i.e.,

has been well recognized for capturing confirmation bias at some extent [29] , [34] , [38] . Depending on the upper confidence level ε i and lower confidence level ε i in (13b), the Hegselmann-Krause model (13) individual v i will completely abandon x a = 0.7 and takes x b = −0.2 into her opinion evolution.

To investigate the polarization and homogeneity, a variant of DeGroot model with continuous state-dependent influence is proposed in [39] , [40] :

A similar version building on Friedkin-Johnsen model [27] is proposed in [9] , [41] , [42] to capture confirmation bias, since both polarization and homogeneity are the results of the conjugate effect of confirmation bias and social influence [34] , [38] . However, the model can only capture symmetric confirmation bias. For example, given

Building on DeGroot model [25] , Dandekar et al. in [47] proposed a opinion polarization dynamics with biased assimilation. We now examine if the model can capture asymmetric confirmation bias in a simple scenario as considered in [47] , where the social network consists of only two individuals: v i and v j . The proposed opinion dynamics in this scenario is 

We can conclude here that the proposed polarization dynamics cannot fully capture the confirmation bias -even the symmetric bias -from the perspective of social influence weight.

We note that the Hegselmann-Krause model (13) involves a discontinuity in the influence impact: an individual is either influenced by an information source (or her neighbors) fully or not at all, depending on the opinion differences. This binary influence effect renders the rigorous analysis related to the fixed points difficult in general. As a remedy, the models (14) and (15) can be leveraged to obtain the trackable investigation of polarization evolution. However, the unexplored capability of capturing asymmetric confirmation bias hinders the reliability and trustworthiness of investigations in the realistic scenarios where humans hold asymmetric bias. To address this challenge, this subsection first presents the formal definition of asymmetric confirmation bias, which is based on the realistic asymmetry behavior described in the Section III-A. We then extend the formal definition to the asymmetric novelty bias.

The definition of asymmetric confirmation bias is formally presented as follows.

Definition 1: The influence weight c(x i (x, k, τ i ), x j (k)) in (3) is said to capture the asymmetric confirmation bias if it satisfies (8)-(12) simultaneously.

2) Asymmetric Novelty Bias: In this paper, we refer the novelty bias to humans' preferences for outlying content. Lamberson and Soroka in [48] revealed that negative information -compared with positive information -is more "outlying", since it is far away from expectations. Inspired by the discovery, an individual's sensed novelty/outlying/surpring degree of information is measured in term of sensed expectation from her surroundings in memory. For example, if George holds novelty bias, he will prefer news/opinions that has larger distance with her own surrounding expectation x g (x, k, τ g ). According to the same logic that describe the expected behavior (8)-(12) due to asymmetric confirmation bias, the expected asymmetry behaviors due to asymmetric novelty bias as formally described by

The definition of asymmetric novelty bias is then formally presented as follows. (3) is said to capture the asymmetric novelty bias if it satisfies (16)- (20) simultaneously.

With the definitions at hand, we next present the systemic guidance of modeling asymmetric confirmation bias and asymmetric novelty bias.

In this paper, we construct c(x g , x a ) and c(x g , x a ) to have the following general forms:

We first present the sufficient and necessary conditions of the model (21) on satisfying (8)- (12) in the following theorem, whose proof appears in Appendix A.

Theorem 1: The influence weight c(x g , x a ) given in (21) satisfies (8)- (12) if and only if

We next present the sufficient and necessary conditions of the model (22) 

Proof: The proof completely follows the proof path of Theorem 1, it is thus omitted.

Remark 3: Theorems 1 and 2 provide the guides to construct the models of asymmetric confirmation bias and asymmetric negativity bias, respectively. For example, c(x i , x j ) = χ i − γ i |tanh(x i ) − tanh(x j )|, with γ i > 0, satisfies the conditions (23)- (27) , which thus captures the asymmetric confirmation bias. The surface plot with the numerical examples of c( Figure 3 demonstrate the model's capability of capturing the cognitive behavioral asymmetry.

To address the Problem 2, we propose the M 3 IRL: a model and maximum-entropy based inverse reinforcement learning with local optimality for the memorized social networks with cognitive behavioral asymmetry. M 3 IRL is based on the dynamics (1).

To formulate the learning problem, we separate the set of individuals V into the set of humans H and the set of targets T, i.e., V = H T and H T = ∅. In light of (1), we write the information diffusion dynamics (1) as 

We observe the public evolving opinions to learn the cost functions of targets (which can including information sources and recommendation systems). We denote an observed trajectory in a finite time interval {k, k + 1, . . . , k + l − 1} as T(u) {(x(k + 1), u(k)), (x(k + 2), u(k + 1)), . . . , (x(k + l), u(k + l − 1))}, (35) and we also define

x [x(k + 1);x(k + 2); . . . ;x(k + l)] ∈ R l|H| , (37)

The trajectory sample (35) is denoted by T(u) rather than T(x, u) is due to the implication that the evolving opinions depend on the actions/opinions of observed targets.

Following the clarifications below the model (1), the action space of observed targets is defined as (38) such that u(k) ∈ U |T| .

In the cost function learning framework, we assume each target's cost function denoted by r i (x, u) is consisted of p basis functions:

k+l−1 j=kc q (x(j), u(j)), q = 1, 2, . . . , p.

The targets usually do not have the identical importance on the public opinion evolution, which is due to, for example, different numbers of their followers. Motivated by this, we impose importance weights on each target's cost function, which can be leveraged to transform multiple cost functions into a single one, i.e.,

considering which and (39), we define a set of parameter vectors:

Hereto, the Problem 2 can be reformulated as: given the basis functions, c q (x, u), q = 1, 2, . . . , p, and the public opinion evolution model (1) , inferring the coefficient vectors α and θ from a single finite-time trajectory sample (35) .

We let p(u|θ, α,x(k)) denote the conditional probability distribution of time-series action u, given the coefficient vectors α and θ and the initial public opinionx(k). The maximum entropy problem is formally formulated as min p(u|θ,α,x(k)) u∈U l|T| p(u |θ, α,x(k)) ln p(u |θ, α,x(k)) du (44a) subject to u∈U l|T| p (u |θ, α,x(k)) du = 1,

u∈U l|T| p( u| θ, α,x(k))c q (x, u)du =c q , q ∈ {1, . . . , p} (44c)

wherec q denote the expectation of c q (x, u). The optimal solution is formally presented in the following lemma, whose proof is given in Appendix B. Lemma 1: The solution of problem (44) is

αiθiqcq(x,u)

ũ∈U l|T| e r(ũ) dũ .

Remark 5: The maximum-entropy based IRL implies that the objective of collective decision making among targets is to maximize the likelihood of the observed sequences of actions and evolving opinions.

If using only a finite-time trajectory of evolving opinions for cost function learning, a local optimization of likelihood of probability distribution is needed, and the algorithm of cost function learning is thus model based.

For the joint cost function (40), we, respectively, denote its gradient and Hessian as

With the defined matrices at hand, we now make the assumptions for deriving main results. Assumption 1: We assume that 1) the derivative ∂ ∂u ∂x ∂u is relatively small such that it can be ignored, i.e., ∂ ∂u ∂x ∂u ≈ O, 2) the integration:

We present the approximation of log likelihood of joint action policy (45) via local optimization in the following theorem, whose proof appears in Appendix C.

Theorem 3: Under the Assumption 1-2), the log likelihood of the conditional probability distribution (45) , i.e., log p (u |θ , α,x (k)), is approximated as

Remark 6: Following the same proof path, the approximation of the log likelihood of conditional probability distribution was first presented in [13] as

which is derived under an implicit assumption that the decision variables are unbounded, such that the Gaussian integral, i.e., ∞ −∞ e −u 2 du = √ π, can be leveraged. However, the approximation via Gaussian integral cannot be applied to the social problem as studied in this paper, since the range of decision variables are constrained into the bounded set [−1, 1].

Moving forward is the computation of H and h for solving the log likelihood approximation (47) . Under Assumption 1-1) (also the assumption of [13] ), we obtain from (40) and (46) 

Remark 7: Under the Assumption 1-1), the Hessian matrix H in [13] is derived as

which is due to the assumption of Markov Decision Process imposed on system model.

With the H and h at hand, the preference parameter vectors α and θ can be obtained through solving the following nonlinear constraint optimization problem.

where L(θ, α) is given in (47) .

In this section, as an example, we first present a library of basis functions that can cover targets' cost functions. We than leverage the proposed M 3 IRL algorithm to infer the associated preference parameter vectors θ and α of targets' cost functions. The considered basis functions are

Remark 8: If minimization, the basis functions (53)-(55) indicates that the objectives of steering the public opinions to +1, −1 and 0, respectively. The basis functions (56) imply a behavioral motivation of stubbornness. The implicit motivation representations for decision making can be inferred from the conjunctive θ and α.

The (47) and (52) indicate that the inference of preference parameters needs the computations of h and H. Furthermore, the relations (48) 

and ∂ 2 ri(u,x) ∂x,∂u , which are carried out in Appendix D.

We collected nine Twitter users' tweents from January 2021 to September 2021 to validate the effectiveness of the proposed information diffusion model and the M 3 IRL algorithm. The network structure of nine users is shown in Figure 4 , where the nodes 8 and 9 are identified as information sources. It follows from the Figure 4 that H = {1, 2, . . . , 7} and T = {8, 7}.

The collected tweets are centered around the topic "COVID-19 Vaccine" and the tweets are sampled biweekly. The tweets are encoded as numerical values in 

The considered decaying-influence model of memory is a simplified version of base level function in the ACT-R 1 BERT Twitter Sentiment Analysis: https://github.com/OthSay/bert-tweetsanalysis 2 Sentiment140 dataset: https://www.kaggle.com/kazanova/sentiment140 declarative memory model [20] , [21] :

where τ i is individual v i 's memory horizon and d i is the fitting parameter from real data. To simplify the model fitting, we ignore humans' innate opinions in model and let x 1 = x 2 = . . . = x 7 = x and d 1 = d 2 = . . . = d 7 = d. According to the Theorems 1 and 2, the social influence models that aim to capture the asymmetric confirmation bias and novelty bias are chosen as

where α i > 0 is the fitting parameter. Under the settings, according to the dynamics (1) and the relation (3), the model (33), without consideration of innate opinions, is rewritten as

where i ∈ H. To fit the model from real data, the considered loss function is e = 18 k=1

x (k) −x (k) 2 2 , wherex (k) denotes the real data of opinion at time k (givenx (1) = x (1)).

We let τ i = 2, ∀i ∈ H, which means that the fitted model assumes all of the humans' memory horizons are 1 month due to the biweekly sampling rate. The fitted model parameters are summarized as • x = −1 and d = 6.01. • α 1 = 1.8, α 2 = 2.2, α 3 = 1.4, α 4 = 2.2, α 5 = 0.2, α 6 = 1 and α 7 = 2.2. Given the fitted parameters, the fitting cures of two humans' opinion trajectories are shown in Figure 5 .

We use the most recent three data (sampling over 1.5 months: mid-August to end-September) to learn the cost functions. With the encoded data and fitted model parameters, we obtain from Appendix D that u ⊤ h = 0.3662 (α 1 θ 11 + α 2 θ 21 ) − 0.4609 (α 1 θ 12 + α 2 θ 22 ) − 0.0473 (α 1 θ 13 + α 2 θ 23 ) − 0.4166 (α 1 θ 14 + α 2 θ 24 ) , u ⊤ Hu = 0.027(α 1 θ 11 + α 1 θ 12 + α 1 θ 13 + α 2 θ 21 + α 2 θ 22 + α 2 θ 23 ) + 0.7351 (α 1 θ 14 + α 2 θ 24 ) ,

The preference coefficients are obtained through solving (52) via the constraint optimization toolbox 'fmincon' of MATLAB: With the basis functions (53)-(56), the learned cost functions of two information sources are

Through observing the learned coefficient α 8 and α 9 and cost functions (58) and (59), we infer:

• The information source IS8 aims at manipulating the opinions of public to be against the COVID-19 Vaccine (i.e, x(k) ∈ [−1, 0) 7 ), which is indicated by the first two terms in the right-hand of (58). This inference can be demonstrated by IS8's evolving opinions in Figure 6 . • The information source IS9 aims at leading the opinions of public to support the COVID-19 Vaccine (i.e, x(k) ∈ (0, 1] 7 ), which is indicated by the first term in the righthand of (59). This inference can be demonstrated by SI9's evolving opinions in Figure 6 . • Both of information sources IS8 and IS9 prefer to spread outlying opinions or dislike spreading persistent opinions, which is implied by the third and second terms in the right-hands of (58) and (59), respectively. The inference can be partially demonstrated by the sharp jumping opinions in the trajectories of IS8 and IS9 in Figure 6 . • The α 9 = 0.7960 > α 8 = 0.2040 implies that information source IS9 has bigger influence than the information source IS8 on the evolving opinions of observed social community. The inference can be partially demonstrated by the observation of Figure 6 that more evolving opinions are in the supporting range (0, 1] than the supposing range [−1, 0), and IS9 supports the COVID-19 Vaccine. The observations and inferences demonstrate the effectiveness of the proposed M 3 IRL algorithm.

In this paper, we have proposed the social informationdiffusion model which explicitly takes human memory, asymmetric confirmation bias and asymmetric novelty bias into account. Based on the proposed model, we have proposed the M 3 IRL algorithm to learn the cost functions of target individuals. Real data validations suggest the effectiveness of the derived M 3 IRL algorithm and the proposed public opinion evolution model.

In the future, we will investigate the generalization of the cost function learning framework for the large-scale social networks with the incorporation of communication detection and classification.

APPENDIX A: PROOF OF THEOREM 1 Sufficient Condition: Without loss of generality, we let x a > 0. With the consideration of (24), the condition (27) is equivalent to

which, in conjunction with (23) and (24) , leads to the behavior (8) .

We now consider the condition

which is the union of conditions in (25) and (26) . If x g > 0, the (60) implies that x a − x g = x g − x b > 0 and x a > x b , which follows from (24) and (25) 

. We then can obtain from (23) and (21) that

If x g < 0, with the consideration of (26), following the same steps to derive (61), we have

We note that the union of (61) and (62) is equivalent to

Meanwhile, it is straightforward to observe from (9) and (10) that the union of them is also equivalent to (63). We thus conclude that the conjunctive conditions (23)- (26) lead to the behavior (9) and (10) . Without loss of generality, we let x g ≥ x a ≥ 0. It follows from (24) (23) and (21), we have c( (23) and (21), we then have c(x g , x a ) > c(x g , x b ). We thus conclude that the conjunctive conditions (23) and (24) imply

In the case of 0 > x g ≥ x a , following the same steps to derive (64), we have

The results (64) and (65) indicate that the conjunctive conditions (23) and (24) result in the behavior (11) .

Let us consider the condition

If x g > 0, the condition (66) implies thatx b < 0 and x a > 0. Without loss of generality, we let x g < x a − x g . Then, in the light of (61) and (64), we, respectively, obtain

which are due to the facts: (67) and (68) indicate that there exists a x b < 0 such that |x g − x a | > |x g − x b | and c(x g , x b ) > c(x g , x a ). We thus can summarize there exists an

Also, considering (66), if x g < 0, according the same logic to derive (69), we can conclude that there exists an x b such that

Let us denote ζ(x g , x a , x b ) = |x g −x b | |x g −xa| < 1, by which it is straightforward to verify that the union of (69) and (70) is equivalent to (12) . We here can conclude that the conjunctive conditions (23)-(26) result in the behavior (12) .

The necessary condition is proved via contradiction, i.e, assuming one of the conditions (23)- (26) does not hold, then proving the influence model c(x i , x j ) cannot capture the behavior (9)-(12) simultaneously.

We assume that (23) does not hold, i.e.,

As a consequence, we obtain from (21) that

which contradicts with (11) . We now consider the case that g g (f g (x g ) − f g (x a )) is non-decreasing w.r.t. |f g (x g ) − f g (x a )| and f g (c) is non-increasing w.r.t. c. Let us set 0 < x b < x a < x g . We thus have The Lagrangian of the constraint optimal problem (44) is given as L(p(u|θ, α,x(k))) = u∈U l|T| p(u|θ, α,x(k)) ln p(u |θ, α,x(k)) du

p( u| θ, α,x(k)) c q (x, u) du , whose derivative w.r.t p (u |θ, α,x(k) ) is obtained as dL(p (u |θ, α,x(k))) dp (u |θ, α,x(k)) (72)

Following the KKT optimality condition, i.e., ∂L(p(u|θ,α,x(k))) ∂p(u|θ,α,x(k)) ≡ 0 for any p (u |θ, α,x(k)), we obtain from (72) that ln p(u |θ, α,x(k)) + 1 − λ − i∈T α i p q=1 θ iq c q (x, u) ≡ 0, which is equivalent to p(u |θ, α,x(k)) = e λ−1 e r(u) ,

via considering the relations (39a) and (40) . Inserting (73) into (44b) leads to the relation ũ∈U l|T| e r(ũ) dũ = e 1−λ , substituting which into the relation (73) yields (45) .

APPENDIX C: PROOF OF THEOREM 3

Considering (46), the second order Taylor expansion of r(ũ) around u is r (ũ) ≈ r (u) + (ũ − u) ⊤ ∂r ∂u

We now can obtain 

where (75) from its previous step is obtained via considering Assumption 1-2). Substituting (74) with (76) into (45) yields p (u |θ , α,x (k)) ≈ e r(u)

ũ∈U l|T| e r(u)+(ũ−u) ⊤ h+ 1 2 (ũ−u) ⊤ H(ũ−u) dũ

by which we obtain L (θ, α) ≈ log p (u |θ , α,x(k)), where L (θ, α) is given in (47) .

APPENDIX D: COMPUTATIONS 1) Computations of ∂ri(u,x) ∂x and ∂ri(u,x) ∂u : According to the dynamics (33), we define 

where we define:

with I i−3 ∈ R |T| and I i−3−|T| ∈ R |T| denoting indicators defined in the Section II-A, and

With (77)-(83) at hand, the ∂ri(u,x) ∂x and ∂ri(u,x) ∂u are computed as follows, which are in light of (39) . where

Leveraging (84) and (87), ∂ 2 ri(u,x) ∂x 2 is finally computed as 

3) ∂ 2 ri(u,x) ∂u 2

Computation: According to (82) with (83), we define u (i,t,m) ∂u (i,t) ∂u (m) = O, i ∈ {1, 2, 3} Λ(i−3, z (i,t,m) ), otherwise

where Λ(·) is defined in the Section II-A, and z (i,t,m) 2z (i,t,m) , m ∈ {t − 1, t, t + 1} 0, otherwise

withz (i,t,t−1) where B (t,m,q) ∂ 2x (t + 1) ∂u (m) ∂u (q) .

In light of (85) and (93), ∂ 2 ri(u,x) ∂u 2

is computed as 

= v (i,t,m) + k+l−1 q=t (A ⊤ (q,t) v (i,q+1,m) + A ⊤ (q,t,m) s (i,q+1) ),

where A (q,t,m) ∂ 2x (q + 1) ∂x (t) ∂u (m) .

In light of (84) and (98), ∂ 2 ri(x,u) ∂x∂u is computed as

Social media use in the united states: implications for health communication

Addressing health misinformation with health literacy strategies

The emergence of covid-19 in the us: a public health and political communication crisis

Public health and online misinformation: Challenges and recommendations

Whatsapp vigilantes: An exploration of citizen reception and circulation of whatsapp misinformation linked to mob violence in india

Temporal trends in anti-vaccine discourse on twitter

Covid-19-related infodemic and its impact on public health: A global social media analysis

Impact of confirmation bias on competitive information spread in social networks

Optimal investment strategies for competing camps in a social network: A broad framework

When is a linear control system optimal

Nonparametric bayesian inverse reinforcement learning for multiple reward functions

Continuous inverse optimal control with locally optimal examples

Relative entropy inverse reinforcement learning

Maximum entropy inverse reinforcement learning

Modeling interaction via the principle of maximum causal entropy

Inverse optimal control for multiphase cost functions

Apprenticeship learning via inverse reinforcement learning

Social behavior for autonomous vehicles

Attitudinal polarization on social networks: A cognitive architecture perspective

Comparing vector-based and bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on twitter and stack overflow

Who shall survive?: A new approach to the problem of human interrelations

A formal theory of social power

Reaching a consensus

Mathematical models of the distribution of attitudes under controversy

Social influence and opinions

Debiasing social wisdom

The paradox of information access: On modeling social-media-induced polarization

Social-Behavioral Modeling for Complex Systems

A semi-supervised activelearning truth estimator for social networks

The science of fake news

Polarization and fake news: Early warning of potential misinformation targets

The spreading of misinformation online

Echo chambers on social media: A comparative analysis

The paradox of information access: Growing isolation in the age of sharing

Opinion dynamics and bounded confidence models, analysis, and simulation

Modeling confirmation bias and polarization

Clustering and asymptotic behavior in opinion formation

Heterophilious dynamics enhances consensus

Social system inference from noisy observations

On inference of network topology and confirmation bias in cyber-social networks

Asymmetry in the estimation of interpersonal distance and identity affirmation

Features of similarity

The asymmetry bias in me, we-others distance ratings. the role of social stereotypes

Confirmation bias

Biased assimilation, homophily, and the dynamics of polarization

A model of attentiveness to outlying news