key: cord-0111489-o9ayevzx authors: Franceschi, Jonathan; Pareschi, Lorenzo; Zanella, Mattia title: From agent-based models to the macroscopic description of fake-news spread: the role of competence in data-driven applications date: 2022-02-22 journal: nan DOI: nan sha: dadb2114d130e75b76aad51ae797f0adc120d20c doc_id: 111489 cord_uid: o9ayevzx Fake news spreading, with the aim of manipulating individuals' perceptions of facts, is now recognized as a major problem in many democratic societies. Yet, to date, little has been understood about how fake news spreads on social networks, what the influence of the education level of individuals is, when fake news is effective in influencing public opinion, and what interventions might be successful in mitigating their effect. In this paper, starting from the recently introduced kinetic multi-agent model with competence by the first two authors, we propose to derive reduced-order models through the notion of social closure in the mean-field approximation that has its roots in the classical hydrodynamic closure of kinetic theory. This approach allows to obtain simplified models in which the competence and learning of the agents maintain their role in the dynamics and, at the same time, the structure of such models is more suitable to be interfaced with data-driven applications. Examples of different Twitter-based test cases are described and discussed. Since the 2016 U.S. presidential election, and more recently the COVID-19 infodemic, fake news on social networks, intended to manipulate users' perceptions of events, has been recognized as a fundamental problem in open societies. As fake news proliferate, disinformation threatens democracy and efficient governance. In particular, there is empirical evidence that fake news spreads significantly "faster, deeper, and more widely" than real news [37] . In the same study, it is also highlighted that the phenomenon is not due to robotic automatisms of news dissemination but to the actions of human beings sharing the news without the ability to identify misinformation. It is therefore of fundamental importance the construction of mathematical models capable of describing such scenarios and with a structure simple enough to be interfaced with data available, for example from social networks, but still embedding the specific features related to the ability of individuals in detecting the piece of false information. In recent years, compartmental models inspired by epidemiology have been used fruitfully to study spreading phenomena of rumors and hoaxes. For instance, following the pioneering work of Daley & Kendall [11] , in [23] SIR-type models are used in conjunction with dynamical trust rates that account for the different spreading rates in a network. Those traditional models were elaborated in [7] , where the authors consider also the impact of online groups in feeding the rumor growth once it has started. Alongside these approaches there are more data-driven works. In this field, Twitter has been gaining consensus as a powerful source of useful and structured information. A recent example in this direction can be found in [27] , that focuses on fake news dissemination on the platform using a two-phase model, where fake news initially spread as novel news story and after a correction time they are paired with a competitive narrative which describes the news as fake in the first place. Twitter data in conjunction with epidemiological models have already been used to study the spread of rumors and fake news by several authors [16, 17, 26, 10] , where SIS and SEIZ compartmental models were employed to fit the data of the evolution of different news. Mounting experimental evidence highlights the strong link between digital media literacy and possibility to reliably identify the quality of online information. This connection has been early identified by communication scientists [20] and later confirmed by experimental studies, see e.g. [22, 24] . In [19] , starting from an agent-based model for the dissemination of fake news in presence of competence, using the tools of kinetic theory, in the limit of a large number of agents, novel mathematical models were proposed and discussed. Previously, kinetic models that include the role of competence or knowledge had been proposed in [5, 29, 31] . The behavior of a social system composed by a large number of interacting agents has been studied in the case of opinion formation [3, 9, 14, 15, 34] and more recently epidemiological dynamics [1, 2, 12] . We refer to [28] for an introduction to the subject. The compartmental structure of the model for fake-news spreading in presence of competence introduced in [19] is composed by four groups of individuals: the susceptible (S) agents-defined as the ones who are unaware of the fake news; the exposed (E) agents-those who know the news but still have not decide whether to spread it or not; the infectious (I) agents-who actively divulge and finally the skeptical or removed (R) agents-those who are aware of the news but choose to not spread it. On a population divided among such categories, there is also a social structure based on an additional time evolving variable that measures the competence level of the agents. Although the model has shown the capacity to correctly describe the role of competence in the dynamics of fake-news, its mathematical structure based on kinetic partial differential equations is generally too complex to be interfaced with the available data. In an attempt to address this problem, in the present work by exploiting the knowledge of the equilibrium states of the corresponding mean-field model we derived reduced order macroscopic models based on ordinary differential equations in which, however, the role of competence continues to be present. The new social models, thanks to their simpler structure, are more suitable for datadriven applications. We emphasize that the methodology here adopted is quite general and that in principle points the way to introducing additional social characteristics of individuals into tractable mathematical models in terms of structural complexity. The rest of the manuscript is organized according to the following sections. In Section 2, we recall the basic concepts of the kinetic model for describing the spread of fake-news in the presence of competence. Next, in Section 3, using the local equilibrium states of the competence we derive reduced order models that depend on the specific shape of the interaction function. Section 4 is devoted to presenting a series of numerical experiments in which we first validate the model and then consider data-driven applications based on Twitter. In the last section, a series of final considerations are reported. In this section we present a model for the description of the spreading of fake news in a society characterized by a heterogeneous competence of agents. Our starting point is the compartmental kinetic approach recently proposed in [19] . We suppose that the system of agents can be divided in the following epidemiologically relevant states: susceptible (S) agents are the ones that are unaware of fake news, we further denote as exposed (E) the agents that encountered the fake news but have still to spread them, infectious (I) agents are the real spreader and, finally, the removed (R) agents are not actively engaged in the spread of misinformation. In the following we indicate with C = {S, E, I, R} the set of epidemiological compartments. Aiming to incorporate the effects of personal competence on the fake news dynamics, we stick to a simple mathematical setting where the state of the individuals in each compartment, at any time t ≥ 0, is characterized by the sole competence level x ∈ R + . Hence, we denote by the distribution of competence at time t ≥ 0 of susceptible, exposed, infectious and removed individuals, respectively. We neglect natality and mortality dynamics since we can consider a short time dynamic where nobody enters or leaves it during the spreading of the fake news. This assumption can be justified based on the average lifespan of fake news. Therefore, we can fix the total distribution of competence of a society to be a probability density for all t ≥ 0 Consequently, the quantities denote the fractions of the population that are susceptible, exposed, infected, or recovered respectively at time t ≥ 0. We also denote with m p J (t) the moment of the distribution f J (x, t), J ∈ C, of order p ≥ 0 Unambiguously we will indicate with m J (t), J ∈ C, the mean values corresponding to p = 1. Drawing inspiration from seminal models for multi-agent systems in presence of personal competence [29, 31] we introduce a binary interaction term expressing two different processes: i ) learning processes by less competent agents that can learn from the more competent ones ii ) the competence evolution depends by a social background in which individuals grow. The dynamics described at point i ) can be easily sketched by the following process: if two agents belonging to compartment H, J ∈ C and characterized by competence levels x, x * ∈ R + meet, their post-interaction competence is given by where λ H (·), H ∈ C, quantify the amount of competence lost by individuals of compartment H by the natural process of forgetfulness and the parameter λ CH , H ∈ C, models the competence gained through the interaction with members of the class J, with J ∈ C. A possible choice for is the characteristic function andx ∈ X a minimum level of competence required to the agents for increasing their own skills by interactions. The parameter ε describes the intensity of the interactions. In (1) η HJ and η JH are centered iid random variable such that, denoting by · their expectation, we have η 2 HJ = η 2 JH = σ 2 HJ . The dynamics defined by point (ii) is instead defined by a pure drift process acting at the kinetic level and depending on the epidemiological compartment of interest. Remark 1. It is reasonable to assume that both the processes of gain and loss of competence from the interaction with other agents in (1) are bounded by zero. Therefore we suppose that if J, H ∈ {S, E, I, R}, and if λ J ∈ [λ − J , λ + J ], with λ − J > 0 and λ + J < 1, and λ CJ (x) ∈ [0, 1] then η HJ may, for example, be uniformly distributed in The combination of the two aforementioned mechanisms together with the spreading of the fake news is described by the following kinetic model: In (2) the functional Parameter Definition β contact rate between susceptible and infected individuals 1/δ average decision time on whether or not to spread fake news η probability of deciding not to spread fake news 1/γ average duration of a fake news α probability of remembering fake news Table 1 : Parameters definition in the SEIR model (2) . is the local incidence rate and κ(x, x * ) is a nonnegative contact function measuring the impact of competence in the spreading of fake news. This function is decreasing with respect to the competences x, x * ≥ 0 of the population of susceptible and infected agents. In the following we will investigate the macroscopic effects of the following two choices of κ(x, x * ) The two functions are both decreasing but have strong differences for x, x * 1. Indeed, since A) is not limited for small competences it enforces the spreading of fake news among less competent agents compared with (B). Indeed, the function in B ) is bounded in R + . We further remark that individuals have the highest rates of contact with people belonging to the same social class, and thus with a similar level of competence. Furthermore, the operators Q HJ (f H , f J )(x, t), J ∈ C determine the thermalization of the distribution of competence characterizing the Jth compartment. It is worth to observe that the evolution of mass fractions J(t) obeys the classical SEIR model with reinfection by choosing Q HJ ≡ 0 and κ(x, x * ) = β > 0. This would correspond in considering the spreading a fake news independent of the competence level of a system of agents. In more details, we will consider the operators Q HJ as integral operators that modify the competence distribution through repeated interactions of type (1) among individuals. We can fruitfully define the introduced operators in weak form as follows where ϕ(·) is a test function and where the brackets · indicate the expectation with respect to the random variables η HJ ,η HJ . In the model (2) the function γ(x) > 0 determines the duration of the fake news and can be strongly influenced by the competence level of the spreader. Furthermore, the function δ(x) > 0 is related to the average time that an agent eventually spend before the diffusion of a fake news such that people with high competence invest more time in checking information reliability, and η(x) ∈ [0, 1] characterizes individuals' decision to spread fake news. The function α(x) ∈ [0, 1] describes the probability to remember fake news and can be thought less influenced by the competence variable. In Table 1 we summarize all the introduced parameters. We focus now on the learning dynamics introduced in model (2) whose evolution is given by the nonlinear operators Q HJ (f H , f J ), H, J ∈ C, defined in (4). We concentrate in particular on the analysis of asymptotic states of the learning dynamics undergoing elementary interactions (1). We are therefore interested in the asymptotic distribution of the Boltzmann-type model It is easily observed that if ϕ(x) = 1 the mass is conserved in (5) corresponding to the conservation of the total number of agents. If ϕ(x) = x in (5) we obtain the evolution of the average competence in each compartment that is not conserved in time and the total competence is conserved Since the steady state solution of (5) is difficult to obtain, we can formally derive a simplified Fokker-Planck model in which the study of the asymptotic properties is much easier. To this end, we introduce the following quasi-invariant scaling of the relevant parameter of the binary scheme (1) given by with τ > 0. It is worth to mention that the introduced scaling is inspired by the so-called grazing collision limit of the Boltzmann equation, see [6, 36] . In the context of multi-agent systems this scaling has been introduced in [8, 33] . In the introduced regime of parameters the interactions become quasi-invariant, in the sense that the post-interaction competences (x , x * ) are such that x − x and x * − x * are small for τ 1. Hence, assuming ϕ ∈ C 0 , we can perform the following Taylor expansion where we exploited the fact that η HJ = 0 and we have defined the sum of reminder terms Hence, assuming that the third order moment of η HJ is bounded, thanks to the smoothness of ϕ we have that for for each x ∈ R + and for all J ∈ C. Therefore in the new time scale, for τ → 0 + and under the quasi-invariant scaling (6), we can show that the solution of model (5) converges to Integrating back by parts we have obtained with J∈C σ 2 HJ = σ 2 , coupled with the following boundary conditions is a conserved quantity as we already observed. Hence, we obtain that the large time distribution is an inverse Gamma In view of S ∞ + R ∞ = 1 we conclude that under the introduced assumptions 3 Reduced order models for fake news spread with competence Once we have characterized the equilibrium distribution of the transition operators Q HJ (·, ·), with H, J ∈ C, we can study the complete system (2) . The aim of this section is the definition of observable macroscopic equations of the introduced kinetic model. Integrating both sides of (2) with respect to x ∈ R + and recalling that the introduced operators are mass and momentum preserving, we obtain the following system for the evolution of the mass fractions J(t), J ∈ C whereas for the momentum we get We can observe that the obtained system is not closed since the evolution of mass fractions J(t) and of the momentum depend on the evolution of the distribution functions f J (x, t). The closure of the obtained system can be obtained by formally resorting to a limit procedure. Indeed, assuming that the time scale involved in the process of competence formation is ε 1, we have a fast learning process of the system of agents with respect to the evolution of the spreading of fake news. Therefore, for ε 1 the distribution function f J (x, t) reaches fast the inverse Gamma equilibrium with mass fractions J(t) and local mean values m J (t). In the following we obtain two different set of macroscopic equations in relation with the considered contact rate function κ(x, x * ). We consider the case (A) introduced in Section 2.2 corresponding to a strong competence-based contact function defined by κ(x, where Therefore, in the limit ε → 0 + we can plug f ∞ J (x) in (12) which becomes thanks to the properties of the inverse Gamma distribution, leading to Next, looking at (10), recalling that under the hypothesis that λ J = λ CJ for J ∈ C, the knowledge exchange operator also preserves momentum, we have the following system of equations which, using the fact that that is, we obtained a closed system of eight ordinary differential equations (14) , (16). If, instead, we consider the case B ) of Section 2.2, corresponding to the weak competence-based contact function defined by κ(x, y) = e −x e −y , it is possible to writẽ As discussed in Section 3.1, in the limit ε → 0 + we may plug the asymptotic distribution f ∞ J of the Fokker-Planck model (8) in (17) to obtaiñ where K a (x) stands for the modified Bessel function of the second kind of order a evaluated at x. Hence, if we consider system (9) under the assumption of weak competence-based contact function we obtain which becomes The next equation will help to close the system which is a straightforward consequence of the following property of the modified Bessel functions of the second kind Again under the assumptions that λ J = λ CJ = λ for J ∈ C, integrating with respect to x equation (10), with the aid of equation (20), we get which, using again the fact that In this section we numerically validate the modeling framework proposed in (2) with local incidence rate 3 in the settings A)-(B )). We stress that those form of contact functions generate different macroscopic models that have been defined in (14), (16) and (19), (22) , respectively, for ε 1. Once established the consistency of the approach, we proceed by exploiting the macroscopic sets of equations for calibration purposes based on a freely available repository for the spreading of hashtags linked to known fake news. The proposed data-oriented approach is fundamental to experimentally observe the different impact of the contact function in identifying impact of competence in the fake news dynamics. From the numerical point of view we will exploit an implicit structure preserving method for the Fokker-Planck operator (8) based on the schemes presented in [32] . The advantage of these methods relies on an arbitrarily accurate description of the steady state distribution of the Fokker-Planck model of interest. Similar approaches have been investigated in a different context also in [12, 13] . In this first test we compare the evolution of mass fractions J(t) and means m J (t), J ∈ C, obtained from direct integration of f J (x, t), solution to (2) , with respect to the competence x ∈ R + , with the macroscopic models (14) , (16) and (19)-(22) for several regimes of ε > 0. We start by outlining the procedure by which we solve the system of kinetic equations (2) with Fokker-Planck interaction operators. Since ε > 0 is assumed to be small, we adopt a time splitting procedure. In particular, upon introducing a time discretization t n = n∆t, ∆t > 0 constant, we proceed as follows. I. Fokker-Planck solver. At time t = t n , we determine the distributions f H (x, t) for all , where Q(f H ) is the Fokker-Planck operator defined in Section 2.3 whose form, in the hypothesis λ H = λ CJ = λ, is given by H (x, t) ) . In this step we take advantage of an implicit structure preserving (SP) scheme for Fokker-Planck equations [32] and describes with arbitrary accuracy the steady state of the model. In Figure 1 we report for several ε = 1, 10 −1 , We may observe that the scheme is capable to approximate the inverse Gamma analytical equilibrium f ∞ (x). We also report the evolution of the L 2 numerical error computed as f (x, t) − f ∞ (x) L 2 in the time frame [0, 2] from which we can observe how for sufficiently small values of ε we correctly approximate the given equilibrium distribution. II. Advection-Reaction step. Hence, we consider the distribution obtained in the interaction step as an input for the advection-reaction dynamics for t ∈ [t n+1/2 , t n+1 ] , , In particular, we adopted a second order Lax-Wendroff scheme coupled with an explicit time integration. In the test of this subsection, unless otherwise specified, we prescribe as initial datum the distribution where a 1 = 2(a 2 − 1) and a 2 = 1.25 with initial mass fractions We consider the choice of parameters m = H∈C H(0)m H (0), λ = 0.25 and σ = 0.01 for (4.1). The fake news dynamics is regulated by the following choice of parameters α = 0.9, β = 20, γ = 0.2, and δ = 0.05. For contact rates A)-B ) we compared the evolution of mass fractions and mean values obtained from the integration of (2) with the ones derived in Section 3. We consider the time interval [0, T ], T = 12, a uniform time discretization with ∆t = 10 −4 and ε = 1, 10 −4 . In particular, Figure 2 refers to the case κ(x, x * ) = β/(x x * ) and Figure 3 to the case κ(x, x * ) = βe −x−x * . In both cases we may observe that for small values of ε the obtained macroscopic models are accurate in describing the trends of observable quantities of the kinetic field model. The macroscopic systems of coupled ODEs has been solved through a RK4 numerical scheme with ∆t = 10 −4 . In this test we focus on the spreading of the fake news by considering available Twitter data from the repository TweetSets 1 . In details, we analyzed the evolution from March to November, 2020 of the hashtag #facemask related to the COVID-19 pandemic, and of the hashtags #florence#fakenews both associated to the hurricane Florence of September 2018 that caused catastrophic damages in USA, particularly in the states of North Carolina and South Carolina. In the following we will assume that the competence variable is strongly related to the education level of a country. The data for the initial distribution of education has been extrapolated by the available Italian data from 2011 ISTAT census, and has been considered as representative data of a prototypical Western country [21] . As underlined in [21] the cumulative distribution of education exhibits a power-law type of tail. For this reason, as an approximation of the competence distribution we considered an inverse Gamma of the form with c 1 , c 2 > 0 obtained by data fitting. More precisely, we measure the education level on the scale [0, 6] where 6 represents the education of people with a PhD (see Figure 4 ). Once we have obtained the initial competence distribution together with the value of m B we can estimate the parameters of the models defined in (14)- (16) and (19)- (22) . Several approaches have been proposed in the literature, see e.g., [25] . It is worth to mention that several uncertainties are present in data linked to news-monitoring. For example the total population size is generally unknown and the total number of Twitter accounts represent an upper bound over the real active users. The approach adopted in [16] , and subsequently in [17, 26] , is to treat this quantity as a parameter to be determined in the minimization process along with the parameters of the models. To reduce the number of parameters to optimize we follow a different path. In particular, as initial guess on the total population size, since the datasets that we used for the fitting were based on U.S. hashtags, we considered that each fake-news spreader has in average 453 followers 2 . Hence, in average we may expect that the total number of susceptible is given by the total number of tweets multiplied by the average number of followers. To take also into account both the number of bots on Twitter as found in [35] (and references therein) and users whose activity could be not assiduous enough to matter during the lifespan of the considered fake news, the initial guess was also reduced by a factor of 4. Let us denote byÎ(t) the number of active spreaders obtained from the data, while I(t) is the number of infectious agents given by the macroscopic differential model. Hence, we consider the following cost functional where [t 0 , t f ] is the time-frame (in hours) during which we solve the minimization problem min α,β,γ,δ∈R+ Parameter #facemask #florence#fakenews Table 2 : Test 2A. Estimated parameters for the entire datasets for the hashtags #facemask (second and third column) and #florence#fakenews (fourth and fifth column). whereas η was kept fixed and equal to 0.5. Since data for the evolution of compartments S, E, R are not at our disposal, as well is not the initial means value for any of the compartments, we solved the ODE model on [t , t 0 ], where t 0 is the starting point of the spreading process and t is a suitable unknown time previous to t 0 starting from single exposed, infectious and recovered individuals. The idea is to simulate an initial situation for the spread of fake news to happen. Furthermore, we considered initial mean values equal to the half of the mean background distribution of competence, i.e. m J (0) = 1.5. In Figure 5 we compare the evolution on the number of tweets regarding the hashtag #facemask, from 3rd March 2020 to 22 November 2020, and the hashtag #florence#fakenews, from 11th September 2018 to 4th October 2018, with the evolution of the model (14) , (16) . The obtained parameters are reported in Table 2 . In both cases, we may observe that the evolution of the mean competence levels are different in the four compartments and, in particular, that low competence levels are associated to exposed and infectious agents, i.e., the active spreaders. The outcome reflects the intuitive idea that the disinformation could be driven by the lack of capability to recognize an information as purposely false in the first place. To better take into account the impact of a competence-based contact rate function κ(x, x * ), we also computed the associated basic reproduction number R t using the parameters β and γ estimated previously for both datasets, reported in Table 2 . Following [4, 19] , and omitting the details for brevity, we consider a generalized version of the classical reproduction number defined as where again we leveraged the structure preserving scheme proposed in [32] to perform the calculations. To analyze the impact of uncertainties in data and parameters we consider a 3D random variable z = (z 1 , z 2 , z 3 ) with distribution ρ(z). We will suppose that the random vector z has independent components, i.e. ρ(z) = ρ 1 (z 1 )ρ 2 (z 2 )ρ 3 (z 3 ). Taking into account parametric uncertainties, we consider the estimated model parameters as follows where we supposed z 1 , z 2 , z 3 ∼ U([−1, 1]) and c β , c γ , c δ > 0. As a result, the macroscopic quantities describing the evolution of compartments result affected by the introduced uncertainties increasing their dimensionality J(z, t), m J (z, t), J ∈ C. In order to handle efficiently the introduced uncertainties in the dynamics we adopt a stochastic collocation approach based on stochastic Galerkin methods, we refer the interested reader to [38] for an introduction and to [2, 39] for applications in compartmental modelling of epidemic dynamics. This class of methods allows to accurately quantify the propagation stochasticity in a parametric differential model when information on the uncertainties' distribution are available. We remark that fast convergence properties hold under suitable regularity assumptions on the problem's solution. In details, we construct a 3D sample Figure 6 : Test 2A. Evolution of R t in the first 24 hours of datasets #facemask (left) and #florence#fakenews (right) for the parameters estimated in Table 2 relative to the introduced contact functions κ(x, x * ). {z i,k } M k=0 , i = 1, 2, 3, obtained in a collocation setting through Gauss-Legendre polynomials with M = 5 nodes. In Figure 7 we display the dynamics of the considered fake-news with respect to available data. In details, for #florence#fakenews we consider the period from 11th September 2018 to 21st September 2018. We consider two successive prediction horizons respectively of 1 day, i.e. the parameters of the models are calibrated taking into account data until September 20th, and a 2 days prediction horizon, where the calibration is based only on data until 19th September. Regarding #facemask we considered the period from 3rd March to 19th May. Also in this case we consider two successive prediction horizons of 1 week, i.e. the parameters of the models are calibrated taking into account data until 12th May, and a two weeks prediction horizon, where the calibration is based on data until 5th May. We highlight in dashed black and magenta the expected value of the predicted number of tweets E[I(z, t)] = 1 0 I(z, t)ρ 1 (z 1 )ρ 2 (z 2 )ρ 3 (z 3 )dz 1 dz 2 dz 3 . Together with the expected trends we plot the 95% confidence intervals (CI) with respect to the random parameters β(z 1 ), γ(z 2 ), and δ(z 3 ). The blue shaded band is relative to the variability in γ(z 2 ), the green shaded to the variability in δ(z 3 ) whereas the shaded red is relative to the variability in β(z 1 ). In this test we perform a retrospective analysis to study how the background could influence the dissemination of fake news as a result of a different learning process. We recall that the background modifies through a learning dynamic the effectiveness of the level of knowledge in identifying fake news. As a consequence high values of the background correspond to a high level of effectiveness of the competence while low values will make it difficult to identify the fake-news. Indirectly, the background acts as a control term which limits the spread of the misinformation. This can also be interpreted as a process of education specific to the identification of fake news that allows to limit the so-called knowledge neglect phenomenon [18] . We consider the two datasets for the hashtags #facemask and #florence#fakenews with the estimated parameters reported in Table 2 and we increase the value of the competence level attained by the background, i.e., m B , while keeping fixed the parameters during the dynamics defined by (14) , (16) and (19) , (22) . Hence, we performed the test with both choices of a strong and weak competence based contact function; the results are summarized in Figure 8 . In all cases, we see how increasing the competence x $ 5(x; x $ ) = -e !x e !x$ Figure 8 : Test 3. Total number of infectious agents for the hashtags #facemask (left) and #florence#fakenews (right) as a function of the competence background. In both cases, we employed the parameters reported in Table 2. of the background reduces the spread of fake news, leading to a decrease in the cumulative number of tweets of infectious agents proportional to the increase in the value of m B . Indeed, we can observe how increasing the competence of the background, we obtain an evident decrease in the overall misinformation for both the examples considered #facemask and #florence#fakenews. Despite the digital transformation of governments and the modernization of public administration, a global decline in democracy is occurring around the world. The spread of fake news created for the purpose of polarizing society in certain directions poses a risk to democratic institutions. The role of individuals' knowledge and the ability to use it in identifying false information is deemed of paramount importance. In this paper starting from a model for the description of fake-news dissemination in the presence of heterogeneous agents with different levels of competence, through the tools of kinetic theory, reduced-order models have been derived that allow to keep the effects of the of competence in the dynamics and that, thanks to their simplified structure, can be interfaced with data. The starting model is inspired, as in much of the literature related to fake-news, to the epidemiology, so it is based on a compartmental structure. The introduction of competence allows to analyze complex phenomena of great relevance in contemporary society, such as the effectiveness of control actions taken to limit the spread of fake news and the role of knowledge neglect in misinformation. The methodology adopted in this article is fully general and depends closely on the equilibrium state of the social variable and the social interaction function at the basis of fake-news spreading. As a consequence, additional social variables that play a key role in the spread of misinformation may be embedded in the dynamics using similar arguments. The ability to have a model that can be interfaced with the available data allowed us to present some preliminary examples of applications to the case of fake-news spreading on Twitter. The datasets generated during the current study is available from the corresponding author on reasonable request. The datasets analysed during the current study are freely available from the website https://doi.org/10.5281/zenodo.1289426. Kinetic modelling of epidemic dynamics: social contacts, control with uncertain data, and multiscale spatial dynamics Control with uncertain data of socially structured compartmental epidemic models Opinion dynamics over complex networks: kinetic modelling and numerical methods Hyperbolic compartmental models for epidemic spread on networks with uncertain data: Application to the emergence of Covid-19 in Italy Concentration effects in a kinetic model with wealth and knowledge exchanges The Boltzmann Equation and its Applications Rumor spreading dynamics with an online reservoir and its asymptotic stability On a kinetic model for a simple market economy Reducing complexity of multiagent systems with symmetry breaking: an application to opinion dynamics with polls Falling into the Echo Chamber: The Italian Vaccination Debate on Twitter Epidemics and rumors Kinetic models for epidemic dynamics with social heterogeneity Optimal control of epidemic spreading in presence of social heterogeneity Opinion dynamics: inhomogeneous Boltzmann-type equations modelling opinion leadership and political segregation On a kinetic opinion formation model for pre-election polling Epidemiological Modeling of News and Rumors on Twitter. SNAKDD '13: Proceedings of the 7 th Workshop on Social Network Mining and Analysis Misinformation Propagation in the Age of Twitter Knowledge does not protect against illusory truth Spreading of fake news, competence, and learning: kinetic modeling and numerical approximation Digital Literacy Pareto tails in socio-economic phenomena: a kinetic description A digital media literacy intervention increases discernment between mainstream and false news in the United States and India SIR Rumor spreading model with trust rate distribution Digital readiness gaps. Pew Research Center The reproductive number of COVID-19 is higher compared to SARS coronavirus Using an Epidemiological Model to Study the Spread of Misinformation during the Black Lives Matter Movement Modeling the spread of fake news on Twitter Interacting Multiagent Systems: Kinetic equations and Monte Carlo methods Wealth distribution and collective knowledge. A Boltzmann approach Mean-field control variate methods for kinetic equations with uncertainties and applications to socio-economic sciences Kinetic models of collective decision-making in the presence of equality bias Structure preserving schemes for nonlinear Fokker-Planck equations and applications Kinetic models of opinion formation Opinion modeling on social media and marketing aspects Bots and online hate during the COVID-19 pandemic: case studies in the United States and the Philippines On a new class of weak solutions to the spatially homogeneous Boltzmann and Landau equations The spread of true and false news online Numerical Methods for Stochastic Computations: A Spectral Method Approach A data-driven epidemic model with social structure for understanding the COVID-19 infection on a heavily affected Italian Province The authors declare no competing interests.