key: cord-0465342-vkxzqwh4 authors: Pinto, Sebasti'an; Trevisan, Marcos; Balenzuela, Pablo title: Reconstructing social sensitivity from evolution of content volume in Twitter date: 2021-12-22 journal: nan DOI: nan sha: 126a2c67b753c6fb2faf0156e5493babd613662b doc_id: 465342 cord_uid: vkxzqwh4 The consumption of news produces uneven social reactions. In most cases, people share information and discuss their opinions; public interest remains therefore bounded to the field of debate. A few cases, in contrast, fuel up the collective sensibility and give rise to social movements. To explain the dynamics that underlie the emergence of these reactive states, we set up a simple mathematical model for public interest in terms of media coverage and social interactions. We test the model on a series of events related to violence in the US during 2020. The volume of tweets and retweets is used as a proxy of public interest, and the volume of news as a proxy of media coverage. We show that the model succesfully fits the data and allows inferring a measure of social engagement that correlates with human mobility data. Our findings suggest that this low-dimensional model captures the basic ingredients that regulate social responses capable of ignite social mobilizations. The continuous expansion of the digital environment creates new and faster ways to exchange information and opinions [1] . At the same time, it also provides access to unprecedented amounts of data, allowing the quantitative investigation of the forces that underlie the diffusion of information [2] and the formation of public interest [3, 4] . Dynamical systems have been particularly successful in identifying collective mechanisms that give rise to public opinion [5, 6] . Using variables that describe the expansions and contractions of content volume, these models explain empirical data remarkably well [7] . In the domain of social media, the emergence of extreme opinions that arise from moderate initial conditions has been recently disclosed [8, 9] . But extreme social reactions appear also beyond the domain of opinions and debates. Normally, people react to the news by sharing information and discussing opinions. In a few occasions, however, and under heightened social sensitivity, a reactive state may emerge giving rise to street manifestations, protests and riots [10] . Although riots and uprisings have been extensively studied and modeled [11, 12] , their unfolding remains unclear. In this work we set up a deliberately simple model for public interest, modulated by the media coverage [13] and the social interactions within the system [14] [15] [16] . We capitalize on the paradigmatic model developed by Granovetter [17] based on the concept of critical mass, which represents the fraction of interested people needed to induce interest to the rest of the population. Is it possible to reveal the emergence of reactive states * spinto@df.uba.ar from content volume extracted from the digital media? We investigate this in connection with a series of highly sensitive events that took place in the US during 2020. The Black Lives Matter movement [18] encompasses events of different nature, as reflected by the large range of reactions in the social media ( Figure 1a ). Here we analyze a subset of the events well covered by media sources, as displayed in chronological order in Figure 1b . The time evolution of these events is shown in Figure 1c : black curves show the volume of tweets and retweets containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery and Andrés Guardado. Red filled curves correspond to the volume of tweets from the 29 most followed official media accounts containing the same keywords. To interpret these time traces, we derived a minimal model with variables that can be related to the collected data (see Methods). In our model, the public interest p is modulated by mass media coverage C and social interactions S. The model reads Media coverage and public interest are coupled variables. A closed model would require another equation for the evolution of media coverage modulated by the public interest. We do not explicitly model this, but instead we feed equation 1 with the experimental time traces of media coverage C(t). When the exposure to the media is maximum (e = 1), the public interest is modulated only by the media cover- age, with a time scale controlled by γ. In the general case, when e ∈ (0, 1), social interactions contribute to the formation of the public interest. These interactions depend on agents with different degree of involvement. Following the ideas of Granovetter, we assign a threshold τ to each agent, which represents the minimum amount of interested people needed to induce interest on the agent. In this framework, the fraction of reactive people can be computed as the cumulative distribution of thresholds S(p) = P (τ < p), which we call social engagement. Assuming a normal distribution of thresholds τ ∼ N (µ, σ), When µ is low, small groups can trigger the interest on the rest of the system. On the contrary, high values of µ require a bigger fraction of interested people to induce interest to rest of the population. We therefore identify the quantity 1 − µ as the social sensitivity of the population. Let us summarize the principal components of our model. On the one hand, we have two variables that quantify the volume of opinions and information shared by people: the public interest p and the media coverage C. On the other hand, we have the social sensitivity 1−µ and the social engagement S, two variables that describe direct interaction among people. Direct interactions are difficult to quantify. Equations 1 and 2 provide a mechanism to tackle this and infer the dynamics of the social sensitivity 1 − µ(t) and the social engagement S(t) from the data. To do so, we fit the model using the volume of twitted news as a proxy for the coverage C(t), and the volume of tweets and retweets as a proxy for the public interest p(t) (see Methods). The upper panels of Figure 2 show the best fitting curves for the public interest. The fitting parameters are summarized in Table I . We find that the exposure is rather stable across events, e = 0.38 ± 0.08. Although media coverage is important, this says that people is mainly exposed to the social environment, at least for this type of events. Different from exposure, the time scale γ increases when the events accumulate over time. The first four events (Arbery, Floyd, Brooks and Guardado) occured one immediately after the other (Figure 1b) , speeding up the dynamics of public interest along the sequence. After a pause of about two months, the same speeding up effect is seen for Taylor, Taylor Blake Brooks Arbery Guardado To quantify the performance of our model, we compare its goodness of fit with two null models: one in which coverage is predicted by public interest alone, and the opposite one where public interest is predicted by coverage alone (see Methods). In Figure 3 we show the mean square errors for the three models. Comparison of the null models shows that public interest tends to predict coverage better than coverage predicts public interest. This is also apparent from the time series (Figure 1c) , where the response of the media is delayed with respect to the public interest. Our model performs better than the null models, explaining this delay by an increase in the social sensitivity 1 − µ(t). The inferred dynamics of the social variables are shown in the lower panels of Figure 2 . The two social variables are of a different nature. In fact, while the engagement S(t) is a threshold-based variable whose dynamics can be expected to be fast, the social sensitivity 1 − µ(t) represents the slower, more gradual build-up of interested people across the whole population. We find that the social sensitivity changes appre- 3 . Performace of the model. We compare the goodness of fit with two null models across the six events analyzed here. In one model, public interest alone predicts coverage (p → C) and in the other, coverage alone predicts public interest (C → p). Our model explains the data better than both null models with the same number of fitting parameters. ciably over periods of ∼ 15 days which is, as expected, longer than the typical time scales of the media coverage and public interest (see Methods). Model fitting yields periods of time where a macroscopic fraction of agents becomes interested (bottom panels of Figure 2 ). To investigate the relation of these reactive periods with the emergence of street manifestations, we collected mobility measures across the US territory [19] . In Figure 4 we show attendance to recreation places, groceries, pharmacies and public transport stations in the counties and periods of time when the events took place. We find that the social sensitivity correlates tightly with mobility patterns for the most populous events using a lag of 3 days. In the case of Floyd, social sensitivity correlates with all the four mobility measures, with a peak in the mean Spearman's rank coefficient r = −0.82; in the case of Taylor, r = −0.47 for two of the mobility measures; for Blake, r = −0.48 and only one measure (p < 0.05 in all cases). The last three events were less massive, and we find no significant correlations with social sensitivity accordingly. Taken together, these results suggest that our lowdimensional approximation of the Granovetter model captures the basic ingredients that regulate social responses of very different magnitudes, which are indeed capable of ignite social mobilizations. The model implements the hypothesis that agents become involved from media exposure and also from the presence of a critical mass of interested agents in the system, which leads to characterize the social sensitivity of the population. Fluctuating interactions among people in massive social events are difficult to quantify. In this work we set up a simple mathematical model that allows us to infer social dynamics from volume content representing public interest and media coverage. We then test our model on Twitter volume data related to the Black Lives Matter movement. We find that this formulation fits the experimental series better than two models in which public interest and coverage explain each other, in absence of social interactions. Data fitting allows us to infer a measure associated with the social susceptibility. Crucially, we show that the evolution of this variable is highly correlated with variations in mobility data due to protests and riots. A possible limitation of our model is related to the assumption of uniform mixing [20] in pairwise interaction, given that public interest time series were collected from Twitter, which is indeed highly structured. The topology of social networks plays a key role when dealing with opinions of different sign that give rise to echo chambers [21, 22] . In our work, however, we are dealing with the volume of keywords, regardless of ideological leanings. We show that, at least for the highly sensitive events analyzed here, the structure of the network can be disregarded, in line with similar models that assume uniform mixing and succesfully explain the dynamics of time series related to different hashtags in Twitter [5] [6] [7] . Simple as it is, our model provides direct and interpretable measures of social engagement. We are witnessing a rapid development of algorithms that are capable of organizing massive amounts of data based on statistical relationships. However, this growth has not been matched with a development of dynamical models capable of generalize our knowledge [23] . We hope that this work contributes to our understanding of public interest, showing the potential of a simple model to explain social reactions within and outside the digital environment. We collected all the available tweets (in english lenguage) containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery, Andrés Guardado, Sean Monterrosa, Daniel Prude, Deon Kay, Walter Wallace Jr., Dijon Kizzee, Andre Hill, Dolal Idd, Marcellis Stinnette and Hakim Littleton, in a period of one month around a significant event related to each topic. Tweets were collected using the Twitter API v2 [24] . We also collected the tweets with the same keywords from the group of most followed news accounts in Twitter [25]: @cnnbrk, @nytimes, @CNN, @BBCBreaking, @BBC-World, @TheEconomist, @Reuters, @WSJ, @TIME, @ABC, @washingtonpost, @AP, @XHNews, @ndtv, @HuffPost, @BreakingNews, @guardian, @Financial-Times, @SkyNews, @AJEnglish, @SkyNewsBreak, @Newsweek, @CNBC, @France24 en, @guardiannews, @RT com, @Independent, @CBCNews, @Telegraph. Twitter data is available at https://shorturl.ae/AcUge. Mobility measures correspond to the US County associated to each event. From all mobility-related time series we extract the trend to compare with social engagement. We provide here a brief context of the analyzed events. George Perry Floyd Jr. was murdered by a police officer in Minneapolis (Ramsey County), Minnesota, on May 25, 2020. Breonna Taylor was fatally shot in Louisville (Jefferson County), Kentucky, on March 13, 2020. On September 23, several protests occur after charging decision announced in Taylor's death. Jacob S. Blake was shot and seriously injured by a police officer in Kenosha County, Wisconsin, on August 23. Rayshard Brooks was murdered on June 12, 2020 in Atlanta (Fulton County), Georgia. Ahmaud Arbery was murdered on February 23, 2020 in Glynn County, Georgia. The case became resonant after the viralization of a video about the shooting that derive his death on May 7. Andrés Guardado was killed by a Deputy Sheriff in Los Angeles County, California, on June 18, 2020. We first normalized both public interest p and media coverage C respect to their peak values. To find a timescale for the dynamics of the social sensitivity, we parameterized µ(t) as a cubic-spline of N equally spaced nodes within a 1-month period. The fitting error either falls abruptly at N = 5 (Floyd and Blake) or does not change significantly in the range 4 ≤ N ≤ 9 (Taylor, Brooks, Arbery and Guardado). We therefore fixed the value N = 5, for which µ changes appreciably on a timescale of ∼ 15 days. The media coverage was interpolated in order to obtain a continuos signal. Interpolation and numerical integration of equations 1 and 2 were performed with the library scipy [26] . Parameter fitting was performed using a gridsearch in parameter γ ∈ [10 −1 −120] in combination with a minimization routine for a the rest of the parameters (e ∈ [0, 1] and nodes of µ ∈ [−1, 2]). The routine consists on integrating the model and varying the parameters until a convergence critera is reached. We used Sequential Least Squares Programming for bounded problems in scipy to minimize the mean square error between the output of the model and data. Confidence intervals provided in table I and showed in Figures 2 and 4 correspond to fitting solutions with an error up to 10% of the best solution in each case, except for Taylor and Brooks, where solutions with a fitting error up to 50% of the best solution were reported. We compare the goodness of fit with two null models. In one of them, coverage is predicted by public interest p → C and in the other it is the other way around, C → p. Both null models were set up to be nonlinear functions approximated by order 7 th polynomials, C(t) = 7 n=1 a n p n (t) and p(t) = 7 n=1 b n C n (t), without zeroth-order term (a 0 = b 0 = 0). In this way, the null models match the number of fitting parameters of the model (e, γ and µ(t i ), with 1 < i < 5). Analytical formulation of the model Equation 1 is an analytical approximation of the threshold-based model proposed by Mark Granovetter [17] with the addition of an external field. In this model, agents adopt a binary state s which we interpret as interest (s = 1) or non-interest (s = 0) in a given topic. The dynamics of the system is described in terms of the fraction of interested agents p = i s i /N , where N is the size of the system. The agents have also an associated threshold τ i , which is the fraction of interested agents needed to induce interest on agent i. The thresholds are random variables between 0 and 1 taken from a probability density f (τ ). On the other hand, the external field is introduced through a parameter C ∈ [0, 1] independent of the state of the system. With these ingredients, the dynamics of the system is as follows: the fraction of interested agents p can change because a random agent i interacts with the media with probability e and become interested (s i = 1) in a given topic with probability C or disinterested (s i = 0) with probability 1 − C; otherwise, with probability 1 − e, the agent observes the system. In this last case, if the fraction of interested agents is greater than the threshold of the agent (p ≥ τ i ), then it becomes interested (s i = 1); otherwise, it becomes disinterested (s i = 0). Agents' state are synchronously updated, independently from their initial state. Following [27] , we derive the analytical expression for the dynamics of p shown in equation 1. Let q(p k , t) be the probability that the fraction of interested agents at time t is p k = k/N . The master equation for q(p k , t) is: where Q(1|p k ) y Q(0|p k ) are the transition probabilities that a given agent become interested or disinterested given p k . These probabilities are given by: where S(p k ) is the threshold accumulative distribution function S(p k ) = p k 0 f (τ )dτ , which by definition is the fraction of agents whose threshold is below p k (S(p k ) ≡ P (τ < p k )). In the limit of infinite population (N → ∞), p k → p, where p is now the fraction of interested agents and a continue variable ∈ [0, 1]. In this limit, the following approximations are taken: with η = 1/N . Replacing the above expressions in the master equation and neglecting terms of η 2 order, we obtain: For a well-defined initial condition, q(p, 0) = δ(p − p 0 ) (δ(x) is the Dirac's delta) and re-scaling time t → N t, the solution of the above equation (pages 53-54 of [28] ) is given by: In particular, if the thresholds are normally distributed with mean µ and dispersion σ, S(p) ≡ S(p|µ, σ). Finally, by adding a constant γ that allows to adjust the timescale, equation 1 is obtained. Equation 1 has equilibria given by p eq = (1−e)S(p eq )+ eC The stability of these points is given by the sign of: where can be observed that the parameter C plays no role in setting the stability. As reference, we summarize here all the variables and parameters of the model mentioned during the manuscript: Media exposure γ Timescale Measures of media activity and public interest were collected from all the available tweets (in english language) containing the keywords George Floyd, Breonna Taylor, Jacob Blake, Rayshard Brooks, Ahmaud Arbery, Andrés Guardado, Sean Monterrosa, Daniel Prude, Deon Kay, Walter Wallace Jr., Dijon Kizzee, Andre Hill, Dolal Idd, Marcellis Stinnette and Hakim Littleton, in a period of one month around a significant event related to each topic using the Twitter API v2 [24] . In particular, media coverage was estimated by collecting tweets with the mentioned keywords from the group of most followed news accounts in Twitter [25]: @cnnbrk, @nytimes, @CNN, @BBCBreaking, @BBCWorld, @TheEconomist, @Reuters, @WSJ, @TIME, @ABC, @washingtonpost, @AP, @XHNews, @ndtv, @HuffPost, @BreakingNews, @guardian, @FinancialTimes, @SkyNews, @AJEnglish, @SkyNewsBreak, @Newsweek, @CNBC, @France24 en, @guardiannews, @RT com, @Independent, @CBC-News, @Telegraph. Twitter data is available at https://shorturl.ae/AcUge. To validate the measure of media coverage, we compare this quantity with information directly obtained from media articles. In particular, we tracked news articles from five main media outlets such as The New York Times, Fox News, UsaToday, Washington Post and Huffington Post related to the main events analyzed in the paper. Figure 5 shows that the coverage reported in the main manuscript is similar to the number of articles in which the keyword is mentioned and also with the number of mentions. Figure 6 shows the correlation between reported media coverage and the number of mentions in the articles. A coefficient higher than 0.8 is obtained in all cases, except from Guardado, suggesting that both approaches to measure media activity are equivalent. The differences in the Guardado case is due to the fact that only a few articles were found in the analyzed media. Novelty and collective attention Memetracking and the dynamics of the news cycle Quantifying time-dependent media agenda and public opinion by topic modeling Analyzing mass media influence using natural language processing and time series analysis Mass media and the contagion of fear: the case of ebola in america Event triggered social media chatter: A new modeling framework Accelerating dynamics of collective attention Emergence of polarized ideological opinions in multidimensional topic spaces Modeling echo chambers and polarization dynamics in social networks A social identity model of riot diffusion: From injustice to empowerment in the 2011 London riots Riots and Uprisings : Modelling Conflict between Centralised and Decentralised Systems Epidemiological modelling of the 2005 French riots: A spreading wave and the role of contagion The power of information networks: New directions for agenda setting Statistical physics of social dynamics The undecided have the key: interaction-driven opinion dynamics in a three state model Polarizing crowds: Consensus and bipolarization in a persuasive arguments model Threshold models of collective behavior All Lives Matter, but so Does Race Inferring models of opinion dynamics from aggregated jury data The echo chamber effect on social media Quantifying echo chamber effects in information spreading over political communication networks Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences of the United States of America Effects of mixing in threshold models of social behavior Coverage measured as twitter activity (coverage reported in the main manuscript, blue dots), number of articles (black diamonds) and number of mentions (red squares) from five main media outlets To measure coverage by tracking twitter activity is very similar to look for the number of mentions in media articles, so both are valid aproaches to estimate media activity This research was partially funded by the Universidad de Buenos Aires (UBA), the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) through grant PIP-11220200102083CO, and the Agencia Nacional de Promoción de la Investigación, el Desarrollo Tecnológico y la Innovación through grant PICT-2020-SERIEA-00966.