key: cord-211511-56q57zwc authors: Aiello, Luca Maria; Quercia, Daniele; Zhou, Ke; Constantinides, Marios; vS'cepanovi'c, Sanja; Joglekar, Sagar title: How Epidemic Psychology Works on Social Media: Evolution of responses to the COVID-19 pandemic date: 2020-07-26 journal: nan DOI: nan sha: doc_id: 211511 cord_uid: 56q57zwc Disruptions resulting from an epidemic might often appear to amount to chaos but, in reality, can be understood in a systematic way through the lens of"epidemic psychology". According to the father of this research field, Philip Strong, not only is the epidemic biological; there is also the potential for three social epidemics: of fear, moralization, and action. This work is the first study to empirically test Strong's model at scale. It does so by studying the use of language on 39M social media posts in US about the COVID-19 pandemic, which is the first pandemic to spread this quickly not only on a global scale but also online. We identified three distinct phases, which parallel Kuebler-Ross's stages of grief. Each of them is characterized by different regimes of the three social epidemics: in the refusal phase, people refused to accept reality despite the increasing numbers of deaths in other countries; in the suspended reality phase (started after the announcement of the first death in the country), people's fear translated into anger about the looming feeling that things were about to change; finally, in the acceptance phase (started after the authorities imposed physical-distancing measures), people found a"new normal"for their daily activities. Our real-time operationalization of Strong's model makes it possible to embed epidemic psychology in any real-time model (e.g., epidemiological and mobility models). In our daily lives, our dominant perception is of order. But every now and then chaos threats that order: epidemics dramatically break out, revolutions erupt, empires suddenly fall, and stock markets crash. Epidemics, in particular, present not only collective health hazards but also special challenges to mental health and public order that need to be addressed by social and behavioral sciences 1 . Almost 30 years ago, in the wake of the AIDS epidemic, Philip Strong, the founder of the sociological study of epidemic infectious diseases, reflected: "the human origin of epidemic psychology lies not so much in our unruly passions as in the threat of epidemic disease to our everyday assumptions." 2 In the recent COVID-19 pandemic 3 (an ongoing pandemic of a coronavirus disease), it has been shown that the main source of uncertainty and anxiety has indeed come from the disruption of what Alfred Shutz called the "routines and recipes" of daily life 4 (e.g., every simple act, from eating at work to visiting our parents, takes on new meanings). Yet, the chaos resulting from an epidemic turns out to be more predictable than what one would initially expect. Philip Strong observed that any new health epidemic resulted into three social epidemics: of fear, moralization, and action. The epidemic of fear represents the fear of catching the disease, which comes with the suspicion against alleged disease carriers, which, in turn, may spark panic and irrational behavior. The epidemic of moralization is characterized by moral responses both to the viral epidemic itself and to the epidemic of fear, which may result in either positive reactions (e.g., cooperation) or negative ones (e.g., stigmatization). The epidemic of action accounts for the rational or irrational changes of daily habits that people make in response to the disease or as a result of the two other social epidemics. Strong was writing in the wake of the AIDS/HIV crisis, but he based his model on studies that went back to Europe's Black Death in the 14 th century. Importantly, he showed that these three social epidemics are created by language and incrementally fed through it: language transmits the fear that the infection is an existential threat to humanity and that we are all going to die; language depicts the epidemic as a verdict on human failings and as a divine moral judgment on minorities; and language shapes the means through which people collectively intend to, however pointless, act against the threat. Hitherto, there has never been any large-scale empirical study of whether the use of language during an epidemic reflects Strong's model, not least because of lack of data. COVID-19 has recently changed that: it has been the first epidemic in history in which people around the world have been collectively expressing their thoughts and concerns on social media. As such, researchers have had an unprecedented opportunity to study this epidemic in new ways: social media posts have been analyzed in terms of content and behavioral markers 5, 6 , and of tracking the diffusion of COVID-related information 7 Coding Strong's model Back in the 1990s, Philip Strong was able not only to describe the psychological impact of epidemics on social order but also to model it. He observed that the early reaction to major fatal epidemics is a distinctive psycho-social form and can be modeled along three main dimensions: fear, morality, and action. During a large-scale epidemic, basic assumptions about social interaction and, more generally, about social order are disrupted, and, more specifically, they are so by: the fear of others, competing moralities, and the responses to the epidemic. Crucially, all these three elements are created, transmitted, and mediated by language: language transmits fears, elaborates on the stigmatization of minorities, and shapes the means through which people collectively respond to the epidemic 2, 40, 41 . We operationalized Strong's epidemic psychology theoretical framework in two steps. First, three authors hand-coded Strong's seminal paper 2 using line-by-line coding 42 to identify keywords that characterize the three social epidemics. For each of the three social epidemics, the three authors generated independent lists of keywords that were conservatively combined by intersecting them. The words that were left out by the intersection were mostly synonyms (e.g., "catching disease" as a synonym for "contagion"), so we did not discard any important concept. According to Strong, the three social epidemics are intertwined and, as such, the concepts that define one specific social epidemic might be relevant to the remaining two as well. For example, suspicion is an element of the epidemic of fear but is tightly related to stigmatization as well, a phenomenon that Strong describes as typical of the epidemic of moralization. In our coding exercise, we adhered as much as possible to the description in Strong's paper and obtained a strict partition of keywords across social epidemics. In the second step, the same three authors mapped each of these keywords to language categories, namely sets of words that reflect how these concepts are expressed in natural language (e.g., words expressing anger or trust). We took these categories from existing language lexicons widely used in psychometric studies: the Linguistic Inquiry Word Count (LIWC) 25 , Emolex 43 , the Moral Foundation Lexicon 37 , and the Prosocial Behavior Lexicon 44 . The three authors grouped similar keywords together and mapped groups of keywords to one or more language categories. This grouping and mapping procedure was informed by previous studies that investigated how these keywords are expressed through language. These studies are listed in Table 1 Daily habits concern mainly people's experience of home, work, leisure, and movement between them 39 leisure (liwc) .05 .16 .10 Table 1 . Operationalization of the Strong's epidemic psychology theoretical framework. From Strong's paper, three annotators extracted keywords that characterize the three social epidemics and mapped them to relevant language categories from existing language lexicons used in psychometric studies. Category names are followed by the name of their corresponding lexicon in parenthesis. We support the association between keywords and language categories with examples of supporting literature. To summarize how the use of the language categories varies across the three temporal states, we computed the peak values of the different language categories (days when their standardized fractions reached the maximum), and reported the percentage increase at peak compared to the average over the whole time period; in each row, the maximum value is highlighted in bold. To find occurrences of these language categories in our Twitter data, we matched them against the text in each tweet. We considered that a tweet contains a language category c, if at least one of the tweet's words (or word stems) belonged to that 3/17 category. For each day, we computed the fraction of users who posted at least one tweet containing a given language category over the total number of users who tweeted during that day. We experimentally checked that each day had a number of data points sufficient to obtain valid metrics (i.e., the minimum number of distinct users per day is above 72K across the whole period of study). To allow for a fair comparison across categories, we z-standardized each fraction by computing the number of standard deviations from the fraction's whole-period average. Figure 1 (A-C) shows how the standardized fractions of all the language categories changed over time. The cell color encodes values higher than the average in red, and lower in blue. We partitioned the language categories according to the three social epidemics. To identify phases characterized by different combinations of the language categories, we determined change-points-periods in which the standardized fractions considerably vary across all categories at once. To quantify such variations, we computed the daily average squared gradient of the standardized fractions of all the language categories. The squared gradient is a measure of the rate of instantaneous change (increase or decrease) of a given point in a time series 45 . Figure 1 D shows the value of the average squared gradient over time; peaks in the curve represent the days of high local variation. We marked the peaks above one standard deviation from the mean as change-points. We found two change-points that coincide with two key events: February 27 th , the day of the announcement of the first infection in the country; and March 24 th , the day of the announcement of the 'stay at home' orders. These change-points identify three phases, which are described next by dwelling on the peaks of the different language categories (days when their standardized fractions reached the maximum) and reporting the percentage increase at peak (the increase is compared to the average over the whole period of study, and its peak is denoted by 'max peak' in Table 1 ). The first phase (refusal phase) was characterized by anxiety and fear. Death was frequently mentioned, with a peak on February 11 of +45% compared to its average during the whole time period. The pronoun they was used in this temporal state more than average; this suggests that the focus of discussion was on the implications of the viral epidemic on 'others', as this was when no infection had been discovered in US yet. All other language categories exhibited no significant variations, which reflected an overall situation of 'business-as-usual. ' The second phase (suspended reality phase) began on February 27 th with an outburst of negative emotions (predominantly anger), right after the first COVID-19 contagion in US was announced. The abstract fear of death was replaced by expressions of concrete health concerns, such as words expressing risk, and mentions of how body parts did feel. On March 13 th , the federal government announced the state of national emergency, followed by the enforcement of state-level 'stay at home' orders. During those days, we observed a sharp increase of the use of the pronoun I and of swear words (with a peak of +54% on March 18 th ), which hints at a climate of discussion characterized by conflict and polarization. At the same time, we observed an increase in the use of words related to the daily habits affected by the impending restriction policies, such as motion, social activities, and leisure. The mentions of words related to home peaked on March 16 th (+38%), the day when the federal government announced social distancing guidelines to be in place for at least two weeks. The third phase (acceptance phase) started on March 24 th , the day after the first physically-distancing measures were imposed by law. The increased use of words of power and authority likely reflected the emergence of discussion around the new policies enforced by government officials and public agencies. As the death toll raised steadily-hitting the mark of 1,000 people on March 26 th -expressions of conflict faded away, and words of sadness became predominant. In those days of hardship, a sentiment of care for others and expressions of prosocial behavior became more frequent (+19% and +25%, respectively). Last, mentions of work-related activities peaked as many people either lost their job, or were compelled to work from home as result of the lockdown. The language categories capture broad concepts related to Strong's epidemic psychology theory, but they do not allow for an analysis of the fine-grained topics within each category. To study them, for each of the 87 combinations of language category and phase (29 language categories, for 3 phases), we listed the 100 most retweeted tweets (e.g., most popular tweets containing anxiety posted in the refusal phase). To identify overarching themes, we followed two steps that are commonly adopted in thematic analysis 46, 47 . We first applied open coding to identify key concepts that emerged across multiple tweets; specifically, one of the authors read all the tweets and marked them with keywords that reflected the key concepts expressed in the text. We then used axial coding to identify relationships between the most frequent keywords to summarize them in semantically cohesive themes. Themes were reviewed in a recursive manner rather than linear, by re-evaluating and adjusting them as new tweets were parsed. Table 2 summarizes the most recurring themes, together with some of their representative tweets. The thematic analysis revealed that the topics discussed in the three phases resemble the five stages of grief 15 : the refusal phase was characterized by denial, the suspended reality phase by anger mixed with bargaining, and the acceptance phase by sadness together with forbearance. More specifically, in the refusal phase, statements of skepticism were re-tweeted widely ( Table 2 , row 1). The epidemic was frequently depicted as a "foreign" problem (r. 2) and all activities kept business as usual (r. 3). In the suspended reality phase, the discussion was characterized by outrage against three main categories: foreigners (r. 4), political opponents (r. 5), and people who adopted different behavioral responses to the outbreak (r. 6). This level of conflict Each row in the heatmaps represents a language category (e.g., words expressing anxiety) that our manual coding associated with one of the three social epidemics. The cell color represents the daily standardized fraction of people who used words related to that category: values that are higher than the average are red and those that are lower are blue. Categories are partitioned in three groups according to the type of social epidemics they model: Fear, Morality, and Action. (D) average gradient (i.e., instantaneous variation) of all the language categories; the peaks of gradient identify change-pointsdates around which a considerable change in the use of multiple language categories happened at once. The dashed vertical lines that cross all the plots represent these change-points. (E-H) temporal evolution of four families of indicators we used to corroborate the validity of the trends identified by the language categories. We checked internal validity by comparing the language categories with a custom keyword-search approach and two deep-learning NLP tools that extract types of social interactions and mentions of medical symptoms. We checked external validity by looking at mobility patterns in different venue categories as estimated by the GPS geo-localization service of the Foursquare mobile app. The timeline at the bottom of the figure marks some of the key events of the COVID-19 pandemic in US such as the announcements of the first infection of COVID-19 recorded. corroborates Strong's postulate of the "war against each other". Science and religion were two prominent topics of discussion. A lively debate raged around the validity of scientists' recommendations (r. 7). Some social groups put their hopes on God Theme Example tweets The acceptance phase 10 sadness "We deeply mourn the 758 New Yorkers we lost yesterday to COVID-19. New York is not numb. We know this is not just a number-it is real lives lost forever." 11 we-focus, hope "We are thankful for Japan's friendship and cooperation as we stand together to defeat the #COVID19 pandemic.", "During tough times, real friends stick together. The U.S. is thankful to #Taiwan for donating 2 million face masks to support our healthcare ", "Now more than ever, we need to choose hope over fear. We will beat COVID-19. We will overcome this. Together." 12 authority "You can't go to church, buy seeds or paint, operate your business, run on a beach, or take your kids to the park. You do have to obey all new 'laws', wear face masks in public, pay your taxes. Hopefully this is over by the 4th of July so we can celebrate our freedom. resuming work "We need to help as many working families and small businesses as possible. Workers who have lost their jobs or seen their hours slashed and families who are struggling to pay rent and put food on the table need help immediately. There's no time to waste." Table 2 . Recurring themes in the three phases, found by the means of thematic analysis of tweets. Themes are paired with examples of popular tweets. rather than on science (r. 8). Mentions of people self-isolating at home became very frequent, and highlighted the contrast between judicious individuals and careless crowds (r. 9). Finally, during the acceptance phase, the outburst of anger gave in to the sorrow caused by the mourning of thousands of people (r. 10). By accepting the real threat of the virus, people were more open to find collective solutions to the problem and overcome fear with hope (r. 11). Although the positive attitude towards the authorities seemed prevalent, some people expressed disappointment against the restrictions imposed (r. 12). Those who were isolated at home started imagining a life beyond the isolation, especially in relation to reopening businesses (r. 13). To assess the validity of our approach, we compared the previous results with the output of alternative text-mining techniques applied to the same data (internal validity), and with people's mobility in the real world (external validity). We processed the very same social media posts with three alternative text-mining techniques (Figure 1 E-G). In Table 3 , we reported the three language categories with the strongest correlations with each behavioral marker. First, to allow for interpretable and explainable results, we applied a simple word-matching method that relies on a custom lexicon containing three categories of words reflecting consumption of alcohol, physical exercising, and economic concerns, as those aspects have been found to characterize the COVID-19 pandemic 48 . We measured the daily fraction of users mentioning words in each of those categories (Figure 1 E) . In the refusal phase, the frequency of any of these words did not significantly increase. In the suspended reality phase, the frequency of words related to economy peaked, and that related to alcohol consumption peaked shortly after that. Table 3 shows that economy-related words were highly correlated with the use of anxiety words (r = 0.73), which is in line with studies indicating that the degree of apprehension for the declining economy was comparable to that of health-hazard concerns 49, 50 . Words of alcohol consumption were most correlated with the language dimensions of body (r = 0.70), feel (r = 0.62), home (r = 0.58); in the period were health concerns were at their peak, home isolation caused a rising tide of alcohol use 51, 52 . Finally, in the acceptance phase, the frequency of words related to physical exercise was significant; this happened at the same time when the use of positive words expressing togetherness was at its highest-affiliation (r = 0.95), posemo (r = 0.93), we (r = 0.92). All these results match our previous interpretations of the peaks for our language categories. Second, since it is unclear whether using a standard word count analytic system would allow for the distinction among the three different types of social epidemics, we used a deep-learning Natural Language Processing tool that mines conversations according to how humans understand them in the real world 53 . The tool can classify any textual message according to types of interaction that are close to human-level understanding. In particular, we studied over time the three types most frequently found: expressions of conflict (expressions of contrast or diverging views), social support (emotional aid and companionship), and power (expressions that denote or describe person's power over the behavior and outcomes of another). Figure 1 F shows the min-max normalized scores of the fraction of people posting tweets labeled with each of these three interaction types. In refusal phase, conflict increased-this is when anxiety and blaming foreigners were recurring themes in Twitter. In the suspended reality phase, conflict peaked (similar to anxiety words, r = 0.88), yet, since this when the first lock-down measures were announced, initial expressions of power and of social support gradually increased as well. Finally, in the acceptance phase, social support peaked. Support was most correlated with the categories of affiliation (r = 0.98), positive emotions (r = 0.96), and we (r = 0.94) ( Table 3) ; power was most correlated with prosocial (r = 0.95), care (r = 0.94), and authority (r = 0.94). Again, our previous interpretations concerning the existence of a phase of conflict followed by a phase of social support were further confirmed by the deep-learning tool, which, as opposed to our dictionary-based approaches, does not rely on word matching. Third, we used a deep-learning tool that extracts mentions of medical entities from text 54 . When applied to a tweet, the tool accurately extracts medical symptoms in the form of n-grams extracted from the tweet's text (e.g., "cough", "feeling sick"). Out of all the entities extracted, we focused on the 100 most frequently mentioned and grouped them into two families of symptoms, respectively, those related to physical health (e.g., "fever", "cough", "sick") and those related to mental health (e.g., "depression", "stress") 3 . The min-max normalized fractions of people posting tweets containing mentions of these symptoms are shown in Figure 1 G. In refusal phase, the frequency of symptom mentions did not change. In the suspended reality phase, instead, physical symptoms started to be mentioned, and they were correlated with the language categories expressing panic and physical health concerns-swear (r = 0.83), feel (r = 0.77), and negate (r = 0.67). In the acceptance phase, mentions of mental symptoms became most frequent. Interestingly, mental symptoms peaked when the Twitter discourse was characterized by positive feelings and prosocial interactions-affiliation (r = 0.91), we (r = 0.88), and posemo (r = 0.85); this is in line with recent studies that found that the psychological toll of COVID-19 has similar traits to post-traumatic stress disorders and its symptoms might lag several weeks from the period of initial panic and forced isolation [55] [56] [57] . To test for the external validity of our language categories, we compared their temporal trends with mobility data. We used the data collection that Foursquare made publicly available in response to the COVID-19 crisis through the visitdata.org website. The data consists of the daily number of people in US visiting each of 35 venue types, as estimated by the GPS geo-localization service of the Foursquare mobile app. We picked three venue categories: Grocery shops, Travel & Transport, and Outdoors & Recreation to reflect three different types of fundamental human needs 58 : the primary need of getting food supplies, the secondary need of moving around freely (or to limit mobility for safety), and the higher-level need of being entertained. In Figure 1 H, we show the min-max normalized number of visits over time. The periods of higher variations of the normalized number of visits match the transitions between the three phases. In the refusal phase, people's mobility did not change. In the suspended reality phase, instead, travel started to drop, and grocery shopping peaked, supporting the interpretation of a phase characterized by a wave of panic-induced stockpiling and a compulsion to save oneself-it co-occurred with the peak of use of the pronoun I (r = 0.80)-rather than helping others. Finally, in the acceptance phase, the panic around grocery shopping faded away, and the number of visits to parks and outdoor spaces increased. To embed our operationalization of epidemic psychology into real-time models (e.g., epidemiological models, urban mobility models), our measures need to work at any point in time during a new pandemic, yet, given their current definitions, they do not: that is because they are normalized values over the whole period of study (Figure 1 A-C) . To fix that, we designed a new composite measure that does not rely on full temporal knowledge, and a corresponding detection method that determines which of the three phases one is in at any given point in time. For each phase, this parsimonious measure is composed of the language dimensions that positively and negatively characterize the phase. More specifically, it is composed of two dimensions: the dimension most positively associated with the phase (expressed in percent change) minus that most negatively associated with it (e.g., (death -I) for the refusal phase). To identify such dimensions, we trained three logistic regression binary classifiers (one per phase) that use the percent changes of all the language dimensions at time t to estimate the probability that t belongs to phase i (P phase i (t)). The on average, the classifiers were able to identify the correct phase for 98% of the days. The regressions coefficients were then used to rank the language category by their predictive power. Table 4 shows the top three positive beta coefficients and bottom three negative ones for each of the three phases. For each phase, we subtracted the top category from the bottom category without considering their beta coefficients, as these would require, again, full temporal knowledge. The top and bottom categories of all phases belong to the LIWC lexicon. The resulting composite measure has change-points ( Figure 2 ) similar to the full-knowledge measure's ( Figure 1) , suggesting that the real-time and parsimonious computation does not compromise the original trends. In a real-time scenario, transition between phases are captured changes of the dominant measure; for example, when the refusal curve is overtaken by the suspended reality curve. In addition, we correlated the composite measures with each of the behavioral markers we used for validation (Figure 1 E-H) to find which are the markers that are most typical of each of the phases. We reported the correlations in Table 3 . During the refusal phase, conflictual interactions were frequent (r = 0.58) and long-range mobility was common (r = 0.62); during the suspended reality phase, as mobility reduced 59, 60 , people hoarded groceries and alcohol 51, 52 and expressed concerns for their physical health (r = 0.81) and for the economy 49, 50 ; last, during the acceptance phase, people ventured outdoors, started exercising more, and expressed a stronger will to support each other (r = 0.90), in the wake of a rising tide of deaths and mental health symptoms (r = 0.85) 55-57 . New infectious diseases break out abruptly, and public health agencies try to rely on detailed planning yet often find themselves to improvise around their playbook. They are constantly confronting not only the health epidemic but also the three social epidemics. Measuring the effects of epidemics on societal dynamics and population mental health has been an open research Table 4 . Top three positive and bottom negative beta coefficients of the logistic regression models for the three phases. The categories in bold are those included in our composite temporal score. problem for long, and multidisciplinary approaches have been called for 61 . As our method is easy to use, and can be applied to any public stream of data, it has a direct practical implication on improving the ability to monitor whether people's behavior and perceptions align with or divert from the expectations and recommendations of governments and experts, thus informing the design of more effective interventions 1 . Since our language categories are not tailored to a specific epidemic (e.g., they do not reflect any specific symptom an epidemic is associated with), our approach can be applied to a future epidemic, provided that the set of relevant hash-tags associated with the epidemic is known; this is a reasonable assumption to make though, considering that the consensus on Twitter hash-tags is reached quickly 62 , and that several epidemics that occurred in the last decade sparked discussions on Twitter since their early days [63] [64] [65] . Our method could complement the numerous cross-sectional studies on the negative psychological impact of health epidemics 3, 66 . Those studies are usually conducted on a small to medium scale and are costly to carry out; our approach could integrate them with real-time insights from large-scale data. For computer science researchers, our method could provide a starting point for developing more sophisticated tools for monitoring social epidemics. Furthermore, from the theoretical standpoint, our work provides the first operationalization of Strong's theoretical model of the epidemic psychology and shows its applicability to social media data. Furthermore, starting from Strong's epidemic psychology, our analysis showed the emergence of phases that parallel Kuebler-Ross's stages of grief. This demonstrates the centrality of the psychological responses to major life trauma in parallel with any potential physical danger. Thus, future research could integrate and apply the two perspectives not just to pandemics, but to large scale disasters and other tragedies. Finally, and more importantly, our real-time operationalization of Strong's model makes it possible to embed epidemic psychology in any real-time models for the first time. Future work could improve our work in five main aspects. First, we focused only on one viral epidemic, without being able to compare it to others. That is mainly because no other epidemic had an online scale comparable to COVID-19. Yet, if one were to obtain past social media data during the outbreaks of diseases like Zika 63 , Ebola 64 , and the H1N1 influenza 65 , one could apply our methodology in those contexts as well, and identify similarities and differences. For example, one could study how mortality rates or speed of spreading influence the representation of Strong's epidemic psychology on social media. Second, our geographical focus was the entire United States and, as such, was coarse and limited in scope. Our collected data did not provide a sufficient coverage for each individual state in the US. If we were to obtain such high-coverage data, we could relate differences between states to large-scale events (e.g., a governor's decisions, prevalence of cases, media landscape, and residents' cultural traits). In particular, recent studies suggested that the public reaction to COVID-19 varied across US states depending on their political leaning 67, 68 . One could also apply our methodology to other English-speaking countries, to investigate how cultural dimensions 69 and cross-cultural personality trait variations 70 might influence the three social epidemics. Third, the period of study is limited yet proved to be sufficient to discover a clear sequence of collective psychological 9/17 phases. Future work could explore longer periods to ultimately assess the social epidemics' long-term effects. Fourth, our study is limited to Twitter, mainly because Twitter is the largest open stream of real-time social media data. The practice of using Twitter as a way of modeling the psychological state of a country carries its own limitations. Despite having a rather high penetration in the US (around 20% of adults, according to the latest estimates 71 ), its user base is not representative of the general population 72 . Additionally, Twitter is notoriously populated by bots 73, 74 , automated accounts that are often used to amplify specific topics or view points. Bots played an important role to steer the discussion on several events of broad public interest 75, 76 , and it is reasonable to expect that they have a role in COVID-related discussions too, as some recent studies seem to suggest 11 . To partly discount their impact, since they tend to have anomalous levels of activity (especially retweeting 75 ), we performed two tests. First, we computed all our measures at user-level rather than tweet-level, which counter anomalous levels of activity. Second, we replicated our temporal analysis excluding retweets, and obtained very similar results. In the future, one could attempt to adapt our framework to different sources of online data, for example to web search queries-which have proven useful to identify different phases of the public reactions to the COVID-19 pandemic 77 . Last, as Strong himself acknowledged in his seminal paper: "any sharp separation between different types of epidemic psychology is a dubious business." Our work has operationalized each social epidemic independently. In the future, modeling the relationships among the three epidemics might identify hitherto hidden emergent properties. We collected tweets related to COVID-19 from two sources. First, from an existing dataset of 129,911,732 COVID-related tweets 78 , we gathered 57,287,490 English tweets posted between February 1 st up to April 16 th by 11,318,634 unique users. We augmented this dataset with our own collection of Tweets obtained by querying the Twitter Streaming API continuously from March 15 th until April 16 th using a list of keywords aligned with the previous data collection 78 : coronavirus, covid19, covid 19, coronaviruslockdown, coronavirusoutbreak, herd immunity, herdimmunity. The Streaming API returns a sample of up to 1% of all tweets. This second crawl got us 96,576,543 English tweets. By combining the two collections, we obtained 143,325,623 unique English tweets posted by 17,862,493 users. As we shall discuss in the remainder of this section, we normalized all our measures so that they are not influenced by the fluctuating volume of tweets over time. We focused our analysis on the United States, the country where Twitter penetration is highest. To identify Twitter users living in it, we parsed the free-text location description of their user profile (e.g., "San Francisco, CA"). We did so by using a set of custom regular expressions that match variations for the expression "United States of America", as well as the names of 333 US cities, and 51 US states (and their combinations). Albeit not always accurate, matching location strings against known location names is a tested approach that yields good results for a coarse-grained localization at state or country-level 79 . Overall, we located 3,710,489 unique users in US who posted 38,950,828 tweets; this is the final dataset we used for the analysis. The number of active users per day varies from a minimum of 72k on February 2 nd to a maximum of 1.84M on March 18 th , with and average of 437k. The median number of tweets per user during the whole period is 2. A small number of accounts tweeted a disproportionately high number of times, reaching a maximum of 15,823 tweets; those were clearly automated accounts, which were discarded by our approach. We selected our language categories from four lexicons: Linguistic Inquiry Word Count (LIWC) 25 . A lexicon of words and word stems grouped into over 125 categories reflecting emotions, social processes, and basic functions, among others. The LIWC lexicon is based on the premise that the words people use to communicate can provide clues to their psychological states 25 . It allows written passages to be analyzed syntactically (how the words are used together to form phrases or sentences) and semantically (an analysis of the meaning of the words or phrases). Emolex 43 . A lexicon that classifies 6k+ words and stems into the eight primary emotions of Plutchik's psychoevolutionary theory 80 . Moral Foundation Lexicon 37 . A lexicon of 318 words and stems, which are grouped into 5 categories of moral foundations 81 : harm, fairness, in-group, authority, and purity. Each of which is further split into expressions of virtue or vice. Pro-social behavior 44 . A lexicon of 146 pro-social words and stems, which have been found to be frequently used when people describe pro-social goals 44 . We considered that a tweet contained a language category c if at least one of the tweet's words or stems belonged to that category. The tweet-category association is binary and disregards the number of matching words within the same tweet. That is mainly because, in short snippets of text (tweets are limited to 280 characters), multiple occurrences are rare and do not necessarily reflect the intensity of a category 82 . For each language category c, we counted the number of users U c (t) who posted at least one tweet at time t containing that category. We then obtained the fraction of users who mentioned category c by dividing U c (t) by the total number of users U(t) who tweeted at time t: Computing the fraction of users rather than the fraction of tweets prevents biases introduced by exceptionally active users, thus capturing more faithfully the prevalence of different language categories in our Twitter population. This also helps discounting the impact of social bots, which tend to have anomalous levels of activity (especially retweeting 75 ). Different categories might be verbalized with considerably different frequencies. For example, the language category "I" (first-person pronoun) from the LIWC lexicon naturally occurred much more frequently than the category "death" from the same lexicon. To enable a comparison across categories, we standardized all the fractions: where µ( f c ) and σ ( f c ) represent the mean and standard deviation of the f c (t) scores over the whole time period, from t = 0 (February 1 st ) to t = T (April 16 th ). These z-scores ease also the interpretation of the results as they represent the relative variation of a category's prevalence compared to its average: they take on values higher (lower) than zero when the original value is higher (lower) than the average. We compared the results obtained via word-matching with a state-of-the-art deep learning tool for Natural Language Processing designed to capture fundamental types of social interactions from conversational language 53 . This tool uses Long Short-Term Memory neural networks (LSTMs) 83 that take in input a 300-dimensional GloVe representation of words 84 Out of the ten interaction types that the tool can classify 85 , only three were detected frequently with likelihood > 0.5 in our Twitter data: conflict (expressions of contrast or diverging views 86 ), social support (giving emotional or practical aid and companionship 87 ) , and power (expressions that mark a person's power over the behavior and outcomes of another 88 ) . Given a tweet's textual message m and an interaction type i, we used the classifier to compute the likelihood score l i (m) that the message contains that interaction type. We then binarized the confidence scores using a threshold-based indicator function: Following the original approach 53 , we used a different threshold for each interaction type, as the distributions of their likelihood scores tend to vary considerably. We thus picked conservatively θ i as the value of the 85 th percentile of the distribution of the confidence scores l i , thus favoring precision over recall. Last, similar to how we constructed temporal signals for the language categories, we counted the number of users U i (t) who posted at least one tweet at time t that contains interaction type i. We then obtained the fraction of users who mentioned interaction type i by dividing U i (t) by the total number of users U(t) who tweeted at time t: Last, we min-max normalized these fractions, considering the minimum and maximum values during the whole time period [0, T ]: . To identify medical symptoms on Twitter in relation to COVID-19, we resorted to a state-of-the-art deep learning method for medical entity extraction 54 . When applied to tweets, the method extracts n-grams representing medical symptoms (e.g., "feeling sick"). This method is based on the Bi-LSTM sequence-tagging architecture introduced by Huang et al. 89 in combination with GloVe word embeddings 84 and RoBERTa contextual embeddings 90 . To optimize the entity extraction performance on noisy textual data from social media, we trained its sequence-tagging architecture on the Micromed database 91 , a collection of tweets manually labeled with medical entities. The hyper-parameters we used are: 256 hidden units, a batch size of 4, and a learning rate of 0.1 which we gradually halved whenever there was no performance improvement after 3 epochs. We trained for a maximum of 200 epochs or before the learning rate became too small (≤ .0001). The final model achieved an F1-score of .72 on Micromed. The F1-score is a performance measure that combines precision (the fraction of extracted entities that are actually medical entities) and recall (the fraction of medical entities present in the text that the method is able to retrieve). We based our implementation on Flair 92 and Pytorch 93 , two popular deep learning libraries in Python. For each unique medical entity e we counted the number of users U e (t) who posted at least one tweet at time t that mentioned that entity. We then obtained the fraction of users who mentioned medical entity e by dividing U e (t) by the total number of users U(t) who tweeted at time t: Last, we min-max normalize these fractions, considering the minimum and maximum values during the whole time period [0, T ]: Comparison with mobility traces Foursquare is a local search and discovery mobile application that relies on the users' past mobility records to recommend places user might may like. The application uses GPS geo-localization to estimate the user position and to infer the places they visited. In response to the COVID-19 crisis, Foursquare made publicly available the data gathered from a pool of 13 million US users. These users were "always-on" during the period of data collection, meaning that they allowed the application to gather geo-location data at all times, even when the application was not in use. The data (published through the visitdata.org . We then averaged the values across all states: where S is the total number of states. By weighting each state equally, we obtained a measure that is more representative of the whole US territory, rather than being biased towards high-density regions. All our temporal indicators are affected by large day-to-day fluctuations. To extract more consistent trends out of our time series, we applied a smoothing function-a common practice when analyzing temporal data extracted from social media 94 . Given a time-varying signal x(t), we apply a "boxcar" moving average over a window of the previous k days: x * (t) = ∑ t i=t−k x(i) k ; We selected a window of one week (k = 7). Weekly time windows are typically used to smooth out both day-to-day variations as well as weekly periodicities 94 . We applied the smoothing to all the time series: the language categories (z * c (t)), the mentions of medical entities ( f * e (t)), the interaction types ( f * i (t)), and the foursquare visits (v * j (t)). To identify phases characterized by different combinations of the language categories, we identified change-points-periods in which the values of all categories varied considerably at once. To quantify such variations, for each language category c, we computed ∇(z * c (t)), namely the daily average squared gradient of the smoothed standardized fractions of that category 95 . To calculate the gradient, we used the Python function numpy.gradient. The gradient provides a measure of the rate of increase or decrease of the signal; we consider the absolute value of the gradient, to account for the magnitude of change rather than the direction of change. To identify periods of consistent change as opposed to quick instantaneous shifts, we apply temporal smoothing (Equation. 10) also to the time-series of gradients, and we denote the smoothed squared gradients with ∇ * . Last, we average the gradients of all language categories to obtain the overall gradient over time: Peaks in the time series ∇(t) represent the days of highest variation, and we marked them as change-points. Using the Python function scipy.signal.find peaks, we identified peaks as the local maxima whose values is higher than the average plus one standard deviation, as it is common practice 96 . For each language category c, we first computed the average value of f c during the first day of the epidemic, specifically µ [0,1] ( f c ). During the first day, 86k users tweeted. We experimented with longer periods (up to a week and 0.4M users), and obtained qualitatively similar results. We used the averages computed on this initial period as reference values for later measurements. The assumption behind this approach is that the modeler would know the set of relevant hashtags in the initial stages of the pandemic, which is reasonable considering that this was the case for all the major pandemics occurred in the last decade [63] [64] [65] . Starting from the second day, we then calculated the percent change of the f c values compared to the historical average: Finally, we combined the ∆% c values of the selected categories to create measures that capture the average relative change of the prevalence of verbal expressions typical of each of the three temporal phases: ∆% Re f usal = ∆% death − ∆% I ∆% Suspended reality = ∆% swear − ∆% death (14) ∆% Acceptance = ∆% sad − ∆% anxiety Those categories were selected among those that proved to be more predictive of a given phase. Specifically, we trained three logistic regression classifiers (one per phase). For each phase, we marked with label 1 all the days that were included in that phase and with 0 those that were not. Then, we trained a logistic regression classifier to predict the label of day t out of the ∆% c (t) values for all categories. During training, the logistic regression classifier learned coefficients for each of the categories. We included in Equations 13, 14, and 15 the categories with the top positive and top negative coefficients. Using social and behavioural science to support covid-19 pandemic response Epidemic psychology: a model The psychological impact of quarantine and how to reduce it: rapid review of the evidence The Structures of the Life-world Assessment of public attention, risk perception, emotional and behavioural responses to the covid-19 outbreak: social media surveillance in china The impact of covid-19 epidemic declaration on psychological consequences: a study on active weibo users The covid-19 social media infodemic Covid-19 infodemic: More retweets for science-based information on coronavirus than for false information 19 on twitter: Bots, conspiracies, and social media activism Coronavirus goes viral: quantifying the covid-19 misinformation epidemic on twitter Prevalence of low-credibility information on twitter during the covid-19 outbreak Evidence from internet search data shows information-seeking responses to news of local covid-19 cases Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (covid-19) epidemic among the general population in china A nationwide survey of psychological distress among chinese people in the covid-19 epidemic: implications and policy recommendations On death and dying Big data for infectious disease surveillance and modeling Digital epidemiology Social factors in epidemiology Risk perception in epidemic modeling Effect of risk perception on epidemic spreading in temporal networks Quantifying mental health signals in twitter Measuring emotional expression with the linguistic inquiry and word count The language of emotion in short blog texts Detecting anxiety through reddit The psychological meaning of words: Liwc and computerized text analysis methods Trust and suspicion Detecting cognitive distortions through machine learning text analytics Effects of prayer and religious expression within computer support groups on women with breast cancer Evaluation of computerized text analysis in an internet breast cancer support group Language use in eating disorder blogs: Psychological implications of social online activity What do you say before you relapse? how language use in a peer-to-peer online discussion forum predicts risky drinking among those in recovery Talk to me: foundations for successful individual-group interactions in online communities Measuring online affects in a white supremacy forum Trauma history and linguistic self-focus moderate the course of psychological adjustment to divorce The language of autocrats: Leaders' language in natural disaster crises Hate lingo: A target-based linguistic analysis of hate speech in social media Liberals and conservatives rely on different sets of moral foundations Enhancing the measurement of social effects by capturing morality Understanding individual human mobility patterns Stigma: Notes on the management of spoiled identity The language of fear: Communicating threat in public discourse Thematic coding and categorizing. Anal. qualitative data Crowdsourcing a word-emotion association lexicon Moral actor, selfish agent Data mining: concepts and techniques Using thematic analysis in psychology Interpretative phenomenological analysis With millions stuck at home, the online wellness industry is booming Coronavirus perceptions and economic anxiety Covid-19-related economic anxiety is as high as health anxiety: Findings from the usa, the uk, and israel Covid-19 hangover: a rising tide of alcohol use disorder and alcohol-associated liver disease Covid-19 and alcohol-a dangerous cocktail Ten social dimensions of conversations and relationships Extracting medical entities from social media The mental health consequences of covid-19 and physical distancing: The need for prevention and early intervention The effect of covid-19 on youth mental health Ptsd as the second tsunami of the sars-cov-2 pandemic A theory of human motivation Staying at home: mobility effects of covid-19 Mapping county-level mobility pattern changes in the united states in response to covid-19 Multidisciplinary research priorities for the covid-19 pandemic: a call for action for mental health science The emergence of consensus: a primer How people react to zika virus outbreaks on twitter? a computational content analysis Ebola, twitter, and misinformation: a dangerous combination Pandemics in the age of twitter: content analysis of tweets during the 2009 h1n1 outbreak The 2014 ebola outbreak and mental health: current status and recommended response Political beliefs affect compliance with covid-19 social distancing orders Political partisanship influences behavioral responses to governors' recommendations for covid-19 prevention in the united states Cultures and organizations: Software of the mind Personality maturation around the world: A cross-cultural examination of social-investment theory Share of u.s. adults using social media, including facebook, is mostly unchanged since Spatial, temporal, and socioeconomic patterns in the use of twitter and flickr The rise of social bots Online human-bot interactions: Detection, estimation, and characterization Social bots distort the 2016 us presidential election online discussion Weaponized health communication: Twitter bots and russian trolls amplify the vaccine debate Applications of google search trends for risk communication in infectious disease management: A case study of covid-19 outbreak in taiwan Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set A twitter geolocation system with applications to public health The emotions Moral foundations theory: The pragmatic validity of moral pluralism Mining the social web: data mining Facebook Long short-term memory GloVe: Global vectors for word representation Coloring in the links: Capturing social ties as they are perceived An integrative theory of intergroup conflict Universal Dimensions of Social Cognition: Warmth and Competence Exchange and Power in Social Life Bidirectional lstm-crf models for sequence tagging A robustly optimized bert pretraining approach Identifying diseases, drugs, and symptoms in twitter FLAIR: An easy-to-use framework for state-of-the-art NLP Automatic differentiation in PyTorch From tweets to polls: Linking text sentiment to public opinion time series New introduction to multiple time series analysis Simple algorithms for peak detection in time-series We thank Sarah Konrath, Rosta Farzan, and Licia Capra for their useful feedback on the manuscript.