key: cord-0543746-6sx7htyc authors: Tommasel, Antonela; Godoy, Daniela; Rodriguez, Juan Manuel title: Tracking the evolution of crisis processes and mental health on social media during the COVID-19 pandemic date: 2020-11-22 journal: nan DOI: nan sha: eaa27ee5e29a8be63bb79c3d467bb1492e62ac94 doc_id: 543746 cord_uid: 6sx7htyc The COVID-19 pandemic has affected all aspects of society, not only bringing health hazards, but also posing challenges to public order, governments and mental health. Moreover, it is the first one in history in which people from around the world uses social media to massively express their thoughts and concerns. This study aims at examining the stages of crisis response and recovery as a sociological problem by operationalizing a well-known model of crisis stages in terms of a psycho-linguistic analysis. Based on a large collection of Twitter data spanning from March to August 2020 in Argentina, we present a thematic analysis on the differences in language used in social media posts, and look at indicators that reveal the different stages of a crisis and the country response thereof. The analysis was combined with a study of the temporal prevalence of mental health conversations across the time span. Beyond the Argentinian case-study, the proposed approach and analyses can be applied to any public large-scale data. This approach can provide insights for the design of public health politics oriented to monitor and eventually intervene during the different stages of a crisis, and thus improve the adverse mental health effects on the population. The COVID-19 crisis has affected all aspects of society, not only bringing health hazards, but also posing challenges to public order, governments and mental health. Crisis can serve as both threats and opportunities, as despite the tangible risks to the public, they also draw awareness about the threats, which can be used to steer people towards productive and socially beneficial behaviors (52 ) . Over the last years, social media has become a part of the daily life of millions of people, as an important medium for exchanging messages in social platforms and for reporting events as they occur. In this sense, the COVID-19 pandemic is the first one in history in which people from around the world have been massively expressing their thoughts and concerns. Hence, there is an unprecedented opportunity to study this pandemic in light of the social media activity it generates, and how the propagation of COVID-related content connects with existing knowledge about crisis processes, mental health and other societal behaviors (e.g., emotions, crime). Several studies have aimed at identifying, modeling and understanding the varying stages through which crisis arise, evolve and dissipate (52 ) . These models have been useful for understand-D r a f t ing the response efforts, as each stage has its particular needs, thus requiring distinct strategies and resources (38 ) . Although useful, these models have limitations in practice to explain how particular stages are faced by a social collective or community, during the development of a crisis. A common approach is to study the needs and opportunities of a given community, as manifested by their individuals, government actors and other environmental factors. In this context, the analysis of the textual exchanges in social media provides rich information of online (and perhaps offline) behaviors of individuals. This analysis can provide insights on: how the perception of the crisis is evolving, how individuals are coping with the crisis, and what their needs are, among others. In addition, social media is changing how people manifest and communicate aspects related to their mental health state. For example, nowadays individuals are more prone to self-identify as suffering from a disorder and to communicate with others sharing similar experiences, which permits to observe the mechanisms underlying mental health conditions during crisis from different perspectives. Similarly, as governments struggle to develop effective messaging strategies to support society, being able to analyze how society perceives and responds to those messages becomes crucial for decision makers. In this work, we present a study and a supporting approach for characterizing crises like the COVID-19 pandemic based on the usage of language in social media. The approach allows human actors to monitor the evolution of a crisis through its different stages, and eventually plan for interventions, helping to improve the mental health effects of the crisis. To that end, our study aims at examining: i) the prevalence and evolution of mental health markers, ii) the evolution of emotions, and iii) the stages of crisis response as a sociological problem. A key aspect of our approach is the operationalization of a model of crisis stages in terms of lexicons for psycho-linguistic text analysis. Particularly, we performed a thematic analysis on the differences in language use in social media posts with respect to different crisis stages, based on a large collection of Twitter data collected from March to August 2020, containing commonly used hashtags belonged to specific user accounts related to Argentina (55 ) . This analysis was combined with a study of the temporal prevalence of mental health conversations (for example, related to depression) across the time span, which shed light on relationship between the crisis stages and the mental health of individuals. Different NLP techniques were used for pre-processing the Twitter data. Furthermore, we developed a heatmap-like visual metaphor for tracking the evolution of the crisis stages as a function of different dimensions of the target model. We believe that this work can contribute to a better understanding of the manifestation of psychological processes related to crisis as they reflect on Spanish-based social media. Thus, it can support the design of public health politics oriented to preserving the mental well-being of individuals during crises. Furthermore, the proposed approach is not tied to the Argentinian case-study, and it can be applied to other largescale data streams or even incorporate alternative disaster models. The rest of this paper is organized as follows. Section 2 presents background concepts of both mental heath and sociological crisis models, as well some related works. Section 3 describes the approach for operationalizing the selected psych-social theories and applying it to the collected tweets. Then, Section 4 presents the performed analysis. Finally, Section 5 draws the conclusions of this study including its limitations and future lines of work. Since the beginning of the health crisis due to the COVID-19 pandemic, social media has been a rich source of information for both analyzing the phenomenon and mitigating its effects. Datasets of different sizes and characteristics have been released to support the study of the social phenomenon around this pandemic. The sampling and collection of social media data, enable researchers (and other actors) to examine different aspects of people's reactions to the pandemic as well as their D r a f t direct and indirect consequences, as expressed through the use of language in social texts. As an example, Li et al. (30 ) analyzed psychological characteristics in a two-week period before and after the declaration of the COVID-19 outbreak in China on January 20th 2020. Weibo original posts during such period were sampled to explore the impacts of COVID-19 on people's mental health. LIWC (44 ) categories were compared the week before and after the mentioned date, considering categories related to emotions (e.g. positive or negative emotions, anxiety, anger) and concerns (e.g. health, family friends). As expected, the study showed that negative emotions (anxiety, depression, and indignation) and sensitivity to social risks both increased, whereas positive emotions (Oxford happiness) decreased. The focus of concerns also drifted as people were more concerned about health and family, and less about leisure and friends. Hou et al. (27 ) also analyzed Weibo posts using LIWC to assess public emotion responses to epidemiological events, government's announcements, and control measures in a period between December 2019 and February 2020. The psycho-linguistic features observed in the study were negative emotions (i.e. anxiety, sad, anger), and risk perception (i.e. drives). Three peaks were reached by all these features during the analyzed period and manifested within 24 hours after the triggering event took place. Aiello et al. (2 ) empirically tested at scale the model proposed by Strong (53 ) according to which any new health epidemic resulted into three social epidemics: of fear, moralization, and action. The authors characterized the three social epidemics based on the use of language on social media by means of lexicons and their goal was to embed epidemic psychology in real-time models (e.g., epidemiological and mobility models). Recently, social media has been also used to understand health outcomes through quantitative techniques that predict the presence of specific mental disorders and symptomatology, such as: depression, suicidality, and anxiety (8 ) . Evidence indicates that the rigorous application of even simple natural language processing, computational linguists and psycho-linguistics techniques can yield insights into mental health disorders (9 , 12 , 34 ) . Some works have explored these techniques in the context of human-made as well as natural disasters. For instance, Gruebner et al. (23 ) aimed to identify specific basic emotions from Twitter for the greater New York City area during Hurricane Sandy in 2012. Lin and Margolin (32 ) used geo-coded tweets over an entire month to study how Twitter users from different cities expressed three different emotions (fear, sympathy and solidarity) in reaction to the Boston Marathon bombing in 2013. (13 ) created a depression lexicon using a labeled collection of Twitter posts associated to symptoms, among other dimensions. Furthermore, a crisis lexicon called CrisisLex (41 ) was created from the sampling of Twitter communications that can lead to greater situational awareness during this kind of crises. These works are mostly oriented to provide real-time tools based on social media to assist during emergency situations and, eventually provide guidelines for first responders. Disasters and crisis have been described as occurring in phases or stages DeWolfe (15 ), Neal (38 ) , which assign order and rationality to the complex reality of disasters and the human responses to them. Phases aim at identifying periods in the unfolding of a crisis, serving to classify the impact or actions that take place to address such impacts (29 ) . Considering a temporal dimension, (16 ) made a fourfold division of disaster phases: Preparedness, Response, Recovery, and Mitigation. The first stage, preparedness, involves actions tending to the elimination or reduction of the effects of a potential disaster. The response phase occurs in the immediate aftermath of a disaster and involves actions in response to challenges caused by disasters (e.g. lack of communications). Then, a phase of recovery takes place in which things starts to return to normal. Mitigation, in turn, refers to sustained actions to reduce or eliminate long-term risks from a disaster occurrence and its effects. Nonetheless, phases should not be considered as discrete events, in which social change is based on a unique episode, but a series of a cycle of events. In this context, the linear division between phases could not represent the reality of every reported or analyzed event (38 ) , as it might not be easy to find a standard set of measures that identify or quantify how a society evolves through the crisis. In this sense, phases might overlap, and should not include an objective time definition. D r a f t Instead, as each combination of crisis and society is different, the duration and transition between phases should be adjusted to the "social time", which accounts for the needs or opportunities of societies (17 , 38 ) . As the stages of crises develop, a range of personal emotions often emerge in response to the situation itself as well as to the social disruption and uncertainty that it causes in people. A crisis not only disrupts the quality of life but also creates a burden of mental health conditions (52 ) , because exposure to a crisis can be a stressor that affects individuals' expectations about the future, challenging their world views, and triggering emotional reactions (51 ) . For example, works as (54 ) found initial evidence that people post about their depression on social media, and observed that words about symptoms dominate. The posts often provide details about sleep, eating habits, and other forms of physical ailment, all of which are known to be associated with occurrence of depressive episodes. Likewise, in (3 ) depression is predicted starting from nine categories associated to possible symptoms, such as: sadness, loss of interest, appetite, sleep, thinking, guilt, tired, movement and suicidal ideation. In this context, social media offers the opportunity of monitoring and understanding the mechanisms underlying mental health conditions during crisis at a massive scale. This kind of monitoring is also the first step to propose actions that can provide "virtual" support to affected individuals. From the precedent works, we recognize the potential of social media posts for understanding the different stages of a crisis and mental health-related aspects in a community. One of the challenges here is the bi-directional linkage between a given social theory (or model) and the "reality" as reflected in the texts of the posts, so that the theory becomes actionable for the crisis. On one side, a given theory can provide a structure or framework to reason about the vast flow of posts in social media. On the other hand, the empirical data extracted from the posts can serve to instantiate the different parts of a theory for a given crisis (such as the COVID-19 pandemic), and help up to track the evolution of the phases or stages prescribed by that theory. A systematic analysis of the data can suggest courses of action for managing the crisis, or even signal adjustments to the underlying theory. In this work, we take a step towards this vision by studying a large dataset of COVID-19 tweets, and moreover, by proposing an analysis pipeline that integrates text processing, psycho-linguistic lexicons, and visualization techniques. This section describes the proposed approach for characterizing the crisis phases from language usage on social media for analyzing the prevalence of emotions and mental health discussions. The approach is schematized in Figure 1 , and it involves the following steps: 1) data collection and pre-processing, 2) operationalization of sociological theories into lexicons, 3) matching and scoring the pre-processed tweets according to the lexicons, and 4) analysis of the results (time series and heatmaps). This approach is guided by three research questions, each one related to whether psycholinguistic techniques can provide evidence of different aspects of the elements under analysis, i.e. mental health, emotions and crisis. D r a f t The remainder of this section provides details of the steps of the proposed approach. Analyses are based on the SpanishTweetsCOVID-19 data collection, which is a large-scale sample of data shared in Twitter during the COVID-19 pandemic in Argentina. We chose Twitter as it is one of the most commonly used social media site, and its role as a public, global and real-time communications provides a glimpse on contemporary society as such (57 ) . Twitter additionally enables an easy access to its data, in comparison to data of other social media sites. The collection provides a broad perspective on the dynamics during this health crisis in Spanish-speaking countries, but centered in Argentina. The dataset consists of a collection of Spanish tweets, complemented with geographical information and the possibility of deriving content-based relations between users from the tweet sharing activity. SpanishTweetsCOVID-19 includes more than 150 million tweets, collected between March 1 and August 30 2020, and it is publicly available in Mendeley(55 ) 1 . The raw data belonging to the 145 million Twitter posts were retrieved from the Twitter API using the Faking it! 2 tool, which internally uses Twitter4J for easily integrating with the Twitter API 3 . The collecting process was based on the Twitter Streaming service, which provides real-time access to the shared tweets. The stream was filtered according to the parameters shown in Table 1 . To be retrieved, tweets had to be identified as written in Spanish, and include any of the selected keywords, or refer to a selected user belonging to the official Argentinian government offices or media. We also considered the retrieval of tweets located inside the Argentina geographical bounding box. D r a f t Figure 2 shows the monthly distribution of the tweets collected using the queries in Table 1 and their type (original, retweets and replies). The number of tweets in March represents less than 1% of the total collected tweets, thus it is not included in the figure. As it can be observed, the original tweets were only the 19% of the dataset, whilst retweets accounted for the 70%. The remaining tweets correspond to replies. Interestingly, the number of collected tweets reached its peak in June. For the purpose of our analyses, only original tweets and replies were considered, as they are the ones in which writers provide sufficient content so as to evaluate aspects related to the crisis evolution and mental health. Table 2 summarizes the statistics of the collected dataset, excluding retweets. Hashtags are only present in the 19% of the tweets. An inspection of the data showed that neither official government accounts, politics nor media included hashtags in the majority of their tweets. The median number of shared tweets per user was 2, whilst the 40% of the accounts tweeted over the median. The collected tweets were pre-processed to remove special characters, URL and mentions. Hashtags were kept without the numeral symbol and were split into their constituent words. After an inspection of randomly selected tweets, spelling corrections were not applied as most of the tweets were correctly written, and we did not detect an abuse of abbreviations. As our approach relies on word matching, and the original Empath lexicon was not lemmatized, we did not apply neither D r a f t The language used in social media for expressing opinions, personal situations and communicating with others provides signals about a person state of mind and situation. In this sense, lexicons are a rich tool for analyzing language in social texts across a broad range of categories, including emotions, concerns and health-related issues. Existing language lexicons have been widely used in psychometric studies as well as sentiment analysis. Some examples are the Linguistic Inquiry and Word Count (LIWC) (44 ), Emolex (35 ) , the Prosocial Behavior Lexicon (21 ) and LEW list (20 ), among others. When no training data is available, the availability of domain-specific lexicons plays a fundamental role on the automated analysis of texts. Thus, accurate lexicons can be of valuable guidance to understand the person behind the text. In this study, the psycho-linguistic analysis of social texts was based on the multiple categories provided by two specific lexicons: • Empath (18 , 19 ) covers a broad, human-validated set of 200 emotional and topical categories drew from common concepts in the ConceptNet (33 ) knowledge base and Parrott's hierarchy of emotions (43 ) . Categories have a number of seed terms representing the concept, which can be further expanded to obtained similar categorical terms. Empath categories have been shown to be highly correlated with similar categories in LIWC and EmoLex. Considering the positive experiences reported by Perczek et al. (45 ) , who analyzed the reliability of Spanish translation from English for psychometric scales and lexicons, as the lexicon is only available in English, we automatically translated the words associated to the categories using the IBM Watson Language Translator 4 . Then, each of the authors independently checked the translation for inconsistencies. In addition, considering the gendered nature of Spanish, in case a word in a category is associated to a specific gender, we added the words corresponding to the other gender, and also a gender-neutral version (if existed). D r a f t • SentiSense Affective Lexicon (10 , 11 ) consists of synsets from WordNet 5 labeled with an emotional category. Particularly, this lexicon consists of 2,190 synsets labeled with 14 emotional categories derived from the ones proposed by Arnold (4 ), Plutchik (46 ) and Parrott (43 ) . Since SentiSense Affective Lexicon is integrated with the WordNet Spanish version, it can be directly applied to the analysis of our tweets. These categories are the ones used for the analysis of emotions. The operationalization of the crisis stage and mental health theories was accomplished in three steps, namely: i) adaptation and expansion of lexicons based on a thematic encoding and categorization; ii) contextualization of the expanded lexicons based on semantic similarity; and iii) association between the expanded lexicons and the Empath categories that will later be used in the analysis of the pre-processed tweets. The contextualization of the expanded lexicons was based on FastText 6 embeddings. The goal here was to capture the context in which the words in the lexicon appear, in order to gather a more accurate picture of how individuals express and manifest aspects related to crisis evolution and mental health. Then, for each word in our lexicons, we selected the top-10 most similar terms in FastText, based on the traditional Wikipedia model. Finally, we matched each of the expanded terms to the Empath categories to automatically retrieve the top-10 most prevalent categories for each disorder, i.e. the categories with the highest number of shared words with the lexicons. We refer to the selected categories as markers for the associated lexicon. The original lexicons, expansions and the translation of Empath categories can be found it the corresponding companion repository 7 . The following subsections describe the operationalizations of the mental health and crisis stages. In public mental health terms, the main psychological impact of the COVID outbreak have been elevated rates of stress and anxiety (6 ) . However, as a consequence of public health policies, such as social distancing and the uncertainty generated by social and economic situations, a rise on depression, levels of loneliness, addictions and suicidal behaviors can be expected. Thus, in this study we focused on anxiety, stress and depression as the three main concerns regarding mental health in the population. As previously mentioned, several works (3 , 13 , 54 ) have relied on lexicons to automatically analyze the level of depression in texts from social networks. Based on such lexicons, each of the authors manually hand-coded the characterizations, symptoms and manifestations of each selected disorder as defined by the National Institute of Mental Health 8 and the Anxiety and Depression Association of America 9 . This coding generated independent lists of keywords that were combined using a voting strategy. Then, the resulting lexicons were augmented using FastText to not only included descriptions of the manifestations of the selected mental health disorders, but also, additional words that are commonly used or provide context to the manifestations, as in (31 , 34 ) . Finally, we matched each of the expanded lexicons to the Empath categories. Table 3 shows a summary of the words associated to each disorder, and the corresponding Empath categories. Several authors have studied the sociological responses to natural disaster events (16 , 49 ) , based on surveys, interviews and narratives. Nowadays, social media presents new ways to explore communication during a crisis (39 ) . Following the crisis stages defined by Neal (38 ) , we attempted to characterize them based on the communicative ways in which the social collectives organize themselves (49 ) . Departing from the crisis stages defined by Drabek (16 ), Richardson (49 ) and DeWolfe (15 ), as well as the crisis lexicon proposed by Olteanu et al. (41 ) each of the authors manually hand-coded each of the described stages. We obtained keywords related to the four traditional crisis stages, namely: preparedness (including the aspects related to planning and warning), response (including the aspects related to impact, the heroic and disillusionment sub stages), recovery and mitigation. From the crisis lexicon we only kept those words related to actions, sentiments, and emotions, removing all words that were related to a particular type of crisis (e.g. tornado, flood, explosion, storms, among others). The selected keywords include not only aspects related to the functional aspects for each of the stages by the different actors (e.g. individuals, society, government), but also aspects related to the perceptions and concerns during a crisis (e.g. the false sense of security that is commonly felt during the preparedness phase), and aspects related to mental health. As previously mentioned, the resulting lexicons were augmented using FastText and matched to the Empath categories to retrieve the 10 most prevalent categories for each disorder. Table 4 shows a summary of the words associated to each disorder, and the corresponding Empath markers. As it can be observed, some markers are shared across stages, meaning that the concepts or keywords that describe each stage might be relevant to the others. This implies that stages are not linearly divided, as Kelly (29 ), Neal (38 ) stated. D r a f t D r a f t In the context of our research questions, we analyzed the prevalence of the Empath categories (or markers) associated to each lexicon, by matching the categories to the text in each tweet. A tweet was considered to match a marker if at least one word belonged to such marker. This matching does not consider the number of matching words in tweets, because multiple occurrences might be rare due to the restrictions on tweets length, which might not adequately reflect the actual intensity of the category. To summarize the prevalence of a category on a particular day, we computed the percentage of tweets on such day that matched with such category. Once we computed the matching for a marker over the full-time span, we obtained its time series distribution. In average, in the collected dataset 189.260 tweets were shared per day, with a maximum monthly average of 396.228 in June, and a minimum average of 800 in March. To reduce the impact of dayto-day variations and weekly periodicity in the obtained time series, and thus better expose the characteristics of the time series, we applied a smoothing considering a window of one week. This smoothing causes the time series to respond more slowly to recent changes, which in turn favors the observation of more consistent behaviors over longer periods of time (in opposition of instantaneous shifts). Given the smoothed time series, we proceeded to identify phases characterized by the different subsets of Empath markers associated to each of the lexicons. To do so, we searched the time series for break points or peaks, which are defined as points in time in which the values of the involved categories varied altogether. These variations were presumably caused by COVID related events. Over the identified peaks, we only kept those whose prominence values was higher that the mean plus one standard deviation (42 ) . Due to the smoothing, the events leading to the peaks should not be searched on the exact day the peak appears, but also on the events of the days leading to the peak. Peaks were not computed over the smoothed time series, but over the smoothed gradient of the time series distribution. Gradients allowed us to measure magnitude of change (either an increment or decrement) of the time series. This section presents the psycho-linguistic analysis of: i) the prevalence and evolution of mental health markers, ii) the evolution of emotions, and iii) the stages of crisis response as a sociological problem. The first research question is about the extent to which tweets contain references to mental health problems, how such references evolve over time with the COVID-19 pandemic, and whether perceptible changes could be observed in the prevalence of the categories associated to the mental health problems. In particularly, we analyzed references to anxiety, depression and stress, which are three of the most commonly analyzed mental illnesses or disorders. Figure 3 presents the temporal distribution of the Empath categories (i.e. markers) for the three disorders during the span March-June, as well as the peaks detected using such markers. The darker the area the higher the prevalence of the associated markers. The analysis does not include the span July-August due to the different orders of magnitude across the prevalence of markers 10 . As Argentina moved into 100 days of lock-down (late-June), the markers showed an increment in their prevalence, which hindered the analyses and eclipsed the changes in the preceding months. D r a f t Table 5 shows the correspondence between the discovered peaks, and events in Argentina related to the COVID-19 pandemic and the accompanying political and economic situation. As regards anxiety (Figure 3a ), the time series shows the apparition of darker areas in the days following the confirmation of the first COVID case and previous to the suspension of activities and the official declaration of lock-down (between March 7th and March 14th), particularly affecting the markers of confusion, horror, disappointment and nervousness. On the other hand, the less affected categories are suffering and health. This correlates to the apparition of the first peak around March 8th. The burst in anger in early March could have been caused by a political event related to the COVID by which the executive branch of government was assigned "special faculties" to discretionally dictate acts of law and make budget allocations. This was also accompanied by declarations of the Health Minister underestimating the virus. These observations match the ones made by Aiello et al. (2 ) after the detection of the first contagion in the US, when there was a peak of anger, fear and anxiety, which decreased after the declaration of lock-down. After the announcement and start of the first lock-down (March 20th -March 21st), it can be observed that the least prevalent marker was disappointment, and a mild decrease in the intensity of nervousness, sadness, suffering and health related tweets. An analysis of the tweets corresponding to such days, shows the faith that people had on the government decisions, the believe on lock-down as something positive, and how grateful they were for how the government was taking care of the population. These observations agree with the results of a survey 11 collected in Argentina at the end of March, in which people reported an increased sense of security against COVID and a higher sense of awareness regarding the prevention actions, when compared to the start of lock-down. Then, as the lock-down continued, we can observe an increment in the conversations related to health and suffering, followed by confusion, fear and anger. Reaching the 100 days of lock-down (June 27th), D r a f t the prevalence of suffering, health and anger reached their highest values for a continuous week. In a context with high anxiety markers, the increment in the prevalence of the health category could be related to the phenomenon known as "health anxiety" (6 ) . This phenomenon arises from the misinterpretation of perceived body sensations and changes in combination with the consumption of inaccurate or exaggerated information from the media (5 , 6 ). These observations are in accordance with Asmundson and Taylor (6 ), who reported an increment in health anxiety, particularly in areas where the number of people affected by COVID-19 were continuously increasing. At the individual level, health anxiety can manifest as maladaptive behaviors (e.g., the avoidance of health care even with genuine symptoms 12 , or the hoarding of particular sanity items 13 ). Then, at a society level, it can lead to mistrust of public authorities, which in turn, can influence the success or failure of the public health strategies put in place. In this context, it is critical for decision makers to understand how health anxiety can influence the responses of individuals and society to the health recommendations (48 ) . As Table 5 shows, the apparition of anxiety peaks could be explained by a sequence of events. Most of them are related to the situation of the capital city of Argentina (called Ciudad Autónoma de Buenos Aires, CABA) and the biggest state (called Buenos Aires), which defined the lock-down path for the rest of the country. In the mapping, there also appear events related to debt negotiation (which altered the rate exchange of the ARS and thus good prices), and to citizen manifestations in reaction to decisions of the executive branch of government. When individually analyzing the markers, it can be observed that their peaks, although appear on similar time spans, they do not share the same prominences. Instead, the health marker presents the most prominent peaks, followed by fear. When observing depression (Figure 3b) , the tendencies are similar to those of anxiety due to the shared markers. Nonetheless, the distributions show a higher prevalence of most categories since early-April, when it was announced that the lock-down was extended for 15 extra days and that the allowed activities would be limited. The topics discovered around those days express concerns regarding both health and the economic situation. The hashtag "#quedatenecasa" (stay at home) still appears on the discovered topics. The highest prevalence is observed for sadness and suffering across most of the time span. When reaching the 100 days under lock-down in late-June, it can be observed a high prevalence of the 60% of categories. On the contrary, emotional, torment and disappointment did not show a high prevalence across the time spam. The first lock-down and its extensions were the most restrictive in terms of the activities that were allowed at a country level. In this sense, the population started to manifest their discouragement regarding the adopted measures. The peaks for depression are close to those of anxiety, with the addition of a new one on March 27th. As Table 5 shows, that date marks the first week of lock-down and a televised announcement of the president in which he declared the first lock-down extension for two additional weeks. When individually analyzing the markers, both suffering and sadness showed the most prominent peaks. Finally, as regards stress (Figure 3c) , it can be observed the highest prevalence of most of the categories for the longest time span. The categories associated to this disorder match subsets of the categories associated to both anxiety and depression, with the addition of the stress category. As for depression, the detected peaks match those of anxiety. Similar as for anxiety, the peaks corresponding with the individual markers are dominated by the health marker, followed by suffering and fear. Recent evidence has suggested that people who are kept under lock-down experience significant levels of anxiety, anger, confusion, depression and stress (50 ) . In this context, the variations observed in the markers associated to each of the selected mental disorders as the COVID-19 lock- 12 In this regard, the official government communications asked people to stay at home and avoid attending to hospitals, which resulted in the population avoiding hospitals even when showing symptoms, and hospitals suspending services. 13 To avoid this, in mid-April, the government fixed the prices of hand sanitizer. (40 ) determined through surveys and interviews changes in stress and depression markers through the crisis period. In addition to the direct COVID-19 situation, the high levels of anxiety and stress could be also related to the economic consequences of lock-down (37 ) . Finally, the evolution of the manifestations of the mental health disorders also showed a correspondence with the psychological states of a crisis (15 , 16 ) , by which it is expected to first have an anxious phase, followed by stress and depression. Sentiment analysis has been shown as an effective tool to detect social media content that can contribute to situational awareness, as it can help to understand the dynamics of individuals (24 ) . For example, how individuals are coping with the causes and effects of the crisis, which are their main concerns, or the emotional burden of the crisis. Nonetheless, some studies (36 , 58 ) have argued that it is not enough to solely focus on mental health issues or a global sentiment score, as they might miss the complexity of the full range of emotional responses to not only the direct effects of crises, but also to the social, economic and environmental changes caused by them. Hence, assessing multiple emotion dimensions might provide insights regarding how a crisis or disaster is experienced by the affected individuals (22 ) . In this sense, we consider a subset of the SentiSense emotions, which are shown in Figure 4 for the span March-August, and the detected peaks for both the positive (i.e. surprise, calmness, joy, love, hope, like and anticipation) and the negative (i.e. despair, hate, anger, sadness, fear and disgust) emotions 14 . The darker the color the higher the prevalence of the emotion for such day. Unlike for the mental health markers, July and August did not show a magnitude order increment in the prevalence of emotions. The emotion evolution shows the appearance of areas with a higher prevalence of emotions in the days following the confirmation of the first case until the announcement of the lock-down, affecting both negative and positive emotions. The positive emotions could be related with the mentioned confidence in the decisions made by the government, and at the same time the feeling of being taken care. At the same time, there was also a small change in anticipation, which could be related to a sense of uncertainty regarding what to expect during lock-down. The prevalence of emotions is similar to those of the detected Empath categories, for example in both cases anger and sadness showed a high prevalence mid-March, around the time of the first official prevention measure and lock-down announcements. In early-June the lock-down was extended for its sixth time until the end of June, with new allowed activities for the big cities, including running, individual outdoor sports and shopping. Unlike the analyzed mental health traits, most emotions showed a low prevalence period around Table 6 .: Events associated to peaks observed for the emotions analysis mid-June, which could be related to a sense of optimism for the newly authorized activities. Then, late-June contagions increased, leading to the following extension of the lock-down, and the start of the most emotional period. When observing the peaks for the negative emotions, they are dominated by changes in fear, disgust, and (at a lesser extent) hate, which achieved their biggest changes in early-March (the first COVID related announcements including the first lock-down), and mid-to late-June (when reaching the 100 days of lock-down), which match the changes observed for the mental health traits. Then, disgust and fear changed in the same magnitude in June. Changes in the remaining emotions were less prominent. Despair was the emotion with the smallest changes. Small peaks were observed for most emotions around the dates of the lock-down extension announcements. By July, none of the negative emotions showed prominent peaks, indicating a sustained level of emotions, with no sudden changes. Table 6 complements Table5 matching the discovered emotion peaks with events. As regards the positive emotions, peaks show similar tendencies than those for the negative ones. Both early-March and late-June showed the highest peak density and prominence. Unlike for the negative emotions, in late-July and mid-August there appeared there appeared a few surprise and calmness small peaks. These peaks match different political and economic events. For example, there is a peak around the celebration of Argentina Independence Day (July 9th) and the day the president presented the plan for recovering the economic situation after the pandemic (July 11th). Then, the surprise peaks in August match the new extensions of the lock-down and the authorization of new activities in the capital city. These observations are in agreement with those of Gruebner et al. (24 ) who found differences in the level of discomfort (represented by six negative emotions: anger, confusion, disgust, fear, sadness, and shame) of social media users before, during and after a disaster. Particularly, they observed that negative emotions tended to increase during and after the crisis, in contrast with the emotions during the warning or planning phase before the crisis. Similarly, Gruebner et al. (22 ) observed a prevalence of fear and surprise after the disaster (in this case, this could mean the D r a f t days after the announcement of the first lock-down), and an excess of sadness. Additionally, the authors determined that the emotions showing the highest prevalence (they referred to them as emotions showing an excess of risk) were anger, fear, sadness, disgust and surprise, which match the emotions dominating the peaks in our observations. Moreover, the evolution of emotions shows a behavior similar to the observed by Aiello et al. (2 ) for the fear social epidemic in the two-month period between the detection of the first case and the declaration of lock-down, in which there is a prevalence of fear and anger, followed by sadness and some positive emotions. The observation of the changes in the prevalence of emotions and the evidence of their interrelations in agreement with other works in the literature allow to answer RQ2, reinforcing the role of social media as a means for monitoring emotions and, consequently derived negative mental health outcomes. The evolution of the emotions could be also associated to the psychological states of crisis (15 , 16 ) by which it is expected to have a sense of anticipation in the days leading to the prime event of the crisis (in this case, it could include the days inbetween the confirmation of the first case and the announcement of the lock-down), followed by a period of negative emotions associated with states of anxiety. Then, there can be brief periods of optimism, which are again followed by a prevalence of negative emotions and stress traits. Based on the described operationalization of the traditional crisis stages Drabek (16 ) , Neal (38 ), Figure 5 presents the temporal distribution of references to each of the particular markers associated to the classical disaster stages (15 , 38 ) : preparedness, response, recovery, and the shared markers across stages, for the span March-June, along with the detected peaks. As for the mental health analysis, during July and August we observed different orders of magnitude across the markers, which hinders the observation of changes in the previous months. The superposition of the categories shared by preparedness, response, recovery show their coincidences and their interweaved nature. The mitigation phase is not included in the analysis as it is related to activities after the recovery of the current event, and in preparation of future similar events. Due to the nature of the pandemic, the current state of the health situation in Argentina, and a preliminary analysis of the categories associated to this phase, it does not seem that Argentinian society has yet arrived to this stage. Table 7 complements Table reftab:peaks-events and Table reftab :peaks-events-emotions matching the discovered stages peaks with events. As regards the preparedness phase (Figure 5a, 5d) , as for anxiety, the distribution shows the apparition of areas of high prevalence of anticipation, trust and nervousness in the days following the confirmation of the first COVID case and previous to the first official announcements. The peaks for this phase match the areas with high prevalence in March. Aggression could be related to anger, as both serve as a self-defensive mechanism for coping with the situation. Anger could also be explained in terms of the reactions to the political events previously described for anxiety. When individually analyzing the categories, communication presents the most prominent peaks from March until mid-April, matching the time span in which the government made an effort to promote safety measures for preventing contagions, and the announcements of the first extension of lock-down, and the first locally produced COVID tests. During this time, the government promoted the idea that Argentina was leading the fight against the COVID and that the efforts were worldwide renown, which could have spiked the sense of trust. In June, when the contagions increased again after the first lock-down flexibilization, another communication peak appeared, which could be related with the need of reminding individuals of both the prevention measures 15 and the consequences of the disease 16 . D r a f t When observing response (Figure 5b, Figure 5d ), a peak appeared after the confirmation of both the first contagions and the first death. Nonetheless, markers started to show their high prevalence around mid-April (matching the second peak) when lock-down was once again extended, contagions started to gradually grow, and individuals started to be aware of the possibilities of contagion and the consequences of the disease. As previously mentioned, the prevalence of health could not only be related to the diffusion of prevention guidelines or the description of physic symptoms or other health complications, but also to the phenomenon of health anxiety. According to Richardson (49 ) , during a crisis, people might suffer "survivor's guilt", which might manifest by shame for still having a job, for being able to work (the lock-down in Argentina prohibited all activities but a small subset of exceptions), which might explain the high relevance of shame. Then, shame or even self-guilt can be followed by a period in which there is a need to blame someone. In this case, around mid-and late-June, when several activities were allowed in CABA, the culprit of the new contagions were the runners, which were the target of a blaming campaign in social media. This situation matches a detected peak. Then, the remaining dark areas match the spans over which the president make announcements regarding the health situation in Argentina, and extended the lock-down. Several theories have explored the relationship between crisis and the prevalence of crime (47 Figure 6 .: Prevalence of the Empath categories associated to crime over the span March-June the social cohesion and the collective efficacy rendering the community unable to self-monitor and sanction antisocial behavior. In turn, this implies that the rate of criminality tends to rise due to the lack of controls. There is a third group of theories (such as Routine Activity) that suggest that changes in crime rates are the result of changes in the socio-structural organization of everyday activities, reflecting the convergence of three necessary elements: the availability of crime targets, the absence of guardians (such as the police) and the presence of motivated offenders. An analysis of the Empath categories related to crime (shown in Figure 6 ) revealed that crime did not seem to be a preoccupation during the early stages of lock-down, which is in line with the government declarations in late-March and mid-April claiming that stealing was reduced due to the decrease in movements in the cities. Nonetheless, during late-April and May the prevalence of the categories increased (along with the category related to government mentions), which could be associated the release of prisoners for fear of contagion, and the rise in crimes against the property by which individuals started the usurpation of private land 17 . Later in August, the government admitted a rise in criminality. In social terms, the recovery stage (Figure 5c, Figure 5d ) covers the attempts to return to a "normal life" mixed with fear, anxiety, depression, rage, irritation with changes in the daily life, and the appearance of conflicts at the community level, among other characteristics (16 ) . In this sense, for this stage, there are two non-overlapping markers, irritability and rage, which showed a high and continuous prevalence starting in mid-May. By late-May, Argentina had already endured 5 lock-down extensions and the number of contagions was still on the rise. In parallel, the press conferences led by the government included comparisons to other countries to show how well the health situation was been administered (e.g. Sweden, Chile, Spain, and Brazil), which were debunked. The fact that the markers associated to response and recovery show a high prevalence around the same time spans shows the interweaved nature of the phases and that recovery actions were followed while society was still enduring the impact of the pandemic. For example, while the number of contagions was still rising, the government tried to better equip the health system by providing more budget and equipment to the most affected regions. The high prevalence of the markers associated to the three stages during July and August could be perceived as a return to the early stages of the pandemic as, at that time, the reporting of contagions and deaths as well as the positivity rate started to rise. At the same time, the economic situation continued to aggravate as several sectors were unable to resume working. This situation is then expressed in social media by the observed exacerbated prevalence of markers associated to mental health and emotions, as previously showed. According to the observed changes in social media activity, the identified mental states and emotions, and the stationary prevalence of the derived markers, it can be stated that Argentina has traversed the three classical stages of crises, thus providing evidence for answering RQ3. Based on the analysis, the preparedness spanned between early-March and mid-April, covering the time when society was still trying to make sense of the situation, and the first prevention measures were adopted after the first cases, deaths and the first lock-down periods. Then, the response stage started mid-April, when the lock-down was well established, the development of national tests and the announcement of the first economic and political measures were announced, for example, the creation of a universal salary for helping those that were not able to work under the restrictions. Finally, while continuing in the response stage, a brief recovery stage started in late-May and D r a f t early-June, with the first lift of restrictions in the big cities, signaling the first brief steps towards a new normality. This situation was then reverted in July and response took prevalence again as the number of daily contagions rose and more restrictive lock-downs were applied. To further reinforce both the differences and relations between the three stages, Table 8 reports the maximum percentage difference between the prevalence of the marker in each of the stages compared to the median prevalence of the marker across the span March-June. In each row, the maximum value is highlighted. As the Table shows, in most cases the highest percentage difference of a marker is observed for its corresponding stage. The two exceptions are aggression and health, both showing a higher prevalence in the recovery stage. For those markers that are shared across stages, the average prevalence differences across phases are lower, i.e. the category usage is more evenly distributed across phases than for those categories associated to a unique phase. The COVID-19 pandemic has profoundly affected all aspects of society, not limited to the physical health, but also to mental health, economics (e.g. affecting employment conditions, financial insecurity and poverty) and even twisting political decisions. At the time of writing this paper, Argentina has just endured 200 days of lock-down, while being at the top-10 countries with most contagions. Even though some restrictions have been lift, there still holds the restriction of freely moving around the country, and students have not yet returned to schools or universities. In this context, measuring the effects of the pandemic on individuals and societal dynamics is vital to D r a f t understand the policies used to manage the pandemic, which should aim at achieving a right balance between the disease control and the mitigation of the negative socio-economic effects (26 ) . Moreover, government agencies should provide accurate information on the state of the pandemic, refute rumours in a timely manner and reduce the impact of misinformation, which should result in a sense of public security and trust (50 ) . Most studies on the effect of crises and disasters have solely relied on post-disaster data, which were obtained through surveys or questionnaires (24 ) and conducted in a small-to-medium scale, which may cause important information to be missed. We believe that social media can complement these studies by enabling opportunities to track individual behavior and perceptions over longer periods of time, from pre-to post-crisis, and at a large scale. Moreover, analyzing social media can also help to monitor the spread of the disease, the awareness about the disease and symptoms, and the responses to health and government recommendations and policies, thus helping in the design of effective communication interventions (14 , 56 ) to provide reassurance and practical advice. From a theoretical point of view, our analyses provide a contextualization of the traditional stages of a crisis, showing their corresponding manifestations in social media, which in turn verifies the processes and patterns previously observed through surveys. In addition, it helps to discover mental health markers, which are necessary for the design of health prevention policies (26 ) . The analysis approach presented in this study can, in principle, be applied to any large-scale data stream and have practical implications on the monitoring of individuals' behaviors, perceptions, mental health and emotions, which are useful indicators to understand mechanisms and propose interventions. Based on the independence of the defined lexicons with respect to the nature of a crisis (i.e. the lexicons do not reflect particularities of the crisis at hand, but rather they characterize mental health markers and the generic stages of crises), the analysis, process and techniques can be applied to future crises and disasters, provided that social media data can be collected. There are also interesting aspects to consider in future works. First, we characterized the COVID-19 pandemic in the context of Argentina, disregarding its effect over neighbour countries. It should be possible to compare how the pandemic manifested in other countries and whether the different policies adopted by each government, its political orientation and the cultural dimensions, can condition the evolution of emotions and crisis stages. Second, we should continue the analyses to include the current developments, and after life returns to the "new normality", we could better assess the pandemic long-term effects. Third, even though we focused the analyses on Twitter data, the government has also an official presence on other social media sites, in which the same content is shared, but the individuals consuming it could be different. A challenge here is how to include the different sites in the analyses to account for different perspectives and extend the scope of the analyses. In turn, it would be possible to compare whether users in different sites behave similarly. Fourth, there are ethical concerns regarding the assessment of mental health markers in social media (25 ) . Even though all collected data is publicly available, there could be risks associated to users' privacy. If not properly managed, the interventions based on these analyses could lead to discrimination, stigmatization and violence. Hence, a clear policy of data transparency and protection should be defined and enforced. Fifth, given the existence of cases of fake news that were propagated (even by the government) during the early stages of the pandemic, it would be interesting to analyze the prevalence and propagation of misinformation and disinformation, and how they affect the public perceptions and the coping of the crisis. Mount saint helens's ashfall: Evidence for a disaster stress reaction How epidemic psychology works on social media: Evolution of responses to the covid-19 pandemic Predicting depression levels using social media posts Emotion and personality Coronaphobia: Fear and the 2019-ncov outbreak How health anxiety influences responses to viral outbreaks like covid-19: What all decision-makers, health authorities, and health care professionals need to know Resilience, covid-19-related stress, anxiety and depression during the pandemic in a large population enriched for healthcare providers Methods in predictive techniques for mental health status on social media: a critical review Quantifying mental health signals in Twitter Using an emotion-based model and sentiment analysis techniques to classify polarity for reputation SentiSense: An easily scalable conceptbased affective lexicon for sentiment analysis Social media as a measurement tool of depression in populations Predicting depression via social media The pandemic of social media panic travels faster than the covid-19 outbreak Training manual for mental health and human service workers in major disasters. US Department of Health and Human Services, Substance Abuse and Mental Human system responses to disaster: An inventory of sociological findings Organized behaviour in disaster (lexington, ma: Heath & co) Empath: Understanding topic signals in largescale text. CHI '16 Lexicons on demand: Neural word embeddings for large-scale text analysis EmoTag: Automated mark up of affective information in texts Moral actor, selfish agent A novel surveillance approach for disaster mental health A novel surveillance approach for disaster mental health Spatio-temporal distribution of negative emotions in new york city after a natural disaster as seen in social media Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences Multidisciplinary research priorities for the covid-19 pandemic: a call for action for mental health science. The Lancet Psychiatry Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: Social media surveillance in China. medRxiv Mental health status of people isolated due to middle east respiratory syndrome Simplifying disasters: Developing a model for complex non-linear events. The Australian journal of emergency management The impact of COVID-19 epidemic declaration on psychological consequences: A study on active Weibo users What does social media say about your stress The ripple of fear, sympathy and solidarity during the Boston bombings ConceptNet-A practical commonsense reasoning tool-kit Evaluating and improving lexical resources for detecting signs of depression in text Crowdsourcing a word-emotion association lexicon Psychosocial effects of the 2001 uk foot and mouth disease epidemic in a rural population: qualitative diary based study The correlation between stress and economic crisis: a systematic review Reconsidering the phases of disasters Social media usage patterns during natural hazards A prospective study of depression and posttraumatic stress symptoms after a natural disaster: the 1989 loma prieta earthquake Crisislex: A lexicon for collecting and filtering microblogged communications in crises Simple algorithms for peak detection in time-series editor. motions in social psychology: Essential readings Linguistic inquiry and word count: LIWC Coping, mood, and aspects of personality in spanish translation and evidence of convergence with english versions Emotion: Theory, research, and experience, chapter A general Psy-choevolutionary Theory of Emotion Modeling the relationship between natural disasters and crime in the united states Covid-19 and mental health: A review of the existing literature. Asian journal of psychiatry The phases of disaster as a relationship between structure and meaning: A narrative analysis of the 1947 texas city explosion Prevalence of stress, anxiety, depression among the general population during the covid-19 pandemic: a systematic review and metaanalysis Natural disasters and existential concerns: A test of tillich's theory of existential anxiety Variability in twitter content across the stages of a natural disaster: Implications for crisis communication Epidemic psychology: a model Exploring healthcare opportunities in online social networks: Depressive moods of users captured in twitter Spanishtweetscovid-19: A social media enriched covid-19 twitter spanish dataset Using social and behavioural science to support covid-19 pandemic response Twitter and society [Digital Formations Flood of emotions: emotional work and long-term disaster recovery June 22ndJune 15th. The governor of Provincia de Buenos Aires wants to impose an even harsh lock-down, as according to estimations, the health system will collapse in 35 days. The economy ministry is confident on reaching an agreement for the debt restructuration. June 16th. The government suggests that the lock-down might be in place for at least three more months. June 17th. Restrictions for running according to the number of the national ID. New circulation permit in place. The president blames the runners for the new surge of contagions. June 20th. Protestors against the expropriation of a company. June 22nd. A plan for restricting the lock-down is announced. Almost 50k confirmed cases.