key: cord-0453226-h81kwsvh authors: Chai, Yuchen; Palacios, Juan; Wang, Jianghao; Fan, Yichun; Technology, Siqi Zheng Massachusetts Institute of; Science, Chinese Academy of title: Measuring daily-life fear perception change: a computational study in the context of COVID-19 date: 2021-07-27 journal: nan DOI: nan sha: 770e03d7a8a5b3c1543d39f5dd3cb68685044f21 doc_id: 453226 cord_uid: h81kwsvh COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity. Besides the fear of getting infected, the outbreak of COVID-19 also created significant disruptions in people's daily life and thus evoked intensive psychological responses indirect to COVID-19 infections. Here, we construct an expressed fear database using 16 million social media posts generated by 536 thousand users between January 1st, 2019 and August 31st, 2020 in China. We employ deep learning techniques to detect the fear emotion within each post and apply topic models to extract the central fear topics. Based on this database, we find that sleep disorders ("nightmare"and"insomnia") take up the largest share of fear-labeled posts in the pre-pandemic period (January 2019-December 2019), and significantly increase during the COVID-19. We identify health and work-related concerns are the two major sources of fear induced by the COVID-19. We also detect gender differences, with females generating more posts containing the daily-life fear sources during the COVID-19 period. This research adopts a data-driven approach to trace back public emotion, which can be used to complement traditional surveys to achieve real-time emotion monitoring to discern societal concerns and support policy decision-making. Fear is one of the six basic emotions [1] , which is commonly considered to be a brief episode of response to a given threat, either physical or psychological [2, 3] . Fear is not merely generated by the direct exposure to a threat to oneself [4] . It can also be transmitted indirectly through social transmission [5] . The perception of fear influences the decision-making process [4] and ultimately translates into behavioral change to help individuals avoid or confront the threat [6, 7, 8] . However, besides its benefits, fear could also lead to chaos in society. For instance, panic buying is a typical response to the uncertainty of crises, which depletes public resources rapidly and unnecessarily [9] . In other cases, the emotion of fear evoked by the social and political environment could lead to violence and protests [10] . Against this background, it is crucial for policy-makers to understand the causes and development of fear to solve the problems and calm down public anxiety [2] . With a good understanding of the sources and evolution of fear emotion, this seemingly negative emotion could even serve as a valuable tool for public agencies to promote socially desirable actions, such as conservation behaviors to mitigate climate change [11] and social distancing behaviors during epidemics [12] . Previous studies have found that disasters and crises could induce fear emotion [13] . COVID-19, as a global health crisis, has triggered fear vastly with unprecedented intensity [14, 15] . Such fear is not merely driven by virus infection. As the pandemic is accompanied by the implementation of non-pharmaceutical policy interventions, the outbreak of COVID-19 has created enormous impacts on people's daily life and evokes intensive psychological responses not directly related to COVID-19 infections [16, 17] . Researchers and policy-makers mainly rely on surveys to measure fear perception [18] . However, surveys have their limitations, such as limited scalability, potential sample bias, high cost, and significant time delays [19] . These drawbacks are especially prominent in the context of COVID-19 when the public sentiment evolved rapidly, and timely interventions are critical for lives. When coupled with machine learning techniques, social media platforms could serve as a valuable tool, which enables the monitoring of public emotions with high temporal and spatial granularity. For example, using social media posts, Dodds et al. [20] explored the temporal pattern of emotions for 63 million users non-invasively; Mitchell et al. [21] estimated geographical happiness distribution using the geotagged Twitter. A recent study also shows the high correlation between social media expressed emotion measurement and traditional surveys [22] , supporting the validity of such NLP methods to measure emotion. In this paper, we study how COVID-19 triggered fear for different aspects of people's daily life (the contents that people posted not directly mentioning the virus-related words). This is achieved by using the Bidirectional Encoder Representations from Transformers (BERT) model to detect fear contents and the BERTopic to extract fear-related topics on a self-constructed social media dataset. We collect the social media data from Sina-Weibo's (the largest microblogging social media platform in China) application programming interface (API). The data contains 16 million original posts from a cohort of 536 thousand active users between January 1st, 2019, and August 31st, 2020. Besides the raw content, we collect the exact posting time, number of likes, and re-posts for each post. To ensure data quality, we follow several rules when collecting data and constructing the research database: 1) We only collect posts from those users who registered before January 1st, 2019; 2) We exclude the posts generated by institutional accounts (e.g., companies and organizations) from our sample; 3) We drop users with post numbers within the top 10% to reduce the influence from extreme posters; 4) We randomly select and scrutinize 50 thousand posts to identify advertisements with a fixed format (For instance: I am the 3545th to celebrate the shopping festival, please join us!). We then apply regular expressions to remove advertisements in these formats for all posts; 5) We apply a series of functions to remove URLs, emojis, special characters, hash symbols from the posts to reduce the impacts of irrelevant information. We retrieve the publicly accessible personal information from the profile page of each individual in our sample, including the birth date, gender, number of fans, number of followers, and the registration location. Supplementary Natural Language Processing (NLP) is a computational method that translates unstructured large-scale text data into structured measures [23] . Sentiment analysis, a sub-area of NLP, is purposefully designed to evaluate the emotional status embedded in the text [24] . An increasing number of studies attempt to detect the change of perceptions or attitudes on social media either towards general or specific topics based on the measures generated from these methods [25] . In this study, we use BERT, a text classification model developed by Google [26] , to classify each post into six categories of emotions (i.e., Anger, Fear, Happiness, Sadness, Surprise, and Others). Specifically, we finetune a pre-trained BERT model provided by [27] To understand why people express fear emotion in social media, we implement the topic model to discover the abstract topics within the dataset. Such a method is widely used by researchers to understand the public attentions and opinions [29, 30] . BERTopic is a state-of-the-art machine learning method that leverages BERT embeddings, uniform manifold approximation and projection (UMAP) dimensionality reduction, hierarchical density-based spatial clustering of applications with noise (HDBSCAN), and class-based term frequency-inverse document frequency (c-TF-IDF) [31] to identify interpretable topics. Using a pre-trained multi-lingual sentence embedding model to encode the text, we apply BERTopic on non-COVID-19 fear posts to identify the fear sources in people's daily life. We apply the model on COVID-19 posts as well to support the analysis. To decide the best topic size, we impute the coherence score by varying the number of clusters. As shown in Supplementary Figure C.1, topic sizes of 60 and 30 are chosen for the two groups respectively. We re-run the algorithm and display the most informative words of each topic (using c-TF-IDF) in Supplementary Table C We find that fear emotion is relatively stable across 2019. There are 2.45% posts on average classified as fear posts (i.e., posts dominated by fear emotion) for each day, while the share reaches a peak of 9.1% on January 23rd (the date that epi-center Wuhan city announced lockdown). The share of fear posts drops afterward and remains 2.64% of total posts after April 8th, 2020, slightly higher than the 2019 baseline. BERTopic automatically splits the data into meaningful clusters. There are 60 fear topics unrelated to COVID-19, lying into six large categories (See Supplementary Table C transportation and work/ education. To identify the fear alterations, we conduct t-tests to compare the fear share by topics during and after the peak COVID-19 pandemic with the same period in 2019. Specifically, we define the pandemic periods in China as follows: (1) COVID-19 peak period started from January 20th, 2020 and ended on April 8th, 2020 (i.e., the date when Wuhan city, the epicenter of COVID-19 pandemic in China, re-opened); (2) post-COVID-19 period started from April 9th and ended at August 31st for 2020. Health and work-related topics had the largest change during the COVID-19. For health-related topics, we find that topics regarding sleep (i.e., nightmare and insomnia) have the largest share in fear posts during our sample period of two years. On average, 10% and 7% of fear posts are related to nightmares and insomnia, respectively. As shown in Figure 2A , during the COVID-19 peak period, fear posts with contents of "nightmare" significantly increased, reaching a share of 16% of all fear posts. Though this share dropped after the COVID-19 peak period, it remains significantly higher than the same period in 2019 until the end of August, indicating a long-lasting impact. Since "nightmare" could be expressed not only as having an unpleasant dream but also as a way to describe a disastrous event, we further explore the posting time within a day to check whether the fear posts are likely to be sleep-related. We assume that if the "nightmare" is used to describe the awful dream, people are more likely to post in the morning right after having a bad sleep. The results in Supplementary Figure D.2 indeed show that posts about "nightmare" are concentrated in the early morning, and the posting times within a day are similar in 2019 and 2020, indicating that there is no significant change in word usage. "Insomnia", i.e., unable to sleep, displays a similar spike during the COVID-19 peak period ( Figure 2B ), suggesting that people were more likely to have difficulty falling asleep. However, the share of "insomnia" posts soon recovered to pre-pandemic status after April. Besides sleep disorders, among health topics, we also notice a significant drop in posts mentioning "cold and fever" (Figure 2C ), and a significant increase in posts mentioning "lose weight" ( Figure 2D ) and "eye" ( Figure 2E ). Besides health, work is one of the key areas for which the COVID-19 pandemic created significant impacts. Many researchers have identified the economic impacts of COVID-19 [14, 32] . The lockdown policy could curb the infections but at the same time prevent people from going to work. The share of posts mentioning "money" increased significantly since the beginning of the COVID-19, suggesting the rises in financial concerns. After checking the content of posts, we find that people are paying more attention to the importance of having money imposed by the pandemic. Researchers have identified significant gender differences during the COVID-19 period in aspects such as risk perception, time use, and compliance to social distancing policies [33, 34] . Here, we explore the difference between genders regarding fear perceptions and topics. Females in general have a higher tendency to express fear. On average, a female user generates 2.27 fear posts during our research period, while a male user only generates 1.89 fear posts. For each topic, we apply four t-tests to detect the gender differences in the COVID-19 peak period and the post-COVID-19 period between 2019 and 2020 (see Supplementary Table D.4) Regarding the fear related to "nightmare", we find that both genders increase posting during the COVID-19 period, with females having a larger and more significant extent (coefficient = 0.251, P-value = 0.005) comparing to males (coefficient = 0.103, P-value = 0.193). After the COVID-19 peak period, both genders remain to have a significantly higher frequency of nightmarerelated fear posts (with coefficients of 0.270 and 0.197 for females and males respectively). The insomnia topic also shows a similar pattern that the female had a significant increase in posting during the COVID-19 period (coefficient = 0.1, P-value = 0.057). The results from the two sleep-related topics suggest that females are more likely to have sleep disorders during the COVID-19. And such impact lasts for months. We also detect the differential changes by gender in the "cold and fever" topic. Cold and fever are prevalent in winter seasons, as shown by the peaks at the beginning of 2019 and 2020. However, unexpectedly, the post number drops quickly during the COVID-19 period. Females reduce the posts related to the "cold and fever" topic for non-COVID-19 related posts more than males. The reason for such difference is that females have a higher tendency to associate cold and fever symptoms to COVID-19, which again, reflects potentially higher mental stress of females during the pandemic. Another pattern we find is related to losing weight. Males post less during the COVID-19 period while females increase their posting after the COVID-19 peak period in this topic. This suggests that people in our sample were less concerned about body shape during the peak pandemic period yet soon start to pay more attention to it once they need to resume work and social activities. The increasing concerns for weight loss could also indicate a reduction in physical activity, as found in previous studies [34] . Regarding monetary topics, both males and females increase their posting behavior during the COVID-19 period, with males having a larger extent (coefficient = 0.042, P-value = 0.064) comparing to females (coefficient = 0.034, P-value = 0.051). Such a concern becomes more significant after the COVID-19 peak period (Male coefficient = 0.090, P-value = 0.000; Female coefficient = 0.062, P-value = 0.000). The work-related topic result shows that, in opposite to health-related topics, males pay more attention to the economic side, indicating a different type of stress. The result could serve as a potential explanation of why men are having a higher suicide rate during the COVID-19 period [35] . In conclusion, our study shows that the COVID-19 has altered people's fear perception towards daily life topics unrelated to virus infection, and the perception change can last for months after the peak pandemic period. We find that the daily-life fear topics in the COVID-19 period which has significant change can be best classified into three clusters: (1) symptoms of fear (such as "nightmare", "insomnia"), (2) fear related to other health problems (such as "lose weight", "eye"), (3) fear about socioeconomic consequences (such as "money"). Our results have important implications. First, the significant increases in fear towards these topics indicate an increase in the mental distress and anxiety caused by the COVID-19. Our result shows that fear posts related to "nightmare", the largest non-COVID-19 related fear source, take up a significantly higher proportion of fear posts even months after the peak pandemic. Deteriorated sleep quality brought by mental distress during the COVID-19 could contribute to latent risks for the population's physical and psychological health, which should receive added attention. Second, our results suggest that COVID-19 and related policies induced health and financial concerns. Staying at home was accompanied by a reduction in physical activities and an increase in screen time, thus inducing more fear posts for weight and eye problems. The increased attention to "money" indicates that people were also faced with higher economic burdens during the pandemic. These results reveal the importance of paying attention to the broader social consequences of the COVID-19 on people's daily life, instead of solely focusing on the COVID-19 related posts when analyzing the fear response. Finally, our findings indicate that females are more affected by the COVID-19 in general while males are more concerned with work-related issues points out the importance to explore further the reasons that underlie the sub-group differences in fear responses. Such investigations can assist the designs of tailored policies for the vulnerable population. Our work leverages the large-scale social media data coupled with computational methods to track the emotional response on a larger scale and with higher temporal granularity than the traditional surveys. Although we conduct this research in a tracing back mode, it is possible to use such a method to achieve real-time emotion monitoring, thus serving as a helpful tool to discern societal concerns and aid for policy decision-making. Our method also has several limitations. First, users of social media platforms might not be able to represent the whole population. Research has found that social media users are younger and are more concentrated in big cities [36] which we also observe in our sample. Second, we use the expressed fear within posts to proxy the fear emotion. Whether the expressed emotion could accurately represent the inner emotional state is still a nascent research area and thus without a clear conclusion. Third, even if the expressed fear can represent the actual feeling of users, we only observe changes in the number of posts with fear as the dominant emotion. Our algorithm does not directly measure the fear intensity of each post at the current stage. Fourth, comparing to a delicately designed survey, using the data-driven method to automatically extract information from unstructured social media posts has unavoidable measurement errors, since the neural network can only capture the general knowledge from training samples and neglects the varying outliers. We hope that our work can motivate more future studies to explore the value of computational methods to understand human emotions and behaviors. To better display the relative relationships between provinces, we truncate the value at 4 to color the map. According to Lyu et al. [28] , we constructed a multi-class emotion dataset to train the BERT model that consists of the following three parts, including Natural Language Processing and Chinese Computing (NLPCC) emotion analysis dataset, the Evaluation of Weibo Emotion Classification Technology of Social Media Processing 2020 (SMP2020-EWECT), and our dataset. Though we focus on the fear emotion, we still reserve five other emotions, i.e. anger, surprise, happiness, sadness, and others. We construct our labeling dataset by combining two publicly available datasets labeling tweets with emotion tags, and one self-constructed dataset: The hosts of NLPCC2014 released thousands of posts (composed of multiple sentences) collected from Weibo. For each sentence within a post, there is a manually labeled tag indicating the expressed emotion, i.e., anger, disgust, fear, happiness, like, sadness, surprise, and others. To be consistent with the SMP2020-EFFECT dataset, we drop the disgust tag and convert the like tag into happiness. Finally, we get 45 thousand labeled sentences. The SMP2020-EWECT is aimed to detect the emotion within each Weibo post during the COVID-19 period. With each post labeled as one of the tags among anger, fear, happiness, sadness, surprise, and others, it contains two topics which are usual topics and COVID-19 topics. For each topic, the host provides training, evaluation and testing datasets. We leave the testing datasets of usual topics and COVID-19 topics for the model evaluation and combine the rest datasets as training. Finally, we get 40 thousand labeled sentences for training and 5 thousand, 3 thousand for each topic to test. Due to the unbalanced distribution of emotions in the post and lack of training samples, we decide to build our dataset as well. We randomly selected 10 thousand unique posts from the preprocessed post dataset and assign 5 thousand posts to each of the two hired RAs without duplication to manually label them. RAs were told to label any post they could discern fear, anger, sadness, or surprise. provided by SMP2020-EWECT with two sub-topics, i.e., topics non-related to COVID-19 and topics related to COVID-19. Results show that the model achieves 74.43% accuracy on the validation dataset. On testing datasets, the overall accuracy is 75.84% (usual topics) and 74.00% (COVID-19 topics). The fear detection achieves 84% and 74% respectively. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations Emotion as a multicomponent process: A model and some cross-cultural data The biology of fear Social threat learning transfers to decision making in humans Assessment of social transmission of threats in humans using observational fear conditioning Social learning of fear Avoidance versus confrontation of fear Predicting fear and perceived health during the COVID-19 pandemic using machine learning: A cross-national longitudinal study Pandemic buying: Testing a psychological model of over-purchasing and panic buying using data from the United Kingdom and the Republic of Ireland during the early phase of the COVID-19 pandemic Emotions in Politics: The Affect Dimension in Political Tension Fear and hope in climate messages Measuring voluntary and policy-induced social distancing behavior during the COVID-19 pandemic Young adults' fear of disasters: A case study of residents from Turkey, Serbia and Macedonia Economic hardship and mental health complaints during COVID-19 The dynamics of fear at the time of covid-19: a contextual behavioral science perspective Threat of COVID-19 and emotional state during quarantine: Positive and negative affect as mediators in a cross-sectional study of the Spanish population Fear of COVID-19" trigger future career anxiety? An empirical investigation considering depression from COVID-19 as a mediator The Fear of COVID-19 Scale: Development and Initial Validation Strengths and weakness of online surveys Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods Text as Data Deep learning for sentiment analysis: A survey Air pollution lowers Chinese urbanites' expressed happiness on social media Pretraining of Deep Bidirectional Transformers for Language Understanding A Robustly Optimized BERT Pretraining Approach Sentiment Analysis on Chinese Weibo Regarding COVID-19. Natural Language Processing and Chinese Computing Using Social Media to Mine and Analyze Public Opinion Related to COVID-19 in China Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study COVID-19-Related Suicides in Bangladesh Due to Lockdown and Economic Factors: Case Study Evidence from Media Reports A multicountry perspective on gender differences in time use during COVID-19 Gender differences in COVID-19 attitudes and behavior: Panel evidence from eight countries Lifestyle and mental health disruptions during COVID-19 Men, Suicide, and Covid-19: Critical Masculinity Analyses and Interventions Social media usage