key: cord-0203184-p3egbdrw authors: Li, Xiaoya; Zhou, Mingxin; Wu, Jiawei; Yuan, Arianna; Wu, Fei; Li, Jiwei title: Analyzing COVID-19 on Online Social Media: Trends, Sentiments and Emotions date: 2020-05-29 journal: nan DOI: nan sha: 7829f75b1b5a1302524296474794a2b6d068abd2 doc_id: 203184 cord_uid: p3egbdrw At the time of writing, the ongoing pandemic of coronavirus disease (COVID-19) has caused severe impacts on society, economy and people's daily lives. People constantly express their opinions on various aspects of the pandemic on social media, making user-generated content an important source for understanding public emotions and concerns. In this paper, we perform a comprehensive analysis on the affective trajectories of the American people and the Chinese people based on Twitter and Weibo posts between January 20th, 2020 and May 11th 2020. Specifically, by identifying people's sentiments, emotions (i.e., anger, disgust, fear, happiness, sadness, surprise) and the emotional triggers (e.g., what a user is angry/sad about) we are able to depict the dynamics of public affect in the time of COVID-19. By contrasting two very different countries, China and the Unites States, we reveal sharp differences in people's views on COVID-19 in different cultures. Our study provides a computational approach to unveiling public emotions and concerns on the pandemic in real-time, which would potentially help policy-makers better understand people's need and thus make optimal policy. The emergence of COVID-19 in early 2020 and its subsequent outbreak have affected and changed the world dramatically. According to the World Health Organization (WHO), by mid-May 2020, the number of confirmed COVID-19 cases has reached 5 millions with death toll over 300,000 world wide. Several mandatory rules have been introduced by the government to prevent the spread of the coronavirus, such as social distancing, bans on social gatherings, store closures and school closures. Despite their positive effects on slowing the spread of the pandemaic, they neverthless caused severe impacts on the society, the economy and people's everyday life. There have been anti-lockdown and anti-social-distancing protests in many places around the world. Given these difficult situations, it is crucial for policy-makers to understand people's opinions toward the pandemic so that they can (1) balance the concerns of stoping the pandemic on the one hand and keeping people in good spirits on the other hand and (2) anticipate people's reactions to certain events and policy so that the policymakers can prepare in advance. More generally, a close look at the public affect during the time of COVID-19 could help us understand people's reaction and thoughts in the face of extreme crisis, which sheds light on humanity in moments of darkness. People constantly post about the pandemic on social media such as Twitter, Weibo and Facebook. They express their attitudes and feelings regarding various aspects of the pandemic, such as the medical treatments, public policy, their worry, etc. Therefore, user-generated content on social media provides an important source for understanding public emotions and concerns. In this paper, we provide a comprehensive analysis on the affective trajectories of American people and Chinese people based on Twitter and Weibo posts between January 20th, 2020 and May 11th 2020. We identify fine-grained emotions (including anger, disgust, fear, happiness, sadness, surprise) expressed on social media based on the user-generated content. Additionally, we build NLP taggers to extract the triggers of different emotions, e.g., why people are angry or surprised, what they are worried, etc. We also contrast public emotions between China and the Unites States, revealing sharp differences in public reactions towards COVID-19 related issues in different countries. By tracking the change of public sentiment and emotion over time, our work sheds light on the evolution of public attitude towards this global crisis. This work contributes to the growing body of research on social media content in the time of COVID-19. Our study provides a way to extracting public emotion towards the pandemic in real-time, and could potentially lead to better decision-making and the development of wiser interventions to fight this global crisis. The rest of this paper is organized as follows: we briefly go through some related work in Section 2. We then present the analyses on topic trends in Weibo and Twitter (section 3), the extracted emotion trajectories (section 4) and triggers of those emotions (section 5). We finally conclude this paper in Section 6. At the time of writing, analyses on people's discussions and behaviors on social media in the context of COVID-19 has attracted increasing attention. [1] analyzed tweets concerning COVID-19 on Twitter by selecting important 1-grams based on rank-turbulence divergence and compare languages used in early 2020 with the ones used a year ago. The authors observed the first peak of public attention to COVID-19 around January 2020 with the first wave of infections in China, and the second peak later when the outbreak hit many western countries. [2] released the first COVID-19 Twitter dataset. [3] provided a ground truth corpus by annotating 5,000 texts (2,500 short + 2,500 long texts) in UK and showed people's worries about their families and economic situations. [4] viewed emotions and sentiments on social media as indicators of mental health issues, which result from self-quarantining and social isolation. [5] revealed increasing amount of hateful speech and conspiracy theories towards specific ethnic groups such as Chinese on Twitter and 4chan's. Other researchers started looking at the spread of misinformation on social media [6] , [7] . [8] provide an in-depth analysis on the diffusion of misinformation concerning COVID-19 on five different social platforms. Discrete Emotion Theory [9] , [10] , [11] think that all humans have an innate set of distinct basic emotions. Paul Ekman and his colleagues [12] proposed that the six basic emotions of humans are anger, disgust, fear, happiness, sadness, and surprise. Ekman explains that different emotions have particular characteristics expressed in varying degrees. Researchers have debated over the exact categories of discreate emotions. For instance, [13] proposed eight classes for emotions including love, mirth, sorrow, anger, energy, terror, disgust and astonishment. Automatically detecting sentiments and emotions in text is a crucial problem in NLP and there has been a large body of work on annotating texts based on sentiments and building machine tools to automatically identify emotions and sentiments [14] , [15] , [16] , [17] . [18] created the first annotated dataset for four classes of emotions, anger, fear, joy, and sadness, in which each text is annotated with not only a label of emotion category, but also the intensity of the emotion expressed based on the Best-Worst Scaling (BWS) technique [19] . A follow-up work by [20] created a more comprehensively annotated dataset from tweets in English, Arabic, and Spanish. The dataset covers five different sub-tasks including emotion classification, emotion intensity regression, emotion intensity ordinal classification, valence regression and valence ordinal classification. There has been a number of studies on extracting aggregated public mood and emotions from social media [21] , [22] , [23] , [24] . Facebook introduced Gross National Happiness (GNH) to estimate the aggregated mood of the public using the LIWC dictionary. Results show a clear weekly cycle of public mood. [25] and [26] specially investigate the influence of geographic places and weather on public mood from Twitter data. The mood indicators extracted from tweets are very predictive and robust [23] , [27] . Therefore, they have been used to predict real-world outcomes such as economic trends [24] , [28] , [29] , [30] , stock market [31] , [32] , influenza outbreak [33] , and political events [34] , [35] , [36] , [37] . In this section, we present the general trends for COVID19related posts on Twitter and Weibo. We first present the semisupervised models we used to detect COVID-19 related tweets. Next we present the analysis on the topic trends on the two social media platforms. For Twitter, we first obtained 1% of all tweets that are written in English and published within the time period between January 20th, 2020 and May 11th 2020. The next step is to select tweets related to COVID-19. The simplest way, as in [2] , [7] , is to use a handcrafted keyword list to obtain tweets containing words found in the list. However, this method leads to lower values in both precision and recall: for precision, usergenerated content that contains the mention of a keyword is not necessarily related to COVID-19. For example, the keyword list used in [2] include the word China, and it is not suprising that a big proportion of the posts containing "China" is not related to COVID-19; for recall, keywords for COVID-19 can change over time and might be missing in the keyword list. To tackle this issue, we adopt a bootstrapping approach. The bootstrapping approach is related to previous work on semisupervised data harvesting methods [38] , [39] , [40] , in which we build a model that recursively uses seed examples to extract patterns, which are then used to harvest new examples. Those new examples are further used as new seeds to get new patterns. To be specific, we first obtained a starting seed keyword list by (1) ranking words based on tf-idf scores from eight COVID-19 related wikipedia articles; (2) manually examining the ranked word list, removing those words that are apparently not COVID-19 related, and use the top 100 words in the remaining items. Then we retrieved tweets with the mention of those keywords. Next, we randomly sampled 1,000 tweets from the collection and manually labeled them as either COVID-19 related or not. The labeled dataset is split into the training, development and test sets with ratio 8:1:1. A binary classification model is trained on the labeled dataset to classify whether a post with the mention of COVID-related keywords is actually COVID-related. The model is trained using BERT [41] and optimized using Adam [42] . Hyperparameters such as the batch size, learning rate are tuned on the development set. Next, we obtain a new seed list by picking the most salient words that contribute to the positive category in the binary classification model based on the first-order derivative saliency scores [43] , [44] , [45] . This marks the end of the first round of the bootstrapping. Next we used the new keyword list to re-harvest a new dataset with the mention of the keyword, 1,000 of which is selected and labeled to retrain the binary classification model. We repeat this process for three times. We report the intensity scores for Weibo and Twitter in Figure 1 . We split all tweets by date, where X t denotes all tweets published on day t. The value of intensity is the number of posts classified as COVID-related divided by the total number of retrieved posts, i.e., |X t |. On Weibo, we observe a peak in late January and February, then a drop, followed by another rise in March, and a gradual decline afterwards. The trend on Chinese social media largely reflects the progress of the pandemic in China: the outbreak of COVID-19 and the spread from Wuhan to the rest of the country corresponds to the first peak. The subsequent drop reflects the promise in containing the virus, followed by a minor relapse in March. For Twitter, we observe a small peak that is aligned with the news from China about the virus. The subsequent drop reflects the decline of the attention to the outbreak in China. The curve progressively went up since March, corresponding to the outbreak in the US. Upon the writing of this paper, we have not observed a sign of drop in the intensity score of COVID19-related posts. In this section, we present the analyses on the evolution of public emotion in the time of COVID-19. We first present the algorithms we used to identify the emotions expressed in a given post. Next we present the results of the analyses. We adopted the well-established emotion theory by Paul Ekman [12] , which groups human emotions into 6 major categories, i.e., anger, disgust, worry, happiness, sadness, and surprise. Given a post from a social network user, we assign one or multiple emotion labels to it [46] , [47] . This setup is quite common in text classification [48] , [49] , [50] , [51] , [52] . For emotion classification of English tweets, we take the advantage of labeled datasets from the SemEval-2018 Task 1e [20] , in which a tweet was associated with either the "neutral" label or with one or multiple emotion labels by human evaluators. The SemEval-2018 Task 1e contains eleven emotion categories in total, i.e., anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise and trust, and we only use the datasets for a six-way classification, i.e., anger, disgust, fear, happiness, sadness, and surprise. Given that the domain of the dataset used in [20] covers all kinds of tweets, and our domain of research covers only COVID-related tweets, there is a gap between the two domains. Therefore, we additionally labeled 15k COVID-related tweets following the guidelines in [20] , where each tweet can take either the neural label or one/multiple emotion labels. Since one tweet can take on multiple emotion labels, the task is formalized as a a multi-label classification task, in which six binary (one vs. the rest) classifiers are trained. We used the description-based BERT model [53] as the backbone, which achieves current SOTA performances on a wide variety of text classification tasks. More formally, let us consider a to-be-classified tweet x = {x 1 , · · · , x L }, where L denotes the length of the text x. Each x will be tagged with one or more class labels y ∈ Y = [1, N ], where N = 6 denotes the number of the predefined emotion classes (the six emotion categories). To compute the probability p(y|x), each input text x is concatenated with the description q y to generate {[CLS]; q y ; [SEP]; x}, where [CLS] and [SEP] are special tokens. The description q y is the Wikipedia description for each of the emotions. For example, q y for the category anger is "Anger, also known as wrath or rage, is an intense emotional state involving a strong uncomfortable and hostile response to a perceived provocation, hurt or threat." Next, the concatenated sequence is fed to the BERT model, from which we obtain the contextual representations h [CLS] . h [CLS] is then transformed to a real value between 0 and 1 using the sigmoid function, representing the probability of assigning the emotion label y to the input tweet x: where W 1 , W 2 , b 1 , b 2 are some parameters to optimize. Classification performances for different models are presented in Table 3 . For emotion y, its intensity score S(t, y) for day t is the average probability (denoted by P (y|x)) of assigning label y to all the texts in that day X t . For non COVID-related texts, P (y|x) is automatically set to 0. We thus have: For Chinese emotion classification, we used the labeled dataset in [54] , which contains 15k labeled microblogs from weibo 1 . In addition to the dataset provided by [54] , we labeled COVID-related 20k microblogs. The combined dataset is then used to train a multi-label classification model based on the description-BERT model [53] . Everyday emotion scores for Weibo are computed in the same way as for Twitter. The time series of intensity scores of six different emotions, i.e., sadness, anger, disgust, worry, happiness, surprise, for Weibo and Twitter are shown in Figures 2 and 3 , respectively. For Weibo, as can be seen, the trend of worry is largely in line with the trend of the general intensity of the COVIDrelated posts. It reached a peak in late January, and then gradually went down, followed by a small relapse in mid-March. For anger, the intensity first went up steeply at the initial stage of the outbreak, staying high for two weeks, and then had another sharp increase around February 8th. The peak on February 8th was due to the death of Wenliang Li, a Chinese ophthalmologist who issued the warnings about the virus. The intensity for anger then gradually decreased, with no relapse afterwards. The intensity for disgust remained relatively low across time. For sadness, the intensity reached the peak at the early stage of the outbreak, then gradually died out with no relapse. For surprise, it went up first, mostly because of the fact that the public was surprised by the new virus and the unexpected outbreak, but then gradually went down. The intensity for happiness remained relatively low across time, with a small peak in late April, mostly because the countrywide lockdown was over. For Twitter, the intensity for worry went up shortly in late January, followed by a drop. The intensity then went up steeply in mid-March in response to the pandemic breakout in the States, reaching a peak around March 20th, then decreased a little bit and remained steady afterwards. The intensity for anger kept going up after the outbreak in mid-March, with no drop observed. The trend for sadness is mostly similar to that of the overall intensity. For surprise, the curve went up first after the breakout in early March, reaching a peak around Mar 20th, then dropped, and remained steady afterwards. For happiness, the intensity remained low over time. Twitter data. In order to extract the emotional triggers from Twitter's noisy text, we first annotate a corpus of tweets. For the ease of annotation, each emotion is associated with only a single trigger: the person/entity/event that a user has a specific emotion towards/with/about. A few examples are shown as follows with target triggers surrounded by brackets. • Angry protesters are traveling 100's of miles to join organized rallies over COVID- 19 In order to build an emotional trigger tagger, we annotated 2,000 tweets in total, and split them into training, development and test sets with ratio 8:1:1. We treat the problem as a sequence labeling task, using Conditional Random Fields for learning and inference with BERT-MRC features [55] . Comparing with vanilla BERT tagger [41] , the BERT-MRC tagger has the strength of encoding the description of the tobe-extracted entities, e.g., what they are worried about. As this description provides the prior knowledge about the entities, it has been shown to outperform vanilla BERT even when less training data is used. In addition to the representation features from BERT-MRC, we also considered the Twittertuned POS features [56] , the dependency features from a Twitter-tuned dependency parsing model [57] and the Twitter event features [58] . The precision and recall for segmenting emotional triggers on English tweets are reported in Table V . The precision and recall for segmenting triggering event phrases are reported in Table 3 . We observe a significant performance boost with linguistic features such as POS and dependency features. This is mainly due to the small size of the labeled dataset. The best model achieves an F1 score of 0.66. Since different extracted tokens may refer to the same concept or topic, we would like to cluster the extracted trigger mentions. The method of supervised classification is unsuitable for this purpose since (1) it is hard to predefine a list of potential triggers to people's anger or worry; (2) it is extremely labor-intensive to annotate tweets with worry types or anger types and (3) these types may change over time. For these reasons, we decided to use semi-supervised approaches that will automatically induce worry/anger types that match the data. We adopt an approach based on LDA [59] . It was inspired by work on unsupervised information extraction [60] , [58] , [61] . We use the emotion anger to illustrate how trigger mentions are clustered. Each extracted trigger mentions for anger is modeled as a mixture of anger types. Here we use subcategory, type, and topic interchangeablely, all referring to the cluster of similar mentions. Each topic is characterized by a distribution over triggers, in addition to a distribution over dates on which a user talks about the topic. Taking dates into account encourages triggers that are mentioned on the same date to be assigned to the same topic. We used collapsed Gibbs Sampling [62] for inference. For each emotion, we ran Gibbs Sampling with 20 topics for 1,000 iterations, obtaining the hidden variable assignments in the last iteration. Then we manually inspected the top mentions for different topics and abandoned the incoherent ones. The daily intensity score for a given subcategory k belonging to emotion y is given as follows: where p(k|x) is computed based on the parameters of the latent variable model. We report the top triggers for different emotions in Table II . We adopt a simple strategy of reporting the most frequent triggers for different emotions. For sadness, the most frequent triggering events and topics are being test positive, and the death of families and friends. For anger, the top triggers are shutdown, quarantine and other mandatory rules. People also express their anger towards public figures such as President Donald Trump, Mike Pence, along with China and Chinese. For worry, the top triggers include jobs, getting the virus, payments and families. For happiness, the top triggers are recovering from the disease, city reopening and returning to work. For surprise, the public are mostly surprised by the virus itself, its spread and the mass deaths it caused. Next we report the results of the mention clustering for anger and worry in Tables 4 and 5 , respectively. The unsupervised clustering reveals clearer patterns in the triggering events: top subcategories for anger include China with racist words such as chink and chingchong; lockdown and social distancing; public figures like President Donal Trump and Mike Pence; treatments in hospitals, and the increasing cases and deaths; Table 4 displays the change of intensity scores for the subcategories of anger. We observe a sharp increase in public anger toward China and Chinese around March 20th, in coincidence with President Donald Trump calling coronavirus 'Chinese virus' in his tweets. Public anger towards the lockdown sharply escalated in mid-March, but decreased a bit after late April when some places started to reopen. Top subcategories for worry include syndromes for COVID-19, finance and economy, families, jobs and food, and increasing cases and deaths. Table 5 displays the change of intensity scores for subcategories of worry. People increasingly worried about families over time. It is interesting to see that the worry about finance and economy started going up in mid-February, earlier than other subcategories. In this paper, we perform analyses on topic trends, sentiments and emotions of the public in the time of COVID-19 on social media. By tracking the change of public emotions over time, our work reveals how the general public reacts to different events and government policy. Our study provides a computational approach to understanding public affect towards the pandemic in real-time, and could help create better solutions and interventions for fighting this global crisis. How the world's collective attention is being paid to a pandemic: Covid-19 related 1-gram time series for 24 languages on twitter Covid-19: The first public coronavirus twitter dataset Measuring emotions in the covid-19 real world worry dataset Mental health problems and social media exposure during covid-19 outbreak An early look on the emergence of sinophobic behavior on web communities in the face of covid-19 Assessing the risks of" infodemics" in response to covid-19 epidemics # covid-19 on twitter: Bots, conspiracies, and social media activism The covid-19 social media infodemic A general psychoevolutionary theory of emotion The laws of emotion Emotions in social psychology: Essential readings An argument for basic emotions Emotion classification using web blog corpora Foundations and trends in information retrieval Sentiment analysis: Detecting valence, emotions, and other affectual states from text," in Emotion measurement Document modeling with gated recurrent neural network for sentiment classification Wassa-2017 shared task on emotion intensity Best-worst scaling: A model for the largest difference judgments Proceedings of the 12th international workshop on semantic evaluation Capturing global mood levels using blog posts Arsa: a sentiment-aware model for predicting sales performance using blogs Measuring the happiness of large-scale written expression: Songs, blogs, and presidents Widespread worry and the stock market The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place What a nasty day: Exploring moodweather relationship from twitter Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter Tweetsmart: Hedging in markets through twitter Using twitter sentiments and search volumes index to predict oil, gold, forex and markets indices Predicting stock market indicators through twitter "i hope it is not as bad as i fear Twitter mood predicts the stock market Predicting stock market fluctuations from twitter Early stage influenza detection from twitter A method of automated nonparametric content analysis for social science From tweets to polls: Linking text sentiment to public opinion time series Predicting elections with twitter: What 140 characters reveal about political sentiment Studying political microblogging: Twitter users in the 2010 swedish election campaign Weakly supervised user profile extraction from twitter Fully unsupervised discovery of concept-specific relationships by web mining Learning arguments and supertypes of semantic relations using recursive patterns Bert: Pre-training of deep bidirectional transformers for language understanding Adam: A method for stochastic optimization Visualizing higherlayer features of a deep network Deep inside convolutional networks: Visualising image classification models and saliency maps Visualizing and understanding neural models in nlp Emotion analysis as a regression problem-dimensional models and their implications on emotion representation and metrical evaluation Goemotions: A dataset of fine-grained emotions Text classification using machine learning techniques A survey of text classification algorithms Character-level convolutional networks for text classification A c-lstm neural network for text classification Bag of tricks for efficient text classification Description based text classification with reinforcement learning Fine-grained emotion classification of chinese microblogs based on graph convolution networks A unified mrc framework for named entity recognition Named entity recognition in tweets: an experimental study A dependency parser for tweets Open domain event extraction from twitter Latent dirichlet allocation Template-based information extraction without the templates Major life event extraction from twitter based on congratulations/condolences speech acts Finding scientific topics