key: cord-0165282-hjudp6ul
authors: Mukherjee, Rajdeep; Poddar, Sriyash; Naik, Atharva; Dasgupta, Soham
title: How Have We Reacted To The COVID-19 Pandemic? Analyzing Changing Indian Emotions Through The Lens of Twitter
date: 2020-08-20
journal: nan
DOI: nan
sha: 77059527665d343fcad60ffb9e399ce0fb7230f3
doc_id: 165282
cord_uid: hjudp6ul

Since its outbreak, the ongoing COVID-19 pandemic has caused unprecedented losses to human lives and economies around the world. As of 18th July 2020, the World Health Organization (WHO) has reported more than 13 million confirmed cases including close to 600,000 deaths across 216 countries and territories. Despite several government measures, India has gradually moved up the ranks to become the third worst-hit nation by the pandemic after the US and Brazil, thus causing widespread anxiety and fear among her citizens. As majority of the world's population continues to remain confined to their homes, more and more people have started relying on social media platforms such as Twitter for expressing their feelings and attitudes towards various aspects of the pandemic. With rising concerns of mental well-being, it becomes imperative to analyze the dynamics of public affect in order to anticipate any potential threats and take precautionary measures. Since affective states of human mind are more nuanced than meager binary sentiments, here we propose a deep learning-based system to identify people's emotions from their tweets. We achieve competitive results on two benchmark datasets for multi-label emotion classification. We then use our system to analyze the evolution of emotional responses among Indians as the pandemic continues to spread its wings. We also study the development of salient factors contributing towards the changes in attitudes over time. Finally, we discuss directions to further improve our work and hope that our analysis can aid in better public health monitoring.

The coronavirus disease 2019, or the COVID-19, caused by the SARS-Cov-2 virus, has subsequently taken the shape of a pandemic since it's outbreak in late December 2019, causing enormous losses to societies and economies around the world [12] . Although the enforcement of several protective measures such as lockdown, social distancing, etc. have had positive effects on containing the spread of the virus, the loneliness caused by self-confinement and reduced access to family, friends, and other social support systems have caused severe psychological threats to the physical and mental wellbeing of people around the globe [2, 8, 10] . While dealing with such unprecedented challenges, as more and more people express their emotions and opinions towards various aspects of the pandemic through social media platforms such as Twitter, Facebook, etc., it becomes essential to monitor the dynamics of public affect from such user-generated textual content in order to understand general concerns and anticipate any potential threats.

Human-written texts such as tweets reflect an author's affective state or thought process. Although emotions and sentiments are closely related, merely classifying the subjective content of a tweet into positive, negative or neutral sentiment categories may not be sufficient to understand a person's true feelings. Under trying situations such as the COVID-19 pandemic, emotions can be even more complicated [23] . A person might feel optimistic about vaccine development or thankful towards the doctors or even both while still being positive. Therefore, in order to capture people's emotions from their tweets, we build a transformer-based [21] supervised multi-label emotion classifier that achieves state-of-the-art results on two benchmark datasets, AIT [19] , and SenWave [23] .

With record no. of cases being reported daily and more than 1 million confirmed cases to date, India has now become the third worsthit nation by the pandemic after the US and Brazil. We, therefore use our classifier, trained on COVID-19-specific SenWave dataset, to monitor the evolution of emotional attitudes among Indians towards the ongoing pandemic by analyzing tweets posted between March 1st 2020 and July 5th 2020. We also study the development of salient factors or triggers contributing towards the changes in emotions over time. We hope that our proposed system can aid the concerned authorities in real-time identification of peoples' mental health conditions and concerns from their social media posts and make timely interventions in fighting the global crisis. arXiv:2008.09035v1 [cs.SI] 20 Aug 2020

Studies based on social media data around the COVID-19 pandemic have seen a steep surge over the past few months. Prior works [1, 13, 14] have stressed the need for automatic detection and monitoring of public affects from user-generated content, especially during trying times such as the current pandemic. Although recent benchmark annotated datasets [5, 18, 19] have facilitated the training and development of supervised deep learning-based classifiers for automated sentiment analysis and emotion detection, their results are not directly applicable to COVID-19 analysis [23] due to considerable differences in vocabulary between the training and testing domains. Majority of recent works around COVID-19 such as [11, 22] have thus relied on feature-engineering based methods or supervised methods with limited training data [12] . As the pandemic continues to spread its wings, we draw our motivation from [14, 19, 23] for building a multi-label emotion classifier and studying the evolution of public emotions towards the COVID-19 from an Indian perspective, through the lens of Twitter.

• AIT : The Affect in Tweets Dataset (or AIT for short) was created in [19] , as part of SemEval-2018 Task 1. This non-COVID-19 dataset consists of 7724 English tweets, each labeled with either neutral or no emotion or one or more of the 11 emotions, anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, and trust. • SenWave : We consider the 10K English tweets from the largest fine-grained annotated Covid-19 tweets dataset created by the authors in [23] . Here, each tweet is labeled with one or more of the 10 emotions, optimistic, thankful, empathetic, pessimistic, anxious, sad, annoyed, denial (towards conspiracy theories), official report, and joking. • Twitter_IN : We use version 17 of the COVID-19 Twitter chatter dataset released by the authors in [3] . Twitter_IN consists of around 1.4 lac English tweets from India posted between January 25th 2020 and July 4th 2020.

The version of SenWave, obtained from the authors, contains an additional label surprise. Although we use all the 11 categories to train our models, however, we leave out surprise (since not documented in [19] ) and official (since not an affective state) from further consideration while performing our analysis with Twitter_IN.

We preprocessed the raw tweets in all the datasets, first by removing the mentions (@user), and URLs. We then filtered out noisy symbols such as "RT" (retweet). Since hashtags can provide meaningful semantics, we removed the # from the #word and kept the word.

We used a mapping 1 between emoticons with their meanings and the emoji 2 package to respectively replace emoticons and emojis appearing in the tweets with their semantic phrases. We then used a slang_translator 3 to convert online slangs, commonly used in 1 https://en.wikipedia.org/wiki/List_of_emoticons/ 2 https://pypi.org/project/emoji/ 3 https://github.com/rishabhverma17/sms_slang_translator short text message conversations, into their actual phrases. For eg: "CUL8R" gets replaced by "see you later". We further expand the contractions using contractions 4 . Finally, we filtered out punctuations, line breaks, tabs and redundant blank characters, etc. since they do provide any relevant lexical or semantic information.

We used RoBERTa-Base [15] as the backbone of our multi-label emotion classifier. A tweet t to be classified is first broken down by the RobertaTokenizer into its constituent tokens. A special token [CLS] is appended to obtain the final sequence [CLS], t 1 , t 2 , ..., t n , which is then passed through 12 layers of attention-based transformers to obtain a 768-dimensional contextualized representation of the tweet, h C LS . Motivated by prior works [19] , we simultaneously obtain a 194-dimensional feature vector derived from affect lexicons using Empath [7] . We append the two to obtain a 962-dimensional vector which is then passed through a fully connected layer with 11 output neurons. At each neuron, we use the tanh activation function with a threshold of 0.33 to finally predict the presence/absence of one of the 11 emotions. hereafter, we refer to this model as EC ROBERTA .

A variant of our model with BERT-Base [6] as the backbone is referred to as EC BERT . Among our non-BERT variants, EC CNN consists of five parallel 1-D convolutions, each with different kernel sizes ranging from 2-6 and ReLU activation function, followed by 1-D max-pooling layers. All five outputs are merged to obtain a single representation vector for the tweet. EC LSTM consists of a single layer of unidirectional LSTM cells with 256 hidden units. Hidden state from the last LSTM cell is considered as the tweet representation vector. For both the above models, the obtained tweet vector is passed through a hidden layer consisting of 128 neurons followed by a fully connected layer with 11 output neurons. Labels at each neuron are predicted with sigmoid activation function. 

Apart from Macro-F1 and Micro-F1, both [19] and [23] consider Jaccard Accuracy as the principle metric to evaluate the multi-label accuracy of the models designed for multi-label emotion detection. [23] additionally considers Label Ranking average precision (or LRAP), Hamming Loss, and a relaxed accuracy mesaure called Weak Accuracy or just "Accurcay" to evaluate the models. Table 1 compares our results with those of the baselines for the task of multi-label emotion classification when the models are trained on the AIT dataset. An ablation study performed on EC CNN demonstrates the advantage of using richer word representations such as Glove [20] over Word2Vec [17] . It further establishes the importance of using affect representations from text such as Empath embeddings for sentiment analysis tasks. Among our non-BERT variants, EC LSTM performs the best. We obtain our best results with EC ROBERTA , when trained end-to-end for 5 epochs with AdamW optimizer [16] and a learning rate of 2e − 5. As observed, we comfortably outperform the scores of NTUA-SLP [4] , which achieved the top rank for this subtask as part of the SemEval 2018 Task 1.

With a better F1-Macro score, we narrowly surpass the results of the current top-ranked system on the leaderboard for this subtask, with the competition currently running in its post-evaluation phase. As can be observed from Table 2 , EC ROBERTA , when trained on Sen-Wave dataset further achieves state-of-the-art results on four out of six metrics, including Jaccard Accuracy, when compared with [23] .

Owing to the fact that there were only three reported cases of COVID-19 in India before March 2020, and there exist very few tweets in Twitter_IN for the months of January and February, we consider all tweets posted on or after Match 1st 2020 for our analysis. We first use our EC ROBERTA classifier, trained on SenWave dataset, to predict the emotions for all tweets under consideration from the Twitter_IN dataset. Few tweets with their predicted emotions are listed in Table 3 . Figure 2 shows the percentage distribution of tweets, on a weekly basis, containing one or more from a total of nine emotions. We observe that annoyed and thankful show contrasting trends. First and second peaks of annoyance correspond with the declaration of nationwide lockdown and the Tablighi Jamaat gatherings respectively. Although, people initially showed their gratitude towards the doctors and health workers for their efforts in dealing with such unprecedented challenges, the sense of thankfulness gradually declined as people started to feel more helpless and anxious with growing no. of cases each day. Here, we highlight that the no. of tweets posted greatly varies across weeks. In Figure 3 , we therefore create unequal-sized bins, each however containing exactly 5000 tweets (except the last one, which contains around 4,300 tweets), and observe the trends for six most relevant emotions. We observe that the trends of optimism and pessimism complement each other well as people gradually start to adjust themselves with the new normal of life. In order to extract relevant factors or triggers affecting the public sentiments and emotions over time, we make use of an unsupervised autoencoder-based neural topic model ABAE as proposed by Rudian He, et. al. in [9] . We separately run the ABAE algorithm with the tweets of each of the six emotions captured in Table 4 and report Optimistic it is a serious matter and thousands of students live are in danger due to increasing cases of COVID19 Cancel exams and promote the students on moderation policy

Media is so obsessed with a particular community that they even misspell coronavirus Annoyed, Joking, Surprise Very nice Measures taken for The Development Of Economy keeping in view the safest side Thankful, Optimistic The first Covid 19 positive from Meghalaya Dr John Sailo Rintathiang passed away early this morning Sailo 69 who was also the owner of Bethany hospital was tested positive on April 13 2020 Sad, Official Report their top contributing aspects. The extracted aspect terms for each emotion are filtered and assigned meaningful sub-categories by means of a many-to-many mapping. In Figure 4 , we plot the trends of the subcategories for annoyed (Fig. 4a) and optimism (Fig. 4b ) .

In Fig. 4a , we observe a clear peak from March 28th 2020 to April 8th 2020 due to the religious gatherings of a certain community, thereby triggering widespread criticism and hatred from the public. In Fig. 4b , the plots show a high level of community gratitude in general, with occasional peaks which may be attributed to the events targeted at raising solidarity among the public. For the technological measures, we see a gradual increase and a peak near the launch date of the Arogya Setu App -developed by the Indian Government to help our citizens in this pandemic.

To summarize, we first build a multi-label emotion classifier that achieves state-of-the-art results on two benchmark datasets, AIT (non-COVID) and SenWave (COVID-specific). We then use our trained model to predict the emotions from India-specific tweets posted between March 1st 2020 and July 4th 2020. Using these predictions, we study the evolution of emotions among Indians towards the ongoing COVID-19 pandemic. We further study the development of salient factors contributing towards the changes in attitudes and draw interesting inferences. In future, we would like to extend our work with the Valence-Arousal-Dominance or VAD model and take a multi-tasking approach to build our classifiers.

Moral Panic through the Lens of Twitter: An Analysis of Infectious Disease Outbreaks

Yuning Ding, and Gerardo Chowell. 2020. A large-scale COVID-19 Twitter chatter dataset for open scientific research -an international collaboration

NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Content in Tweets with Deep Attentive RNNs and Transfer Learning

Gaurav Nemade, and Sujith Ravi. 2020. GoEmotions: A Dataset of Fine-Grained Emotions

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Empath: Understanding Topic Signals in Large-Scale Text

Mental health problems and social media exposure during COVID-19 outbreak

An Unsupervised Neural Attention Model for Aspect Extraction

COVID 19: Impact of lock-down on mental health and tips to overcome

CoronaVis: A Real-time COVID-19 Tweets Analyzer

Measuring Emotions in the COVID-19 Real World Worry Dataset

Detecting Signs of Depression in Tweets in Spanish: Behavioral and Linguistic Analysis

Analyzing COVID-19 on Online Social Media: Trends, Sentiments and Emotions

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Decoupled Weight Decay Regularization

Distributed Representations of Words and Phrases and their Compositionality

SemEval-2018 Task 1: Affect in Tweets

Glove: Global Vectors for Word Representation

Machine learning on Big Data from Twitter to understand public reactions to COVID-19