key: cord-1032910-edee95m7 authors: nan title: Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning on COVID-19 Related Tweets date: 2020-09-28 journal: IEEE Access DOI: 10.1109/access.2020.3027350 sha: 6bc6374c5cf6d414b086c9fe3562c8939158497b doc_id: 1032910 cord_uid: edee95m7 How different cultures react and respond given a crisis is predominant in a society’s norms and political will to combat the situation. Often, the decisions made are necessitated by events, social pressure, or the need of the hour, which may not represent the nation’s will. While some are pleased with it, others might show resentment. Coronavirus (COVID-19) brought a mix of similar emotions from the nations towards the decisions taken by their respective governments. Social media was bombarded with posts containing both positive and negative sentiments on the COVID-19, pandemic, lockdown, and hashtags past couple of months. Despite geographically close, many neighboring countries reacted differently to one another. For instance, Denmark and Sweden, which share many similarities, stood poles apart on the decision taken by their respective governments. Yet, their nation’s support was mostly unanimous, unlike the South Asian neighboring countries where people showed a lot of anxiety and resentment. The purpose of this study is to analyze reaction of citizens from different cultures to the novel Coronavirus and people’s sentiment about subsequent actions taken by different countries. Deep long short-term memory (LSTM) models used for estimating the sentiment polarity and emotions from extracted tweets have been trained to achieve state-of-the-art accuracy on the sentiment140 dataset. The use of emoticons showed a unique and novel way of validating the supervised deep learning models on tweets extracted from Twitter. The world is seeing a paradigm shift the way we conduct our daily activities amidst ongoing coronavirus pandemic -be it online learning, the way we socialize, interact, conduct businesses or do shopping. Such global catastrophes have a direct effect on our social life; however, not all cultures react and respond in the same way given a crisis. Even under normal circumstances, research suggests that people across different cultures reason differently [1] . For instance, Nisbett in his book ''The geography of thought: How Asians and Westerners think differently. . . and why'' stated that the East Asians think on the basis of their The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott . experience dialectically and holistically, while Westerners think logically, abstractly, and analytically [2] . This cultural behavior and attitude are mostly governed by many factors, including the socio-economic situation of a country, faith and belief system, and lifestyle. In fact, the COVID-19 crisis showed greater cultural differences between countries that seem alike with respect to language, shared history and culture. For example, even though Denmark and Sweden are two neighboring countries that speak almost the same language and share a lot of culture and history, they stand at extreme ends of the spectrum when it comes to the way how they reacted to coronavirus [3] . Denmark and Norway imposed more robust lockdown measures closing borders, schools, restaurants, and restricting gathering and social contact, while on the other side, Sweden has taken a relaxed approach to the corona outbreak keeping its schools, restaurants, and borders open. Social media platforms play an essential role during the extreme crisis as individuals use these communication channels to share ideas, opinions, and reactions with others to cope with and react to crises. Therefore, in this study, we will focus on exploring collective reactions to events expressed in social media. Particular emphasis will be given to analyzing people's reactions to global health-related events especially the COVID-19 pandemic expressed in Twitter's social media platform because of its widespread popularity and ease of access using the API. To this end, tweets collected from thousands of Twitter users communicated within four weeks after the corona crisis are analyzed to understand how different cultures were reacting and responding to coronavirus. Additionally, an extended version of publicly available tweets dataset was also used. A new model for sentiment and emotion analysis is proposed. Distinguishing emotions and classifying them into distinctive groups and categories is an area of research widely studied in affective science, leading to several theories and models. The grouping of emotion classification models is based on two fundamentals -i) emotions that are discrete and ii) based on a dimensional basis. Discrete emotion theory, like the one presented by Tomkins [4] , concluded eight discrete emotions as surprise, interest, joy, rage, fear, disgust, shame, and anguish. These emotions are thought to be cross-culturally recognizable. That is, these basic emotions are biologically determined emotional responses in which both expression and recognition are the same for all individuals regardless of ethnic or cultural differences [5] . Further experiments conducted by Paul Ekman in a cross-cultural study concluded six basic emotions as anger, disgust, fear, happiness, sadness, and surprise [6] . According to Ekman, these emotions possess particular characteristics which allow them to be expressed with varying degrees. Dimensional emotions, on the other hand, tend to group emotions according to one or more dimensions such as arousal, valence, and intensity. These include PANA model by Watson and Tellegen [7] , Circumplex model by Russell [8] , PAD emotional state model, and Plutchick's model [9] . Plutchik's model is the most renowned three-dimensional hybrid of both basic and complex categories in which emotions with varying intensities can be combined to form emotional dyads. He presented a wheel of eight emotions -joy, trust, fear, surprise, sadness, disgust, anger, and anticipation into twenty-four primary, secondary, and tertiary dyads, as shown in Figure 1 . This study is limited to six primary emotions -joy, surprise, sad, fear, anger, and disgust as shown in Figure 2 . The model in this work takes advantage of natural language processing (NLP) and deep neural networks and comprises two main stages. The first stage involves sentiment polarity classifier that classifies tweets as positive and negative. The output of the first stage is then used as input to an emotion classifier FIGURE 1. Plutchik's wheel of emotion depicting the relationship between primary and related emotions [9] . that aims to assign a tweet to either one of positive emotions classes (joy and surprise) or one of the negative emotions classes (sad, disgust, fear, anger). Figure 2 shows the abstract model of proposed system of sentiment and emotion analysis on tweets' text. Our primary objective with this study is to understand how different cultures behave and react given a global crisis. The state of the questions addressed about the cultural differences as a techno-social system reveals potentialities in societal attitudinal, behavioral, and emotional predictions. In the present investigation, to examine those behavioral and emotional factors that describe how societies react under different circumstances, the general objective is to analyze the potential of utilizing NLP-based sentiment and emotional analysis techniques in finding answers to the following research questions (RQ • Proposed a multi-layer LSTM assessment model for classifying both sentiment polarity and emotions. • Achieved state-of-the-art accuracy on Sentiment140 polarity assessment dataset. • Validation of the model for emotions expressed via emoticons. • Provide interesting insights into collective reactions on coronavirus outbreak on social media. The rest of the article is organized as follows. Section II presents the research design and study dimensions. Related work is presented in section III. Data collection procedure and data preparation steps are described in section IV, whereas, sentiment and emotion analysis model is presented in section V. Section VI entails the results followed by discussion and analysis in section VII. Lastly, section VIII concludes the paper with potential future research directions. The study is conducted using quantitative (experimental) research methodology on users' tweets posted post corona crisis. The investigation required collecting users' posts on Twitter from early February 2020 until the end of April 2020, when the first few cases were reported worldwide and in a respective country for ten to twelve weeks. The reason for using only the initial few weeks is that people usually get accustomed to the situation over time and an initial phase is enough to grasp the general/overall behavior of the masses towards a crisis and the policies adopted by respective governments. Several measurements have been taken in this study during data collection that requires cataloging for training deep learning models and for further analysis. These are discussed in the next subsection. Following dimensions are used to facilitate the interpretation of the results: • Demography-(d): country / region under study. This study focuses on two neighbouring countries from South Asia, two from Nordic, and two from North America. • Timeline-(t): the day from the initial reported cases in the country up to 4-12 weeks. • Culture-(c): East (South-East Asia) vs. West (Nordic/ America) • Polarity-(p): sentiment classified as either positive or negative. • Emotions-(e): Feelings expressed as joy, surprise (astonished), sad, disgust, fear and anger. • Emoticons-(et): emotions expressed through graphics for emotions listed above i.e., Python scripts are used to query Tweepy Twitter API 1 for fetching users' tweets and extracting feature set for cataloging. NLTK 2 is used to preprocess the retrieved tweets. NLP-based deep learning models are developed to predict sentiment polarity and users' emotions using Tensorflow and Keras as a back-end deep learning engine. Sentiment140 and Emotional Tweets datasets are used to train classifier A and Classifier B/C respectively, as discussed in section V. Visualization and LSTM model prediction as an instrument to analyze the results in addition to correlation are used. The results of sentiment and emotion recognition are validated through an innovative approach to exploiting emoticons extracted from the Tweets, which is a widely accepted feature of expressing one's feelings. Deep learning models for sentiment detection are employed in this study. A deep neural network (DNN) consists of an input, output, and a set of hidden layers with multiple nodes. The training process of a DNN consists of a pre-trainig and a fine-tuning steps. The pre-training step consists of weight initialization in an unsupervised manner via a generative deep belief networks (DBN) on the input data [10] , followed by network training in a greedy way by taking two layers at a time as a restricted Boltzmann machine (RBM), given as: where σ k is the standard deviation, w kl is the weight value connecting visible units v k and the hidden units h l , a k and b l are the bias for visible and hidden units, respectively. The Equation 1 represents the energy function for the Gaussian-Bernoulli RMB. The hidden and visible units' joint-probability are defined as: whereas, a contrastive divergence algorithm is used to estimate the trainable parameters by maximizing the expected log probability [10] , given as: where θ represents the weights, biases and standard deviation. The network parameters are adjusted in a supervised manner using back-propagation technique in the fine-tuning step. The back-propagation is an expression for the partial derivative ∂C ∂w of the cost function C with respect to any weight w (or bias b) in the network. The quadratic cost function can be defined as: where n is the total number of training examples, x is the training samples, y = y(x) is the corresponding desired output, L denotes the number of layers in the network, and α L = α L (x) is the vector of activations output from the network when x is input. The proposed sentiment assessment model employs LSTM, which is a variant of a recurrent neural network (RNN). LSTMs help preserve the error that can be back-propagated through time and layers. They allow RNN to learn continuously over many time steps by maintaining a constant error. RNN maintains memory which distinguish itself from the feedforward networks. LSTMs contain information outside the normal flow of the RNN in a gated cell. The process of carrying memory forward can be expressed mathematically as: where h t is the hidden state at time t. W is the weight matrix, and U is the transition matrix. φ is the activation function. There is a large body of literature concerning people's reactions to events expressed in social media, which generally can be distinguished by the type of the event the response is related to and by the aim of the study [11] . Types of events cover natural disasters, health-related events, criminal and terrorist events, and protests, to name a few. A recent emerging field of sentiment analysis and affective computing deals with exploiting social media data to capture public opinion about political movements, response to marketing campaigns and many other social events [12] . Studies have been conducted for various purposes including examining the spreading pattern information on Twitter on Ebola [13] and on coronavirus outbreak [14] , tracking and understanding public reaction during pandemics on twitter [15] , [16] , investigating insights that Global Health can draw from social media [17] , conducting content and sentiment analysis of tweets [18] . Sentiment analysis on Twitter data has been an area of wide interest for more than a decade. Researchers have performed sentiment polarity assessment on Twitter data for various application domains such as for donations and charity [19] , students' feedback [20] , on stocks [21] - [23] , predicting elections [24] , and understanding various other situations [25] . Most approaches found in the literature have performed lexicon-based sentiment polarity detection via a standard NLP-pipeline (pre-processing steps) and POS tagging steps for SentiWordNet, MPQA, SenticNet or other lexicons. These approaches compute a score for finding polarity of the Tweet's text as the sum of the polarity conveyed by each of the micro-phrases m which compose it [26] , given as: where w pos ( term j ) is greater than 1 if pos(term j ) = adverbs, verbs, adjectives, otherwise 1. The abundance of literature on the subject cited led Kharde and Sonawane [27] and others [28] - [30] to present a survey on conventional machine learning-/ lexicon-based methods to deep learning-based technique respectively, to analyze tweets for polarity assessment, i.e., positive, negative, and neutral. The authors in [31] address the issue of spreading public concern about epidemics using Twitter data. A sentiment classification approach comprising two steps is used to measure people's concerns. The first step distinguishes personal tweets from the news, while the second step separates negative from non-negative tweets. To achieve this, two main types of methods were used: 1) an emotion-oriented, clue-based method to automatically generate training data, and 2) three different Machine Learning (ML) models to determine the one which gives the best accuracy. Akhtar et al. [32] proposed an stacked ensemble method for predicting intensity present in an opinion. For example, the porpoised model is able to differentiate positive intensities for 'good' and 'awesome'. The authors propose three models based on convolutional neural network (CNN), long-short term memory (LSTM) and gated recurrent unit (GRU) to evaluate emotion in generic domain and sentiment analysis in financial domain. Exploratory sentiment classification in the context of COVID-19 tweets is investigated in the study conducted by Samuel et al. [33] . Two machine learning techniques, namely Naïve Bayes and Logistic Regression, are used to classifying positive and negative sentiment in tweets. Moreover, the performance of these two algorithms for sentiment classification is tested using two groups of data containing different lengths of tweets. The first group comprises shorter tweets with less than 77 characters, and the second one contains longer tweets with less than 120 characters. Naïve Bayes achieved an accuracy of 91.43% for shorter tweets and 57.14% for longer tweets, whereas, a worse performance is obtained by Logistic Regression, with an accuracy of 74.29% for shorter tweets and 52% for longer tweets, respectively. After the lockdown on the COVID-19 outbreak, Twitter sentiment classification of Indians is explored by the authors in [34] . A total of 24,000 tweets collected from March 25 th to March 28 th , 2020 using the two prominent keywords: #IndiaLockdown and #IndiafightsCorona are used for analysis. The results revealed that even though there were negative sentiments expressed about the lockdown, tweets containing positive sentiments were quite present. Hasan et al. [35] utilized the Circumplex model that characterizes affective experience along two dimensions: valence and arousal for detecting emotions in Twitter messages. The authors build the lexicon dictionary of emotions from emotional words from LIWC 3 (Linguistic Inquiry & Word Count). They extracted uni-grams, emoticons, negations and punctuation as features to train conventional machine learning classifiers in a supervised manner. They achieved an accuracy of 90% on tweets. The study conducted by Fung et al. [36] examines how people reacted to the Ebola outbreak on Twitter and Google. A random sample of tweets are examined, and the results showed that many people expressed negative emotions, anxiety, anger, which were higher than those expressed for influenza. The findings also suggested that Twitter can provide valuable information on people's anxiety, anger, or negative emotions, which could be used by public authorities and health practitioners to provide relevant and accurate information related to the outbreak. The authors in [37] investigate people's emotional response during the Middle East Respiratory Syndrome (MERS) outbreak in South Korea. They used eight emotions to analyze people's responses. Their findings revealed that 80% of the tweets were neutral, while anger and fear dominated the tweet concerning the disease. Moreover, the anger increased over time, mostly blaming the Korean government while there was a decline in fear and sadness responses over time. This observation, as per the authors, was understandable as the government was taking strict actions to prevent the infection, and the number of new MERS cases decreased as time went by. The important finding was that the surprise, disgust, and happiness were more or less constant. A similar study is conducted by the researchers in [14] . The study focuses on emotional reactions during the COVID-19 outbreak by exploring the tweets. A random sample of 18,000 tweets is examined for positive and negative sentiment along with eight emotions, including anger, anticipation, disgust, fear, joy, sadness, surprise, trust. The findings showed that there exists an almost equal number of positive and negative sentiments, as most of the tweets contained both panic and comforting words. The fear among the people was the number one emotion that dominated the tweets, followed by the trust of the authorities. Also, emotions such as sadness and anger of people were prevalent. We used two tweets' datasets in this study to detect sentiment polarity and emotion recognition. Trending hashtag # data explained in IV-A that we collected ourselves and the Kaggle dataset presented in subsection IV-B. We additionally used the Sentiment140 [38] and Emotional Tweets dataset [39] to train our proposed deep learning models. The reason for using these two particular datasets for training the model is: (i) the availability of manually labeled state-of-the-art dataset and (ii) the lack of labeled tweets extracted from Twitter. The focus of our study is six neighboring countries from three continents having similar cultures and circumstances. These include Pakistan, India, Norway, Sweden, USA, and Canada. We specifically opted for these six countries for cross-cultural analysis due size, approach adopted by respective governments, popularity and cultural similarity. The study employs retrieving and collecting trending hashtag # tweets ourselves due to the lack of publicly available datasets for the initial period of COVID-19 outbreak. For instance, #lockdown was trending across the globe during February 2020; #StayHome was trending in Sweden, while COVID-19 was trending throughout the period February -April 2020. Figure 3 shows the total number of tweets per country for trending hashtags # between 3 rd February to 29 th February 2020. We only retrieved the trending hashtag # tweets from across six countries mentioned earlier for the initial phase of the pandemic for this study. A standard Twitter search API, known as Tweepy, is used to fetch users' tweets. Multiple queries are executed via Tweepy containing trending keywords #Coronavirus, #COVID_19, #COVID19, #COVID19Pandamic, #Lockdown, #StayHomeSaveLives, and #StayHome for the period where S d is the starting date, i.e. when the first case of the corona patient is reported in a given country/region and E d is the end date. The keywords are chosen based upon the trending keywords during T p . Only tweets in English for a given region are cataloged for further processing containing Tweet ID, text, user name, time, and location. PRISMA 4 approach is adopted in this study to query COVID-19 related tweets and to filter out the irrelevant ones. Following pre-processing steps are applied to clean the retrieved tweets: 1) Removal of mentions and colons from tweet text. 2) Replacement of consecutive non-ASCII characters with space. 3) Tokenization of tweets. 4) Removal of stop-words and punctuation via NLTK library. 5) Tokens are appended to obtain cleaned tweets. 6) Extraction of emoticons from tweets. The following items are cataloged for each tweet: Tweet ID, Time, Original Text, Cleaned Text, Polarity, Subjectivity, User Name, User Location and Emoticons. A total of 27,357 tweets were extracted after pre-processing and filtering, as depicted in Table 1 . We further went on to include Tweets for the period of March to April 2020 from the publically available dataset since after data-preparation, we were left with a small number of tweets from Nordic countries. Table 1 shows the number of tweets per country under consideration for the Kaggle dataset 5 from 12 th of March to 30 th April 2020. The total number of tweets is 460,286, out of which USA tweets contribute 73%. The hashtags # applied to retrieve Kaggle dataset tweets include #coronavirus, #coronavirusoutbreak, #coron-avirusPandemic, #covid19, #covid_19. From 17 th March till the end of the April two more hashtags were included, i.e., # epitwitter, #ihavecorona. We used the Sentiment140 dataset from Stanford [38] for training our sentiment polarity assessment classifier -A, presented in section V-A. The dataset includes two class labels, positive and negative. Each label contains 0.8 million tweets, a staggering number of a total of 1.6 million tweets. We particularly opted for this dataset to train our deep learning models in a supervised manner due to the unavailability of labeled tweets related to COVID-19. Emotional Tweets dataset is utilized in this study to train classifier B and classifier C for emotions recognition, described in V-B and V-C, respectively. The tagging process of this dataset is reported by Saif et al. in [39] . The dataset contains six classes as summarized in Table 2 . The first two labels, joy and surprise, are positive emotions, whereas the remaining four, sadness, fear, anger, and disgust, are negative emotions. The dataset comprises of 21,051 total number of labeled tweets. Literature suggests many attempts of tweets' sentiment analysis, but very few attempts of emotions' classification. Sentiment analysis on tweets refers to the classification of an input tweet text into sentiment polarities, including positive, negative and neutral, whereas emotions' classification refers to classifying tweet text in emotions' label including joy, surprise, sadness, fear, anger and disgust. Sentiment polarity certainly conveys meaningful information about the subject of the text; however, the emotion classification is the next level. It suggests if the sentiment about the subject is negative, then to what extent it is negativebeing negative with anger is a different state of mind than being negative and disgusted. Therefore, it is important to extend the task of sentiment polarity classification to the next level and identify emotion in negative and positive sentiment polarities. Literature suggests many attempts of sentiment polarity detection as well as emotion analysis, however to the best of our knowledge, the literature does not present any attempt which combines both these tasks in a two-staged architecture as proposed in this article. The rest of this section explains the working of each of the components in the abstract model, depicted in Figure 2 . All the models and Jupyter Notebooks developed for this article are available on paper's GitHub repository. 6 The first stage classifier in our model classifies an input tweet text in either positive or negative polarity. For this, we employed the Sentiment140 dataset explained in section IV-C -the most popular dataset for such polarity classification tasks. For developing our first stage model, we padded each input tweet to ensure a uniform size of 280 characters, which is standard tweet maximum size. To establish a baseline model, a simple deep neural network based on an embedding layer, max-pooling layer, and three dense layers of 128, 64, and 32 outputs were developed. The last layer uses sigmoid as an activation function, as it performs better in binary classification, whereas all intermediate layers use ReLU as an activation function. This baseline model splits 1.6 million tweets in training and test sets with 10% tweets (160,000 tweets) spared for testing the model. The remaining 90% tweets were further divided into a 90/10 ratio for training and model validation, respectively. The model training and validation was set to ten epochs; however, the model over fits immediately after two epochs, therefore, it was retrained on two epochs to avoid overfitting. The training and validation accuracy on the baseline models was 96% and 81%, respectively. Table 3 summarizes training and validation accuracy for each of the five proposed models along with model structures. Figure 4 shows structure of the best performing model i.e. LSTM with FastText model. Table 4 shows the F1 and the accuracy scores on test set -10% of the dataset comprising of 160,000 tweets equally divided into positive and negative polarities. The table also presents the previously best-reported accuracy and F1 score on the dataset, as reported in [40] . The model proposed in this article based on FastText outperforms all other models, including previously best-reported accuracy. Therefore, we choose this model as our first stage classifier to classify tweets in positive and negative polarities. The author of [40] uses different variants of convolutional neural network with different word embedding including BERT, the best accuracy reported is 81.1% using CNN with pre-trained word embedding. Mohammad et al. [41] also collected tweets from different sources including Sentiment140 and applied SVMs by extracting features from tweets such as n-gram, emoticons, number of hashtags etc. They reported a F1 score of 69.02%. Once the polarity from Classifier A is positive, the next step is to identify positive emotions in the tweet. In order to extract tweet emotions, we use the Emotional Tweets dataset presented in section IV-D. If the label from first stage Classifier A is positive, the text is applied to classifier A to determine exact positive emotions -joy or surprise. In order to extract positive emotions from the positive tweets, the negative emotions' labels were removed at Classifier B, leaving only two positive labels -joy and surprise. Repeating the same experiments as in Classifier A, the performance of five models was tested for this classification task. The test accuracy for each of these models is reported in Table 5 . The model based on Glove.twitter.27B.300d pre-trained embedding with LSTM outperforms the other four models; therefore, we use LSTM with GloVe embedding at this stage. The final classifier at the second stage is Classifier C, which classifies negative polarity tweets in negative emotions. As reported in Table 2 , although there are four labels in negative emotion category, however, we drop the forth category -disgust as it has very few instances and causes performance degradation for the dataset being imbalance. We performed experiments of remaining three labels on our five models. Table 6 summarises models' performance on 10% test data. Once again, the classifier based on LSTM with pre-trained embedding Glove.twitter.27B.300d outperforms the other four models; therefore, we use it for classifying negative polarity tweets in negative emotions -sadness, anger and fear. Figure 5 shows the structure of model for classifier B and C. The performance of classifier B and C, unlike classifier A, can not be compared with previously reported best accuracy as our work requires to split Emotion Tweets Dataset in two sub-dataset, first includes only positive emotions (joy and surprise) and second negative emotions (sad, anger, fear). VOLUME 8, 2020 As there were no pre-trained classifiers available in literature on these sub datasets we, therefore, are training our own classifiers for this work. The GloVe: Global Vector for Word Representation used in classifier B and C is a model for word representation trained on five corpora, a 2010 Wikipedia dump with 1 billion tokens; a 2014 Wikipedia dump with 1.6 billion tokens; Gigaword 5 which has 4.3 billion tokens; the combination Gigaword5 + Wikipedia2014, which has 6 billion tokens; and on 42 billion tokens of web data, from Common Crawl. The process of learning GloVe word embedding is explained in [42] . Similarly, FastText word embedding used in our Classifier A is an extension to word2vec model. FastText represents words as n-gram of characters. For example, to represent word computer with n = 3, the FastText representation is . A more detailed information on integration of general-purpose word embeddings like GloVe and FastText, and deep learning within a classification system can be found in [43] . In order to ensure that correct model choices for Classifiers A, B and C have been made, further experiments were performed to assess accuracy of latest trends in text classification. This section reports BERT, GRU and BiLSTM results on Senti-ment140 and Emotional Tweet Dataset. BERT developed by Google is a fusion of multiple deep learning techniques like bidirectional encoder LSTM and Transformers. Like any other word embedding, BERT is also a text representation technique and performs well in text classification, generation and summarization. Figure 6 shows summary of BERT model applied on Emotional Tweet Dataset on positive emotions i.e. joy and surprise. The accuracy on the 10% test data on joy and surprise classes is 78.12%, which is lower than 81% of LSTM + GloVe Twitter. Similarly, summary of BERT model applied on Emotional Tweet Dataset on negative emotions i.e. sad, anger and fear is shown in Figure 7 . Once again, the accuracy achieved by BERT model on 10% test data is 64.8% which is lower than LSTM + GloVe as reported in Table 6 . Finally, Figure 8 shows BERT model on Sentiment140, the accuracy achieved on 10% test instances by the model is 77% which is less than 82.4% achieved by LSTM+FastText on same dataset. Our next set of experiments included applying BiLSTM on Classifier A, B and C. Table 7 shows results of BiLSTM on Sentiment140 and Tweet Emotion Datasets. In each of these cases, it can be observed that the performance of BiLSTM is not better than LSTM + FastText for Sentiment140 and LSTM + GloVe for Tweet Emotion Dataset, therefore it is not considered in actual COVID-19 tweets classification. Similarly, our last set of experiments in search of an appropriate model for text classification included implementation of GRU. Figure 9 shows the GRU model we exploited on Sentiment140 and Tweeet Emotion Dataset. Table 8 shows performance of GRU on both the datasets. Based on our experiments, we reach to the conclusion that Classfier A which classifies input tweet in positive and negative polarities and trained on Sentiment140 should be based on LSTM with FastText as pre-trained word embedding, whereas our classifiers B and C which classify positive tweets in joy & surprise and negative tweets in sadness, fear and anger should be based on LSTM with GloVe Twitter pre-trained word embedding. The lack of ground truth i.e. labeled Tweets for queried test dataset for sentiment assessment concerning COVID-19, required the use of emoticons as a mechanism to validate the detected results into positive and negative polarities, as well as for emotions. We, therefore, propose the use of emoticons extracted from tweets to check whether a tweet's polarity and emotions reflect the sentiments depicted via emoticons the same or no. It may not be a perfect system, but a way to assess the accuracy of more than a million tweets via our proposed classifiers in a weakly supervised manner. The use of emoticons in sentiment analysis is not something new. In fact, there is an abundance of literature that supports the notion of utilizing emoticons in sentiment analysis [44] , [45] . However, rather than using emoticons for sentiment detection, we use them for validating our model's performance. The emoticons were grouped into six categories, as described in Table 2 . The type and description of emoticons used are depicted in Table 9 for each group category. We had a total number of 460,286 tweets from six selected countries in the English language. Out of these tweets, 443,670 tweets did not contain any emoticon, whereas VOLUME 8, 2020 11,110 tweets used positive emoticon (joy, surprise), and 5,674 used negative emoticon (sad, disgust, anger, fear). The remaining tweets used a mix of emoticons like joy with disgust, anger with surprise, etc.; therefore, these usages of emoticons were considered sarcastic expressions of emotion, thus being excluded in the validation process. We tested our Model #2 presented in Table 4 (based on LSTM + FastText trained on Sentiment140) on the remaining 16,784 positive and negative tweets. We used these 16,784 tweets as test data to assess model accuracy. The emoticons were considered actual labels and the model predicted the labels on the tweet text. The model achieved an accuracy of 76% and an F1 score of 78%. This indicates that our model is reasonably consistent with the users' sentiments expressed in terms of emoticons. The reason of good accuracy achieved in the validation phase is that our process of validation is indeed the same as the process used in preparing the Sentiment140 datasetthe dataset on which our model is based upon for sentiment polarity assessment. The proposed Model #2, which achieved state-of-the-art polarity assessment accuracy on the Sentiment140 dataset, was used to detect polarity and emotions on the trending hashtag # data. Figure 10 shows the side-by-side country-wise comparison of sentiment polarity detection for the initial period of four weeks. The sentiments are normalized to 0 -1 as the sum of tweets per day/total number of tweets for a given country. As can be seen from the graphs illustrated in Figure 10 , there were only a few tweets concerning the coronavirus outbreak posted over almost all the month of February. There were also few days where no tweets have been posted, especially in Pakistan and India. It is interesting to note that the number of tweets is rapidly increased only in the last 2-3 days of February, and all six countries see this growing trend among Twitter users for sharing their attitudes, i.e., positive and negative about coronavirus. The graphs between neighboring Sweden and Norway (top-row) and that of Canada and USA (middle-row) have a similar pattern of tweets' emotions, unlike Pakistan and India (bottom-row). In India, people's reaction seems quite strong, as evident from the average number of positive and negative posts (yellow and blue horizontal line). The reason could be the early outbreak of COVID-19 in India, i.e., 30 th of January 2020. A similar pattern was observed for Canada probably because they had their first positive case reported during the same time as well. A. POLARITY ASSESSMENT ANALYSIS BETWEEN NEIGHBOURING COUNTRIES Figure 11 gives an overview of the side-by-side countrywise sentiment for both negative and positive polarity. The sentiments are normalized to 0 -1. As can be seen in Figure 11 (top-left) , the attitudes of Swedes over coronavirus outbreak has changed over time. The peak of negative comments expressed in twitter is registered on March 22. This was a day before the Prime Minister had a rare public appearance addressing the nation over the coronavirus outbreak. It is fascinating to note that on the day of Prime Minister's speech there exists an equal number of positive (top-right) and negative (top-left) sentiments, while a day after, the positive emotions dominated the tweets showing Swedes' trust in Government with respect to the outbreak. There is an equal number of negative sentiments for both Norway and Sweden over the entire period, whereas the average polarity for positive sentiments is higher in the case of Sweden compared to Norway. A gradual decline in positive trends for Norway can be observed in (top-right) plot in the figure. Till May 1 st , 2020, the positive sentiments (blue line) for Norway were above the average (orange line), after which it started to decline. Figure 12 shows the actual number of persons tested positive in Norway during the same period (data source 7 ). The percentage of positive cases in the chart is based upon the total number of persons tested each day. The number of positive registered cases started increasing the second week of March 2020 till the first week of April, after which it dropped, which is in line with the sentiments expressed by the users which started to decline during the same week ( Figure 11 , top-left and -right). The trend between the positive and negative sentiments between Pakistan and India and that of the USA and Canada are very similar, as evident from the middle and bottom charts in Figure 11 . A closer look at the average sentiments between Pakistan and India reveals that the Indians expressed higher negative sentiments than Pakistanis (middle-left). Also, a significant number of positive posts appeared for Pakistan (middle-right), which showed that the people showed some trust in Government's decision. It partially is attributed to Pakistan's Prime Minister address to the nation on coronavirus on multiple occasions (March 17 th and March 22 nd ) before the lockdown. It is worthy to note that the first case in India was reported on 30 th January 2020 and for Pakistan on 26 th February 2020, however, both countries went into the lockdown around the same time, i.e., 21 st of March for Pakistan and 24 th of March for India. Table 10 shows when the first COVID-19 case was reported in the given country and the day it went into the lockdown. It is also worth mentioning here that at the beginning of April, the number of tweets declined, and so does the sentiments representation, which dropped below the average for all the countries except Sweden, where still a significant number of positive sentiments can be observed (topright). Moreover, Pakistan had the least negative sentiments (i.e., avg = 0.201 -yellow line -(middle-left)), whereas, Swedes were more positive (i.e., avg = 3.98 -yellow linetop-right)). This could be attributed to the fact that most of the businesses run as usual in Sweden. In the case of Pakistan, the number of cases during the initial period was still low, as anticipated by the Government. Additionally, people did not observe the standard operating procedures enforced by the state much, despite the country was in lockdown. A similar trend was observed in India; however, the Government there had a much strict shutdown, though it came quite late since the first case was reported late January, which may have triggered more negative posts than positive. We observed that there was a visible difference between the sentiments expressed by the people of Norway and Sweden ( Figure 11 ). We further analyze these two countries in detail in our study of emotional behavior. The results are depicted in Figure 13 . Positive emotions are presented in the left figures and negative emotions on the right -the graph shows which emotions are dominated over a period of time. The graph is scaled between 0 to 25 for better readability. It represents the accumulative emotions stacked on top of each other. As we can see from Figure 13 (top-left), in both countries, the joy dominates the positive tweets whereas sad and fear are the most commonly shared negative emotions, with anger being less shared. The pattern, in particular, for Norway is in line with the actual statistics for positive cases reported by the Norwegian Institute of Public Health (NIPH) (Figure 12 ). Additionally, we analyze the Pearson correlation between neighboring countries to see the sentiment polarity and emotion trend during the COVID-19 lockdown. As can be seen in Table 11 , there is a high correlation between USA and Canada (US-CA), and Pakistan and India (PK-IN), unlike between Norway and Sweden (NO-SW). The correlation between (NO-SW) is around 50% for negative and 40% for positive sentiments. This shows that the sentiments expressed in tweets on Twitter by the people of both countries were different during the same period. A possible reason for this is the different approach that these two countries have taken over the outbreak. Similar trend can be observed for emotions depicted in Table 12 . Pakistan and India have the highest correlation across all five emotions, followed by the USA and Canada. While Norway and Sweden have the least number of tweets sharing common polarity, as evident from the emotions ''surprise'' and ''anger''. A possible explanation for this is the response of people to respective Governments' decision on Following the detected sentiment and emotions by the proposed model and the analysis of results presented in previous subsections, for (RQ1), it is safe to assume that NLP-based deep learning models can provide, if not enough, some cultural and emotional insight across cross-cultural trends. It is still difficult to say to what extent, as for non-native English speaking countries, the number of tweets was far less than those of the USA for any statistically significant observations. (RQ2) Nevertheless, the general observations of users' concern and their response to respective Governments' decision on COVID-19 resonates with sentiments analyzed from the tweets. (RQ3) It was observed that the there is a very high correlation between the sentiments expressed between the neighbouring countries within a region (Table 11 and 12) . For instance, Pakistan and India, similar to the USA and Canada, have similar polarity trends, unlike Norway and Sweden. (RQ4) Both positive and negative emotions were equally observed concerning #lockdown; however, in Pakistan, Norway, and Canada the average number of positive tweets was more than the negative ones ( Figure 10 and 11 ). Although, the work carried out in this study draws a reasonable portrait of different cultures' reactions to COVID-19 pandemic, however it is impossible to cover all the aspects of such a vast domain. This section presents the limitations of our work. 1) This work compares the performance of different word embedding like GloVe, BERT and uses different variants of RNN including LSTM, BiLSTM and GRU, however it does not assess the performance of other deep neural networks like convolutional neural networks (CNN) and its variants. Certainly, there is a possibility a different architecture may perform better on the datasets used in this work i.e. Sentiment140 and Emotion Tweet Dataset. 2) The deployed word embedding models in this work including BERT, GloVe and GloVe Twitter fail to capture the context. The proposed architecture is not able to understand the context, especially when it is sarcastic. Human brain has an extraordinary capability to understand the similar words' usage in real as well as in sarcastic sense. The proposed work is not able to discriminate in such contexts. for the reason that there are rich resources available in the language which includes availability of datasets and word embedding, whereas resource-poor languages like Urdu, Hindi and other regional languages are not covered in this work. There is a sizeable population on social media using local languages to express their opinion and emotion. 4) This work only uses Twitter for sentiment and emotion extractions, whereas other social media platforms like Facebook, Instagram etc. are not covered for keeping this work to a manageable complexity level. Any attempt to fully understand the citizens' sentiment to any phenomena should consider a variety of social media platforms. 5) As Twitter allows a maximum number of 280 characters in a tweet, there is an increasing trend of writing long tweets in image format. Our work is not able to process any text available in image format. 6) Another method of expressing emotions and sentiment on Twitter and other social media platforms is through images and video. These images or video may not necessarily include any text but still these would convey users' sentiments about any topic, presently our work does not extract emotions or sentiments from multimedia contents. 7) Another popular trend on social media is use of roman Urdu or roman Hindi which refers to writing these languages in English alphabet. Our work does not address these aspects of the languages. This article aimed to find the correlation between sentiments and emotions of the people from within neighboring countries amidst coronavirus (COVID-19) outbreak from their tweets. Deep learning LSTM architecture utilizing pre-trained embedding models that achieved state-ofthe-art accuracy on the Sentiment140 dataset and emotional tweet dataset are used for detecting both sentiment polarity and emotions from users' tweets on Twitter. Initial tweets right after the pandemic outbreak were extracted by tracking the trending hashtags# during February 2020. The study also utilized the publicly available Kaggle tweet dataset for March -April 2020. Tweets from six neighboring countries are analyzed, employing NLP-based sentiment analysis techniques. The paper also presents a unique way of validating the proposed model's performance via emoticons extracted from users' tweets. We further cross-checked the detected sentiment polarity and emotions via various published sources on the number of positive cases reported by respective health ministries and published statistics. Our findings showed a high correlation between tweets' polarity originating from the USA and Canada, and Pakistan and India. Whereas, despite many cultural similarities, the tweets posted following the corona outbreak between two Nordic countries, i.e., Sweden and Norway, showed quite the opposite polarity trend. Although joy and fear dominated between the two countries, the positive polarity dropped below the average for Norway much earlier than the Swedes. This may be due to the lockdown imposed in Norway for a good month and a half before the Government decided to ease the restrictions, whereas, Swedish Government went for the herd immunity, which was equally supported by the Swedes. Nevertheless, the average number of positive tweets was higher than the average number of negative tweets for Norway. The same trend was observed for Pakistan and Canada, where the positive tweets were more than the negative ones. We further observed that the number of negative and positive tweets started dropping below the average sentiments in the first and second week of April for all six countries. This study also suggests that NLP-based sentiment and emotion detection can not only help identify cross-cultural trends but is also plausible to link actual events to users' emotions expressed on social platforms with high certitude, and that despite socio-economic and cultural differences, there is a high correlation of sentiments expressed given a global crisis -such as in the case of coronavirus pandemic. Deep learning models on the other hand can further be enriched with semantically rich representations using ontology as presented in [46] , [47] for effectively grasping one's opinion from tweets. Furthermore, a word is known through its company very much the same as it applies to human beings. For example, 'kids are playing cricket' and 'you are playing with my emotions', here the word 'playing' has a different meaning depending upon which other words are in its company. The word embedding used in this article does not capture the word context. ELMo is a large scale context-sensitive word embedding model that can be explored in the future to improve the performance of classifiers A, B and C in the proposed model [48] . Moreover, advanced seq2seq type language models as word embedding can be explored as future work. Till to date (i.e., the first week of May 2020), the pandemic is still rising in other parts of the world, including Brazil and Russia. It would be interesting to observe more extended patterns of tweets across more countries to detect and assert people's behavior dealing with such calamities. Especially, tweets and other social media platforms' post in local languages like Urdu, Hindi, Swedish etc. may reveal even more interesting patterns related to pandemic. We hope and believe that this study will provide a new perspective to readers and the scientific community interested in exploring cultural similarities and differences from public opinions given a crisis, and that it could influence decision makers in transforming and developing efficient policies to better tackle the situation, safe-guarding people's interest and needs of the society. Are there cross-cultural differences in reasoning? The Geography of Thought: How Asians and Westerners Think Differently... and Why Why is Denmark's Coronavirus Lockdown so Much Tougher Than Sweden'S? Affect Imagery Consciousness: The Positive Affects From affect programs to dynamical discrete emotions An argument for basic emotions Toward a consensual structure of mood A circumplex model of affect A general psychoevolutionary theory of emotion,'' in Theories Emotion A fast learning algorithm for deep belief nets A conceptual framework for studying collective reactions to events in location-based social media Affective computing and sentiment analysis How did ebola information spread on Twitter: Broadcasting or viral spreading? Informational flow on Twitter-Corona virus outbreak-topic modelling approach Twitter informatics: Tracking and understanding public reaction during the 2009 swine flu pandemic How people react to zika virus outbreaks on Twitter? A computational content analysis What insights can global health draw from social media? Pandemics in the age of Twitter: Content analysis of Tweets during the 2009 H1N1 outbreak Sentiment analysis of Twitter data Weakly supervised framework for aspect-based sentiment analysis on students' reviews of moocs Real-time sentiment analysis of Twitter streaming data for stock prediction Sentiment analysis of Twitter data for predicting stock market movements Integrating StockTwits with sentiment analysis for better prediction of stock price movement Prediction and analysis of indonesia presidential election from Twitter using sentiment analysis CNN for situations understanding based on sentiment analysis of Twitter data A comparison of lexiconbased approaches for sentiment analysis of microblog posts Sentiment analysis of Twitter data: A survey of techniques Deep learning for sentiment analysis: A survey Like it or not: A survey of Twitter sentiment analysis methods Techniques for sentiment analysis of Twitter data: A comprehensive survey Twitter sentiment classification for measuring public health concerns How intense are you? Predicting intensities of emotions and sentiments using stacked ensemble [Application Notes COVID-19 public sentiment insights and machine learning for tweets classification Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India Emotex: Detecting emotions in twitter messages Ebola and the social media Analyzing emotions in Twitter during a crisis: A case study of the 2015 middle east respiratory syndrome outbreak in korea Twitter sentiment classification using distant supervision WASSA-2017 shared task on emotion intensity Sentiment analysis of tweets using deep neural architectures NRC-canada: Building the State-of-the-Art in sentiment analysis of tweets Glove: Global vectors for word representation Integrating word embeddings and document topics with deep learning in a video classification framework Semantic textual similarity of sentences with emojis The impact of sentiment on content post popularity through emoji and text on social platforms The impact of deep learning on document classification using semantically rich representations An improved concept vector space model for ontology based classification Deep learning based text classification: A comprehensive review ALI SHARIQ IMRAN (Member, IEEE) received the master's degree in software engineering and computing from the National University of Science and Technology (NUST), Pakistan, in 2008, and the Ph.D. degree in computer science from the University of Oslo (UiO), Norway, in 2013. He is currently associated as an Associate Professor with the Department of Computer Science, Norwegian University of Science and Technology (NTNU), Norway. He is also a member of the Norwegian Colour and Visual Computing Laboratory (Colourlab), Norway Section. His research interests include deep learning technology and its application to signal processing, natural language processing, and the semantic web. He has over 65 peer-reviewed journals and conference publications to his name and has served as a Reviewer for many reputed journals over the years, including IEEE ACCESS as an Associate Editor.SHER MUHAMMAD DAUDPOTA received the master's and Ph.D. degrees from the Asian Institute of Technology, Thailand, in 2008 and 2012, respectively. He is currently serving as a Professor of computer science with Sukkur IBA University, Pakistan. Alongside his Computer Science contribution, he is also a Quality Assurance Expert in higher education. He has reviewed more than 50 universities in Pakistan for quality assurance on behalf of Higher Education Commission in the role of an Educational Quality Reviewer. He is the author of more than 35 peer-reviewed journal and conference publications. His research interests include deep learning, natural language processing, and video and signal processing.ZENUN KASTRATI received the master's degree in computer science through the EU TEMPUS Programme developed and implemented jointly by the University of Pristina, Kosovo, the Universite de La Rochelle, France, and the Institute of Technology Carlow, Ireland; and the Ph.D. degree in computer science from the Norwegian University of Science and Technology (NTNU), Norway, in 2018. He is currently associated as a Postdoctoral Research Fellow with the Department of Computer Science and Media Technology, Linnaeus University, Sweden. Prior to joining Linnaeus University, he was employed as a Lecturer with the University of Pristina. He is the author of several papers published in international journals and conferences and has served as a Reviewer for many reputed journals. His research interests include artificial intelligence with a special focus on NLP, machine learning, semantic web, and learning technologies.RAKHI BATRA received the B.S. degree in computer science and the M.S. degree in data and knowledge engineering from Sukkur IBA University, Sukkur, Pakistan, in 2015 and 2019, respectively. Since 2018, she has been working as the Assistant Manager of the ORIC Department, Sukkur IBA University. Her research interests include knowledge discovery, data mining, artificial intelligence, and deep learning.