key: cord-0457570-j7ruhpw1
authors: Abdukhamidov, Eldor; Juraev, Firuz; Abuhamad, Mohammed; AbuHmed, Tamer
title: An Exploration of Geo-temporal Characteristics of Users' Reactions on Social Media During the Pandemic
date: 2021-03-24
journal: nan
DOI: nan
sha: 56557508af20d35e409c159e8b5bedd6f26615a0
doc_id: 457570
cord_uid: j7ruhpw1

During the outbreak of the COVID-19 pandemic, social networks become the preeminent medium for communication, social discussion, and entertainment. Social network users are regularly expressing their opinions about the impacts of the coronavirus pandemic. Therefore, social networks serve as a reliable source for studying the topics, emotions, and attitudes of users that are discussed during the pandemic. In this paper, we investigate the reactions and attitudes of people towards topics raised on social media platforms. We collected data of two large-scale COVID-19 datasets from Twitter and Instagram for six and three months, respectively. The paper analyzes the reaction of social network users on different aspects including sentiment analysis, topics detection, emotions, and geo-temporal characteristics of our dataset. We show that the dominant sentiment reactions on social media are neutral while the most discussed topics by social network users are about health issues. The paper examines the countries that attracted more posts and reactions from people, as well as the distribution of health-related topics discussed in the most mentioned countries. We shed light on the temporal shift of topics over countries. Our results show that posts from the top-mentioned countries influence and attract more reaction worldwide than posts from other parts of the world.

The COVID-19 has been spread rapidly around the world since December 2019. The disease is considered a pandemic that impacted many countries across all living continents. Due to its consequences (infection, increasing death rate), governments developed policies to decrease the virus's spread. Quarantine, social distancing are among the measures taken by governments as emerging actions. Such measures have led to stopping social events, starting online lectures for pupils, college, and university students, reducing working hours at workplaces, and turning into telecommuting. As a result, social networks have become the main platform for people to express opinions and share information. Social media, such as Instagram and Twitter, has grown due to measures like social distancing.

• We collected a large-scale dataset of users' posts and tweets from Twitter and Instagram.

• Sentiment analysis has been conducted for social networks based on five sentiment categories: very negative, negative, neutral, positive, and very positive. • Topic modeling analysis on the sentiment categories was carried out to identify overall public reactions to COVID-19 related tweets and posts. • Exploration of geo-temporal characteristics based on countries was performed to recognize countries that are mentioned in COVID-19 tweets and posts (refer them as top countries). • We also performed sentiment analysis of top countries to determine people's emotional responses about mostly mentioned countries. • Exploiting the applied topic modeling technique, we extracted the major discussed topics on the top mentioned countries in our dataset. • An analysis based on word2vec embedding was conducted to discover the shift of words and topics used with top countries over time. • The study provided locality analysis of top countries to find countries where tweets and Instagram posts about top countries are published.

The rest of the paper is organized as follows. Section 2 explores the related work on studying people's reactions on social networks during the COVID-19 pandemic. Section 3 presents the methods used for the data collection and a description of the collected data. Section 4 shows our sentiment analysis on the collected dataset. The per-sentiment topic modeling is described in Section 5. We provide the geo-temporal analysis on the collected dataset in Section 6 and conclude in Section 7.

Since January 2020, several studies have explored the impact of COVID-19 on people's daily lives. In this section, we review the main related work in social networks analysis for COVID-19. The work of Schild et al. [10] was among the first studies on analyzing people's Sinophobic behavior on social media. Their work included studying two datasets of posts from Twitter and 4-chan's (/pol/) that were published between November 1, 2019, and March 22, 2020. Both datasets were used to explore whether there was an important change regarding the spreading of Sinophobic contents among social network users. The authors trained three word2vec models over the social network data to study the use of words and the major differences in using the most similar words, and to observe the development of new terms. The authors observed that COVID-19 caused the emergence of new Sinophobic slurs and a trend of blaming China for the COVID-19 pandemic.

Ordun et al. [11] proposed a paper to answer prevalent COVID-19 related questions about trends, news-making events, topics, retweets, and COVID-19 networks, by analyzing 23,830,322 tweets posed during the time period between March 24, 2020, and April 9, 2020. The authors revealed the top trends in tweets using Keyword Trend Analysis. They adopted a topic modeling stage using Latent Dirichlet Allocation (LDA) to spot the events that cause sparks in COVID-19 tweets. Moreover, they used Uniform Manifold Approximation and Projection (UMAP) to find unique topics. The authors also applied network modeling to derive how social media reacts to the spread of COVID-19. Their findings pointed out the huge attention/reaction given to the live White House Coronavirus Briefings and topics related to healthcare and government reactions. We note that the authors used a pre-filtered dataset with 13 healthcare-related terms from the Twitter Streaming API and reduced the dataset size by nearly 77% from 23,830,322 tweets to 5,506,223 tweets when removed the retweets.

Li et al. [12] conducted a comprehensive analysis of datasets collected from Twitter and Weibo posted between January 20, 2020, and May 11, 2020. Six emotions containing anger, disgust, fear, happiness, sadness, surprise are identified based on the user-generated content. Authors compare people's emotions in the United States and China to understand the different reactions towards COVID-19. Using NLP, the authors found the causes of public emotions, e.g., the reason for angriness, surprise, and worry. The study indicated a strong contrast in people's opinions about COVID-19 in different countries. However, the study was limited to two countries, i.e., the USA and China. The authors concluded with a suggestion to use real-time emotion analysis for methods and procedures for the fight against the global crisis.

Sharma et al. [13] proposed a dashboard for misinformation tracking on Twitter. Their work follows Coronavirus-related discussions over time based on Twitter data between March 1, 2020, and May 3, 2020, to determine the incorrect and deceptive contents. The authors conducted an analysis of public sentiments on specific data that are filtered by keywords such as "#workfromhome" and "#socialdistance". They also analyzed topics in Twitter conversations, extracted hashtags that are emerging in different countries, and utilized tweets about countries to estimate the public perception. However, the dataset includes tweets from only two months, limiting the insights for long-term users' behavior on social media.

Lamsal [5] collected a large-scale Twitter dataset of English tweets related to COVID-19 for the period between March 20, 2020, and July 17, 2020. In their work, authors used the dataset and its filtered version that includes only geo-tagged tweets for sentiment analysis and network analysis. Tweets collected between April 24, 2020, and July 17, 2020, were utilized to study significant drops in the average sentiment over the period by generating the sentiment trend graph. The dataset that consists of only geo-tagged tweets was used to conduct network analysis; furthermore, it was used to generate the sentiment-based world and regional maps. The study identified 12 different communities based on the usage of similar hashtags within the dataset. The author also presented a set of popular hashtags and their associated communities. 

We collected two large-scale COVID-19 related datasets from Twitter and Instagram to study the topics discussed and people's emotions on social networks during the pandemic. This section describes the collected datasets.

Twitter For our Twitter data collection, we collected 131,083,839 tweets posted between January 21, 2020, and June 19, 2020. We leveraged many tools, such as Hydrator [25] and Twarc [26] , to rehydrate the Tweet IDs that are publicly available by Chen et al. [22] . Moreover, we utilized twint [27] , an advanced Twitter scraping & OSINT tool in Python, to enrich our data collection. To concentrate on the pandemic-related tweets, we targeted tweets with hashtags, such as "#covid19", "#corona", "#staysafe", "#covid_19", "#covid2019", "#lockdown", "#stayhome", "#quarantinelife", "#coronacrisis", "#coronavirus", "#quarantine", etc. After filtering out retweets, i.e., focusing on unique tweets, the dataset was reduced by around 63% (i.e., from 131,083,839 to 48,387,435 unique tweets). Table 2 shows some details about the collected tweets in terms of length and count.

Instagram The Instagram posts are gathered using the open-source project called Instaloader [28] and the publicly available Instagram Posts IDs provided by Zarei et al. [29] . Our collection of Instagram posts includes a total of 3,843 posts from January 5, 2020, to March 30, 2020. Concentrating on the pandemic-related posts, we collected posts with hashtags, such as "#coronavirus", "#covid19", "#covid_19", and "#corona". We filtered out non-English posts and focused on analyzing posts written in English. The Instagram dataset contains 2,052 English posts in total. Table 3 shows some details about the collected Instagram dataset.

Although several studies have conducted sentiment analysis on social network data [16, 13, 24] , the main purpose of this research is to conduct a comprehensive analysis to understand people's worldwide reaction to the pandemic. Therefore, this study aims to incorporate sentiment analysis to gain insights about people's reactions to topics and trends in different time periods so it becomes feasible to discover geo-temporal patterns of users' behavior. This section describes the methods and results of the sentiment analysis on our datasets.

Methods Before conducting our analysis on the collected datasets, we adopted a pre-processing stage which includes cleaning the data, removing URL links, emails, user mentions, punctuations, stopwords, and converting emojis and emoticons into words. As deep learning techniques has shown powerful performance in intelligent processing of data in several domain [30, 31, 32, 33] , we leveraged Stanford deep learning based NLP model called CoreNLP for the sentiment analysis [34] . CoreNLP toolkit provides an extensible pipeline supporting core NLP tasks, and it is based on a compositional model over binarized trees of sentences using deep learning [35] , and it contains a collection of pre-trained models. Due to the Figure 1 illustrates the CoreNLP architecture, starting from providing an input content and ending with an output of all the analysis information as a JSON file. The output includes the sentiment score represented with a value ranging from zero to four, where four means that the input is very positive and zero means the input is very negative.

Twitter We performed the analysis on all tweets based on the five sentiment categories (very negative, negative, neutral, positive, and very positive). Figure 2 (a) represents the distribution of sentiments in all the collected tweets between January 24, 2020, and June 24, 2020. Observing the sentiment during the data collection timeline, Figure 2 shows that most of the tweets are neutral in general. However, most tweets in June are negative. In March, April, and June, there are significant spikes that occurred, as well as we can consider the rise in January as a spike because of a small but marked increase. These spikes are highlighted in the figure with vertical yellow dashed lines to indicate which part of the dataset is used to identify topics discussed on each sentiment type in the next section.

Instagram The same sentiment categories (very negative, negative, neutral, positive, very positive) are applied to the Instagram dataset. Figure 2 (b) shows the sentiment analysis of collected posts between the period of January 7, 2020, and March 31, 2020. In the figure, it is shown that the neutral category is dominant over others. In contrast with the results of Twitter, the second dominant category is positiveness while it is the negative type in Figure 2 (a). The result has two spikes in February and March posts (highlighted with vertical black dashed lines).

In this section, we conduct the topic modeling task on data collected during spikes of users' reactions, i.e., covering different sentiment types (negative, positive, neutral). As observed by our sentiment analysis, we re-categorized sentiments into three types, instead of five, to discover topics discussed on the social media platforms that attracted these sentiments. This per-sentiment topic modeling enables the understanding of people's reactions to certain topics.

Methods For topic modeling, we utilized the popular Latent Dirichlet Allocation (LDA) [36] method. We leverage the implementation of LDA in the Gensim package along with the Mallet [37] implementation to extract the topics discussed during the spike periods of various sentiment types. LDA with Mallet is an efficient technique to extract topics with sufficient topic segregation. Figure 3 shows a brief description of the LDA model. As shown in Figure  3 , the two corpus-level parameters (documents in corpus and words in the document) represent a Bayesian method for randomly sampling a mixture of topics for each document. Firstly, the LDA process goes through each document and assigns each word in each document (tweet or post) to one of the N topics (N is set by us) randomly. Then LDA checks each word in all documents again and calculates the proportion of words in each document and the proportion of assigning words to each topic. By doing this process repeatedly, LDA shifts words around topics to find the most suitable one. At the end of the process, good quality topics that are clear, meaningful, and segregated are extracted.

To improve the generation of topics, we have taken into consideration some key factors, such as the quality of the presented text, the number of topics per tweet or post, the total number of topics, the various parameters of the algorithm. For the quality of the presented text, we adopt a preprocessing stage to make the input content ready for the analysis. This preprocessing stage includes the removal of stopwords, emails, user mentions, extra spaces, URL links, and punctuations. It also includes tokenization, lemmatization, and the extraction of unigrams, bigrams, and trigrams. that generated 20 topics with a 0.344 coherence score. Similarly, Sharma et al. [13] identified 20 different topics using topic modeling that is based on compressed text classification.

Remarks We only apply topic modeling on Twitter and Instagram data during the spike periods shown in Figure  2 . Since the size of the data in those periods is different, we have different numbers of topics per spike period. The generated topics are then assigned into a set of predefined general categories, namely, Economy, Health, Social, Politics, and Tourism. Table 4 shows the results of our topic modeling technique. In the table, total topics with corresponding coherent scores, as well as the number of topics that belong to general categories, are given. It should be noted that a new category called "Fashion" is added to the table as our analysis on the Instagram dataset generated this topic.

Twitter Figure 4 shows the result of topic modeling on Twitter dataset. The figure is divided into two parts by the vertical line: the left part shows corresponding keywords that express topics, and the right part shows the distribution of each topic on the analyzed dataset (in percentage), as well as, the topics are grouped by three sentiment classes: negative, positive, and neutral. It is interesting to note that the rank distribution of topics is the same for all sentiment types. The figure shows that tweets about social-related topics are the most common among other categories, with about 50.4% of the total tweets. The most observed keywords of various topics describe the people's intent to notify and advise others with quarantine rules and ways to stay safe. On the other hand, the least discussed topic is about travel-related tweets, as shown for the negative and neutral groups with 0.2% and 2%, respectively. For the positive Figure 3 : Plate notation of Latent Dirichlet Allocation (LDA) to discover topics discussed on both social networks category, the least discussed topic is the politics category, with 0.4% of the total tweets. Our analysis shows that most tweets belong to the neutral type populating proportionally the topics observed by the topic modeling task. Tweets assigned to the economy-related topic attract more negative sentiments with 4.6% of the total tweets compared to 3.3% and 3.1% on the neutral and positive sentiments, respectively. This trend of people's reaction towards economy-related topics reflects their concerns about the impact of COVID-19 on the economy.

Instagram Figure 5 shows 

We started our geo-temporal analysis by identifying countries' mentions on posts from Instagram and Twitter to explore the geo-temporal patterns in our data collection. Afterward, we investigate people's reactions to topics discussed in those countries. Our study provides answers to the following questions: (1) what countries are mainly attached to the pandemic posts on social networks? (2) what are the top topics connected to the top-mentioned countries during the pandemic? (3) what are the geo-temporal trends and patterns observed across topics and localities during the pandemic on social networks?

This section provides analysis and methods of tracking countries in posts on social media platforms. This process aims to define the top-mentioned countries in our data collection for further geo-temporal analysis.

Methods To analyze and track countries mentioned on Twitter and Instagram, we utilize the NER technique implemented by Stanford CoreNLP [34] . Stanford NER tagger assigns named entities in texts to recognize people, values, dates, places, and organizations. Using the Stanford NER tagger, we determine country mentions on all tweets and posts in our datasets. Afterward, we sort countries in descending order based on the number of mentions to select the top-5 mentioned countries in our data collection.

Twitter We note that the adopted NER technique identifies not only country names but also cities and other places in a given input. However, we focus only on the mentions of countries to select the top mentioned countries. Based on our analysis, our dataset includes mentions of 180 countries, and China is the most mentioned country with 3,035,389 mentions. The next four top mentioned countries are the USA (695,131), India (541,851), the UK (521,290), and Italy (283,096). Figure 6 depicts the results of the number of mentions of countries in our dataset. Figure 7 . Figures 6 and 7 show that the top-mentioned countries on both social networks are almost the same, except for Spain and the UK. This supports the fact that these are the key countries in which a high number of coronavirus infections were reported within the corresponding periods [38] . To gain insights into people's reactions and attitudes towards topics discussed in these countries, we conducted a detailed analysis of the top-mentioned countries including: sentiment analysis, topic modeling, word2vec-based analysis, and locality analysis.

This section provides sentiment analysis on data records observed with the top-5 mentioned countries in our data collection. This analysis enables the understanding of attitudes and reactions towards topics related to these countries during the pandemic.

Twitter Figure 8 illustrates the sentiment analysis of the top-5 countries, i.e., China, USA, India, UK, and Italy, from the Twitter dataset. The negative reactions, sentiment, observed in tweets mentioning the top-5 countries are more obvious than other sentiments over time. Figure 8 (a) shows that China started with having more neutral tweets in January 2020. Interestingly, all countries except for China experienced an improvement in the growth of tweets in March, as shown in Figure 8 . The figure also shows that the USA, India, and the UK charts are similar with respect to Instagram Figure 9 illustrates the sentiment analysis of posts mentioning the top-5 countries, i.e., China, India, USA, Italy, and Spain, using our Instagram dataset. Figure 9 shows that there are few posts mentioning the countries on Instagram. Based on the analysis, there are fluctuations of reactions as shown in Figure 9 , therefore, there is no dominant sentiment type over time. Nevertheless, there are more occurrences of positive posts in the USA and Italy charts (see Figures 9 (c) and (d) ). 

To further analyze the data of the top-5 countries, we conducted a topic modeling task to identify the major topics discussed in social media related to these countries. Identifying country-specific topics enables understanding how the people of these countries perceive the pandemic and to what extent their perception differs across localities. To this end, we employed LDA to extract country-specific topics in the top-5 mentioned countries in our dataset. Using the default implementation and settings of LDA in the Gensim LDA MultiCore model [36] , we explored various LDA models to extract a different number of topics ranging from 2 to 30. For our analysis, we selected the best-performing models, i.e., the models with the highest coherence score, to extract and identify the topics. The results of our study show the emergence of a variety of topics, such as safety messages, work, travel, COVID-19-related news, government response, emergency funding, and prevention measures, which are similar findings from Sharma et al. [13] and Ordun et al. [11] .

Twitter Figure 10 depicts the result of the topic modeling task on the Twitter dataset. The results show that healthrelated topics are the most-discussed topics across countries, except for the UK, where most tweets were related to the social category. The health-related tweets mentioning China constitute 19.2% of the total tweets, indicating that the people's interest in China's health situation, including hospitals, virus spreads, number of cases, and how the Chinese handle the COVID-19 pandemic. Common concerns and trends in the health-related topics across all countries include COVID-19-related news, such as virus cases, the number of deaths, and virus spread. This high popularity of health-related topics indicates that people concern about the COVID-19 pandemic. In terms of the economy-related category, which constitutes 16.3% of the total tweets in our dataset for the top-5 countries, the trend was on the virus's economic aspects and implications. While our analysis highlighted more than four topics for tweets mentioning China, the USA, India, and the UK, only two topic categories were extracted from tweets mentioning Italy during the data collection period. Tweets mentioning Italy are related to health and social categories with 1.8% and 1.1% percentage of the total dataset, respectively. More specifically, tweets mentioning Italy focused on COVID-19 cases, deaths, and social aspects of life during the pandemic. Generally, the social-related tweets across different countries indicate the users' attempt to inform others about prevention measures, such as staying at home, caring, supporting, sharing, wearing a mask, etc.

Instagram The result of the topic modeling analysis on the Instagram dataset is displayed in Figure 11 . The dominant topic categories across countries include health, social, fashion, and tourism, distributed with the percentages 43.3%, 20.9%, 6.7%, and 29.2%, respectively. Health-related posts are prominent with mentions of countries such as China, the USA, India, and Italy (as shown in Figure 11 ). Similar to trends from the Twitter dataset, users express their concern about the number of virus cases spread across different countries. Moreover, Figure 11 shows that travel-related topics are the second most discussed topics in Instagram posts mentioning the top-5 countries with a total of 29.2% of the entire dataset.

Our exploration of social media topics has shown that people expressed interest in aspects and issues in specific countries more than others. In previous sections, we discussed the topics mentioning the top-5 countries in social media. This section aims to identify the temporal patterns of used terms when mentioning the top-5 countries. It is interesting to explore the frequent words associated with a specific country and whether such words persisted or shifted during the pandemic. To this end, we apply a word embedding technique to investigate the context of terms associated with the top-5 countries in each month and for the entire collection period.

We utilize the Word2Vec model that adopts a shallow neural network to embed words for the analysis. Word2vec has two main algorithms: a bag of words (CBOW) and a skip-gram (see Figure 12 ). The difference between those algorithms is that CBOW is used to predict a target word from a context, while skip-gram predicts a target context from a given word. For our implementation of the Word2Vec model, we used the Gensim library [36] with the default settings including hyperparameters such as window, size, sample, alpha, min-alpha, negative, and workers. However, we set the min_count hyperparameter to 20, which means that the words with occurrence less than 20 are ignored.

Word2Vec captures different degrees of similarity between words, and it uses vector arithmetic to reproduce syntactic and semantic patterns. Therefore, we use it to identify key terms and top countries on both social networks. Schild et al. [10] used a similar technique, where the authors investigated the spread of Sinophobic slurs in social networks based on specific keywords such as "china", "chinese", "virus" etc. The difference between the work of Schild et al. [10] , and ours is that we use word embeddings to explore the dynamics and temporal patterns of used words in the social network over time, considering their association with different countries.

The results of our analysis using Twitter and Instagram datasets are shown in Figures 13 and 14 , respectively. The figures show the words colored in black and orange. The black-colored words represent persistent words through the data collection period, and the orange-colored words represent trending words that shifted from January to June. Information about the most frequent words on both social networks is given in Tables 5 and 6 , respectively.

Twitter Figure 13 shows the color-coded words that persisted or shifted during the data collection period of Twitter posts mentioning different countries. We noticed a shift in using other words in association with different countries. For example, the term "wuhan_virus" was increasingly associated with mentioning "China" over time, as well as other terms, such as "originate_China" and "chinavirus", indicating the people's interest in communicating news about the virus as COVID-19 that was first documented in Wuhan, China. Other terms, such as "silence_whistleblower", has shown a trend over time in tweets mentioning China. The phrase "silence_whistleblower" was in reference to the doctor who revealed the COVID-19 in China, e.g., one tweet says: "Sad: Chinese doctor who worked with late whistleblower dead from #coronavirus". Our analysis also highlights terms that were associated with the USA, including terms that are not related to coronavirus but to events that occurred during the pandemic. For example, the terms "civil_unrest" and "civil_war" were associated with the death of George Floyd on May 25, 2020, and its consequences nationwide. Another trend was captured for the phrase "million_unemployed" which was more likely related to an article published in The New York Times [39] pointing out the unemployment numbers in the USA, specifically, mentioning that 51 million Americans were unemployed since the start of the COVID-19 pandemic. The following tweet is an example for this case: "Coronavirus: 3.3 million more unemployed in a week in the United States, a historic record . . . ". Similar coronavirus-related trends were also obvious in association with other countries, such as India, the UK, and Italy. Figure 13 : (Twitter) Words that are related to top countries. Black words represent words that remained unchanged between January and June. Orange words represent words that shifted towards the countries during the same period.

Instagram The result of word representations and trends using the Instagram posts is shown in Figure 14 . The results show that some terms such as, "smile", "love", "instagood", "art", "follow", etc. remain unchanged within the data collection period. However, some other terms were shifted, such as "travel", "help", "stay_safe" across different countries. We also observed the trends for using terms related to coronavirus over time. Accordingly, this means that people started publishing posts related to COVID-19 indicating their attention to the pandemic's spreading. Some of such posts show that Instagram users were trying to compare past diseases, e.g., severe acute respiratory syndrome (SARS), with COVID-19. This can be the reason for observing the keyword "sars" in association with China, especially referencing the beginning of the SARS outbreak in China (November 2002, Xu et al. [40] ).

To study users' localities interacting on social media through posts mentioning the top-mentioned countries, we further investigated the posts and locations of users showing interest in different topics associated with these countries. Indeed, Figure 14 : (Instagram) Words that are related to top countries. Black colored words represent words that remained unchanged between January and March. Orange colored words represent words that shifted towards the countries during the same period. Taking those number of countries into account (the numbers do not include the count of top countries), it can be said that China was mentioned by users from more places than other countries, while the USA is the second highest one. Since our results and analysis showed that the most discussed topics are related to the health category, we can deduce that most countries discussed the health situations in China. Even though a smaller number of countries mentions Italy, people had discussions only about its health and social causes. It should be underlined that people from 50 different locations focused on the UK's social parts while the health category is dominant for other top countries.

Instagram Figure 16 depicts the results of our locality analysis of posts associated with the top countries. The figure shows that there are less localities, where people mentioned the top countries in comparison to Twitter data. For example, Spain is not mentioned by any user from other countries, while China, the USA, and India have only four separate external locations where people published posts about them. Italy is discussed by only two other countries, i.e., Brazil and the USA. Users from Turkey, the UK, Rwanda, the USA, and China paid attention to topics that are related to travel. Health-related topics from the USA were the main point of interest to users from South Korea, France, Turkey, Rwanda, and the USA.

First, we identified people's overall reactions that could be deduced from sentiment analysis. We found that users of both social networks published more neutral tweets and posts, but the number of negative tweets is also significantly high. Second, we implemented topic modeling on the datasets with distinct sentiment types (negative, neutral, positive) on specific periods that led to spikes. Third, we performed the country analysis to observe countries in discussion and found that China, the USA, India, the UK, and Italy are the countries that attracted more Twitter users' attention. Interestingly, those countries, except for the UK, are the most discussed countries together with Spain by Instagram users. Fourth, by displaying each country's sentiment analysis on both social media, we detected spikes on specific time periods and identified what topics led to those spikes. It can be inferred that topics related to economy, politics, health, social, and tourism are the reasons for those spikes on Twitter. In contrast, health, fashion, social, and tourism categories are superior on Instagram. We can conclude that on both social media platforms, health topics are governing over other categories based on the resultant Figures 10 and 11 . Fifth, we detected the words that remained or shifted towards top countries by carrying out the word2vec analysis. The results showed that COVID-19 affected top countries, Figure 16 : Countries where users published posts about China, India, the USA, Italy, and Spain on Instagram and those countries experienced a shift of meaning. Thus, terms related to coronavirus are linked with the country names of Twitter and Instagram. Sixth, we measured the impact areas of top countries by identifying the location of publishing retweets and posts. According to our analysis results, users of 66 countries (Twitter) along with another four countries (Instagram) focused on China more than any other country. In conclusion, as our analysis provides a detailed insight for COVID-19 datasets of both social networks, this paper can be used to comprehend people's reactions and their attention.

Waleed Alasmary, and Abdulaziz Alashaikh. COVID-19 open source data sets: a comprehensive survey

A gradient boosting machine learning approach in modeling the impact of temperature and humidity on the transmission rate of covid-19 in india

SEIAQRDT model for the spread of novel coronavirus (COVID-19): A case study in India. Applied Intelligence

Kalman filter based short term prediction model for COVID-19 spread

Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence

OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19. Applied Intelligence

COVIDetectioNet: COVID-19 diagnosis system based on X-ray images using features selected from pre-learned deep features ensemble. Applied Intelligence

Stacked-autoencoder-based model for COVID-19 diagnosis on CT images. Applied Intelligence

Deep neural network to detect COVID-19: one architecture for both CT Scans and Chest X-rays. Applied Intelligence

Go eat a bat

Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs

Analyzing COVID-19 on online social media: Trends, sentiments and emotions. CoRR, abs

COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations

Using twitter and web news mining to predict COVID-19 outbreak

A large-scale COVID-19 twitter chatter dataset for open scientific research -An international collaboration

Infodemic": Leveraging High-Volume Twitter Data to Understand Early Public Sentiment for the Coronavirus Disease

A first look at COVID-19 information and misinformation sharing on twitter

Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset

The COVID-19 social media infodemic

Coronavirus Goes Viral: Quantifying the COVID-19 Misinformation Epidemic on Twitter

How the world's collective attention is being paid to a pandemic: COVID-19 related 1-gram time series for 24 languages on twitter

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

Prevalence of low-credibility information on twitter during the COVID-19 outbreak. CoRR, abs

CoronaVis: A Real-time COVID-19 Tweets Data Analyzer and Data Repository

Hydrator [Computer Software

Twint -twitter intelligence tool

A First Instagram Dataset on COVID-19

Robust hybrid deep learning models for alzheimer's progression detection. Knowledge-Based Systems

Intensive care unit mortality prediction: An improved patient-specific stacking ensemble model

Autosen: Deep-learning-based implicit continuous authentication using smartphone sensors

Multi-χ: Identifying multiple authors from source code files

The Stanford CoreNLP Natural Language Processing Toolkit. Aclweb.Org

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Software Framework for Topic Modelling with Large Corpora

MALLET: A Machine Learning for Language Toolkit

World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard

About 30 Million Workers Are Collecting Jobless Benefits

Epidemiologic clues to SARS origin in China

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2021R1A2C1011198).