key: cord-0154574-1ffo9eoq authors: Wahab, Omar Abdel; Mustafa, Ali; Bamatakina, Andr'e Bertrand Abisseck title: Trends, Politics, Sentiments, and Misinformation: Understanding People's Reactions to COVID-19 During its Early Stages date: 2021-06-25 journal: nan DOI: nan sha: 193af9d93b7da7b7a322791da48be9be1f72ae24 doc_id: 154574 cord_uid: 1ffo9eoq The sudden outbreak of COVID-19 resulted in large volumes of data shared on different social media platforms. Analyzing and visualizing these data is doubtlessly essential to having a deep understanding of the pandemic's impacts on people's lives and their reactions to them. In this work, we conduct a large-scale spatiotemporal data analytic study to understand peoples' reactions to the COVID-19 pandemic during its early stages. In particular, we analyze a JSON-based dataset that is collected from news/messages/boards/blogs in English about COVID-19 over a period of 4 months, for a total of 5.2M posts. The data are collected from December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK. Our study aims mainly to understand which implications of COVID-19 have interested social media users the most and how did they vary over time, the spatiotemporal distribution of misinformation, and the public opinion toward public figures during the pandemic. Our results can be used by many parties (e.g., governments, psychologists, etc.) to make more informative decisions, taking into account the actual interests and opinions of the people. The abrupt outbreak of COVID-19 has created a global crisis that had affected not only our physical health but also our mental health and way of living [14, 9] . As a result of the pandemic, social media usage has undeniably gone up. In fact, the stay-at-home orders that followed the rapid outbreak of COVID-19 have pushed us to rely more and more on the Internet, not only for entertainment purposes but also to work from home, to pursue our education virtually, and to catch up with family and friends. Moreover, as shopping centres, stores and restaurants closed their doors for months, most of our shopping activities shifted to online. For example, a survey led by the leading media and research organization Digital Commerce 360 over 4, 500 Influenster (product discovery and reviews platform for consumers) community members in North America reported that social media consumption increased up to 72% and that the posting activities went up to 43% during pandemic times 1 . Add to this the fact that social media had been the number one communication platform for health professionals, governments, universities and organizations to deliver pandemic-related information to the public [4, 7] . Thus, it is undeniable that the pandemic and the subsequent nationwide lockdowns have entailed a second to none surge in social media usage across the World. Consequently, it becomes crucial to perform a deep social media analysis to extract useful insights about the COVID-19 pandemic and peoples' reactions to it. Given that the traditional survey methods are time-consuming and expensive to conduct [16] , there is a doubtless need for proactive and timely data analytic studies to understand and respond to the speedily emerging effects of the pandemic on our physical and mental health. Several social media analysis studies [11, 6, 3, 15, 10, 1, 12] have been lately conducted in an attempt to understand the impacts of the COVID-19 on people's lives and attitudes. Our work aims to complement these studies by providing a large-scale spatiotemporal on peoples' reactions to the COVID-19 pandemic during its early stages. Our work differs from these studies from two perspectives: (1) unlike most of these studies which capitalize on twitter data, we analyze in this work data collected from many social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK, some of which haven't been included in earlier studies; (2) we focus our study on the first four months of the pandemic in an attempt to understand the evolution of people's reactions and opinions regarding the pandemic over time. We conduct a large-scale study on a dataset [5] that contains 5.2M posts collected from news/message/boards/blogs about COVID-19 over a period of 4 months (December 2019 to March 2020). The goal is to understand how people reacted to the COVID-19 pandemic during its early stages and how the pandemic affected peoples' opinions on several matters. To attain this goal, we formulate the following specific research questions that we aim to answer through our study: 1. How did the number of COVID-19-related postings evolve on social media over time? 2. Which online Web sites were the most consulted by social media users to get updates on COVID-19? 3. How did the interests of social media users in the ramifications of COVID-19 vary across the first four months of the pandemic? 4. How did the spread of illegitimate information evolve over time? 5. What countries were the most targeted by the posts shared on social media? 6. What are the temporal and geographic distributions of illegitimate information? 7. Which public figures were the most mentioned on social media? 8. What were the public sentiments toward the most mentioned public figures? In Section 2, we review the related studies and highlight the unique contributions of this work. In Section 3, we first describe the environment and tools used to conduct our study and then present and discuss the results. Finally, in Section 4, we summarize the main findings of the paper. In [11] , the authors aim to examine the volume, content, and geo-spatial distribution of tweets related to telehealth during the COVID-19 pandemic. To do so, public data on telehealth in the United States collected from Twitter for the period of March 30, 2020 to April 6, 2020 have been used. The tweets were analyzed using a mixture of cluster analysis and natural language processing techniques. The study suggests the importance of social media in promoting telehealth-favoring policies to counter mental problems in highly affected areas. In [6] , the authors aim to study the impact, advantages and limitation of using social networks during the COVID-19 pandemic. The authors concluded that social media is important to foster the dissemination of important information, diagnostics, treatments and follow-up protocols. However, according to the authors also, social media can also be negatively used to spread fake data, pessimist information and myths which could contribute in increasing the depression and anxiety among people. In [3] , the authors perform a large-scale analysis of COVID-19-related data shared on Instagram, Twitter, Reddit, YouTube and Gab. Particularly, they investigate the engagement of social media users with COVID-19 and provide a comparative evaluation on the evolution of the discourse on each social media platform. The main finding of the article is that the interaction patterns of each social media along with the distinctiveness of each platform's audience is a crucial factor in information and misinformation spreading. In [15] , the authors investigate the propagation, authors and content of false information related to COVID-19. To do so, they gathered 1500 fact-checked tweets associated with COVID-19 for the period of January to mid-July 2020, of which 1274 are false and 226 are partially false. The study suggests that (1) verified twitter accounts including those of organisations and celebrities contributed in generating or propagation misinformation; (2) tweets with false information often tend to defame legit information on social media and (3) authors of false information use less cautious language, seeking to harm others. In [10] , the authors develop a Web application, called CoVerifi, to asses the credibility of COVID-19-related news. CoVerifi integrates machine learning [17] and human feedback to evaluate the credibility of the news. It first enables users to give a vote on the content of the news, resulting in a labelled dataset. A Bidirectional Long Short-Term Memory (LSTM) machine learning model is then trained on this dataset to predict future false information. In [1] , the authors propose a Markov-inspired computational method to characterize topics in tweets within an specific period in Brazil. The proposed solution seeks to address the abuse of social media from three perspectives, which are: (1) providing a better understanding of the fact-checking actions during the pandemic; (2) studying the contradictions between the pandemic and political agendas and (3) detecting false information. In [12] , the authors conduct a large-scale study on Twitter-generated data that spans over a period of two months. The study concludes that social media have been used to delude users and reorient them to extraneous topics and promote wrongful medical measures and information. On the bright side, the authors noted the importance of credible social media users of different roles (e.g., influencers, content developers, etc.) in the battle against the COVID-19 pandemic. The unique contributions of our work compared to these studies are two-fold: (1) while most studies base their analysis on data collected from twitter, we capitalize in this work on data collected from various social media mediums, i.e., Facebook, LinkedIn, Pinterest, StumbleUpon and VK, where some of these mediums are not considered in the previous studies; (2) our study is based on the first four consecutive months of the pandemic with the goal of understanding the evolution of people's reactions and opinions on matters related the pandemic over time. Thus, the insights extracted from our study are original, given the social media platforms and time interval we consider. Our analysis is done on a JSON-based dataset [5] which is collected from news/message boards/blogs about COVID-19 over a period of 4 month, for a total of 5.2M posts. The time frame of the data is December 2019 to March 2020. The posts are in English mentioning at least one of the following: "Covid", "Coron-aVirus" or "Corona Virus". To analyze the dataset, we employ the MongoDB 2 document-oriented, distributed, JSON-based database platform. More specifically, we write the code in the form of MapReduce queries in MongoDB, which helps us analyze large volumes of data in a distributed way and generate useful aggregated results. We explain hereafter the results of our analysis in terms of number of posts related to COVID-19 over time; number of published news per Web site, per month; geographic distribution of shared news; geographic and temporal trends in fake news; and opinions about public figures. We study in Fig. 1 the evolution of the number of COVID-19-related posts on the different studied social media mediums during the first four months of the pandemic. We notice from the figure that the number of posts grew exponentially from December 2019 to March 2020. This might be justified by the fact In Fig. 2 , we give a breakdown of the Web sites that were cited on the social media platforms as sources of information for the months of December 2019 (Fig. 2a) , January 2020 (Fig. 2b) , February 2020 (Fig. 2c) and March 2020 (Fig. 2d) . By observing Fig. 2a , we notice that the medical Really Simple Syndication (RSS) feed provider was the most cited Web site in December 2019 with a big gap vis-à-vis the other Web sites. This indicates that at that period of the pandemic, the people were mostly interested in learning more about this new generation of viruses from a medical perspective. As for January 2020, we notice by observ- ing Fig. 2b that the most cited Web site was MarketScreener followed by BNN Bloomberg. Knowing that MarketScreener is a company that operates as an international stock market and financial news Website and that BNN Bloomberg is Canada's Business News Network reporting on finance and markets, we conclude that in the second month of the pandemic, people were more interested in knowing the impacts of the pandemic on the local and global financial markets. On the other hand, we notice from Figures 2c and 2d that the trend started to change in February and March 2020 where the most cited Web sites become those that are news-oriented such as Fox News, Yahoo News and The Guardian. This indicates that in this period of time, people started to consult more new-related sites to get news on the emergency measures adopted by the governments and the impacts of the pandemic on the political situation such as the 2020 United States presidential election [2, 8] . (b) January 2020 Figure 3 : Starting from January 2020, the US accounts for more than the half of the news shared on social media. We measure in Fig. 3d the geographic distribution of the news shared on social media per month, across the four studied months. Starting with December 2019 (Fig. 3a) , we notice that in this month, Germany accounted for 38% of the news, followed by the United States with a percentage of 29%, Hong Kong with a percentage of 9%, Canada with a percentage of 8%, United Kingdom and Ireland with a percentage of 4%, Switzerland with a percentage of 3.5%, South Korea with a percentage of 0.025%, and Luxembourg and France with a percentage of 0.01%. In January 2020 (Fig. 3b) , we notice that the US accounted for more than the half of the news with a percentage of 53% followed by the United Kingdom with a percentage of 9%, with a big noticeable percentage gap between the two countries. We also notice that some new countries started to appear in the shared news such as India, Australia, Singapore, the Philippines and South Africa. In February 2020 (Fig. 3c) and March 2020 (Fig. 3c) , the geographic distribution status quo remains almost the same with the US being in the lead with a percentage of 63% in February 2020 and a percentage of 62% in March 2020. In Fig. 4 , we study the evolution of fake news spread across the first four months of the pandemic. The news are classified into three categories, i.e., legitimate, probably legitimate and fake. In the dataset, each shared news is associated with a spam score in the interval [0, 1]. A Spam Score quantifies the percentage of news with similar features to news that were already classified as illegitimate. To classify the news, we adopt the method proposed by Link Explorer 4 which is based on the following criteria: • News with a spam score between 1% and 30% are considered legitimate. • News with a spam score between 31% and 60% are considered to be probably legitimate. • News with a spam score between 61% and 100% are considered illegitimate. By carefully looking at Fig. 4 , we notice that the spread of fake news has considerably increased over time. From a percentage of 2% in December 2019 to a percentage of 13% in March 2020. Thus, we conclude that the amount of fake news has increased six times in a period of four months. In Fig. 5 , we study the geographic distribution of fake news. By observing the Figure, we notice that 70% of the fake news came from the United States, followed by 9% from India, 5% from the United Kingdom, 5% from Australia, 4% from the Philippines, 4% from Canada and 3% from China. It is worth mentioning that Finally, we identify in Fig. 6 the public figures that were the most mentioned on social during the first fourth months of the pandemic and provide a detailed breakdown of the overall sentiment of the public towards them. Specifically, the top eight most mentioned public figures on the considered social media platforms in that period were: Justin Trudeau (Prime Minister of Canada), Narenda Modi (Prime Minister of India), Joe Biden (Presidential Candidate at the United States elections during the analyzed period), Andrew Cuomo (New York's Governor), Bernie Sanders (United States Senator), Mike Pence (Vice President of the United States during the analyzed period), Boris Johnson (Prime Minister of the United Kingdom during the analyzed period) and Donald Trump (President of the United States during the analyzed period). To perform the sentiment analysis, we use the AFINN lexicon [13] which records over 3, 300+ words with a polarity score (i.e., positive, negative or neutral) associated with each word. Starting with Justin Trudeau, 89% of the authors were neutral about him, 9% had a negative feeling and 2% had a positive feeling. Moving to Narenda Modi, 86% of the authors were neutral about him, 12% had a negative feeling and 2% had a positive feeling. As for Joe Biden, 71% of the authors were neutral about him, 20% had a negative feeling and 9% had a positive feeling. Concerning Andrew Cuomo, 86% of the authors were neutral about him, 12% had a negative feeling and 2% had a positive feeling. Concerning Bernie Sanders, 83% of the authors were neutral about him, 15% had a negative feeling and 2% had a positive feeling. As for Mike Pence, 85% of the authors were neutral about him, 13% had a negative feeling and 2% had a positive feeling. Concerning Boris Johnson, 89% of the authors were neutral about him, 9% had a negative feeling and 2% had a positive feeling. Moving to Donald Trump, 77% of the authors were neutral about him, 18% had a negative feeling and 5% had a positive feeling. Overall, we conclude from this Figure that the most controversial (having higher positive and negative sentiments toward them) personages were Joe Biden and Donald Trump who were in a fierce competition for the 2020 United States presidential election. This also hints that the COVID-19 pandemic had an effect on the people's general opinion regarding candidates in the 2020 United States presidential election. We analyze in this work a dataset that contains news/message/boards/blogs in English about COVID-19 for the period December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK. Our results suggest that (1) the number of posts related to COVID-19 increased exponentially from December 2019 to March 2020; (2) interests of social media users changed from being health-oriented in December 2019 to being economics-oriented in January 2020, and news-oriented in February and March 2020; (3) the amount of fake news increased six times from December 2019 to March 2020; (4) most of the news, including the illegitimate ones, originated from the United States; (5) people mostly had a neutral sentiment toward public figures with negative sentiments prevailing positive ones; (6) the most controversial pub-lic figures with more positive and negative sentiments in the studied period were Joe Biden and Donald Trump. This work is partially funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant number RGPIN-2020-04707 and by the Université du Québec en Outaouais (UQO). Fake news agenda in the era of covid-19: Identifying trends through fact-checking content Covid-19 misinformation and the 2020 us presidential election. The Harvard Kennedy School Misinformation Review The covid-19 social media infodemic Provision of pandemic disease information by health sciences librarians: a multisite comparative case series Free dataset from news/message boards/blogs about coronavirus (4 month of data -5 Social media influence in the covid-19 pandemic Role of social media in covid-19 pandemic When is it democratic to postpone an election? elections during natural disasters, covid-19, and emergency situations Covid-19 pandemic: Lessons learned and future directions Coverifi: A covid-19 news verification system Social media data analytics on telehealth during the covid-19 pandemic Critical impact of social networks infodemic on defeating coronavirus covid-19 pandemic: Twitter-based study and research directions Mental health and psychosocial considerations during the COVID-19 outbreak An exploratory study of covid-19 misinformation on twitter Social media insights into us mental health during the covid-19 pandemic: Longitudinal analysis of twitter data Federated machine learning: Survey, multi-level classification, desirable criteria and future directions in communication and networking systems