key: cord-1032308-inbx7p9v authors: Zou, C.; Wang, X.; Xie, Z.; Li, D. title: Public Reactions towards the COVID-19 Pandemic on Twitter in the United Kingdom and the United States date: 2020-07-28 journal: medRxiv : the preprint server for health sciences DOI: 10.1101/2020.07.25.20162024 sha: 802eb24f78f43e2fa7c9ea01a2be5652c73288cb doc_id: 1032308 cord_uid: inbx7p9v Background: The coronavirus disease 2019 (COVID-19) has spread globally since December 2019. Twitter is a popular social media platform with active discussions about the COVID-19 pandemic. The public reactions on Twitter about the COVID-19 pandemic in different countries have not been studied. This study aims to compare the public reactions towards the COVID-19 pandemic between the United Kingdom and the United States from March 6, 2020 to April 2, 2020. Data: The numbers of confirmed COVID-19 cases in the United Kingdom and the United States were obtained from the 1Point3Acres website. Twitter data were collected using COVID-19 related keywords from March 6, 2020 to April 2, 2020. Methods: Temporal analyses were performed on COVID-19 related Twitter posts (tweets) during the study period to show daily trends and hourly trends. The sentiment scores of the tweets on COVID-19 were analyzed and associated with the policy announcements and the number of confirmed COVID-19 cases. Topic modeling was conducted to identify related topics discussed with COVID-19 in the United Kingdom and the United States. Results: The number of daily new confirmed COVID-19 cases in the United Kingdom was significantly lower than that in the United States during our study period. There were 3,556,442 COVID-19 tweets in the United Kingdom and 16,280,065 tweets in the United States during the study period. The number of COVID-19 tweets per 10,000 Twitter users in the United Kingdom was lower than that in the United States. The sentiment scores of COVID-19 tweets in the United Kingdom were less negative than those in the United States. The topics discussed in COVID-19 tweets in the United Kingdom were mostly about the gratitude to government and health workers, while the topics in the United States were mostly about the global COVID-19 pandemic situation. Conclusion: Our study showed correlations between the public reactions towards the COVID-19 pandemic on Twitter and the confirmed COVID-19 cases as well as the policies related to the COVID-19 pandemic in the United Kingdom and the United States. A novel coronavirus disease, known as COVID-19, was identified in China in December 2019 [1] . This virus can transmit from person to person, and the most common symptoms include fever, dry cough, and tiredness [2]. On March 11, 2020, the World Health Organization declared the outbreak a pandemic [3] . As of June 16, States had the first confirmed cases on January 22, 2020, and the United Kingdom had the first confirmed cases on January 31, 2020 [4]. These two countries had their first COVID-19 case in late January, but by June 14, the United States (2,141,057 COVID-19 cases) had around 7 times confirmed cases of that in the United Kingdom (298,139 cases) [4] . Thus, it is important to examine the differences in public reactions to this COVID-19 pandemic between these two countries. Since the outbreak of H1N1 (a novel influenza A virus) in 2009, the internet has been the most frequently used source of information for the public to learn about the pandemic [5] . Social media platforms provide unique resources where people can access and share information. Twitter is a social media platform with 330 million monthly active users, where the users can post brief (<280 characters) text messages known as "tweets" [6] . Mining the data on Twitter can help us understand the public's opinions and behavioral responses to the COVID-19 pandemic. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07. 25.20162024 doi: medRxiv preprint Most of social media studies on COVID-19 focused on one particular country [7] , misinformation [8] - [10] , or mental health [11] . A cross-region comparison study analyzed the Twitter contents about COVID-19 of world leaders in G7 countries, but they did not evaluate its association with the public [12] . To our best knowledge, attempts to discover public reactions to COVID-19 on social media across countries are still lacking. In this study, statistical analysis and text mining techniques were applied to the COVID-19 related tweets to understand how Twitter users responded to the COVID-19 pandemic in the United Kingdom and the United States. Our findings suggest that Twitter users in different countries have different reactions towards the COVID-19 pandemic, which may be related to the severity of COVID-19 pandemic and related policies/news in different countries. The number of confirmed COVID-19 cases in the United Kingdom and the United States from March 6, 2020 to April 2, 2020 were obtained from the 1Point3Acres website [4]. The Twitter dataset used for this study was generated by a crawler using the Twitter streaming API from March 6, 2020 to April 2, 2020 [13] . A set of keywords, "CORONA", "corona", "COVID19", "covid19", "covid", "coronavirus","Coronavirus", "CoronaVirus", and "NCOV", was used to collect COVID-19 related tweets. The total number of COVID-19 tweets in the dataset was 155,028,779. Using a Python script, the tweets without COVID-19 related keywords in the text were filtered out. There is a beer named Corona Extra produced by Cervecería Modelo. The tweets mentioned corona as beer had been filtered out. Another set of keywords, "dealer", "deal", "supply", "beer", "drink", "drank", "drunk", "store", "promo", "promotion", "customer", "discount", "sale", "free shipping", "sell", "$", "%", "dollar", All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07.25.20162024 doi: medRxiv preprint "offer", "percent off", "save", "price", "wholesale", was used to filter out unrelated and promotion or commercial tweets. After filtering out tweets irrelevant to COVID-19, the dataset contained 85,953,249 tweets. The names of the country, state, and cities with the top 50 population, such as "New York, New York" for New York City, and "California" for California state, were used to filter tweets in the United States. The names of the country such as England, A temporal analysis was performed to investigate the daily and hourly change in the number of COVID-19 tweets during the study period. The temporal trend of the number of tweets per 10,000 Twitter users during the study period was calculated by the daily number of tweets normalized by the number of users in the region. We also performed the temporal hourly trend of the COVID-19 tweets by calculating the All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. was used to calculate the sentiment score for each tweet [14] . The word "virus" was removed from the text before sentiment analysis because this study focused on public attitudes about the virus and this word itself could influence the sentiment. To analyze the sentiment during the study period, VADER was applied to the dataset, which calculated the daily average sentiment scores of COVID-19 tweets in the United Kingdom and the United States. To analyze the sentiment scores in a 24-hour period, VADER calculated the hourly average sentiment scores of all COVID-19 tweets. A tool for Latent Dirichlet Allocation (LDA) topic models, Gensim parallelized LDA, was applied to identify the COVID-19 related topics on Twitter in the United Kingdom and the United States [15] . Due to the time-consuming training process of LDA models, 30% of tweets were randomly sampled from the datasets. Before the model was built, the text was split into sentences and further split into words. All characters were converted to the lowercase to ensure consistency. Unrelated characters, including emails, newline, extra spaces, distracting single quotes, and URLs, were removed from the data. Stopwords from the Natural Language ToolKit and the words "virus", "corona", "coronavirus", "ncov", "covid19" and "covid" were also removed due to unrelatedness [16] . Words were lemmatized and stemmed to their root forms using spaCy because different forms of the same words have the same meaning in this All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07.25.20162024 doi: medRxiv preprint study [17] . LDA is unsupervised, and prior to running the model, the number of topics exits in the corpus is unknown. Models of different numbers of topics were built and the model with the best performance was chosen to represent the data. Intertopic distance map using pyLDAVis and topic coherence using Genism were used to evaluate the model. In the intertopic distance map, topics were plotted as circles in the two-dimensional plane whose centers are determined by the computed distance between topics [18] . If two circles overlapped each other, the corpus could have been modeled with fewer topics. The topic coherence score measures a single topic by measuring the degree of semantic similarity between high scoring words in the topic. A higher topic coherence score indicates a better model. Through an iterative process, the United Kingdom data had the highest coherence score with 10 topics, and the United States data had the highest coherence score with 12 topics. Kingdom and the United States All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. The number of COVID-19 tweets in the United Kingdom and the United States had different patterns in a 24-hour period ( Figure 4A ). In the United Kingdom, the number of COVID-19 tweets increased from 5 AM until 9 AM, and then slowly dropped. In All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. States, which indicates that COVID-19 was a less popular topic on Twitter in the All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. real-world news events [24] . In this study, we demonstrated that the number of COVID-19 related tweets and their sentiment scores were correlated with the significant events about COVID-19. In this study, we showed that the sentiment scores of COVID-19 related tweets had a diurnal pattern. Dzogang et al. showed that two independent factors could explain 85% of the variance across tweets' 24-hour profiles in the United Kingdom, one starting at 5 AM/6 AM correlates with positive emotions and another starting at 3 AM/4 AM correlates with negative affect and social concerns [25] . Consistently, our study showed that the sentiment scores of COVID-19 related tweets in the United Kingdom dropped to the bottom at 3 AM and started to increase rapidly from 5 AM. In addition, we showed that the number of COVID-19 tweets in the United Kingdom had a diurnal pattern, in which the number of COVID-19 tweets was the highest at 9 AM and the lowest at 3 AM. The reasons behind this observation need further investigation. As for the United States, Golder et al. showed that there was a morning rise and nighttime peak in positive affect, and a sharp drop in negative affect during the overnight hours on Twitter [26] . We found a similar pattern in the number of COVID-19 tweets, All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07.25.20162024 doi: medRxiv preprint 1 5 but the sentiment of the COVID-19 tweets did not have an obvious pattern in the United States compared to the United Kingdom. This is probably a consequence of emotional numbness when the United States had much more COVID-19 cases than the United Kingdom. There were several limitations in our study. First, although Twitter is one of the most popular social media platforms [27], the Twitter users may not represent the whole population. Second, our study only focused on Twitter data from March 6 to April 2, 2020. COVID-19 is an ongoing pandemic and public reactions towards COVID-19 might evolve after April 2, 2020. Third, the geographic information used in our study might have some biases as the user geolocation in their profile could be inaccurate. By analyzing COVID-19 related tweets from March 6 to April 2, 2020 in the United Kingdom and the United States, we showed the differences in the public attitudes towards COVID-19 in different countries in a timely manner, which might correlate with the number of COVID-19 cases and some important policies/news related to COVID-19. Our study provides some evidence about the correlation between the severity of COVID-19 pandemic and the public attitudes towards COVID-19, especially how different policies from different countries affect the public attitudes towards the COVID-19 pandemic. WHO | Novel Coronavirus -China Coronavirus What is pandemic? Why did WHO just declare one? -The Washington Post Early Assessment of Anxiety and Behavioral Response to Novel Swine-Origin Influenza A(H1N1) Q1-2019-Slide-Presentation.pdf Conversations and Medical News Frames on Twitter: Infodemiological Study on COVID-19 in South Korea Coronavirus Goes Viral: Quantifying the COVID-19 COVID-19 and the 5G Conspiracy Theory: Social Network Analysis of Twitter Data All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder Dataset on dynamics of Coronavirus on Twitter Mental health problems and social media exposure during COVID-19 outbreak World leaders' usage of Twitter in response to the COVID-19 pandemic: a content analysis Overview gensim: topic modelling for humans Natural Language Toolkit -NLTK 3.5 documentation spaCy · Industrial-strength Natural Language Processing in Python Rishi Sunak promises to guarantee £330bn loans to business A Guide to State Coronavirus Reopenings and Lockdowns -WSJ No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted Coronavirus: Prince Charles tests positive but 'remains in good health' -BBC News Coronavirus: Prime Minister Boris Johnson tests positive -BBC News Event extraction using behaviors of sentiment signals and burst structure in social media Diurnal variations of psychometric indicators in Twitter content Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted The project described in this publication was supported by the University of Rochester Clinical and Translational Science Award UL1 TR002001 from the National Center for Advancing Translational Sciences of the National Institutes of Health (DL). All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07.25.20162024 doi: medRxiv preprint 1 6All rights reserved. No reuse allowed without permission.(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint this version posted July 28, 2020. . https://doi.org/10.1101/2020.07.25.20162024 doi: medRxiv preprint 1 7