key: cord-0939566-6w2jxbkv authors: Tsao, Shu-Feng; Chen, Helen; Tisseverasinghe, Therese; Yang, Yang; Li, Lianghua; Butt, Zahid A title: What social media told us in the time of COVID-19: a scoping review date: 2021-01-28 journal: Lancet Digit Health DOI: 10.1016/s2589-7500(20)30315-0 sha: 4ef8b319b9119b59b40b7743837793b055b9c776 doc_id: 939566 cord_uid: 6w2jxbkv With the onset of the COVID-19 pandemic, social media has rapidly become a crucial communication tool for information generation, dissemination, and consumption. In this scoping review, we selected and examined peer-reviewed empirical studies relating to COVID-19 and social media during the first outbreak from November, 2019, to November, 2020. From an analysis of 81 studies, we identified five overarching public health themes concerning the role of online social media platforms and COVID-19. These themes focused on: surveying public attitudes, identifying infodemics, assessing mental health, detecting or predicting COVID-19 cases, analysing government responses to the pandemic, and evaluating quality of health information in prevention education videos. Furthermore, our Review emphasises the paucity of studies on the application of machine learning on data from COVID-19-related social media and a scarcity of studies documenting real-time surveillance that was developed with data from social media on COVID-19. For COVID-19, social media can have a crucial role in disseminating health information and tackling infodemics and misinformation. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and the resulting COVID-19, is a substantial international public health issue. As of Jan 18, 2021, an estimated 95 million people worldwide had been infected with the virus, with about 2 million deaths. 1 As a consequence of the pandemic, social media is becoming the platform of choice for public opinions, perceptions, and atti tudes towards various events or public health policies regarding COVID-19. 2 Social media has become a pivotal communication tool for governments, organisations, and universities to dissem inate crucial information to the public. Numerous studies have already used social media data to help to identify and detect outbreaks of infectious diseases and to interpret public attitudes, behaviours, and perceptions. [3] [4] [5] [6] Social media, particularly Twitter, can be used to explore multiple facets of public health research. A systematic review identified six categories of Twitter use for health research, namely content analysis, surveillance, engage ment, recruitment, as part of an intervention, and network analysis of Twitter users. 5 However, this review included only broader research terms, such as health, medicine, or disease, by use of Twitter data and did not focus on specific disease topics, such as COVID-19. Another article analysed tweets on COVID-19 and identified 12 topics that were categorised into four main themes: the origin, source, effects on individuals and countries, and methods of decreasing the spread of SARS-CoV-2. 7 In this study, data were not available for tweets that were related to COVID-19 before February, 2020, thereby missing the initial part of the epidemic, and the data for tweets were limited to between Feb 2 and March 15, 2020. Social media can also be effectively used to communicate health information to the general public during a pandemic. Emerging infectious diseases, such as COVID-19, almost always result in increased usage and consumption of media of all forms by the general public for information. 8 Therefore, social media has a crucial role in people's perception of disease exposure, resultant decision making, and risk behaviours. 9, 10 As information on social media is generated by users, such information can be subjective or inaccurate, and frequently includes misinformation and conspiracy theories. 11 Hence, it is imperative that accurate and timely information is disseminated to the general public about emerging threats, such as SARS-CoV-2. A systematic review explored the major approaches that were used in published research on social media and emerging infectious diseases. 12 The review identified three major approaches: assessment of the public's interest in, and responses to, emerging infectious diseases; examination of organisations' use of social media in communicating emerging infectious diseases; and evaluation of the accuracy of medical information that is related to emerging infectious diseases on social media. However, this review did not focus on studies that used social media data to track and predict outbreaks of emerging infectious diseases. Analysing and disseminating information from peerreviewed, published research can guide policy makers and public health agencies to design interventions for accurate and timely knowledge translation to the general public. Therefore, keeping in view the limitations of existing research that we have previously mentioned, we did a scoping review with the aim of understanding the roles that social media has had since the beginning of the COVID-19 crisis. We investigated public attitudes and perceptions towards COVID-19 on social media, information about COVID-19 on social media, use of social media for prediction and detection of COVID-19, the effects of COVID-19 on mental health, and government responses to COVID-19 on social media. Our objective was to identify and analyse studies on social media that were related to COVID-19 and focused on five themes: infodemics, public attitudes, mental health, detection or prediction of COVID-19 cases, government responses to the pandemic, and quality of health information in videos. Mainly, the primary reviewer (S-FT) screened title and abstract for each article to decide whether an article met the inclusion criteria. If the criteria were confirmed, then the article was included; otherwise, it was excluded. Paragraphs in articles were assigned a code representing one of the five themes (eg, I for infodemic), then a code was assigned to the article on the basis of the majority of paragraph codes. Next, quotes were sorted under each code, applying Ose's method. 15 Braun and Clark's thematic analysis method was used and involved searching for the text that matched the identified predictors (ie, codes) from the quantitative analysis and discovering emergent codes that were relevant to either the study objective or identified in the relevant literature review. 16 Finally, we categorised the codes into main themes. These codes and themes were compared and clarified by S-FT, ZAB, and YY to draw conclusions around the main themes. S-FT is fluent in English and Mandarin. The secondary reviewer (ZAB) is fluent in English, and the tertiary reviewer and domain expert (YY and HC) are both fluent in English and Mandarin. Any discrepancies among reviewers were discussed with the research team to reach consensus. With the application of appropriate search filters, a total of 2405 articles were retrieved from the identified databases: PubMed (1084 articles), Scopus (1021 articles), and PsycINFO (300 articles). Among these, 670 duplicates were excluded. Of the remaining 1735 articles, 1434 were deemed to be non-empirical, such as comments, editorial essays, letters, opinions, and reviews. These exclusions left 301 articles for a full-text review on the basis of the screening results of titles and abstracts. After the full-text review, 81 articles were included in this scoping review (figure 1). Keywords "COVID-19" • "Betacoronavirus", "severe acute respiratory syndrome", "covid 2019", "COVID-19", "corona-19", "n-cov", "novel coronavirus", "sars-cov", or "wuhan 2019" "Social media" • "Twitter", "tweet", "retweet", "facebook", "weibo", "sina", "youtube", "webcast", "user comment", "online post", "online discussion", "social network", "social media", "online community", or "mobile app" Indexed terms "COVID-19" • "Betacoronavirus", "coronavirus infections", "severe acute respiratory syndrome", "coronavirus disease 2019", "COVID-19", or "covid-19" "Social media" • "Social networking", "social media", "mobile applications", "blogging", "social networking (online)", "online social network", "webcast", "mobile applications", "mobile computing", or "social network" The table summarises the 81 articles that were selected on COVID-19 and social media. All articles were written in English. Data from Twitter (45 articles) and Sina Weibo (16 articles) were undoubtedly the most frequently studied. To categorise these chosen articles, we adopted a novel framework called Social Media and Public Health Epidemic and Response (SPHERE) and developed a modified version of SPHERE framework to organise the themes for our scoping review (figure 2). 98 According to WHO, the term infodemic, a combination of information and epidemic, refers to a fast and widespread dissemination of both accurate and inaccurate information about an epidemic, such as Mixed method: analysed emoji use by each gender category; the top 600 emojis were manually classified on the basis of their sentiment Identified five major themes in the analysis: morbidity fears, health concerns, employment and financial issues, praise for front-line workers, and unique gendered emoji use; most emojis are extremely positive across genders, but discussions by women and gender minorities are more negative than by men; when discussing particular topics (eg, financial and employment matters, gratitude, and health care), there are many differences; use of several unique gender emojis to express specific issues (eg, coffin, skull, and siren emojis were used more often by men than by other genders when discussing fears and morbidity, whereas the use of the folded hands emoji as a thankful gesture for front-line workers was found more often in discussions by women than by other genders and the bank emoji was noted only in women's discussions) ( increased, so did its infodemic. 42 Gallotti and colleagues analysed over 100 million tweets and identified that, even before the onset of the COVID-19 pandemic, infodemics threatened public health, although not to the same extent. 37 Pulido and colleagues sampled and analysed 942 tweets, which revealed that although false Review information had a higher number of tweets, it also had less retweets and lower engagement than did tweets comprising scientific evidence or factual statements. 41 Kouzy and colleagues 39 Six themes were identified: the most prominent theme was older people contributing to the community (46 [24%] of 188) followed by older patients (defined by keywords-eg, "older people", "old-aged people", "grandmother", "grandfather", "old grandmother", "old grandfather", "old woman", and "old man") in hospitals ( The trend of public attention could be divided into three stages; the hot topic keywords of public attention at each stage were slightly different; the emotional tendency of the public towards the COVID-19 pandemic-related hot topics changed from negative to neutral between January and February, 2020, with negative emotions weakening and positive emotions increasing overall; COVID-19 topics with the most public concern were divided into five categories: Review colleagues 40 collected and reviewed 2102 news articles that were circulated on the internet. Their analysis showed that fake news was shared over 2 million times, which accounted for 23·1% (2 352 585 of 10 184 351) of total shares between Dec 31, 2019, and April 30, 2020. 40 Similarly, another quantitative study by Galhardi and colleagues comparing the proportion of fake news shared on WhatsApp, Instagram, and Facebook in Brazil showed that fake news was mainly shared on WhatsApp. 36 A UK study by Ahmed and colleagues analysed 22 785 tweets posted by 11 333 Twitter users with #FilmYourHospital to identify and evaluate the source of the conspiracy theory on Twitter. 32 Their work uncovered that ordinary people were the major driver behind the spread of conspiracy theories. 32 Another study investigated the 5G and COVID-19 conspiracy theory that was circulating on Twitter with a random subsample of 233 tweets. The content analysis showed that 34·8% (81) of tweets linked 5G and COVID-19 and 32·2% (75) condemned such theory. 33 Similar research by Bruns and colleagues investigated 89 664 distinct Facebook posts in Australia that were related to this conspiracy from Jan 1 to April 12, 2020, by use of time series and network analysis. 35 The results showed that this conspiracy went viral after March 19, 2020, with unusual coalition among various groups on Facebook. Islam and colleagues analysed 2311 infodemic reports that were related to COVID-19 from Dec 31, 2019, to April 5, 2020, and showed that misinformation was mainly driven by rumours, stigma, and conspiracy theories that were circulating on various social media and other online platforms. 38 Associations between infodemic and bot activities on social media are another important research direction. One study analysed 12 million tweets from the USA and 15 million tweets from the Philippines from March 5 to March 19, 2020, and both countries showed a positive relation between bot activities and rate of hate speech in communities that are denser and more isolated than others. 43 Brennen and colleagues qualitatively analysed 96 samples of visuals (ie, image or video) from January to March, 2020, and categorised misinfor mation into six trends, noting that, fortunately, there has been no involvement of artificial intelligence deepfake techniques (ie, techniques used to make synthetic videos that closely resemble real videos) so far. 34 Three themes emerged under this category: public attitudes, mental health, and detection or prediction of COVID-19 cases. Public attitudes and mental health are reflections regarding the public perceptions and mental health effects of the pandemic; detection or prediction of COVID-19 cases includes typical surveillance studies aiming to propose ways to detect or predict COVID-19 cases. 48 selected articles gauged the attitudes and emotions that were expressed by social media users regarding the COVID-19 pandemic, mainly by use of content and sentiment analysis. Twitter accounted for 33 articles and Sina Weibo accounted for 8 articles. Public attitude can be further divided into the following sub-themes: public sentiment towards the COVID-19 pandemic and interventions, stigma and racism, and ageism. To learn about the public sentiment towards the overall COVID-19 pandemic and its interventions, Abd-Alrazaq and colleagues 7 analysed 167 073 unique English tweets that were divided into four categories: origin, source, regional and global effects on people and society, and methods to reduce transmission of SARS-CoV-2. Tweets regarding economic loss had the highest mean number of likes, whereas travel bans and warnings had the lowest number of likes. 7 Kwon and colleagues investigated 259 529 English tweets in the USA, using trending and spatiotemporal analyses, and noted that tweets about social disruptiveness had the highest number of retweets, whereas tweets about COVID-19 interventions had the highest number of likes. 73 A content analysis of 522 Reddit comments showed that the topic of symptoms accounted for 27% (141) of all comments, followed by the topic of prevention (25% [131] ). 74 Likewise, another content analysis of 155 353 unique English tweets showed that the most mentioned topic was "peril of COVID-19". 76 Additionally, a study that examined 126 049 English tweets by use of sentiment analysis and latent Dirichlet analysis for topic modelling showed that the most common emotion that was mentioned was fear, and the most common topic that was mentioned was the economic and political effects. 79 Al-Rawi and colleagues studied emojis in over 50 million tweets and identified five primary subjects: morbidity fears, health concerns, employment and financial issues, praise for front-line Review workers, and unique gendered emoji use. 51 Samuel and colleagues investigated 293 597 tweets with sentiment analysis and noted more positive emotions than negative emotions towards the US economy reopening. 86 Analysing 2 558 474 English tweets by use of clustering and network analyses, Odlum and colleagues identified that African Americans shared positive sentiments and encouraged virtual discussions and prevention behaviours. 82 A study investigated gender differences in terms of topics by analysing 3 038 026 English tweets. 88 The results showed that tweets from women were more likely to be about family, physical distancing, and health care, whereas tweets from men were more likely to be about sports cancellations, pandemic severity, and politics. In Canada, Xue and colleagues analysed 1 015 874 tweets via latent Dirichlet analysis to identify nine themes about family violence. 93 In Australia, Yigitcanlar and colleagues analysed 96 666 tweets and identified that the public's attitude could be captured efficiently through social media analytics. 94 One qualitative content analysis of 30 profiles from Instagram, Twitter, and TikTok in Brunei identified five types of attitudes towards physical distancing: fear, responsibility, annoyance, fun, and resistance. 80 In Turkey, to show the effects of social media on human psychology and behaviour, Arpaci and colleagues 52 used evolutionary clustering analysis on 43 million tweets between March 22 and March 30, 2020. The study suggested that highfrequency word clusters, such as death, test, spread, and lockdown denoted the public's underlying fear of infection and death from the virus, whereas terms such as stay home and social distancing corresponded to behavioural shifts. 52 A study in Luzon, Philippines, 84 in which sentiment analysis was done by use of natural language processing, showed that most Filipino Twitter users expressed negative emotions towards COVID-19, and the negative mood grew stronger over time in lockdown. 84 Sentiment analysis of 107 990 English tweets uncovered that a negative feeling towards the COVID-19 pandemic dominated, and topic modelling showed three major themes in people's concerns: the COVID-19 pandemic emergency, how to control COVID-19, and reports on COVID-19. 54 Another study analysed 373 908 Belgian tweets and retweets, which showed that the public relied on the EU coalition to tackle the pandemic. 72 De Santis and colleagues analysed 1 044 645 tweets to identify daily hot topics in Italy that were related to the COVID-19 pandemic and developed a framework for prospective research. 62 One thematic analysis study of 1 920 593 Arabic tweets in Egypt showed that negative emotions and sadness were high in tweets showing affective discussions, and the dominant themes included the outbreak of the pandemic, metaphysics responses, signs and symptoms in confirmed cases, and conspiracism. 64 In Singapore, Lwin and colleagues examined 20 325 929 tweets using sentiment analysis and showed that public emotions shifted over time: from fear to anger and from sadness to gratefulness. 77 Chang and colleagues examined over 1·07 million Chinese texts from various online sources in Taiwan using deductive analysis and identified that negative sentiments mainly came from online news with stigmatising language linked with the COVID-19 pandemic. 56 In India, one study investigated 410 643 tweets via sentiment analysis and latent Dirichlet analysis and showed that positive emotions were overall substantially higher than negative sentiments, but this observation diminished at individual levels. 61 Another study analysed 29 554 tweets from the second lockdown (ie, April 15-May 3, 2020) and 47 672 tweets from the third lockdown (ie, May 4-May 17, 2020) via sentiment analysis uncovered positive attitudes towards the second lockdown but negative attitudes towards the third lockdown in India. 57 One study analysed 868 posts from Reddit and noted sentiments to be 50% (434) neutral, 22% (191) positive, and 28% (243) negative in India. 63 A study in South Korea examined 43 832 unique users and their relations on Twitter by use of content and network analyses and showed that tweets including medical news were more popular than tweets containing non-medical news. 83 A study from Ireland analysed 203 756 tweets through topic modelling and identified that war was the most frequently used frame for the pandemic. 90 In the USA, Damiano and colleagues qualitatively analysed 600 English tweets and showed neutral sentiment across most tweets. 59 Politics also had an essential role in shaping people's opinion. 59 A study of 19 803 tweets from Democrats and 11 084 tweets from Republicans by use of random forest in the USA showed that Democrats put more emphasis on public health and direct aid to US workers, whereas Republicans put more emphasis on national unity, China, and businesses. 67 Results of a study involving various online data sources from Italy, the UK, the USA, and Canada showed that media was the major driver of the public's attention, but attention decreased with saturation of the media with news about COVID-19. 66 Compared with other users, Reddit users focused more on health, data related to new disease, and preventative interventions. Researchers in Spain studied 22 223 tweets by use of topic modelling and network analysis. 95 They identified eight frames and noted that the entire pandemic could be divided into three periods: precrisis, lockdown, and recovery periods. Using 563 079 English Reddit posts that were related to COVID-19, Jelodar and colleagues proposed a novel method to detect meaningful latent topics and sentiment-comment classification. 69 Samuel and colleagues examined over 900 000 tweets to study the accuracy of tweet classifications among logistic regression and Naive Bayes methods. 85 They identified that Naive Bayes had 91% of accuracy compared with 74% from the logistic regression model. 85 Han and colleagues analysed 1 413 297 Sina Weibo posts and observed that the public paid attention to information regarding the epidemic, especially in metro areas. 68 Zhao and colleagues studied 4056 topics from the Sina Microblog hot search list and noted that the public emotions shifted from negative to neutral to positive over time and that five major public concerns existed: the situation of the new cases of COVID-19 and its effects, front-line reporting of the pandemic and the measures of prevention and control, expert interpretation and discussion on the source of infection, medical services on the front line of the pandemic, and focus on the pandemic and the search for suspected cases. 96 Li and colleagues 75 did an observational infoveillance study with a linear regression model by analysing 115 299 Sina Weibo posts. The results showed that the number of Sina Weibo posts positively correlated with the number of reported cases of COVID-19 in Wuhan. Additionally, the qualitative analysis classified the topics into the following four overarching themes: cause of the virus, epidemiological characteristics of COVID-19, public responses, and others. 75 Chen and colleagues examined relationships between citizen engagement through government social media and media richness, dialogic loop, content type, and emotion valence. 58 Citizen engagement through government social media refers to sum of shares, likes, and comments in this study, so the higher the sum, the greater the citizen engagement through government social media. Media richness quantifies how much information that a sender transfers to a receiver via a medium and is based on the media richness theory (ie, "the potential information load of communication media, emhasising the abilities of promoting shared meaning"). 101 Dialogic loop, or dialogic communication theory, is defined as an approach that promotes a dialogue between a speaker and audience. According to the American Psychological Association, emotion valence refers to "the value associated with a stimulus, expressed on a continuum from pleasant to unpleasant or from attractive to aversive". 100 For instance, happiness is typically considered to be pleasant valence. Chen and colleagues analysed 1411 posts that were related to COVID-19 from Healthy China, an official account of the National Health Commission of China on Sina Weibo. Findings showed an inverse association between media richness and citizen engagement through government social media, indi cating that posts with plain texts had higher citizen engagement through government social media than did posts with pictures or videos. A positive association between dialogic loop and citizen engagement through government social media was noted, as evidenced by 96% (1355 of 1411) of responses to these posts having hashtags and 25% (353 of 1411) containing questions. In terms of media richness, when posts had both a high media richness and positive emotion, citizen engagement through government social media increased, whereas when posts had a high media richness and negative emotion, citizen engagement decreased. Regarding content type, when posts were related to the latest news about the pandemic, stronger negative emotions led to increased citizen engagement through government social media. 58 Yin and colleagues 65 proposed a new multiple-information susceptible-discussingimmune model to analyse the public opinion propagation of COVID-19 from Sina Weibo posts that were collected from Dec 31, 2019, to Feb 27, 2020. The researchers reported that the reprod uction rate of this proposed model reached 1·78 in the early stage of COVID-19 but decreased to around 0·97 and was maintained at this level. Such a result showed that the information on COVID-19 would continue to increase slowly in the future until it stabilises. However, this stability would depend on how much information is received on COVID-19. Wang and colleagues 89 analysed 999 978 randomly selected Sina Weibo posts that were related to COVID-19 through an unsupervised Bidirectional Encoder Representations from Trans formers model for sentiments and a term frequency-inverse document frequency model for topic modelling. The authors identified four public concerns: the virus origin, symptom, production activity, and public health control in China. 89 Xi and colleagues examined 241 topics with their views and comments via thematic and temporal analysis and noted that older adults contributing to the community was the most frequent theme in the first phase of COVID-19 in China (ie, Jan 20-Feb 20, 2020). 91 The theme of older patients in hospitals was most frequent in the second (ie, Feb 21-March 17, 2020) and third phase (ie, March 18-April 28, 2020). Using Wilcoxon tests, Su and colleagues examined posts from 850 Sina Weibo users and 14 269 tweets from Italy. 87 The findings showed that Italian people paid more attention to leisure, whereas Chinese people paid more attention to the community, religion, and emotions after lockdowns. Analysing the top 200 accounts from WeChat via regressions and content analysis, Ma and colleagues showed that both non-medical and medical reports had positive effects on people's behaviours. 78 Using Kendall's Tau-B rank test, Xie and colleagues investigated relations among the Baidu Attention Index, daily Google Trends, and numbers of COVID-19 cases and deaths. 92 Daily Google Trends were correlated to seven indicators, whereas daily Baidu Search Index was correlated only to three indicators. 92 Zhu and colleagues analysed 1 858 288 Sina Weibo posts and noted that topics changed over time but political and economic posts attracted greater attention than did other topics. 97 Regarding stigma and racism, Kim 71 analysed 27 849 individual tweets in South Korea by use of a binary logistic regression to gauge network size and semantic network analysis to capture contextual and subjective factors. The results indicated that size of personal social network was inversely correlated with impolite language use. Namely, users with larger social networks were less likely to post uncivil messages on Twitter than were users with smaller social networks. This study suggested that the size of the social network influenced the language choice of social media users in their postings. 71 Research compared public stigma before and after the introduction of the terms Chinese virus or China virus in 16 535 English tweets from before introduction and 177 327 tweets from after introduction. 55 The results showed an almost 10 times increase, nationwide and statewide and in the USA, from 0·38 tweets posted per 10 000 people referencing the two terms before introduction to 4·08 tweets posted per 10 000 after introduction. A similar study examined 339 063 tweets from non-Asian respondents via local polynomial regression and interrupted time-series analysis. 60 The findings showed that, when stigmatising terms, such as Chinese virus, were used by media (starting from March 8, 2020), the bias index (ie, Implicit Americanness Bias) began to increase, and such bias was more profound in conservatives than in members of any other political subgroup. Nguyen and colleagues analysed 3 377 295 tweets that were related to race in the USA using sentiment analysis and uncovered a 68·4% increase in negative tweets referring to Asian people, whereas tweets referring to other races remained stable. 81 Regarding ageism, a study 70 investigating Twitter content that was related to both COVID-19 and older adults analysed a random sample of 351 English tweets. 21·1% (74) of the tweets implied diminished regard for older adults by downplaying or dismissing concerns over the high fatality of COVID-19 in this population. 70 Similar research examined 188 tweets via thematic analysis and showed that 90% (169) of tweets opposed ageism, whereas 5% (9) of tweets favoured ageism, and 5% (10) of tweets were neutral. 53 Two of 81 reviewed studies, both based in China, focused on assessing the mental health of social media users. 44, 45 A cross-sectional study 44 investigated the relationship between anxiety and social media exposure, which is theoretically defined as "the extent to which audience members have encountered specific messages". 102 The researchers distributed an online survey based on the Chinese version of WHO-Five Well-Being Index for depression and the Chinese version of Generalized Anxiety Disorder Scale for anxiety. Respondents included 4872 Chinese citizens aged 18 years and older from 31 provinces and autonomous regions in China. After controlling for all covariates through a multivariable logistic regression, the study showed that frequent social media exposure increased the odds ratio of anxiety, showing that frequent social media exposure is potentially contributing to mental health problems during the COVID-19 outbreak. 44 To explore how people's mental health was influenced by COVID-19, Li and colleagues 45 analysed posts from 17 865 active Sina Weibo users to compare sentiments before and after declaration of COVID-19 outbreak by the National Health Commission in China on Jan 20, 2020. The researchers identified increased negative sentiments, including anxiety, depression, and indignation, after the declaration and decreased positive sentiments expressed in the Oxford happiness score. Additionally, cognitive indicators showed increased sensitivity to social risks but decreased life satisfaction after the declaration. 45 Six of 81 studies investigated the detection or prediction of COVID-19 outbreaks with social media data. Qin and colleagues 22 attempted to predict the number of newly suspected or confirmed COVID-19 cases by collecting social media search indexes for symptoms (eg, dry cough, fever, and chest distress), coronavirus, and pneumonia. The data were analysed by use of subset selection, forward selection, lasso regression, ridge regression, and elastic net. Results showed that the optimal model was constructed via the subset selection. The lagged social media search indexes were a predictor of new suspected COVID-19 cases and could be detected 6-9 days before confirmation of new cases. 22 To evaluate the possibility of early prediction of COVID-19 cases via internet searches and social media data, Li and colleagues 17 used the keywords coronavirus and pneumonia to retrieve corresponding trend data from Google Trends, Baidu Search Index, and Sina Weibo Index. By use of the lag correlation, the results showed that the correlation between trend data with the keyword coronavirus and number of laboratory-confirmed cases was highest 8-12 days before increase in confirmed COVID-19 cases in the three platforms. Similarly, the correlation between trend data for the keyword coronavirus and new suspected COVID-19 cases was highest 6-8 days before increase in new suspected cases. The correlation between trend data for the keyword pneumonia and new suspected cases was highest 8-10 days before increase in new suspected COVID-19 cases across the three platforms. 17 Peng and colleagues studied 1200 Sina Weibo records using spatiotemporal analysis, kernel density analysis, and ordinary least square regression and noted that scattered infection, community spread, and full-scale outbreak were three phases of early COVID-19 transmission in Wuhan, China. 21 Older people are at high risk of severe COVID-19 and accounted for over 50% of help seeking on Sina Weibo. To identify COVID-19 patients with poor prognosis, Liu and colleagues analysed Sina Weibo messages from 599 patients along with telephone followups. 18 The findings suggested risk factors involving older age, diffuse distribution of pneumonia, and hypoxaemia. A regression study analysed Google Trends searches, Wikipedia page views, and tweets and showed that current Wikipedia page views, tweets from a week before, and Google Trends searches from two weeks before can be used to model the number of COVID-19 cases. To model the number of deaths, all three variables should be one week earlier than for cases. 19 To inoculate the public against misinformation, public health organisations and governments should create and Review spread accurate information on social media because social media has had an increasingly important role in policy announcement and health education. Six of 81 articles were categorised as government responses because they examined how government messages and health education material were generated and consumed on social media platforms. Two studies analysed data from Sina Weibo, 23, 27 and the other four studies analysed data from Twitter. [28] [29] [30] [31] Zhu and colleagues 23 measured the attention of Chinese netizens-ie, citizen of the net-to COVID-19 by analysing 1101 Sina Weibo posts. They noted that Chinese netizens paid little attention to the disease until the Chinese Government acknowledged and declared the COVID-19 outbreak on Jan 20, 2020. Since then, high levels of social media traffic occurred when Wuhan, China, began its quarantine (Jan 23-Jan 24, 2020), during a Red Cross Society of China scandal (Feb 1, 2020), and following the death of Li Wenliang (Feb 6-Feb 7, 2020). 23 Li and colleagues 27 collected 36 746 Sina Weibo posts to identify and categorise the situational information using support vector machines, Naive Bayes, and random forest as well as features in predicting the number of reports using linear regression. Except for posts that were categorised as counter rumours (ie, used to oppose rumours), they identified that the higher the word count, the more reposts there were. Likewise, posts from unverified users had more reposts for all categories than did posts from verified users, excluding the counter rumours. For counter rumours, reposts increased with the number of followers and if the followers were from urban areas. 27 A qualitative content analysis was done to investigate how G7 leaders used Twitter for matters concerning the COVID-19 pandemic by collecting 203 tweets. 29 The findings showed that 166 of 203 tweets were informative, 48 tweets were linked to official government resources, 19 (9·4%) tweets were moraleboosting, and 14 (6·9%) tweets were political. 29 To assess the political partisan polarisation in Canada regarding COVID-19, Merkley and colleagues 28 randomly sampled 1260 tweets from the social media of 292 federal members of parliament and collected 87 Google Trends for the search term coronavirus. 2499 Canadian respondents aged 18 years and above were also surveyed. The results showed that, regardless of party affiliation, members of parliament emphasised the importance of measures for physical distancing and proper hand-hygiene practices to cope with the COVID-19 pandemic, without tweets exaggerating concerns or misinformation about COVID-19. Search interest in COVID-19 among munici palities was strongly determined by socioeconomic and urban factors rather than Conservative Party vote share. 28 Sutton and colleagues studied 149 335 tweets from public health, emergency management, and elected officials and observed that the underlying emotion of messages changed positively and negatively over time. 30 Wang and colleagues investigated 13 598 tweets that were related to COVID-19 via temporal and network analyses. 31 They categorised 16 types of messages and identified inconsistent and incongruent messages expressed in four crucial pre vention topics: mask wearing, risk assessments, stay at home order, and disinfectants or sanitisers. Eight chosen studies investigated the quality (ie, the number of recommended prevention behaviours that were covered in the videos-eg, wearing a facemask, washing hands, physical distancing, etc) of YouTube videos with COVID-19 prevention information. Basch and colleagues 24 did a cross-sectional study and retrieved the top 100 YouTube videos with the most views that were uploaded in January, 2020, with the keyword of coronavirus in English, with English subtitles, or in Spanish. These 100 videos generated over 125 million views in total. However, fewer than 33 videos included any of the seven key prevention behaviours that are recommended by the US Centers for Disease Control and Prevention. 24 A follow-up study with the same criteria and a successive sampling design gathered the top 100 YouTube videos that were most viewed in January and March, 2020. 25 Findings showed that, in total, the January sample generated over 125 million views, and the March sample had over 355 million views. Yet, fewer than 50 videos in either sample contained any of the prevention behaviours that are recommended by the US Centers for Disease Control and Prevention. 25 Additionally, a study investigated the top 100 YouTube videos about do it yourself hand sanitiser with the most views and showed that the average number of daily calls about paediatric poisoning increased substantially in March, 2020, compared with the previous 2 years. 46 To analyse the information quality on YouTube about the COVID-19 pandemic and to compare the contents in English and Chinese Mandarin videos, Khatri and colleagues 26 collected 150 videos with the keywords 2019 novel coronavirus and Wuhan virus in English and Mandarin. The DISCERN score and the medical information and content index were calculated as a reliable way to measure the quality of health information. The mean DISCERN score for reliability was low: 3·12 of 5·00 for English videos and 3·25 for Mandarin videos. The mean cumulative medical information and content index score of useful videos was also undesirable: 6·71 of 25·00 for English videos and 6·28 for Mandarin videos. 26 In Spain, a similar study of 129 videos in Spanish identified that information in videos about preventing COVID-19 was usually incomplete and differed according to the type of authorship (ie, mass media, health professionals, individual users, and others). 47 Likewise, one study in South Korea noted that misleading videos accounted for 37·14% (39 of 105) of most-viewed videos and had more likes, fewer comments, and longer viewing times than did useful videos. 48 Two studies in Turkey investigated the quality of YouTube videos regarding COVID-19 information in dentistry. 49 Review 300 views and showed moderate quality and useful information from these videos. 49 The other study, however, showed poor quality for 24 of 55 (43·6%) English videos, whereas good quality accounted for only 2 (3·6%) videos. 50 Studies on social media data showed our attitudes and mental state to some extent during the COVID-19 crisis. These studies also showed how we generated, consumed, and propagated information on social media platforms when facing the rapid spread of the SARS-CoV-2 and extraordinary measures for the containment. In our Review, public attitudes accounted for nearly 59% (48 of 81) of the reviewed articles. In terms of social media platforms, 56% (45 of 81) of the chosen articles used data from Twitter, followed by Sina Weibo (20% [16 of 81] ). Machine learning analyses, such as latent Dirichlet analysis and random forest, were applied in research that studied public attitudes. We identified six themes on the basis of our modified SPHERE framework, including infodemics, public attitudes, mental health, detection or prediction of COVID-19 cases, government responses to the pandemic, and quality of prevention education videos. However, a common limitation in all chosen studies on social media data is the comparison of data due to differences in quality, such as formats, metrics, or even the definition of common variables (eg, the amount of time required for a post to be on an individuals screen to be counted as a view). For instance, the definition of a view on one social media platform is likely to be different from another. Besides, not every social media platform offers accessible data, like Twitter and Sina Weibo. To address these challenges, the selected studies have controlled for many factors, including social media platforms, languages, locations, time, misspellings, keywords, or hashtags. However, such search strategies resulted in many study limitations, such as non-representative sample sizes, selection bias, crosssectional study design, or retrospective study design. We also observed that, given the large amount of available data, most studies across all domains sampled small data size for analyses, except for four studies under the theme of public attitudes that analysed over one million posts via machine learning methods. Additionally, data from Twitter and Sina Weibo accounted for over 70% (59 of 81) of our selected studies. Research examining other social media platforms, including Facebook, Instagram, TikTok, Snapchat, and WhatsApp, is scarce due to barriers of data availability and accessibility. We also identified future research topics that are needed for each category during the COVID-19 pandemic. From an infodemics perspective, additional research is needed to investigate how misinformation, rumours, and fake news (eg, anti-mask wearing reports) undermine preventions and compromise public health, although social media companies, such as Twitter and Facebook, have started to remove accounts that are based on misinformation. Bot posts are another topic to be addressed and studies evaluating effective counter-infodemic interventions are also needed. Articles regarding public attitudes towards the COVID-19 pandemic have shown sentiments that shifted over time. Yet, this theme can be a useful indicator when evaluating interventions, such as physical distancing and wearing masks, that aim to reduce the risk of COVID-19 infection. However, public sentiments had not been incorporated into many intervention studies by the time that we did this Review. When a disease, such as COVID-19, starts spreading and causing negative sentiments, timely, proper, and effective risk communication is needed to help ease people's anxiety or negative attitudes regarding the COVID-19 pandemic, especially through social media. Mental health is another issue that requires further investigation. Our chosen studies did not address mental health issues on the basis of age, as symptoms and interventions tend to vary with age. Public health measures, such as physical distancing, that were implemented in the COVID-19 pandemic exacerbated risk factors and adverse health behaviours at the individual and population levels. Studies showed that social media data were useful to detect mental health issues at the population level. Due to the early outbreak of COVID-19 and the prevalence of social media use (eg, Sina Weibo and WeChat) in China, two studies reported increased issues of mental health among the Chinese population. 44, 45 A similar trend of deteriorating mental health could happen in other regions. At the time of writing, British Columbia has recorded the highest number of overdose deaths in Canada (May, 2020). 103 In terms of the surveillance of the COVID-19 pandemic, six chosen studies showed methods to detect or predict the number of COVID-19 cases by use of social media data. Accoridng to our Review, unlike other infectious diseases, such as influenza and malaria, COVID-19 has not had real-time monitoring surveillance developed with social media data. It is possible that the pandemic has evolved so rapidly that finding COVID-19 vaccinations or therapies has been prioritised over real-time monitoring surveillance with social media. Besides, scarcity of accurate and reliable data sources might discourage the development of the COVID-19 real-time surveillance. Moreover, whether COVID-19 is a one-time event or will become seasonal, like influenza, is unknown. If COVID-19 becomes seasonal, then it might be meaningful and useful to establish a real-time model to monitor the disease by use of social media data. Government responses that were distributed via social media have been increasingly crucial in combating infodemics and promoting accurate and reliable information for the public. However, little has been studied about how efficient and effective these official responses are at leading to public belief or behavioural changes. It also remained unknown whether government posts would reach greater numbers of social Review media users or have greater effects on them than would infodemics. YouTube has served as one of the major platforms to spread information concerning the control of COVID-19. Nonetheless, our chosen studies showed that most YouTube videos were of undesirable quality because they contained few recommended preventions from governments or public health organisations. The undesirable quality is a worrisome observation if accurate and reliable videos and other types of information are not created and disseminated in a timely manner. Therefore, videos, especially from public health authorities, should include accurate and reliable medical and scientific information and use relevant hashtags to reach a large audience, generate a high number of views, and increase responses. Moreover, our selected studies were limited to YouTube videos only. Additionally, a substantial proportion of the studies were done using Sina Weibo, which, although used by many people, is exclusive to China and might lead to an over-representation of a single country in this Review. In summary, although our Review has limitations that are embedded from the chosen studies, we recognised six themes that have been studied so far and identified future research directions. Our adopted framework can serve as a fundamental and flexible guideline when studying social media and epidemiology. Our Review identified various topics, themes, and methodological approaches in studies on social media and COVID-19. Among the six identified themes, public attitudes comprised most of the articles. Among the selected studies, Twitter was the leading social media platform, followed by Sina Weibo. Few studies included machine learning methods, whereas most studies used traditional statistical methods. Unlike influenza, we were not able to find studies documenting real-time surveillance that was developed with social media data on COVID-19. Our Review also identified studies that were related to COVID-19 on infodemics, mental health, and prediction. For COVID-19, accurate and reliable information through social media platforms can have a crucial role in tackling infodemics, misinformation, and rumours. Additionally, real-time surveillance from social media about COVID-19 can be an important tool in the armamentarium of interventions by public health agencies and organisations. HC and ZAB conceived the Review. S-FT searched for articles and screened them, analysed data, and wrote the first draft of the manuscript. TT refined the search strategy and searched for articles. YY refined the search strategy and created the tables. LL assisted in screening. ZAB and HC supervised the review process and prepared the final draft for submission. All authors contributed to the interpretation of results, manuscript preparation, and revisions. All authors read and approved the final manuscript. We declare no competing interests. The search strategies used index terms, where applicable, and free-text terms to capture two concepts: social media, including both general terms and specific platform names (eg, Twitter, Facebook, Sina Weibo, and YouTube); and COVID-19. For each database, both indexed terms (ie, MeSH and Emtree) and natural language keywords were used with Boolean operators (ie, AND, OR, and NOT) and truncations (panel). Since each database has distinctive search functionality, individually tailored search statements were developed with appropriate search filters for each database. Final search statements, along with a list of search results, were downloaded from each database. Articles were included if they discussed the use of social media for COVID-19 research and if they were original, empirical studies. Only peer-reviewed articles, including peer-reviewed preprints, in English or Mandarin, were included. A decision to include Chinese publications was based on the fact that COVID-19 cases were first reported in Wuhan, China, and many initial and relevant studies were published in Mandarin; therefore, we wanted to capture most studies regarding the use of social media for COVID-19 research. All articles published between Nov 1, 2019, and Nov 4, 2020, were included. Publications such as reviews, opinion pieces, books, book chapters, articles and preprints that were not peer-reviewed, and articles that were written in languages other than English or Mandarin were automatically excluded. The final reference list was generated on the basis of originality and relevance to the broad scope of this review. Coronavirus resource center: world map Social networks' engagement during the COVID-19 pandemic in Spain: health media vs. healthcare professionals Using Twitter for public health surveillance from monitoring and prediction to public response Automatically appraising the credibility of vaccine-related web pages shared on social media: a Twitter surveillance study Twitter as a tool for health research: a systematic review How organisations promoting vaccination respond to misinformation on social media: a qualitative investigation Top concerns of tweeters during the COVID-19 pandemic: infoveillance study Managing and sharing H1N1 crisis information using social media bookmarking services Effective uses of social media in public health and medicine: a systematic review of systematic reviews Influence of social media platforms on public health protection against the COVID-19 pandemic via the mediating effects of public health awareness and behavioral changes: integrated model The causes and consequences of COVID-19 misperceptions: understanding the role of news and social media Social media and outbreaks of emerging infectious diseases: a systematic review of literature Scoping studies: towards a methodological framework Scoping studies: advancing the methodology Using Excel and Word to structure qualitative data Using thematic analysis in psychology Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data Characteristics and outcomes of a sample of patients with COVID-19 identified through social media in Wuhan, China: observational study A Google-Wikipedia-Twitter model as a leading indicator of the numbers of coronavirus deaths COVID-19 coronavirus pandemic Exploring urban spatial features of COVID-19 transmission in Wuhan based on social media data Prediction of number of cases of 2019 novel coronavirus (COVID-19) using social media search index Limited early warnings and public attention to coronavirus disease 2019 in China Preventive behaviors conveyed on YouTube to mitigate transmission of COVID-19: cross-sectional study The role of YouTube and the entertainment industry in saving lives by educating and mobilizing the public to adopt behaviors for community mitigation of COVID-19: successive sampling design study YouTube as a source of information on 2019 novel coronavirus outbreak: a cross-sectional study of English and Mandarin content Characterizing the propagation of situational information in social media during covid-19 epidemic: a case study on Weibo A rare moment of cross-partisan consensus: elite and public response to the COVID-19 pandemic in Canada World leaders' usage of Twitter in response to the COVID-19 pandemic: a content analysis COVID-19: Retransmission of official communications in an emerging pandemic Examining risk and crisis communications of government agencies and stakeholders during early-stages of COVID-19 on Twitter COVID-19 and the "Film Your Hospital" conspiracy theory: social network analysis of Twitter data López Seguí F. COVID-19 and the 5G conspiracy theory: social network analysis of Twitter data Beyond (mis)representation: visuals in COVID-19 misinformation 5G? or both?': the dynamics of COVID-19/5G conspiracy theories on Facebook Fact or fake? An analysis of disinformation regarding the COVID-19 pandemic in Brazil Assessing the risks of 'infodemics' in response to COVID-19 epidemics COVID-19-related infodemic and its impact on public health: a global social media analysis Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on Twitter Fake news and COVID-19 in Italy: results of a quantitative observational study COVID-19 infodemic: more retweets for science-based information on coronavirus than for false information Global infodemiology of COVID-19: analysis of Google web searches and Instagram hashtags Bots and online hate during the COVID-19 pandemic: case studies in the United States and the Philippines Mental health problems and social media exposure during COVID-19 outbreak The impact of COVID-19 epidemic declaration on psychological consequences: a study on active Weibo users Hand sanitizer in a pandemic: wrong formulations in the wrong hands Characteristics of YouTube videos in Spanish on how to Evaluation of Korean-language COVID-19-related medical information on YouTube: cross-sectional infodemiology study Analysis of dentistry YouTube videos related to COVID-19 An analysis of YouTube videos as educational resources for dental practitioners to prevent the spread of COVID-19 COVID-19 and the gendered use of emojis on Twitter: infodemiology study Analysis of twitter data using evolutionary clustering during the COVID-19 pandemic Calculated ageism: generational sacrifice as a response to the COVID-19 pandemic Public perception of the COVID-19 pandemic on Twitter: sentiment analysis and topic modelling study Creating COVID-19 stigma by referencing the novel coronavirus as the "Chinese virus" on Twitter: quantitative analysis of social media data Blaming devices in online communication of the COVID-19 pandemic: stigmatizing cues and negative sentiment gauged with automated analytic techniques COVID-19 pandemic lockdown: an emotional health perspective of Indians on Twitter Unpacking the black box: how to promote citizen engagement through government social media during the COVID-19 crisis A content analysis of coronavirus tweets in the United States just prior to the pandemic declaration The China Virus" went viral: racially charged coronavirus coverage and trends in bias against Asian Americans Characterizing public emotions and sentiments in COVID-19 environment: a case study of India An infoveillance system for detecting and tracking relevant topics from Italian tweets during the COVID-19 event Analysing COVID-19 news impact on social media aggregation How do Arab tweeters perceive the COVID-19 pandemic? COVID-19 information propagation dynamics in the Chinese Sina-microblog Collective response to media coverage of the COVID-19 pandemic on Reddit and Wikipedia: mixed-methods analysis Elusive consensus: polarization in elite communication on the COVID-19 pandemic Using social media to mine and analyze public opinion related to COVID-19 in China Deep Sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach an evaluation of tweets about older adults and COVID-19 Effects of social grooming on incivility in COVID-19 #Coronavirus: monitoring the Belgian Twitter discourse on the severe acute respiratory syndrome coronavirus 2 pandemic Defining facets of social distancing during the COVID-19 pandemic: Twitter analysis Addressing immediate public coronavirus (COVID-19) concerns through social media: utilizing Reddit's AMA as a framework for public engagement with science Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: retrospective observational infoveillance study Constructing and communicating COVID-19 stigma on Twitter: a content analysis of tweets during the early stage of the COVID-19 outbreak Global sentiments surrounding the COVID-19 pandemic on Twitter: analysis of Twitter trends Effects of health information dissemination on user follows and likes during COVID-19 outbreak in China: data and content analysis An "infodemic": leveraging high-volume Twitter data to understand early public sentiment for the coronavirus disease 2019 outbreak Creative production of 'COVID-19 social distancing' narratives on social media Exploring U.S. shifts in anti-Asian sentiment with the emergence of COVID-19 Application of topic modeling to tweets as the foundation for health disparity research for COVID-19 Conversations and medical news frames on twitter: infodemiological study on COVID-19 in South Korea Sentiment analysis of filipinos and effects of extreme community quarantine due to coronavirus (COVID-19) pandemic COVID-19 public sentiment insights and machine learning for tweets classification Feeling positive about reopening? New normal scenarios from COVID-19 US reopen sentiment analytics Examining the impact of COVID-19 lockdown in Wuhan and Lombardy: a psycholinguistic analysis on Weibo and Twitter COVID-19 tweeting in English: gender differences COVID-19 sensing: negative sentiment analysis on social media in China via BERT model Framing COVID-19: how we conceptualize and discuss the pandemic on Twitter A thematic analysis of Weibo topics (Chinese twitter hashtags) regarding older adults during the COVID-19 outbreak An extensive search trends-based analysis of public attention on social media in the early outbreak of COVID-19 in China The hidden pandemic of family violence during COVID-19: unsupervised learning of tweets How can social media analytics assist authorities in pandemic-related policy decisions? Insights from Australian states and territories Analyzing Spanish news frames on Twitter during COVID-19-a network study of El País and El Mundo Chinese public's attention to the COVID-19 epidemic on social media: observational descriptive study Analysis of spatiotemporal characteristics of big data on social media sentiment with COVID-19 epidemic topics From "infodemics" to health promotion: a novel framework for the role of social media in public health WHO. Novel coronavirus (2019-nCoV): situation report-13. 2020 Your emotional brain on resentment Organizational information requirements, media richness and structural design Measuring media exposure in a changing communications environment records highest number of fatal overdoses in a single month, with 170 deaths The Author(s)