key: cord-0978941-4szg1nbu authors: Xie, Tiantian; Tan, Tao; Li, Jun title: An Extensive Search Trends-Based Analysis of Public Attention on Social Media in the Early Outbreak of COVID-19 in China date: 2020-08-26 journal: Risk Manag Healthc Policy DOI: 10.2147/rmhp.s257473 sha: ac454f91f364c266f5784c392037e420073ee79d doc_id: 978941 cord_uid: 4szg1nbu BACKGROUND: A novel coronavirus (COVID-19) caused pneumonia broke out at the end of 2019 in Wuhan, China. Many cases were subsequently reported in other cities, which has aroused strong reverberations on the Internet and social media around the world. OBJECTIVE: The aim of this study was to investigate the reaction of global Internet users to the outbreak of COVID-19 by evaluating the possibility of using Internet monitoring as an instrument in handling communicable diseases and responding to public health emergencies. METHODS: The disease-related data were retrieved from China’s National Health Commission (CNHC) and World Health Organization (WHO) from January 10 to February 29, 2020. Daily Google Trends (GT) and daily Baidu Attention Index (BAI) for the keyword “Coronavirus” were collected from their official websites. Rumors which occurred in the course of this outbreak were mined from Chinese National Platform to Refute Rumors (CNPRR) and Tencent Platform to Refute Rumors (TPRR). Kendall’s Tau-B rank test was applied to check the bivariate correlation among the two indexes mentioned above, epidemic trends, and rumors. RESULTS: After the outbreak of COVID-19, both daily BAI and daily GT increased rapidly and remained at a high level, this process lasted about 10 days. When major events occurred, daily BAI, daily GT, and the number of rumors simultaneously reached new peaks. Our study indicates that these indexes and rumors are statistically related to disease-related indicators. Information symmetry was also found to help significantly eliminate the false news and to prevent rumors from spreading across social media through the epidemic outbreak. CONCLUSION: Compared to traditional methods, Internet monitoring could be particularly efficient and economical in the prevention and control of epidemic and rumors by reflecting public attention and attitude, especially in the early period of an outbreak. Since December 2019, several cases of pneumonia of unknown causes have been reported in Wuhan. 1 Within just a few months, the novel coronavirus has developed into a global pandemic. As of March 31, 2020, over 809,000 confirmed cases were reported worldwide, of which 81,518 were identified in China. Immediately after the outbreak in Wuhan, several exported cases were confirmed in South Korea, Japan, Thailand, Iran, and the EU due to the high transmissivity of the COVID-19. 2 The most common symptoms of this epidemic were onset of fever, generalized weakness, and dry cough, with clinical presentations greatly resembling viral pneumonia. The incubation period of the virus in the human body is generally 3-7 days (within 14 days). 3 This new type of coronavirus is not as fatal as people thought based on the current data. At present, the mortality of COVID-19 in China is 3.67% according to the situation report by China's National Health Commission (CNHC), 4 compared with 9.6% for SARS and 34.4% for MERS reported by the World Health Organization (WHO). 5, 6 The COVID-19 pandemic has generated a huge impact on social media behaviors across the world internet community, in particular at the early outbreak in China when global attention was focused on the situation in Wuhan and Hubei province, the epicenter of the epidemic outbreak. China has exhibited fast Internet growth in recent years. According to the latest version of the China Internet Development Statistical Report published by China Internet Network Information Center (CNNIC), 7 there were 854 million Internet users (61.2% of the total population) and 847 million cell phone users as of June 2019. Meanwhile, social media has become the primary source for the vast majority of internet users in accessing news and acquiring real-time information in China and across the world. Due to the expansion of Wi-Fi coverage, the maturity of Fourth generation (4G) networks and the popular growth of Fifth generation (5G) networks, Chinese Internet users spent on average 27.9 hours per week online. At the global level, there are 38 billion netizens (over 50% of the whole population) according to the latest Internet Trends Report by Mary Meeker; the Internet penetration is 48% in Asia Pacific, 78% in Europe, 32% in Africa and the Middle East, 62% in Latin America and the Caribbean, and 89% in North America. 8 With the increase in global Internet penetration, people are used to obtaining information and expressing their opinions online. Participation and attention of Internet users can to some extent represent the public reactions and attitudes to an event. As the leverage of the world's netizens is increasing, public opinion plays an important role in the appeal and participation of many public emergencies, for instance, people spontaneously started using social media to express their dissatisfaction and organize protest marches in France's Yellow vests movement in 2019. 9 In recent years, Internet search trends have been widely used in scientific researches in European and American countries. There were many previous researches based around these data, for instance, forecasting of tourism demand, 10 prediction of unemployment, 11 monitoring of public health and major mental illness. [12] [13] [14] Monitoring public reactions through Internet and major social media platforms (Twitter and Facebook) has become a popular topic, especially in the prevention and control of some infectious diseases, such as Measles, 15 Ebola, 16 and other diseases. [17] [18] [19] Similar to the situation in the regions mentioned above, researchers in many Asian countries have also verified that Internet search trends and social media data could be considered as an important and effective way for the assessment of public attention, risk perception, and behavioral responses to the epidemic outbreaks, since the outbreak of SARS in 2002 20 to the outbreak of COVID-19. [21] [22] [23] The reaction of netizens can generally represent the public response to the emergency, and people in different countries often have different concerns, which provided us a useful tool to better understand the public concerns as well as to prevent and control the epidemic outbreaks. [17] [18] [19] 24 Rational public opinion and correct information transmission from the authorities and authoritative experts can enhance persons' understanding of the outbreaks and relieve their panic. However, the transmission of much contradictory and false information may cause an information overload as well as confusion, 25 which will arouse fear and anxiety that could in turn nourish rumors. For example, it is difficult for people to understand whether face masks can be reused during the epidemic outbreaks, since a thousand posts have a thousand answers. [26] [27] [28] Today, people get information in different ways, but web search engines are a common choice for Internet users around the world. Baidu is the most popular search engine in China, 90.9% of Internet users have reported to prefer Baidu as their default search engine according to the latest Chinese Internet Search Engine Usage Research Report. 29 However, in other countries and regions outside China, people choose Google with priority selection incidence of 92.07%, according to the latest Search Engine Market Share Worldwide Report. 30 Based on quantitative analyses with data mined from both Baidu and Google indexes, this study aims to argue that Internet monitoring is a convenient and cost-effective way to assess public reactions, which can provide evidence to all governments and the public in the world to handle public health emergency problems in case of epidemic outbreaks. In order to examine the public attention to the outbreak of COVID-19, we utilize a search index which uses keywords as statistical objects to scientifically analyze and calculate the weighted sum of the search frequency in a specific area and time. In short, the more Internet searches the higher the search index, which could illustrate the degree of public concern. 31 Baidu Attention Index (BAI) and Google Trends (GT), two major search indexes, were searched from January 10 to February 29, 2020, for comparing with some diseaserelated factors. This outbreak has caused widespread concern in the world, especially on the Internet. 32 Therefore, our study sample was the global netizens (Internet users) during the study period of this coronavirus outbreak. Through the correlation test, we found that in the early stage of the outbreak, correlation between the keyword "Coronavirus" in Chinese for BAI and the epidemic trends were significant, while "New coronary pneumonia" and "Epidemic" were not. Similarly, we found that the keyword "Coronavirus" for GT compared with other keywords such as "Corona" or "COVID-19" can better reflect the public's concern about the epidemic situation during the study period. Therefore, we gathered the BAI with " Coronavirus" in Chinese as the keyword, from the web page of Baidu Index on March 11, 33 at the same time, we collected the daily GT as a supplementary data source by using the keyword "Coronavirus" from the web page of Google Trends. 34 Besides, we collected 385 rumors identified and confirmed by the Chinese authorities during the study period from Chinese National Platform to Refute Rumors (CNPRR) and Tencent Platform to Refute Rumors (TPRR). 35, 36 The former is the Chinese official rumor-refuting platform which made rumors verified and confirmed by the authority public, the latter was established by China's largest Internet company Tencent whose mission is to seek out misinformation by monitoring the Top topics ranking on social platforms and mark them so as not to be reposted. The data used in this research are all officially published on the internet and do not contain any privacy information. To represent the public attention, we have graphed the curves of epidemic trends based on collected data. We compared the daily BAI and daily GT over time with the disease-related data to explore the changes of public attention. It can be seen from the previous studies, both Spearman's and Kendall's rank correlation coefficient are appropriate to analyze the correlation between two continuous or discrete ordinal variables. 24, 37, 38 The data used in this study is continuous but does not conform to the normal distribution, for this reason, the Kendall's Tau-B rank correlation coefficient was used to check the statistical correlation. The statistical significance was set at P<0.05. Rumors during COVID-19 outbreaks were classified by date and main content. These correlation analyses were performed by using the commercial software "Statistical Package for the Social Sciences" (SPSS package for Windows, v25.0.0, IBM Corporation, Armonk, NY, US). All data was checked for completeness and accuracy before analysis. On January 11, 2020, Wuhan Municipal Health Commission (WMHC) initially reported 41 cases of human infections with an unknown new coronavirus. 39 Afterwards, more cases were consecutively reported in other provinces. Due to the approaching of Chinese New Year, there was a large population movement that promoted the spread of coronavirus. The number of cumulative confirmed cases was 440 on 21 January, 2020 and rose alarmingly to 830 2 days later. Subsequently, the Chinese government announced the quarantine of Wuhan and took a series of measures to reduce population movements. However, the number of infected patients has still increased dramatically. By February 6, 31,211 cases had been reported in 31 provinces of China. By mid-February, the growth of confirmed cases was slowing down gradually and the cumulative case cure rate was continuously rising. By February 29, 79,968 cumulative confirmed cases and 41,675 cured cases (cumulative case cure rate=52.11%), and 2873 death cases (cumulative case fatality rate=3.59%) had been reported in China (Figure 1 ). Overseas Public Attention to the Outbreak of COVID-19 in China At the beginning, the daily GT for keyword "coronavirus" remained stable. From January 20, it increased rapidly in the next 8 days due to the outbreak of COVID-19 in China. When the WHO declared this outbreak to be a public health emergency of international concern (PHEIC) on January 30, 40 the daily GT reached a peak of 31 the following day. Subsequently, the daily GT gradually declined with small fluctuations until February 20. From February 21, coronavirus infections grew exponentially outside China, so that daily GT increased again and reached another peak of 55 on February 28. We carried out the correlation analysis to investigate the relationship between disease-related indicators and daily GT for the rise and fall of public attention. The analysis illustrated a statistically significant positive correlation between the following sets of data: daily GT and cumulative confirmed cases (Kendall's Tau-B rank correlation P=0.027) . In other words, daily GT increases as each of the indicators above increases. We also found that there was a negative relevant between daily GT and cumulative case cure rate (Kendall's Tau-B rank correlation coefficient=-0.285, P=0.005). However, we found there was no correlation between the daily GT and cumulative mortality rate (Kendall's Tau-B rank correlation coefficient=0.090, P=0.361). The results are graphedshown in Figures 2 and 3 . In summary, daily GT was associated with multiple indicators, suggesting that overseas netizens were eager to have as much information as possible of COVID-19 through the internet in the early stage of the outbreak in China. Public Attention of China Mainland to the Outbreak of COVID-19 Prior to January 19, 2020, the information and news related to the novel coronavirus were not widely disseminated to the public, so the BAI remained at a relatively low level. COVID-19 began to spread in areas other than Hubei province from January 19, and the daily BAI for the Chinese keyword increased sharply, peaking at 2,330,851 on January 25. Then, the daily BAI decreased with fluctuations. As provinces began one by one to launch a first-level response to the major public health emergency, the daily BAI remained at a high level of more than 1,500,000 from January 26 to January 29. Afterwards, the daily BAI steadily declined, falling below the median level of about 1, 000,000 on February 15. We conducted correlation analysis on daily BAI and diseaserelated indicators. Statistical correlation analysis showed that daily BAI was positively correlated with new confirmed cases (Kendall's Tau-b rank correlation coefficient=0.581, P=0.000) ( Figure 4A ). Besides, daily BAI has a negative correlation with cumulative cases cure rate (Kendall's Tau-b rank correlation coefficient=0.586, P=0.000) ( Figure 4B ). It was also positively correlated with new death cases (Kendall's Tau-b rank correlation coefficient=0.209, P=0.003) ( Figure 5A ). We found there is no correlation between daily BAI and other indicators, for example cumulative confirmed cases, new cured cases, cumulative cases cure rate, and cumulative case fatality rate. To summarize, daily BAI generally showed a similar trend to daily GT but not exactly in the same way ( Figure 5B ). Daily BAI was only correlated to three indicators (by comparison, seven indicators were correlated to daily GT). This showed that Chinese people who were in the epicenter knew more about the epidemic situation, compared with foreign netizens, they just cared about the new infected cases and their cure rate. In other words, they focused on whether the epidemic was being effectively controlled. The epidemic-related rumors began to circulate as the COVID-19 broke out and spread. From January 10 to February 29, we collected a total of 385 rumors from CNPRR and TPRR. We found most of them were spread through group chats and official accounts on WeChat, the most popular messaging and social media app in China. In addition, Sina Weibo blog (Chinese microblogging website) is another major site for the spread of rumors. Of these rumors, 40.3% (155/385) disseminated incorrect information on how to prevent and treat the new coronavirus; 29.1% (112/385) were about the spread of infections in different cities, and among them, 37 rumors related to Wuhan. Additionally, 5.7% (22/385) were about the shortage of daily necessities and medical supplies such as food and masks, and 2.9% (11/385) were spoofing in the name of Professor Nanshan Zhong, who is not only head of China's COVID-19 Experts Team but the key person in the successful fight against SARS in 2003. He was the first professor who suggested that COVID-19 could be transmitted from person to person, people did believe that his opinions were authoritative and instructive. Through the investigation of rumors, we discovered that the purpose of producing and spreading rumors is on the one hand to attract people's attention or more visits to rumor makers' homepages on social media. On the other hand, the uncertainty of COVID-19 breeds anxiety, which on its own has been linked to rumor spreading. 41 The result of statistical correlation analysis between rumors and disease-related indicators showed that the number of rumors has a significant positive correlation with daily BAI (Kendall's Tau-b rank correlation coefficient=0.660, P=0.000) ( Figure 6A) . Furthermore, the number of rumors was positively correlated with new confirmed cases (Kendall's Tau-b rank correlation coefficient=0.552, P=0.000) ( Figure 6B) . Besides, the number of rumors also had a positive correlation with new death cases (Kendall's Tau-b rank correlation coefficient=0.330, P=0.000) ( Figure 7A ). However, the number of rumors was negatively correlated with cumulative case cure rate (Kendall's Tau-b rank correlation coefficient=−0.369, P=0.001) ( Figure 7B ). Generally speaking, rumors of COVID-19 outbreaks affected daily BAI. The growth of new confirmed cases and new death cases seemed to increase the number of rumors. The severity of the epidemic situation also determined the number of rumors. Fears and insecurity due to a lack of information at the initial outbreak stage led people to pay much attention to the epidemic, however, as the cure rate increased and more information was made available to the public, individuals were optimistic about the development of the epidemic, so their panic eased and the number of rumors decreased. Internet-based analysis is a powerful tool in the new era. Baidu, Google, and other search engines are used by more and more scholars around the world to carry out academic research. 38, [42] [43] [44] [45] [46] [47] [48] [49] In this research, we used Internet search trends to probe public attention to COVID-19. This showed that both daily BAI and daily GT began to rise sharply on January 19, after the outbreak, and reached a peak in a short period of time, then the high attention lasted approximately 10 days. Afterwards, both indexes have decreased because of the initial positive effect of epidemic control. It is worth mentioning that the daily GT rose again on February 20, because the epidemic began to spread widely in other countries. Our study showed that in the early stages of COVID-19, public attention increased as new cases were confirmed, however it diminished as the cure rate rose. This suggests a rapidly spreading epidemic would promptly draw public attention which would decrease when its threat decreased. Therefore, it is necessary for government to take appropriate measures and provide more information to address public's concerns as early as possible by monitoring the public reactions through the Internet and social media. Daily GT was directly proportional to the cumulative confirmed cases, cumulative cured cases, and cumulative death cases, but daily BAI was not. The reason for this phenomenon is that the early outbreak of COVID-19 was in China, so Chinese netizens can naturally learn about the early news of the epidemic more quickly than those from other countries, Figure 5 (A) The daily BAI for keyword "Coronavirus" compared with new death cases from January 10 to February 29, 2020. (B) The daily BAI for keyword "Coronavirus" compared with daily GT for keyword "Coronavirus" from January 10 to February 29, 2020. they did not have to deliberately search the Internet for information, because they can easily get the latest data from TV, newspapers, community, and family. The outbreak of COVID-19 was accompanied by a variety of rumors. According to our research, the number of rumors is positively correlated with new confirmed cases and new death cases, which also has the same trend with the growth of daily BAI. Furthermore, the emergence of breaking news can often make the index of rumors reach a peak. This indicates that uncertainty and severity of the new epidemic are major factors in encouraging rumors to which public attention is easily attracted. Moreover, Internet and social media have become a vehicle of transmitting rumors. The vast majority of rumors that we collected came from social media platforms such as WeChat and Weibo. It suggests that social media was a two-edged sword. Although it can spread rumors, it was also a convenient tool for those refuting false messages if authoritative information could be released in time. In addition, if government invited experts who have received public attention to clarify misinformation during the early period of the outbreak, it would effectively help to dispel unnecessary panic and prevent the spread of rumors. An important policy component in the prevention and control of such infectious diseases as COVID-19 is monitoring. Traditional epidemic monitoring mainly relies on data provided by hospitals, medical institutions, and Centers for Disease Control and prevention (CDC) at different levels. [50] [51] [52] Yet, this monitoring system has some deficiencies. Firstly, the relevant data is collected and processed by each unit level by level, which may lead to a loss of important information during the process and a delay in acquiring analysis results. Secondly, a traditional monitoring method consumes a lot of manpower and resources, however the data is rarely published and made available to the public. 19, 24 Internet monitoring has largely improved the data acquisition process and can overcome these deficiencies from three perspectives. First, billions of netizens learn about the latest epidemic situation through the Internet. Data collected in this way can represent the attention of most people. Besides, the search trends were generated based on the daily search behaviors of the public, which can automatically record the public attention. Compared with traditional methods such as epidemiological surveys and telephone interviews, [49] [50] [51] [52] [53] this data acquisition method can significantly reduce the time and efforts. Moreover, the data is always shared, and everyone can download it for free, which can maximize savings. Our study has some limitations. For instance, data used in this research was mined from Baidu and Google, two giants in the search service market, which does not mean that the data from niche search engines are not important. Similarly, for people who live in regions with low Internet penetration rate, whose attention to the outbreak of COVID-19 was ignored. Our research shows that internet users in China and other countries have attached great attention to the COVID-19 outbreak, but their focuses were slightly different. During the first 10-day period of the outbreak, public attention increased rapidly and remained at a high level. This was a key period for the governments to release relevant and authoritative information to minimize the spread of groundless rumors on social media. Publishing accurate information through major social media platforms can quickly clarify rumors and prevent their further spread. In sum, Internet monitoring is a quick and convenient way to reflect public attention, which will greatly enhance the ability of governments and the public to address public health emergencies. Our study suggests that government should attach more importance to the monitoring of Internet search trends platforms such as Baidu Attention Index and Google Trends during the epidemic outbreaks. In particular, in the early stages of a new coronavirus outbreak, governments need to make proper use of the data of public attention, and take responsible measures in a timely manner for scientific popularization of fighting against the epidemic, in light of relieving people's concerns, and clarifying confusions and misunderstandings among the general public. Risk Management and Healthcare Policy is an international, peerreviewed, open access journal focusing on all aspects of public health, policy, and preventative measures to promote good health and improve morbidity and mortality in the population. The journal welcomes submitted papers covering original research, basic science, clinical & epidemiological studies, reviews and evaluations, guidelines, expert opinion and commentary, case reports and extended reports. The manuscript management system is completely online and includes a very quick and fair peer-review system, which is all easy to use. Visit http://www.dovepress.com/testimonials.php to read real quotes from published authors. A novel coronavirus from patients with pneumonia in China WHO. Novel coronavirus (2019-nCoV) situation Report-43 Clinical Features of 69 Cases with Coronavirus Disease Daily situation Report of COVID-19 Summary of probable SARS cases with onset of illness from 1 Statistical Report on Internet Development in China. China Internet Network Information Center Internet Trends France's yellow vests: A self-mobilised mass movement with insurrectionist overtones Forecasting tourism demand with composite search index Predicting unemployment in short samples with internet job search query data Monitoring a toxicological outbreak using Internet search query data Do seasons have an influence on the incidence of depression? The use of an internet search engine query data as a proxy of human affect Seasonal trends in restless legs symptomatology: evidence from Internet search query data Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in The Netherlands in 2013 Ebola virus disease and social media: a systematic review Infodemiology and infoveillance: tracking online health information and cyberbehavior for public health infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet The review on public health media surveillance and risk research The rumouring of SARS during the 2003 epidemic in China Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: social media surveillance in China. Risk Perception, Emot Behav Responses to COVID-19 Outbreak Soc Media Surveill China The COVID-19 risk perception: A survey on socioeconomics and media attention Using Twitter and web news mining to predict COVID-19 outbreak Importance of internet surveillance in public health emergency control and prevention: evidence from a digital epidemiologic study during avian influenza a h7n9 outbreaks The dark side of information: overload, anxiety and other paradoxes and pathologies Rumor surveillance and avian influenza H5N1 Rumor-Related and Exclusive Behavior Coverage in Internet News Reports Following the 2009 H1N1 Influenza Outbreak in Japan Rumors of disease in the global village: outbreak verification Chinese search engine market research report. China Internet Network Information Center Search Engine Market Share Worldwide Report The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health -the latest 2019 novel coronavirus outbreak in Wuhan, China A novel coronavirus outbreak of global health concern Chinese National Platform to Refute Rumors TPRR Tencent Platform to Refute Rumors Chinese public attention to the outbreak of ebola in west africa: evidence from the online big data platform Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes detail/30-01-2020-statement-on-the-second-meeting-ofthe-international-health-regulations-(2005)-emergency-committeeregarding-the-outbreak-of-novel-coronavirus-(2019-ncov) Combating rumor spread on social media: the effectiveness of refutation and warning Dengue Baidu Search Index data can improve the prediction of local dengue epidemic: A case study in Guangzhou, China Early detection of an epidemic erythromelalgia outbreak using Baidu search data Evaluation of Internet-Based Dengue Query Data: google Dengue Trends Predicting the Present with Google Trends Predicting tick-borne encephalitis using Google Trends Using Google HW. Trends and ambient temperature to predict seasonal influenza outbreaks Using the Baidu Search Index to Predict the Incidence of HIV Impacts of media coverage on the community stress level in Hong Kong after the tsunami on 26 Perceived risk, anxiety, and behavioural responses of the general public during the early phase of the Influenza A (H1N1) pandemic in the Netherlands: results of three consecutive online surveys Public perceptions, anxiety, and behaviour change in relation to the swine flu outbreak: cross sectional telephone survey Widespread public misconception in the early phase of the H1N1 influenza epidemic The impact of communications about swine flu (influenza A HINIv) on public responses to the outbreak: results from 36 national telephone surveys in the UK All authors made substantial contributions to conception and design, acquisition, analysis, and interpretation of data; took part in drafting the article and revising it critically for important intellectual content; gave final approval of the version to be published; and agree to be accountable for all aspects of the work. This research was funded by CHINA SCHOLARSHIP COUNCIL, grant number 201708070092. The authors have no conflicts of interest with any individuals or organizations.