key: cord-262310-z0m6uuzf authors: Effenberger, Maria; Kronbichler, Andreas; Shin, Jae Il; Mayer, Gert; Tilg, Herbert; Perco, Paul title: Association of the COVID-19 pandemic with Internet Search Volumes: A Google TrendsTM Analysis date: 2020-04-17 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2020.04.033 sha: doc_id: 262310 cord_uid: z0m6uuzf Abstract Objectives To assess the association of public interest in coronavirus infections with the actual number of infected cases for selected countries across the globe. Methods We performed a Google TrendsTM search for “Coronavirus” and compared Relative Search Volumes (RSV) indices to the number of reported COVID-19 cases by the European Center for Disease Control (ECDC) using time-lag correlation analysis. Results Worldwide public interest in Coronavirus reached its first peak end of January when numbers of newly infected patients started to increase exponentially in China. The worldwide Google TrendsTM index reached its peak on the 12th of March 2020 at a time when numbers of infected patients started to increase in Europe and COVID-19 was declared a pandemic. At this time the general interest in China but also the Republic of Korea has already been significantly decreased as compared to end of January. Correlations between RSV indices and number of new COVID-19 cases were observed across all investigated countries with highest correlations observed with a time lag of -11.5 days, i.e. highest interest in coronavirus observed 11.5 days before the peak of newly infected cases. This pattern was very consistent across European countries but also holds true for the US. In Brazil and Australia, highest correlations were observed with a time lag of -7 days. In Egypt the highest correlation is given with a time lag of 0, potentially indicating that in this country, numbers of newly infected patients will increase exponentially within the course of April. Conclusions Public interest indicated by RSV indices can help to monitor the progression of an outbreak such as the current COVID-19 pandemic. Public interest is on average highest 11.5 days before the peak of newly infected cases. A novel coronavirus, the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causes a new disease named Corona Virus Disease 2019 . It was first detected in December 2019 in Wuhan (Hubei, China) (Wang et al., 2020) . Due to a high virulence and a high proportion of asymptomatic cases, the outbreak spreads all over the world. On April 5 th 2020 the World Health Organization (WHO) reported 1 133 758 confirmed cases. Today, a cumulative mortality rate of 5.5 % (62 784) has been reported. The internet is increasingly used as a source of health care information. Infodemiology and infoveillance are essential public health informatics methods which are used to analyze search behavior on the internet. Infodemiology is defined as "science of distribution and determinants of information in an electronic medium, specifically the internet, or in a population, with the ultimate aim to inform public health and public policy", while the primary aim of infoveillance is surveillance (Eysenbach, 2009) . Infodemiology and infoveillance of epidemiological data are important to increase situational awareness and make suitable interventions (Rivers et al., 2019) . The analysis of relative internet search volumes (RSV) gives information on the extent of public attention (Arora et al., 2019 , Kaleem et al., 2019 , Ling and Lee, 2016 with Google Trends TM being one of the most widely used tools for this purpose. RSV are used for real-time analyses for transmissibility, severity, and natural history of an emerging pathogen, as observed with severe acute respiratory syndrome (SARS), the 2009 influenza pandemic, and Ebola (Chowell et al., 2009 , Cleaton et al., 2016 . The analyses of confirmed cases are particularly useful to infer key Page 5 of 18 J o u r n a l P r e -p r o o f 5 epidemiological parameters, such as the incubation and infectious periods and ongoing outbreaks or an outbreak probability. In addition, Google Trends TM data might be used to forecast an increase in infected cases. A linear time series pattern with official dengue reports, indicating a potential use to monitor public interest before an increase of cases and during the outbreak (Husnayain et al., 2019) . Beside infectious diseases, Google Trends TM have been successfully used to forecast the suicide risk increase (Barros et al., 2019) . In this study, we investigated the public interest in COVID-19 since December 31 st 2019 comparing Google Trends™ data to data of newly infected COVID-19 cases. Retrieving outbreak and confirmed cases numbers from the WHO Data on confirmed COVID-19 cases were retrieved on the 5 th of March from the European Center for Disease Control (ECDC) for the time from the 31 st of December 2019 until the 1 st of April 2020 (https://www.ecdc.europa.eu/en/publicationsdata/download-todays-data-geographic-distribution-covid-19-cases-worldwide). Worldwide data were retrieved as well as data for the following countries, namely China, Republic of Korea, Japan, Iran, Italy, Austria, Germany, the United Kingdom (UK), the United States (US), Egypt, Australia, and Brazil. Retrieving Google Trends TM data on COVID-19 The Google Trends TM tool was used to retrieve data on internet user search activities in the context of COVID-19. Google Trends TM enables researchers to study trends and patterns of Google TM search queries (Arora et al., 2019) . It was implemented in J o u r n a l P r e -p r o o f 6 Trends TM expresses the absolute number of searches relative to the total number of searches over the defined period of interest (Arora et al., 2019) . The retrieved Google Trends TM index ranges from 0 to 100, with 100 being the highest relative search term activity for the specified search query in the time period of interest. Further information on Google Trends TM can be found on the respective help page Worldwide interest in coronavirus started on January 20 th and reached its first peak on January 31 st , a few days after the word was spread on the outbreak in Wuhan, China. The increasing numbers of cases over the globe prompted the WHO to declare the coronavirus outbreak as a pandemic on March 11 th , leading to an increase in public interest currently peaking on March 12 th 2020 ( Figure 1 ). The data on newly confirmed cases, overall confirmed cases, and overall death worldwide as 7 well as for the afore-mentioned countries under study are summarized in Table 1. There are two peaks, one sharp increase in numbers when cases were counted based on clinical diagnosis and not from a confirmatory laboratory test in China and the other peak on March 16 th due to cases around the globe. The worldwide initial peak associates with a strong increase of confirmed cases in China. In China, a maximum of Google Trends TM RSV was observed at the end of January with a 5.47-fold increase of cases between January 24 th and January 28 th . Afterwards, with rigorous measures the relative increase in new cases was slower, and a decrease of new cases was firstly reported to the WHO on February the 6 th , with the exception of a sharp increase as mentioned above. The RSV trend followed a similar path, with a steady number of search enquiries around 25% of the maximal interest during the last weeks. (Figure 2 ). Correlation analysis indicates highest public interest in COVID-19 on average around 11.5 days before the maximum of newly infected cases was reported ( Figure 3 ). In countries with proximity to China such as the Republic of Korea or Japan a high volume of search queries was observed during or closely after the peak was reached in China. A non-comparable smaller peak was observed in countries in the European Union or the US (Figure 2 ). In the Republic of Korea, a first Google Trends™ index peak was observed end of January only slightly shifted as compared to the peak in China with a second peak being observed on February 23 rd (Figure 2 ). This second peak in Korea proceeded the peak in newly infected cases by 7 days (Figure 3 ). Japan´s RSV started to increase on February 24 th , with a peak on February 27 th , also followed by an increase in confirmed COVID-19 cases. In Iran, the most affected country in the Middle East, a strong increase of RSV could be observed on February 18 th with a peak between 20 th and 22 nd of February. The Iranian increase of RSV was five days Page 8 of 18 J o u r n a l P r e -p r o o f 8 before the first confirmed cases in Iran, with also a strong association and prediction of the outbreak, which followed five to seven days later. Egypt, the first country on the African continent with a confirmed COVID-19 case, showed a small RSV peak during the outbreak in China. Furthermore, the RSV started to steadily increase since February 20 th with an observed leap in interest on April 1 st . Australia showed a similar pattern with an increase in RSV during the first outbreak in China, followed by a decrease afterwards and again an increase since February 23 rd , followed by increasing new COVID-19 cases 10 days later (Figure 2 ). In European countries, especially in Italy, a small peak in the Google Trends TM analysis was found during the outbreak in China and a climax was found on February 23 rd 2020, a few days before the numbers of newly COVID-19 started to increase exponentially. Similar trends were observed in Austria, Germany and the UK with a delay of several days and a second peak, which was accompanied by an increase in numbers in the following days. The highest RSV peak was reached mid of March , which is in line with rigorous policies by the government regarding the rapid spread. The UK and Australia show very similar patterns with highest correlations between RSV indices and newly diagnosed cases found with time lags of -12 and -7 days respectively ( Figure 3 ). In the US, a steady increase of Google TM search queries since February 27 th was observed followed by an outbreak since March 2 nd . The peak of search queries was March 3 rd a new increase in RSV is found in Brazil, followed by increasing numbers of newly confirmed cases of COVID-19 ( Figure 2 ). In our study, we found a significant increase in RSV using Google Trends™ for COVID-19 worldwide with a peak of RSVs around 11.5 days prior to the peak in newly diagnosed cases in different countries all over the world. As such, Google Trends™ can be used to associate and predict outbreaks worldwide and provides a valuable picture of the outbreak of COVID-19 in real time. Close monitoring and continued evolution of enhanced communication strategies is needed that provide general populations and vulnerable populations most at risk with actionable information for self-protection, including identification of symptoms (Heymann et al., 2020) . The application of internet data in health care research, also known as infodemiology, is a promising new field and it may complement and extend the current data sources and foundations (Mavragani and Ochoa, 2019) . The attention to COVID-19 increased days to weeks before the actual peak outbreak, not only worldwide, but also in most of the investigated countries in this study. This strongly supports our finding that the RSV is a useful tool to monitor local and global outbreaks of infectious diseases. The internet is the biggest platform for search engines and social media for real time data and outbreaks. RSV has been used before to detect outbreaks, like the recent severe influenza outbreak in 2009 (Cook et al., 2011) . Close monitoring and continued evolution of enhanced communication strategies is needed that provide general populations and vulnerable populations most at risk with actionable Page 10 of 18 J o u r n a l P r e -p r o o f 10 information for self-protection, including identification of symptoms (Heymann et al., 2020) . Most countries and the WHO provide awareness -raising and educational programs on COVID-19 via internet. The strong association between RSV and increasing outbreak numbers may be due to implementation of such programs in the different countries. The impact of web based research continuously grows since the past decade (Jun et al., 2018) . Google Trends™ is the only unbiased approach including millions of users and has widely been used in health issues. Public attention in different fields has been published recently (e.g. osteoarthritis, breast cancer or COPD) (Boehm et al., 2019 , Jellison et al., 2018 , Kaleem et al., 2019 . Furthermore, infodemiology and Google Trends™ is used to generate awareness profiles and is a suitable substitute for classical data collection, such as surveys (Jun et al., 2018) . Far mostly, Google Trends™ is used to monitor disease control and awareness in cancer, HIV or stroke, but also in rare diseases like antiphospholipid syndrome or systemic lupus erythematosus (Ling and Lee, 2016 , Mahroum et al., 2019 , Sciascia and Radin, 2017 , Sciascia et al., 2018 . Definitely, Google Trends™ can be used to detect success rates of awareness programs and predict infectious outbreaks worldwide (McLean et al., 2019 , Patel et al., 2020 . There are also some potential limitations of this study. There is no information about the individual searches for the analyzed topics. The selections of spelling/terms might affect the results and conclusions. The importance of accuracy in defining the search queries is exemplified by searching Google Trends™ for the topic "pneumonia". Pneumonia is associated with COVID-19, although not specifically representing COVID-19. Thus, using the query "pneumonia" may be useful to analyze symptom-related curiosity, but does not sufficiently represent COVID-19 outbreaks. The number of studies based on Google Trends™ is increasing, but so far there is no standardized procedure for data collection. More guidance by Google™ should be warranted in order to assist researchers to establish an optimal search strategy (Nuti et al., 2014) . Despite the fact the Google search is accessible worldwide, the use of different search tools in certain countries like for example Baidu in China might lead to more accurate estimations of public interest. It was for example shown that a high Baido Search Index (BSI) predicted dengue fever outbreaks in Guangzhou and to a lesser degree in Zhongshan, indicating that BSI might complement traditional dengue fever surveillance in China (Liu et al., 2016) . In our study we decided to make use of data from one common framework. In conclusion, infodemiology and RSV provide a tool to anticipate COVID-19 outbreaks and of other infectious diseases. Information on public interst could be used to monitor the outbreak in northern European countries, Africa or The Americas. cases, with highest interest observed on average 11.5 days before the peak of newly reported COVID-19 cases Google Trends: Opportunities and limitations in health and health policy research The Validity of Google Trends Search Volumes for Behavioral Forecasting of National Suicide Rates in Ireland Using Google Trends to investigate global COPD awareness Severe Respiratory Disease Concurrent with the Circulation of H1N1 Influenza Characterizing Ebola Transmission Patterns Based on Internet News Reports Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet Technical Advisory Group for Infectious H. COVID-19: what is next for public health? Correlation between Google Trends on dengue fever and national surveillance report in Indonesia Using Google Trends to assess global public interest in osteoarthritis Ten years of research change using Google Trends: From the perspective of big data utilizations and applications Google Search Trends in Oncology and the Impact of Celebrity Cancer Awareness Disease Monitoring and Health Campaign Evaluation Using Google Search Activities for HIV and AIDS, Stroke, Colorectal Cancer, and Marijuana Use in Canada: A Retrospective Observational Study Using Baidu Search Index to Predict Dengue Outbreak in China Capturing public interest toward new tools for controlling human immunodeficiency virus (HIV) infection exploiting data from Google Trends Google Trends in Infodemiology and Infoveillance: Methodology Framework Internet search query analysis can be used to demonstrate the rapidly increasing public awareness of palliative care in the USA The use of google trends in health care research: a systematic review Success of Prostate and Testicular Cancer Awareness Campaigns Compared to Breast Cancer Awareness Month According to Internet Search Volumes: A Google Trends Analysis Using "outbreak science" to strengthen the use of models during epidemics What can Google and Wikipedia can tell us about a disease? Big Data trends analysis in Systemic Lupus Erythematosus Infodemiology of antiphospholipid syndrome: Merging informatics and epidemiology The authors declare no conflicts of interest. Not applicable