key: cord-0769043-voeptwrw authors: Liu, J.; Huang, T.; Xiong, H.; Huang, J.; Zhou, J.; Jiang, H.; Yang, G.; Wang, H.; Dou, D. title: Analysis of Collective Response Reveals that COVID-19-Related Activities Start From the End of 2019 in Mainland China date: 2020-10-16 journal: nan DOI: 10.1101/2020.10.14.20202531 sha: 7df35b29b3c01097c1becdcc1af67d00fc1923d3 doc_id: 769043 cord_uid: voeptwrw While the COVID-19 outbreak is making an impact at a global scale, the collective response to the pandemic becomes the key to analyzing past situations, evaluating current measures, and formulating future predictions. In this paper, we analyze the public reactions to the pandemic using search engine data and mobility data from Baidu Search and Baidu Maps respectively, where we particularly pay attentions to the early stage of pandemics and find early signals from the collective response to COVID-19. First, we correlate the number of confirmed cases per day to daily search queries of a large number of keywords through Dynamic Time Warping (DTW) and Detrended Cross-Correlation Analysis (DCCA), where the keywords top in the most critical days are believed the most relevant to the pandemic. We then categorize the ranking lists of keywords according to the specific regions of the search, such as Wuhan, Mainland China, the USA, and the whole world. Through the analysis on search, we succeed in identifying COVID-19 related collective response would not be earlier than the end of 2019 in Mainland China. Finally, we confirm this observation again using human mobility data, where we specifically compare the massive mobility traces, including the real-time population densities inside key hospitals and inter-city travels departing from/arriving in Wuhan, from 2018 to 2020. No significant changes have been witnessed before December, 2019. As reported by the World Health Organization (WHO), a global pandemic, COVID-19, has rapidly spread all over the world since December, 2019, e.g., within China (1), the United States (2), European countries (3), etc. The COVID-19 pandemic is considered as the greatest challenge for humankind since World War II (4) . The outbreak of COVID-19 has been closely monitored by governments, researchers, and digital tracing applications (5, 6) . As early as January 2, 2020 research (7) has confirmed Huanan seafood market in Wuhan was epidemiologically associated with most (27 of 44) of the patients from Mainland China, the starting point of the collective response to the COVID-19 pandemic however remains unclear (8) . To analyze the collective response to COVID-19, digital information turns out to be critical, efficient, and effective (6) . For example, mobile phone applications (Apps) can be easily used to conduct contact tracing and notification upon case confirmation (6) . With data collected from the mobile phone Apps, statistical analysis has been carried out for COVID-19-related studies (9) (10) (11) , e.g., Baidu Migration (12) . Further, search engines, e.g., Baidu (13) , has been used to understand social responses to the pandemic from the keywords of massive search queries while making users well-informed during the outbreak of COVID-19. In addition, mobility data from Baidu Maps also demonstrates the collective response from the perspective of mobility (14, 15) , where we can compare the massive mobility traces under the pandemic with the regular patterns 2 of determining the starting moment of COVID-19, (9) (10) (11) infer the possible origin dates of COVID-19 using dynamical systems and the numbers of cases confirmed over time. Compared to these studies, our work revisits the early stage of the pandemic from the perspective collective response, using search data and mobility data collected from massive mobile users in a participatory fashion. To uncover the latent factors of epidemiological dynamics, (7, (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) fit parameterized models with the numbers of cases confirmed over time, and interpret the models through the physical and epidemiological meaning of every parameter. Compared to these studies, our work does not make any assumptions on the epidemiological dynamics while providing a model-free analysis on the search and mobility data. Instead, we simply track the collective response through ranking the frequent search keywords and detecting the changes in population inflow/outflow between Chinese cities, in a data-driven fashion. Finally, this work incorporates direct evidences through multi-modal data fusion, while existing studies (33) rely on inferences and predictions based on indirect measurements. For example, (33) estimated and compared the volumes of visits to hospitals in Wuhan in 2018 and 2019 using remote sensing data from satellites, then claimed abnormalities detected. We directly sample the volumes of visits to hospitals in Wuhan from 2018 to 2020 through tracking the massive mobility traces collected from Baidu Maps , and confirm that there was no abnormality of hospital visit volumes before December, 2019 in Wuhan. Using search index data, we analyze correlations between the frequent search keywords, related to respiratory diseases, per day and the trends of confirmed cases using DTW and DCCA. As both search index data and the number of confirmed cases are time-series data, i.e., each data item associated with a time stamp, we use DTW and DCCA as analytical tools. DTW is introduced to adjust the time stamps and to match sequences that are similar but out of phase (35) . DCCA is based on detrended covariance to investigate power-law cross-correlations between different simultaneously recorded time-series (30) . Both DTW and DCCA can analyze the time-series data. For every keyword, we estimate the correlation using DTW and DCCA between the everyday search index of the keyword and the number of confirmed cases per day during the pandemic, where we believe the top keywords with highest correlations refer to the most relevant evidences and the correlations estimated by DTW and DCCA confirm each other. We rank the keywords in four regions, i.e., Wuhan, Mainland China, the US, and the world, separately, where the common top five keywords are: • "Covid," "epidemic situation," "mask," "pneumonia," and "sterilization" (based on DTW); • "Covid," "epidemic situation," "mask," "sterilization," and 'nucleic acid" (based on DCCA). In addition, we find that COVID-19 relative symptoms (33), such as "diarrhea" (16 th ) and 5 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint "cough" (20 th ), are included within the top keywords To analyze the early stage of the COVID-19 epidemic, we study the correlation of the search index of the high-ranked keywords between any two of the four regions (Wuhan, China, the US, and the world), using DCCA for each year between 2014 and 2020. We take "mask," as an example, as "mask" is high-ranked in all the four regions. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint the correlations between Wuhan and the world, Wuhan and the US, the world and China, and the US and China have significantly changed and have dropped to negative in 2020, while the overall trends of search index dramatically increased in the same time, compared with other years. It demonstrates the happening of massive events (i.e., which changes the common trends of search at global scale. Interestingly, prior to COVID-19, we found a short term in 2015 with negative correlations between Wuhan and the world, Wuhan and the US, the world and China, and the US and China. In that year, an outbreak of Middle East Respiratory Syndrome (MERS) was found in Korea (36) and influenced several Asian countries. Similar patterns repeat in the status quo of 2020. From search volumes shown in Figure 1b , we find that the search volumes in Wuhan or China before November, 2019 were still relatively lower than those in the US or the world. This result indicates that there was no special mask-related epidemic in Wuhan or China before November, 2019. In addition, we also find that the search volume of "mask" increased from December, 2019 and reached its peak in January, 2020 in Wuhan and China. After great efforts were put into inhibiting the spread of COVID-19, the pandemic was mostly under control in China after May, 2020. However, on June 11th, the first case related to Xinfadi Market in Beijing was found, which led to a second wave of pandemic in Beijing. This second wave of epidemic persisted for more than a month, and was contained around the end of July. With previous experience in tackling COVID-19, this second wave was met with a swift response, not only in news reportage but also in measure implementation. We investigate how the public reacted to this resurgence of COVID-19 throughout this entire period by searching frequencies of keywords, and find that Beijing's case is consistent with the previous national scenario. From the analysis, we can see that the general public in Beijing exhibited a high level of attention to the overall event as the search volume of a majority of keywords, e.g., "nucleic acid," "epidemic," "mask," "sore throat," and "fever," surges significantly following the outbreak on June 11th, where the search index of "nucleic acid" shown in Figure 2 is an example. However, a few words, e.g., "stuffy nose," "respiratory failure," and "respiratory distress," which are mostly related to critically ill patients, did not show a noticeable change in search, which is in accordance with the fact that although the outbreak aroused concerns at first, quick containment actions prevented the outbreak from resulting in a huge number of significantly ill patients. Until July, the search trends fell back into the previous levels as normalized in earlier months, since the epidemic was fully under control after mid-July. In this section, we try to find abnormal population mobility trends from June, 2018 to March, 2020, exploring the starting point of the collective response to COVID-19. We exploit the 8 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint mobility data from Baidu Maps, which consists of the inflow and outflow to/from Wuhan, and the population densities of multiple hospitals in Wuhan, from June, 2018 to March, 2020. representing inflow refers to the number of people who entered into the city on that specific date, and correspondingly, each outflow data point refers to the number of people who left the city on that date. We separate inflow and outflow into distinct diagrams in order to demonstrate a sharper focus on time scale comparison. In addition, all absolute values of the mobility data 9 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . Figure 4 : The population outflow from Wuhan (2018 -2020). The inflow of 2019 is between January 1st and December 31st. The inflow of 2018 is between June 1st and December 31st, which is aligned with that of 2019 using the Solar Calendar. The inflow of 2020 is between December 22nd, 2019 to March 31st, 2020, which is aligned with that of 2019 using the Lunar Calendar. 10 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint were standardized to shed light on a more structured inference on an equal baseline across different times. For a few dates with missing data, we utilize the data from a previous date as an estimation. As shown in Figure 3 , we find that the collective mobility related response to the pandemic has happened around January 24th (Spring Festival), 2020. For inflow data shown in Figure 3 , we align inflow traces from January 1st, 2018 to December 31st, 2018 along with January Yet, the overall trend is steady, without significant stumbling. The line representing inflow of 2020 contains the inflows from December 22nd, 2019 to March, 2020. Note that in Figure 3 , we align the yearly traces according to the Lunar Calendar for better comparison. We can see both peaks happened around Spring Festival due to the national holidays. It is after Spring Festival that we witness a significant decline in inflow approaching a standardized score of -2, as the pandemic began and China announced travel restrictions. From the population inflow traces, we conclude that there is no obvious abnormal trend before Spring Festival in Wuhan compared to previous year's data, which implies our finding. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint As shown in Figure 4 , with the analysis of outflow, we also find that the collective mobility related response to the pandemic has happened around Spring Festival. In Figure 4 , two lines of 2018 and 2019 exhibit a high level of superposition, with peaks around national holidays similar to scenarios in inflow diagrams. Zooming in for the latter part of 2019, the outflows from October to December maintain a paralleling trend, indicating no aberrant leaving or public panic. The increasing slopes of the outflow are similar in 2019 and 2020, as Wuhan is a city with abundant migrant workers. Yet, we can see some abnormal humps in the outflow at January 11th, 2020, where outflow is obviously above the previous year's level, probably suggesting an onset of COVID-19 and more people leaving the city to avoid it. It is after Spring Festival that we witness a significant decline in outflow, approaching a standardized score approximating -2, implying that the start of the collective response to the pandemic is likely to have happened around Spring Festival. Finally, the combination of the analysis of inflow and outflow further confirms that the collective mobility related response to the pandemic have happened three days before Spring Festival, i.e., January 21st, 2020. For speculating a point as a start of the collective response to the pandemic, we need to find abnormal trends with high outflow and low inflow. If people were aware of abnormal events, people within the city would want to leave and people outside of city would not want to enter. Yet, we do not note any significant declining trend in inflow nor rising trend in outflow during the second half of the year in 2019. We find an abnormal peak of outflow around January 11th, 2020, and a higher increase in outflow than the previous year, peaking around January 21st, 2020. Reflected within the data, we also see a highly efficient government measure implementation, as both inflow and outflow tumbled after travel restrictions (from/to Wuhan) announced on January 23rd, 2020, hitting an unprecedented level of almost -2, and this low level persisted afterwards. 12 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint As well, the analysis of the population in the hospitals of Wuhan suggests that the collective response did not start before the end of 2019. By concentrating on the figures of hospitals, 13 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint we align the data based on years. It was suspected that COVID-19 infections had begun at the second half of the year (33), so we mainly focus on the latter part of this timeline. The two lines of 2018 and 2019 were matched in similar trends. Generally, we see a nearly parallel shape from June to mid-September. It is after mid-September that we see a discernable climbing trend of population visiting hospitals, and the higher level persists as a plateau shape until the end of Search index is an important indicator for monitoring the outbreak of a pandemic. In this work, we analyzed the correlation of search index corresponding to COVID-19-related keywords. We first ranked the keywords based on the correlation of the search index and the number of confirmed cases of COVID-19. We found that "mask," "cough," and "diarrhea" are within the 28 highly ranked keywords but not within the top rankings among all. Then, we analyzed the correlation between any two regions among the four regions (Wuhan, Mainland China, the US, and the world). We found that the correlations and the search index did not change much (because of COVID-19) before December, 2019, in Wuhan or in Mainland China. We thus argue that the collective activities start from December, 2019 in Mainland China. 14 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint Furthermore, we analyzed the mobility data of Wuhan and the population in 26 hospitals of Wuhan to confirm our claim. By comparing the trends in the second half of 2019 with that of 2018, we did not find abnormal deviations from the previous year, indicating that the start of the collective response to the pandemic occurred after December, 2019. Additionally, the increasing outflow and decreasing inflow, which signaled the pandemic, was not observed in the data of Baidu migration in Wuhan in the second half of 2019, further supporting the claim. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint In this section, we present the details on data normalization, correlation analysis of other keywords, and some other details. To visualize the search index data of keywords in different regions in a single figure, we plotted figures using normalized data. In Figures 1, S1 , S2, S3, S4, S5, S6,S7, S8, S9, S10, S11, S14 and S15, we visualized the search index data of different keywords in four regions, i.e., Wuhan, China, the US, and the world. Given the search index of a keyword in a region from January, 2014 to June, 2020, we normalized the data using Z-score standardization. Given the overall volumes of the search volume in a region from January, 2014 to June, 2020, we first calculated the mean value and the standard derivation value of the search index as µ and σ respectively. Then we normalized the search index in each region as In this way, the search index of each keyword in the study is normalized to almost the same scale, which fits into the same figure. Similarly, data normalization was also applied when we calculated the inflow, outflow and the population in the hospitals of Wuhan from June, 2018 to March, 2020. In this section, we present the detailed ranking results using DTW and DCCA. Table S1 represents the ranking of the top 28 keywords in Wuhan using DTW and DCCA. From the table, we can see that the common highly (top 10) ranked keywords are "COVID," "Epidemic situation," "Nucleic acid," "Pneumonia," "Sterilization," and "Infection" based on DTW and DCCA. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint In addition, "Mask" is highly ranked (4 th ) using DTW. Table S2 represents the ranking of the top 28 keywords in China using DTW and DCCA. The common highly ranked keywords are "COVID," "Nucleic acid," "Runny nose," "Sterilization," and "Hypoxemia" based on DTW and DCCA. In addition, "Mask" is also highly ranked using DTW (8 th ) and DCCA (12 th ). Table S3 demonstrates the ranking of the top 28 keywords in the US using DTW and DCCA. The common highly ranked keywords are "COVID," "Mask," "Sterilization," "Body temperature," and "Epidemic" based on DTW and DCCA. "Mask" is also highly ranked using both DTW (2 nd ) and DCCA (1 st ). Table S4 demonstrates the ranking of the top 28 keywords all over the world, using DTW and DCCA. The common highly ranked keywords are "COVID," "Mask," "Sterilization," "Body temperature," "Epidemic situation," "Respiratory distress," and "Respiratory failure" based on DTW and DCCA. "Mask" is highly ranked using both DTW (2 nd ) and DCCA (2 nd ). In this section, we present the correlation analysis on "Mask," "Cough," and "Diarrhea". "Cough" and "Diarrhea" are considered as key symptoms for COVID-19 in (33) . The search volumes of "mask" in four regions are shown in Figures S1 and S2 . Similar to "mask," as shown in Figure S3 , we can see that the correlations were positive and did not change much between 2014 and 2019. In 2015, the correlations became less significant, which may also have been caused by the outbreak of Middle East Respiratory Syndrome (MERS), as "cough" is one of the most important symptoms (36) . However, the correlations have changed to be negative in 2020. In addition, the search index also significantly increased in 20 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint compared with the US and the world. This is due to the fact that early confirmed cases were found in December, 2019 (3, 37). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure S4 : Comparison of the search volume of "cough" between two regions in 2019 with normalized data. The correlation analysis of "Diarrhea" is shown in Figure S6 , To analyze the decrease of the correlation from 2019, we zoom in on 2019 and 2020. Figure S7 shows that the search volumes actually decreased in Wuhan or China from June, 2019 to November, 2019. This result indicates that there was no evidence to support the outbreak of diarrhea-related epidemic in Wuhan or China before December, 2019. In addition, Figure S8 shows that the search volume of "diarrhea" increased from December, 2019 and reached its peak 23 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Figure S7 : Comparison of the search volume of "diarrhea" between two regions in 2019 with normalized data. in January, 2020 in Wuhan and in February, 2020 in China. Similar to the "cough," the search volumes of "Diarrhea" spiked in Wuhan and China in December, 2019 compared to the US and the world. This is also due to the fact that early confirmed cases were found in December, 2019 (3, 37) . In Figures S9, S10 , and S11, we visualize correlations using (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint 27 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint We examined the correlations of search trends between every pairs of the four geographical regions, i.e., China, Wuhan, the US, and the world, using Spearman's rank correlation (38) . With the correlation coefficients, we apply the hypothesis test of the "significance of the correlation coefficient" to decide whether the correlation between two search trends is significant or not. We use the significant level of 0.05, which is α = 0.05. Thus, we consider the correlation with p-value less than 0.05 as "significantly correlated"; otherwise, the correlation is considered "insignificant." By summing up the number of queries with significant correlated trends, an indicator for overall correlation of epidemic-relation search queries between two regions can be drawn. To clearly identify the changing trends from 2014 to 2020, we separate the data into 7 subsets according to the corresponding year. Figure S12 shows the yearly trends for the number of significant queries between two of the four examined regions. Except for 2020, during which the number of queries with significant correlated trends between World and USA decreases to (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint Wuhan are significantly correlated. This concurs with the fact that national attention has been highly focused on COVID-19 during the first five months of 2020. The analysis results indicate that the collective response to the COVID-19 pandemic exists in 2020. For over 30 search queries related to the COVID-19 pandemic, we analyze the daily search index trend in Beijing from March 1st, 2020 to July 20th, 2020 for each keyword, as Beijing was in the center of the epidemic. From the analysis, we are able to categorize them into two 29 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. main categories, one group that has an abnormal rise in accordance with the pandemic shown in Figure S14 , and another that has a relatively mild reaction shown in Figure S15 . Some representative queries that belong to the first category include "nucleic acid," "epidemic," "mask," "sore throat," "fever," "fatigue," "pneumonia," "diarrhea," "coronavirus," "Xinguan (stands for COVID)," "patient," and "body temperature." These words are mostly in high rankings in our previous correlation analysis, indicating consistency and effectiveness of analytical tools above. Let us take the word "epidemic" as an example. The searching trend of this word underwent stable declination throughout the period until the start of June, with the highest point between (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint engaged in testing and containment measures in relevant locations in Beijing. The entire set of abnormal trends fits perfectly with the developing situation of the second wave of the pandemic, indicating the close-knit relationship between search trends and epidemic circumstances and the effectiveness of using search volume data as tools for exploring public reactions. Other search queries, such as "fever" and "nucleic acid," show very similar patterns, with either parallel trend or decreasing trend over the past few months, and an obvious, exceptional, sudden rise starting at around June 10th or June 11th. One intriguing factor is that the keyword "diarrhea" is also in category one, with a significant high peak from June 10th to June 19th. However, "diarrhea" ranked relatively low in previous DCCA and DTW rankings, meaning that it's not considered very pertinent to the virus by the general public. Yet, its search trend's acute rise in the second wave of the epidemic differs from previous search trend changes in the first national wave. One reason may be that the epidemic center of Beijing is in Xinfadi, one of the most important food markets in that region, so people in Beijing may have become more concerned with food safety and food-related searches. Similar patterns of abnormal increase also applied to other common symptom words such as "fatigue" and "fever," with their tremendous pump-ups in search volumes matching exactly with the period of the second wave of the pandemic in a range of approximately 15 days after the first announcement of the news, on June 11th, 2020. Furthermore, there are certain words that do not show abnormal patterns as discussed above, with no significantly abrupt increase in their overall trend from March to July 2020. For instance, the keyword "stuffy nose" has a searching trend with a peak in around the second half of March, following by a parallel trend throughout the period with normal daily fluctuations in a stable range. Although the peak at the week of June 11th, signaling the start of the second wave of the pandemic, was mildly wider than other fluctuations, it only showed a slightly higher search volume and a quick fall back to normal levels. It is hard to perceive significant traces of the second wave of the epidemic from the diagram directly. In similar fashion, search queries 33 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint such as "respiratory failure" and "respiratory distress" display no significant deviation from the general trend since March, with no especially peculiar shape of fluctuations. Although these words are considered relevant to COVID-19, these symptoms are more related to severely ill patients. Consequently, it is reasonable to see no significant change in these keywords, since Beijing has had only a few critically ill patients, compared to the first national wave of the pandemic, resulting in many fewer people searching for these symptoms of serious illness. Other keywords, including "respiratory symptom," "dehydration," and "affection" also show no obvious change in search volume trend during this second epidemic in Beijing. One crucial search query, "cough," also displays less significant change in trend compared to other keywords. Upon closer examination, there is a hump from early June to the end of June, with the highest peak on June 11th. Yet most peaks around the period from June 11th to July 20th are still at equivalent levels, compared to the high points in April and May, and thus the second wave of Beijing epidemic outbreak did not cause a tremendous increase in search volume for "cough." Among keywords, "cough" is also abnormal. It varies from other search trend changes. In the first national wave of the epidemic, its ranking is very high by DCCA and DTW rankings; but in the second wave, the public did not show a strong reaction by searching significantly more frequently for "cough." Comprehensive reasons are still under investigation, but it may be due to the fact that the confirmed case number is not very high, due to the efficient measures to contain the spread. Another possibility to consider is that public education had already covered early on that coughs could be a symptom, while the knowledge that other symptoms, such as diarrhea, could derive from the new novel coronavirus emerged later. In fact, our techniques lend credence to the notion that looking proactively at what diagnostic terms are being searched for could potentially help identify symptoms that are not already known to be part of the pandemic. Notwithstanding this word, most other words show strong consistency with our previous analysis, as search queries with ranking higher than the 15 th place are mostly in category one. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint In this section, we present the figures of population data in shelter (fangcang) hospitals (39) and Huanan seafood market. We calculate the total population of shelter hospitals and the population of Huanan seafood market. As shown in Figure S16 , the figure of shelter hospitals exhibits highly similar patterns with the figure of hospitals described in the Section, Analysis of Population Densities in Hospitals of Wuhan. After areas were used to build shelter hospitals, steps to close off and manage ingress and egress were implemented, meaning that strict restrictions on people entering and leaving were enacted; thus, the population data fell off enormously. There is no significantly aberrant deviation of trend in population data of late 2019 in reference to data of late 2018. This conclusion also applies to patterns of population densities in Huanan seafood market as shown in Figure S17 , as there is no abnormal deviation from paths in the previous year. Consequently, the latter part of 2019 did not witness any aberrant signal that could link to the collective response to the COVID-19 pandemic. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint Figure S16 : Population data of shelter hospitals in Wuhan. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . Asthenia NCOV 40 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint Figure S17 : Population data of seafood markets in Wuhan. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 16, 2020. . https://doi.org/10.1101/2020.10.14.20202531 doi: medRxiv preprint Greatest test since world war two, says un chief ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ACM SIGKDD International Conference on Knowledge Discovery and Data Mining COVID-19 statistics Encyclopedia of Biostatistics