key: cord-1025563-96wxbmec authors: Li, Kun; Zhang, Yangyang; Wang, Chao title: Estimate the Trend of COVID-19 Outbreak in China: a Statistical and Inferential Analysis on Provincial-level Data date: 2021-12-31 journal: Procedia Computer Science DOI: 10.1016/j.procs.2021.04.092 sha: 9aa7bfdba6b14d04568b47a6933d5d16aea9419e doc_id: 1025563 cord_uid: 96wxbmec The ongoing COVID-19 epidemic spreads with strong transmission power in every part of China. Analyses of the trend is highly need when the Chinese government makes plans and policies on epidemic control. This paper provides an estimation process on the trend of COVID-19 outbreak using the provincial-level data of the confirmed cases. On the basis of the previous studies, we introduce an effective and practical method to compute accurate basic reproduction numbers (R 0 s) in each province-level division of China. The statistical results show a non-stop downward trend of the R 0 s in China, and confirm that China has made significant progress on the epidemic control by lowering the provincial R 0 s from 10 or above to 3.21 or less. In the inferential analysis, we introduce an effective AR(n) model for the trend forecasting. The inferential results imply that the nationwide epidemic risk will fall to a safe level by the end of April in China, which matches the actual situation. The results provide more accurate method and information about COVID-19. The outbreak of COVID-19 from Hubei, China at the end of December 2019 has instantly spread to become a nationwide and even global pandemic. A number of recent studies had dedicated into the analyses of the trend of COVID-19. For example, Jung et al., 2020 take the basic reproduction number as a key measure to estimate the risk of death from the COVID-19 infection [1] . Roosa et al. 2020 generate short-term forecasts of cumulative reported cases in Guangdong and Zhejiang provinces of China, and detect the different trends across locations [2] . Inspired by these recent publications, we figure out that the diversity of COVID-19 trends in different areas is critical to concern, although the majority of cases (56,249) were reported in Hubei as of February 16th, 2020. In this paper, we explore the diversified trends of COVID-19 in province-level divisions of China. We first introduce an effective and accurate method to estimate the basic reproduction numbers. Second, we use the daily data of the cumulative confirmed COVID-19 cases and a news gather technique to acquire the accurate inputs of the key component factors for the computation of basic reproduction numbers. As a contribution, we provide a time-series array of detailed daily basic reproduction numbers in each province-level division of China. Third, using the computed provincial-level data, we conduct a statistical and inferential analysis on the data. Our statistical results indicate that the outbreak of COVID-19 is still in the process to be contained and does not move into a steady state with all individuals healthy in any region of China. But the regional basic reproduction numbers had been close to each other. We construct an Auto-regression (AR) model and quantify how the historical basic reproduction numbers account for the future values. Our model state that the nationwide trend of COVID-19 would fall to a safe level in late April 2020. According to Anderson and May (1992) , Wang and Zhao (2012) , Pastor-Satorras et al (2015), the basic reproduction number 0 is one of the most popular and important indicators in epidemic studies [3, 4, 5] . It quantifies the infection risk of the epidemic by estimating the expected number of secondary cases caused by the initial infective case. The basic reproduction number has been widely used for different epidemics. In this study, we use the method by Zhou et al (2020) to calculate the basic reproduction numbers of the COVID-19 [6] . The basic reproduction number 0 can be computed by To compute 0 , we need information about the three component factors: , and . is the growth rate of the confirmed cases at the day t since the onset of the infection, written as Where Y(t) is the cumulative number of the COVID-19 confirmed cases by day t. refers to the generation time, which is defined as the time interval between symptom onset in an index case and symptom onset in a secondary case. is the ratio of the incubation period and . In this section, we compute inputs of the three component factors, , and , to calculate 0 in each date and each province-level division of China. We collect the numbers of the confirmed cases in each of the 34 province-level divisions of China. We start from January 21 st , 2020, the first day when the NHC China provided the real-time platform immediately after the State Council of China had launched the nationwide alert to COVID-19 epidemic control one day before, till February 16 th , 2020. The main data source is the National Health Commission of China (NHC China). It provides a real-time platform to update the numbers of COVID-19 confirmed cases sorted by the province-level divisions (including Hong Kong, Macau and Taiwan). We also acquire the data from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. First, we compute . According to (2), we collect Y(t), the cumulative number of the COVID-19 confirmed cases, in each date and each province-level division. We then track the onsets of the infection to determine t. Previous studies that use the only date as the onset for every province-level division. As a progress, we use the precise onset dates of COVID-19 in each province-level division. In each province-level division, we find the date when the first case was confirmed in official announcements of the NHC China and its provincial health commission branch, and define the day before it as the onset (t=0). To precisely track the information and thus lock the onset date in each province-level division of China, we use the iFinD database for news subscription, a news gather technique which is widely used in financial studies [7] . According to our knowledge, it is the first study that provides the accurate onset dates in each province-level division of China. The accurate onsets improve the reliability and accuracy of our results. Table 1 and Figure 1 summarize the variety of COVID-19 onsets across province-level divisions. As the origin of the COVID-19 outbreak, Hubei had the first confirmed case early on December 29 th , 2020, according to the official report of Chinese Center for Disease Control and Prevention (CCDC) published on January 28 th , 2020. Following Hubei, Beijing and Guangdong detected their first confirmed cases on January 19 th , 2020, and Shanghai one more day later. Thus, the top three developed provinces and municipalities started the epidemic alarm ahead of the rest of China. The overall outbreak of COVID-19 started on January 21 st , when nine more provinces and municipalities announced the first confirmed cases in their domains. On January 22 nd , 14 province-level divisions had the first confirmed cases, including Hong Kong and Macau. The COVID-19 spread the majority of China. Tibet was the last province that had the first confirmed case till January 29 th , 2020. Given Y(t) and t, we compute , the daily-updated growth rate of the confirmed cases between January 21 st , and February 16 th , in each province-level division of China. We also compute the s of the overall China, by choosing the onset in Hubei as the onset of China. Next, we estimate and . Previous studies state that the of SARS is between 0.5 and 0.8 [8] [9] . Some recent studies treat SARS as a comparable target of COVID-19 and thus set up the of COVID-19 equal to 0.65 [5] [24] under an optimistic estimation. However, according to the trend of COVID-19 in China during our observation period, this new epidemic has larger impact than SARS in 2003. Therefore, we set up the of COVID-19 equal to 0.5 in this study, such as to maximize the coefficient (1 − ) in (1) and thus depict the basic reproduction number in a more reasonable level. As discussed previously, is the ratio of the incubation period and , = incubation period . So the generation time can be computed as = incubation period . Previous studies usually use the incubation period of SARS as the substitute when the incubation period of COVID-19 is unavailable. According to the latest research, the median incubation period was 3.0 days [9] . So we choose 3.0 days as the incubation period of COVID-19. Thus, the generation time is 6.0 days. In summary, the function of 0 is We input the data of , and , to calculate daily updated 0 in province-level division of China. It is a common case to allow one (the unity) as the epidemic threshold. If 0 < 1, the relative size of the epidemic is negligibly small and the epidemic will die out soon; If 0 > 1, an infective individual can cause an outbreak of the epidemic in a certain size of population and will keep generating new infected individuals. In summary, 0 can be considered as the expected number of cases directly generated by one case of an infection in a population. We first look at the trend nationwide. Figure 2 depicts the nationwide daily 0 and new-added confirmed cases. The peak of 0 appeared on January 28 th , 2020, when it reached 3.50. Since then, 0 had decreased constantly. The increase of the new-added confirmed cases did not cause 0 to bounce up to a higher level, even when some boom of confirmed cases occurred on February 2 nd (5257 new cases confirmed), February 7 th (6483 new cases confirmed), and February 13 th (15216 new cases confirmed). Till February 16 th , 2020, 0 was controlled at 2.83, implying that one case of COVID-19 may lead to 2.83 new cases on average. Although the latest 0 is still higher than one, the downward trend has explicitly shown the positive outcomes from current epidemic prevention and control. Next, we study 0 s in the province-level divisions. Till February 16 th , 2020, the lowest 0 was equal to one from Tibet, where only one confirmed case appeared, while the rest of China had 0 s ranged between 1.60 and 3.21. The results are in line with our findings from the nationwide data. They indicate that in any region of China, the outbreak of COVID-19 is still in the process to be contained and does not move into a steady state with all individuals healthy. Although the provincial 0 s are still higher than one, however, changes of the provincial 0 s clearly show the outcomes of epidemic prevention and control. Figure 3 presents the highest 0 s (peak 0 ) in each province-level divisions of China during the observation period. The average peak 0 s of the 34 province-level divisions is 12.01. 15 of them had their peak 0 s between one and 10. 19 provinces or municipalities had their peak 0 s above 10, and three of them even reached up to 30 (Chongqing, Hainan and Zhejiang). By contrast, the latest average 0 on February 16 th has been lower to 2.56, which is 20% of the average peak 0 s occurred about twenty days ago. Figure 3 shows that the latest 0 s are very close to each other, indicating that nationwide epidemic prevention and control have been synchronized on the same page in different parts of China. Author name / Procedia Computer Science 00 (2019) 000-000 5 In this section we make a forecasting model of the 0 s. The descriptive statistical results above depict the downward trend of 0 in both nationwide and provincial levels, implying that the 0 s may follow certain time-varying processes. According to Box and Jenkins (1976) and Ruud (2000) [10, 11] , we construct the AR model and quantify how the current outcomes of 0 s will account for the future values. We first use the partial autocorrelation to determine the appropriate lag length. After determining the lag length n, we construct the AR(n) model for one array of 0 s, as described in (4) . We use the AR(n) model to forecast the nationwide trend of COVID-19. Depicted in Figure 4 , the 0 will keep moving down by a certain extent every day and eventually fall below one approximately on April 19 th , in other words, in two months. Compared with the two-month time frame of the 2003 SARS outbreak [12] , this result is reasonable. We use the actual data to reexamine the validity of this estimated trend. Figure 5 depicts the actual trend of 0 from January to the end of April, 2020. The actual 0 shows a decreasing trend and gradually approaches one. The actual result is consistent with our estimated result using the first month data. It confirms the validity and reasonability of our estimating model. This paper provides an estimation process on the trend of COVID-19 outbreak using the provincial-level data of the confirmed cases. We present daily updated basic reproduction numbers 0 s in each province-level division of China. We employ an effective and practical method to compute 0 s. The three component factors of this method are achievable through the daily updated data of the cumulative confirmed COVID-19 cases and a news gather technique. The outcomes of 0 s show a monotonic downward trend in the nationwide level since January 28th regardless of sudden booms of newly confirmed cases, which confirms the effects of current epidemic prevention and control in China. But the latest 0 s were still above the unity threshold, indicating that the outbreak of COVID-19 had not been contained into a steady state with all individuals healthy in any region of China. Through the statistical analysis, we also find that China has made significant progress on the epidemic prevention and control. In some provinces the peak 0 s ran up to 33 and 2/3 of the province-level divisions had their peak 0 s once above 10. The Chinese government successfully lower the provincial 0 s below 3.21 and maintain the efforts of epidemic prevention and control on the same page across province. In the inferential analysis section, we introduce an effective AR(n) model for the trend forecasting. We use it to forecast both the nationwide trend. Our results imply that the epidemic risk will fall to a safe level by the end of April in China, which matches the actual situation. The results provide more accurate method and information about COVID-19. Real-Time Estimation of the Risk of Death from Novel Coronavirus (COVID-19) Infection: Inference Using Exported Cases Short-term Forecasts of the COVID-19 Epidemic in Guangdong and Zhejiang Infectious diseases of humans: Dynamics and control Basic reproduction numbers for reaction-diffusion epidemic models Epidemic processes in complex networks Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Reaction to News in the Chinese Stock Market: A Study on Xiong'an New Area Strategy Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong Severe acute respiratory syndrome-Singapore An introduction to classical econometric theory Characteristics of and public health responses to the coronavirus disease 2019 outbreak in China The authors, including Kun Li (https://orcid.org/0000-0001-5862-1959), Yangyang Zhang (https://orcid.org/0000-0002-7492-3022), and Chao Wang (https://orcid.org/0000-0002-6096-7550), gratefully acknowledge the support from the National Natural Science Foundation of China (Grant No. 71803012).