key: cord-0923110-8b1esll4 authors: Huang, Ganyu; Pan, Qiaoyi; Zhao, Shuangying; Gao, Yucen; Gao, Xiaofeng title: Prediction of COVID-19 Outbreak in China and Optimal Return Date for University Students Based on Propagation Dynamics date: 2020-04-07 journal: J Shanghai Jiaotong Univ Sci DOI: 10.1007/s12204-020-2167-2 sha: 40ae26f6095a1e1bf5e3e4d5652403c2234d2051 doc_id: 923110 cord_uid: 8b1esll4 On 12 December 2019, a novel coronavirus disease, named COVID-19, began to spread around the world from Wuhan, China. It is useful and urgent to consider the future trend of this outbreak. We establish the 4+1 penta-group model to predict the development of the COVID-19 outbreak. In this model, we use the collected data to calibrate the parameters, and let the recovery rate and mortality change according to the actual situation. Furthermore, we propose the BAT model, which is composed of three parts: simulation of the return rush (Back), analytic hierarchy process (AHP) method, and technique for order preference by similarity to an ideal solution (TOPSIS) method, to figure out the best return date for university students. We also discuss the impacts of some factors that may occur in the future, such as secondary infection, emergence of effective drugs, and population flow from Korea to China. Nomenclature c-The average number of contacts of an exposed person without isolation each day n-The number of individuals nD-The death toll nE-The number of exposed individuals nI-The number of infectious individuals nR-The number of recovered individuals nS-The number of susceptible individuals N -The total population of China p-Intensity of isolation for exposed individuals r-Correlation coefficient R 2 -Coefficient of determination t0-Moment when the government began to take measures t-Outbreak duration α-Incubation rate β-Infectious rate of contacts of an exposed person γ-Recovery rate μ-Pneumonia mortality 0 Introduction On 12 December 2019, the first patient with unexplained pneumonia was admitted into the hospital in Wuhan. Since then, a novel coronavirus disease, named COVID-19, has spread around the world, and the number of infected patients has been growing exponentially. We have known that the novel coronavirus has certain infectivity and a good affinity with human respiratory tract cells. It can also be transmitted from person to person. Thus, it is useful and urgent to predict the situation of this outbreak with mathematical modeling. Besides, students still stay at home to prevent the spread of COVID-19. Here we establish a model to predict the spread of COVID-19 and infer the most suitable return date for university students. Traditionally, the compartment model is used to predict the outbreak of infectious diseases. Before building our model, we have an overview of related work. We classify the related work into two categories: models with external floating population and those without. In these papers, conventional models are the models of SEIR, SEIAR and SEIJR. They all have their advantages. For example, the SEIR model is easy to implement, and the SEIJR model accurately divides isolated individuals from other groups. However, there is a common disadvantage: the results of longterm prediction are not accurate because these models cannot fit the real situation for a long time. In this paper, we consider some factors that influence the COVID-19 outbreak for a long time and establish the 4+1 penta-group model. On the basis of traditional SEIR model, we add a compartment, i.e., dead individuals, and take into account some parameters, such as the moment when the government began to take measures, and intensity of isolation for exposed individuals. Moreover, for some parameters used in our model, we use the collected data to calibrate them in order to achieve as close as possible to the actual situation. Then, we establish an analytical model, using simulation of the return rush (Back), analytic hierarchy process (AHP) method and technique for order preference by similarity to an ideal solution (TOPSIS) method, called BAT model. Through the combination of these two models, we can predict the development of the epidemic and draw the pros and cons of different return dates. Existing work on prediction of the COVID-19 outbreak can be classified into models without external floating population and those with. Previous related work adopts differential equations as the basic form for simulation. The types of models without external floating population are the SEIR, SEIAR and SEIJR models, approximately. The comparison of these models and our model is shown in Fig. 1 SEIR As a traditional infectious disease model, the SEIR model describes the relationship between susceptible individuals, exposed individuals, infectious individuals, and recovered individuals. Fan et al. [1] , Geng et al. [2] , and Zhou et al. [3] proposed some of the most classic SEIR models. They directly applied traditional SEIR model [4] without any changes to simulate the outbreak in Wuhan and other areas. The SEIR model takes little account of the actual situation, so the long-term prediction is far from the actual value. SEIJR The SEIJR model is roughly the same as traditional SEIR model, but the population is divided into susceptible individuals, asymptomatic individuals during the incubation period, infectious individuals with symptoms, isolated individuals with treatment, and recovered individuals. Read et al. [5] adopted this idea. This model accurately separates isolated and other populations and it is more realistic about the status quo. Nevertheless, precise data on each individual are hard to collect, making it difficult to calibrate parameters. Therefore, the long-term prediction is far from the actual value. SEIAR In the SEIAR model, the difference from the SEIJR model is that there are no isolated individuals but asymptomatic individuals. Bai et al. [6] followed this approach, and this model has similar characteristics to the SEIJR model. In addition to these, we find an SEIR based model with external floating population, which considers the zoonotic force of infection and the daily number of travelers. Wu et al. [7] adopted this model. The simulation is already very close to the beginning of the epidemic. However, as the outbreak progresses, it is drifting away from reality, which makes it unsuitable for long-term forecasting. All of these models have their strengths, but none of them do well in long-term predictions due to the parameters or model accuracy. Based on the above experience, our 4+1 penta-group model takes into account the long-term nature of the outbreak. We add the dead individuals, whose precise data can be collected to calibrate mortality, to our model and we also add the time of isolation initiation and the intensity of isolation to the model given the long-term impact of the measures that the government has taken. In general, we establish a model that can predict the long-term situation of the outbreak. We establish two models to predict the spread of COVID-19 and figure out the most suitable return time. First, we establish the 4+1 penta-group model to predict the future trend of the COVID-19 outbreak in China. Then, we propose the BAT model to simulate the return rush and consider some factors to obtain the best return time. Model For the 4+1 penta-group model, we assume that there is no floating population, considering that the government has issued notices and taken measures to keep people at home. Moreover, we assume that the recovery rate is positively correlated with the level of medical care, and the mortality is negatively correlated with the level of medical care. The 4+1 penta-group model includes six sub-models that describe the flow relationship of each population [8] . It reflects the complete changes of five groups of people and the overall relationship. In this paper, we use differential equations to simulate the flow of people [9] . The first model that reflects the change in susceptible individuals is (1) The second model that reflects the change in exposed individuals is The third model that reflects the change in infectious individuals is The fourth model that reflects the change in recovered individuals is In this model, we separate n D from n R to predict the death toll precisely. Moreover, we will discuss the effect of secondary infection in Subsection 4.1. This new compartment also benefits this section because secondary infections in dead individuals will never exist. The fifth model that reflects the change in dead individuals is and the sixth model that reflects the overall relationship is The BAT model is composed of three parts: simulation of the return rush (Back), AHP method and TOPSIS method. First of all, we assume that the infectious rate is positively correlated with the population density, measured by the per capita floor space. On the first day of the Spring Festival travel rush, before the extended leave notice, people were sent at Shanghai Hongqiao Railway Station. Moreover on 5 February 2020, 50 000 people were at the same place. Without extended holidays, the per capita area of 440 million people will increase m times: m = 1.942 8, estimated by data of Shanghai Hongqiao Railway Station. Then we derive the impacted infectious rate and the number of contacts that would quadruple: β = mβ, and c = 4c. By replacing the original parameters, we get Here we take into account the impact of both the epidemic situation and the delay to rework and return to school, and use AHP method to get the weight of each factor. Factors considered in our analysis are the end date of the outbreak, the total number of cases, economic impact, and students' graduation. We establish a weight matrix for these factors, as shown in Fig. 2 . Then we set up a scoring system to obtain the best return date using TOPSIS method. When constructing the matrix of judgment, we compare factors in pairs using the consistent matrix method [10] [11] . The relative scale is adopted to minimize the difficulty of comparing various factors with different properties, to improve accuracy. Through the analysis of the questionnaire results, the importance levels of the four relevant factors are determined in a score-based method from equally important (denoted as 1) to extremely more critical (denoted as 9), as shown in Fig. 2 . In terms of importance, relative to the end time, the scores of the number of cases, the economic impact and the end time are 5, 4 and 0.33, respectively. Also, we set up a scoring system to obtain the best return date using TOPSIS method [12] [13] . We first determine that both the economic impact and the students' graduation are related to the return time. It is evident that the delay of return will influence graduation of those students while do harm to the economic. Then, the end time of the outbreak is subtracted from the original case to get a 3 × 3 matrix. Furthermore, we normalize this matrix and add up the product of weight and value of each factor in the matrix. We consider the sum as the highest score. Comparing those scores, we finally derive the most suitable return date. With the BAT model, we can derive the most suitable return date. To realize the prediction of the future situation of the outbreak, we need to determine the specific values of the parameters in the 4+1 penta-group model [14] . We set the time for the emergence of the first case as t = 1 d. According to the time of the emergence of much news about COVID-19 and the time when people started to pay attention, we set t 0 as 42 d (January 23, 2020). The mean time from symptoms onset to isolation is 6.138 8 d (interval: 5.967 6-6.320 0 d) [15] , from which we obtain 1/α = 6.138 8. Under traffic control, considering the fact that people in Hubei were forbidden to leave Hubei, and the city's buses, subways, ferries, and long-distance passenger transport were suspended from 10:00 am since 23 January 2020, we assume that each person has only daily contact with family members at home, i.e., a census household size of 3.1 people per household (from the national data). The total population of China is N = 1 400 050 000, while the population in Hubei is 59 170 000, so the average number of contacts is c = 20, which is calculated by employing a decentralized average and considered to be the number of people exposed to an exposed person without isolation. For the recovery rate and pneumonia mortality, we apply the latest data to Eqs. (4) and (5) and get the variation of the recovery rate and pneumonia mortality. At the same time, we can see that these two parameters depend on the time t from At last, we use the nonlinear least square method to get β and p that best match the actual data. When the sum of the squares of the predicted value from the actual value is the smallest, we obtain β = 0.019 5, p = 0.052. Until now, we have all the parameters, and then we set the initial condition as follows: With the initial conditions brought in, we obtain the predicted results of the infections of the COVID-19 outbreak in China, as shown in Fig. 4 . To verify the validity of our model, we set the coefficient of determination and the correlation coefficient between the predicted curve and the actual data as the reliability evaluation criteria of the model. The closer the two coefficients are to 1, the higher the correlation is, which means the higher the reliability of the predicted results is. The final results are R 2 = 0.905 29 and r = 0.995 38; both are greater than 0.9, so the correlation is very high. Eventually, our model can be considered to be highly reliable. From the prediction results, it is concluded that by the seven-day rule of no new infections, we know that the epidemic will end on 2 May 2020, and the total number of cases is 103 321. The number of exposed individuals peaked at 22 540 on 8 February 2020. The number of infectious individuals peaked at 58 125 on 22 February 2020. We assume that the general return rush lasts for seven days, and use BAT model to evaluate 46 sets of return date (starting from March 1 to April 15). Finally, we obtain the best time to return to work or study is from March 15 to March 22, which means universities will start on March 23. Judging from the prediction results, we cannot relax our vigilance at this stage but should wait patiently for the epidemic to end, although the epidemic situation has passed its inflection point. In the following analysis, we assume that the government and people will continue to take strict precautionary measures. While the cured ones will all become susceptible individuals and share the same infectious rate and other equations remain the same, we rewrite Eqs. (1) and (6) to comply with the presence of secondary infections: The new framework is shown in Fig. 5 . We apply Eqs. (9) and (10) to get the results shown in Fig. 6 . The prediction model gives the outbreak prediction results and simultaneously evaluates the prediction of secondary infection at different time, and finally gets almost the same result, from which we can know that there is no need to panic if secondary infection occurs. As long as we continue to take protective measures, the overall situation will not be significantly affected. Predictably, the emergence of effective therapies and specific drugs will significantly improve γ and reduce μ. Furthermore, we assume that the influence of the emergence of effective drugs on the two parameters has the following three conditions: only the recovery rate increases, only the pneumonia mortality decreases, and both recovery rate and pneumonia mortality change. Here, we increase the cure rate to 0.5 and reduce the pneumonia mortality by a tenth. We take three values as the possible time for effective drugs to appear: t = 80 d (March 1, 2020) , t = 90 d (March 11, 2020), and t = 100 d (March 21, 2020) and get the results, as shown in Table 1 . While the improvement of the cure rate and the reduction in the mortality will increase the number of cured people and reduce the number of deaths, the delay in the presence of effective treatment and drugs will increase the number of deaths but decrease the number of cured individuals. These two parameters have a relatively small impact on the end of the outbreak and the total number of infections. We take the exchange between the first-tier city Shanghai and the republic of Korea as an example to discuss the influence of the population flow from Japan and South Korea on the COVID-19 outbreak in China. South Korea will suspend most flights to China, according to statements from major Korean airlines. We regard the day February 4 as the date of announcement as the time node, and the six routes still open after the time node, which means that 20 flights can be taken. The previous 59 routes have about 33 flights. While the numbers of flights from Shanghai to Korea are 16 and 30, we determine that the Korean authorities allow only asymptomatic passengers on flights before and after the decision. Moreover, in the context of a large population, we use foreign predictive case data to figure out the ratio of exposed people to susceptible people. On the basis of this ratio, the number of exposed and susceptible people arriving in China by plane is estimated. Then, it is added to all kinds of people in Shanghai to figure out the final prediction result. Korean Air usually uses Boeing 777, which can carry people between 305 and 550. We take the mid-value, 428 people and obtain that the daily population flows of Korea and Shanghai around February 4 are 1 712 and 428, respectively. Supposing the daily population flow is A, and the proportion of the latent population in the sum of latent population and susceptible population is k, for Shanghai, the expressions are Using the previous model, we change the data to the infection data of Shanghai and obtain the results, and we compare the obtained results with the predicted results without considering the flow between Korea and Shanghai. By comparison, we find almost no difference in the results. The results differ only by two digit after the decimal point. Therefore, we conclude that although the news will strongly report the entry of sick foreigners, it has a relatively small impact on overall epidemic control. We should not be panicked by such reports, nor should we be relaxed because there is no report. We should pay more attention to the protection against the exposed individuals, the main source of infection. In this paper, we establish the 4+1 penta-group model to predict the spread of COVID-19 and infer the most suitable return date for university students using the BAT model. Firstly, we develop a basic SEIRbased model for predicting the spread of new viruses. Secondly, with the help of methods such as AHP and TOPSIS and taking into account various factors, we establish the BAT model to obtain an optimal time for the return to work and study, which is of great practical significance. Our estimates perform much better in the long run than the estimates of other forecasting models. We are very innovative in introducing the isolation strength parameter and set the isolation time; for example, dead population is added on the basis of SEIR model. The change of mortality and cure rate varying with time is obtained by fitting instead of a constant, which is scientific. SEIR-based novel pneumonia transmission model and inflection point prediction analysis Analysis of the role of current prevention and control measures in the epidemic of new coronavirus based on SEIR model Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV Global stability analysis on one type of SEIR epidemic model with floating population Novel coronavirus 2019-nCoV: Early estimation of epidemiological parameters and epidemic predictions Early transmission dynamics of novel coronavirus pneumonia epidemic in Shaanxi Province Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study Study on the stability of infectious disease dynamics model Survey of transmission models of infectious diseases Research on computation methods of AHP weight vector and its applications Fuzzy analytic hierarchy process A review of the comprehensive multiindex evaluation method The improved method for TOPSIS in comprehensive evaluation Parameter identification for a stochastic SEIRS epidemic model: Case study influenza Modelling the epidemic trend of the 2019-nCoV outbreak in Hubei Province