key: cord-0834639-b3j3kg73 authors: Wang, Peipei; Zheng, Xinqi; Li, Jiayang; Zhu, Bangren title: Prediction of Epidemic Trends in COVID-19 with Logistic Model and Machine Learning Technics date: 2020-07-01 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110058 sha: f3a79301cbde72fa46f4e2ac561a70b3727a4f0c doc_id: 834639 cord_uid: b3j3kg73 COVID-19 has now had a huge impact in the world, and more than 8 million people in more than 100 countries are infected. To contain its spread, a number of countries published control measures. However, itâs not known when the epidemic will end in global and various countries. Predicting the trend of COVID-19 is an extremely important challenge. We integrate the most updated COVID-19 epidemiological data before June 16, 2020 into the Logistic model to fit the cap of epidemic trend, and then feed the cap value into Fbprophet model, a machine learning based time series prediction model to derive the epidemic curve and predict the trend of the epidemic. Three significant points are summarized from our modeling results for global, Brazil, Russia, India, Peru and Indonesia. Under mathematical estimation, the global outbreak will peak in late October, with an estimated 14.12 million people infected cumulatively. An outbreak of atypical pneumonia [coronavirus disease 2019 (COVID- 19)] caused by Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) since late December 2019 has made huge impact on people life and work. The virus may spread from bats to humans through another intermediate host and cause 5 severe respiratory syndrome [1] , characterized by strong human-to-human transmission through the air [2, 3] . The world health organization (WHO) declared an international emergency on 31 January, 2020. Since initial identification, despite of strict control, now it becomes a pandemic in global, which is a big threat and challenge to world health and economy [4] , the disease has spread 10 to over 100 countries across the world ( Figure 1 ). As of 3:33 PM on 16 June, 2020, a total of 8,044,683 COVID-19 cases have been reported worldwide, with 437,131 deaths and 3,883,243 survivors, with an overall case fatality rate of 5.43% [5] . John Hopkins University is offering the current data [6] . The infectivity of COVID-19 is greater than that of influenza, with an estimated R0 15 value (the basic reproduction number, representing viral infectivity) of 2.28 [7] . Therefore, it is of striking significance to predict the pandemic trends of infection worldwide. Many scholars have developed a number of predicting methods for the trend forecasting of COVID-19, in some severe countries and global [8, 9] , debating 30 about mathematical model, infectious disease model, and artificial intelligence model. The models based on mathematical statistics, machine learning and deep learning have been applied to the prediction of time series of epidemic development [10, 11] . Logistic is often used in regression fitting of time series data due to its simple principle and efficient calculation. For example, in the 35 Coronavirus case, Logistic growth is characterized by a slow increase in growth at the beginning, fast growth phase approaching the peak of the incidence curve, and a slow growth phase approaching the end of the outbreak, i.e., the maximum of infections. Wu et al. [12] The rest of this paper is structured as follows. In Section 2, data sources and our proposed method are introduced detailed. Experiments and results analysis are given in Section 3. Section 4 presents the discussion of our research. Finally, in Section 5, the main contribution of this paper is summarized. across the whole of China retrieved from an archived news-site (SOHU) [14] was used for Logistic training. The Logistic model originated from the modeling of population growth in 85 ecology [15] . As an improvement on the Malthus population model [16] , in 1838, Pierre Franois Verhulst published the logistic equation: where Q, r and K indicate the population size, intrinsic growth rate and maximum population size that the environment could carry, respectively. dQ/dt represents the growth of the population. r and K are constants number and is a trend function used to analyze the non-periodic changes of time series. s(t) a periodic term, reflecting the periodic change, such as the periodicity of a week or a year. h(t) is the influence of an occasional day or days, such as a holiday. t We create an instance of the Prophet class and then call its fit and predict methods. The input to Prophet is always a time series with two features: date 135 ds and value y. Here in our study, ds is the date of day, and y is the accumulated cases in a particular country. In this paper, We integrate the most updated COVID-19 epidemiological data before Jun 16, 2020 into the Logistic model to fit the cap of epidemic To modeling the Logistic growth of COVID-19, Q, r and K in Equation Logistic Growth is characterized as follows [23] : where Q(t) is the number of cases at time t, a is constant, b could be considered 150 as incubation rate and K is the cap value, the maximal number of cases for Q(t). Therefore, the number of cases at the very beginning is K/(1 + a), and the key point is ln a/b at which the cumulative situation curve turns, when rapid increases in the number of cases are replaced by slow increases. We initialize a,b and K randomly and update it by using Nonlinear Least Squares [24] . The If the current day in time series is less than t f ast , it means the key point is still ahead and the growth is increasing maybe exponential. Otherwise, it means that virus spread has been controlled and growth is going towards the end. We set the time t max with maximum cases is three times of t f ast for the first kind of growth while it is more 20 days than current day for the second kind of growth. Then the estimated top number of infections Q top can be calculated as follows: The aforementioned top number of infections Q top will be feed into Prophet model with actual time series data. We perform around five to six months ahead forecasting by using Prophet, with 95% prediction intervals and logistic growth 180 type. No tweaking of seasonality-related parameters and additional regressors are performed. In Three time series are constructed from our collected data, namely confirmed 195 cases, recovered cases and death cases series. In our experiment, we assume that each of these three sequences has a peak, in other words, the epidemic will end eventually. Obviously, the number of active confirmed cases is equal to the number of accumulated confirmed cases minus the number of recovered and deaths. We first apply Logistic model to fit the curve and calculate the 200 time with fastest growing rate, then use Prophet to make a prediction. It is worth noting that there are three significant points in our forecasting results, as is described in Table 1 . The first is the maximum number of existing infections (the time when the blue line reach peak in Figure 5 ), i.e., epidemic Epidemic peak point This peak means that the active infections has reach the top value and since then, the number of active cases will decrease. The fastest growth point After this point, the epidemic gradually slows down and finally becomes stabled. Turn point This point occurred when the number of cumulative cured exceeds the number of active confirmed cases, marks an early victory in the control of the epidemic. We chose Brazil, Russia, India, Peru and Indonesia as the forecast countries, worldwide. We could summarize our basic predictions as follows: As is shown in Table 2 , the fastest growth point of those five countries has already passed. With current intervention, the total epidemic size in Brazil is values were compared separately ( Figure 6 ). The results found that there was overall a good fit between our projected and reported data. According current management capacity, especially in ICU care [26] . Once the outbreak exceeds the combination of national health resources, it will take a long time to recover. In our predicted results, by the late July or mid of August 2020, the healing rate will increase and Brazil will have about 1.7 million confirmed cases at the end. Our study highlighted another key point, the strict control measures adopted 1.35%, 4.90% and 2.98%, respectively and there are still some increasing cases every day (Figure 7) . Because of the rapid outbreak in South American and European countries, the medical system nearly collapsed in a short period of time [27] . After strict control, the growth rate of the epidemic gradually slowed down, and the days when the cumulative cure was greater than the existing 285 diagnosis came in mid-June, which means that South American and European countries still have a long way to fight in strictly controlling the outbreak. While more data is needed to make more detailed predictions, these models could help predict future confirmed cases if the spread of the virus does not change in a way beyond expectation. As we all know, this virus is new and has 290 the ability to spread seriously. This characteristic may affect all our predictions, but to our best knowledge at the time we spent writing this paper, the proposed model is effective. In this article, a forecasting method with Logistic and Prophet model is However, as is shown in Figure 6 , all of our predictions are based on the assumption of there will be a maximum of outbreak, and the epidemic curve is modeled based on a full Logistic curve. In real world, there maybe some small 305 peak during the pandemic due to different intervention of the government and different public cooperation. Besides, when we forecasting the epidemic in some countries, the effects of input cases and spatial influence between countries are not taken into account. To address the aforementioned limitations, the following aspects are worthy Evolutionary history, potential intermediate animal host, and cross-species analyses of 330 sars-cov-2 Clinical characteristics of coronavirus disease 2019 in china Early transmission dynamics in wuhan, china, of novel coronavirusinfected pneumonia Will covid-19 generate global preparedness? Coronavirus disease (covid-19) situation report 147 An interactive web-based dashboard to track covid-19 in real time Estimation of the reproductive number of novel coronavirus (covid-19) and the prob-350 able outbreak size on the diamond princess cruise ship: A data-driven analysis Modified seir and ai prediction of the epidemics trend 355 of covid-19 in china under public health interventions Modelling and predicting the spatio-temporal spread of coronavirus disease 2019 (covid-19) in italydoi: dx Analysis and forecast of covid-19 spreading in china, italy and france Statistical analysis of forecasting covid-19 for upcoming month in pakistan Generalized logistic growth modeling of the covid-19 outbreak in 29 provinces in china and in the rest of the world Forecasting at scale Combatting sars (in chinese) Simulation of rice biomass accumulation by an extended logistic model including influence of meteorological factors An essay on the principle of population Sars epidemiology modeling Lstm network: a deep 385 learning approach for short-term traffic forecast An improved neural network-based approach for short-term wind speed and power forecast Multiple-instance learning approach via bayesian extreme learning machine Automatic forecasting procedure Logistic model-based forecast of sales and generation of obsolete computers in the u.s An adaptive nonlinear least-squares algorithm Covid-19 projections Arboviral diseases and covid19 in 405 brazil: Concerns regarding climatic, sanitation and endemic scenario, Arboviral diseases and COVID19 in Brazil: Concerns regarding climatic, sanitation and endemic scenariodoi The WHO Regional Office for Europe, the European Observatory on Health Systems, Policies, Covid-19 health system response monitor 410 (hsrm) There is no conflict of interest in this work.