key: cord-292537-9ra4r6v6 authors: Liu, Fenglin; Wang, Jie; Liu, Jiawen; Li, Yue; Liu, Dagong; Tong, Junliang; Li, Zhuoqun; Yu, Dan; Fan, Yifan; Bi, Xiaohui; Zhang, Xueting; Mo, Steven title: Predicting and analyzing the COVID-19 epidemic in China: Based on SEIRD, LSTM and GWR models date: 2020-08-27 journal: PLoS One DOI: 10.1371/journal.pone.0238280 sha: doc_id: 292537 cord_uid: 9ra4r6v6 In December 2019, the novel coronavirus pneumonia (COVID-19) occurred in Wuhan, Hubei Province, China. The epidemic quickly broke out and spread throughout the country. Now it becomes a pandemic that affects the whole world. In this study, three models were used to fit and predict the epidemic situation in China: a modified SEIRD (Susceptible-Exposed-Infected-Recovered-Dead) dynamic model, a neural network method LSTM (Long Short-Term Memory), and a GWR (Geographically Weighted Regression) model reflecting spatial heterogeneity. Overall, all the three models performed well with great accuracy. The dynamic SEIRD prediction APE (absolute percent error) of China had been ≤ 1.0% since Mid-February. The LSTM model showed comparable accuracy. The GWR model took into account the influence of geographical differences, with R(2) = 99.98% in fitting and 97.95% in prediction. Wilcoxon test showed that none of the three models outperformed the other two at the significance level of 0.05. The parametric analysis of the infectious rate and recovery rate demonstrated that China's national policies had effectively slowed down the spread of the epidemic. Furthermore, the models in this study provided a wide range of implications for other countries to predict the short-term and long-term trend of COVID-19, and to evaluate the intensity and effect of their interventions. Novel coronavirus pneumonia (coronavirus disease 2019, COVID-19) break out firstly in Wuhan, Hubei Province, China in December 2019, then the epidemic became prevalent in the rest of the world. With the research on COVID-19 so far, through the comparison of the gene sequence of the virus with that of the mammalian coronavirus, some studies found that its source may be related to bat, snake, mink, Malayan pangolins, turtle and other wild animals [1] [2] [3] [4] . COVID-19 can also cause severe respiratory diseases such as fever and cough [5] , and there is a possibility of transmission after symptoms of lower respiratory diseases [6] . However, unlike SARS-CoV and MERS-CoV, COVID-19 is separated from airway epithelial cells of patients [6] , yet the mechanism of receptor recognition is not consistent with SARS [7] . Therefore, the pathogenicity of COVID-19 is less than that of SARS [8] , and its transmissibility is higher than that of SARS [9] . In addition, this new coronavirus presents human-to-human transmission [10] , and close contact could lead to group outbreaks [11] . As of July 7th, 2020, 85,359 confirmed cases and 4,648 deaths had been reported in China [12] . In addition to China, there are over 200 countries and regions in the world with a total of 11,630,898 of confirmed cases and 538,512 of deaths [12] . The outbreak of COVID-19 happened right before the Lunar New Year, which is typical Chinese Spring Festival transportation period. With a population of over 11 million, Wuhan is one of the major transportation hubs in China as well as a core city of the Yangtze River Economic Belt. The time and location of the outbreak further led to the rapid spread of the epidemic in China [13] . Since there is still no vaccine or antiviral drug specifically for COVID-19, the government's policies or actions play an important role in flatting the epidemic curve [14] . From the perspective of public health, the interventions of Wuhan government have achieved the purpose of reducing the flow of people and the risk of exposure to the diagnosed patients, and also effectively slowed down the spread of the epidemic [15] . Nevertheless, COVID-19 can be transmitted by asymptomatic carriers [16] , and some of the recovered patients may still be virus carriers [17] . In order to implement non-pharmaceutical interventions more effectively, we used a combination of epidemiological methods, mathematical or statistical modeling tools to provide valuable insights and predictions as benchmarks. For the study of infectious diseases like COVID-19, SARS, and Ebola, most of the literature used descriptive research or model methods to assess indicators and analyze the effect of interventions, such as combining migration data to evaluate the potential infection rate [18, 19] , understanding the impact of factors like environmental temperature and vaccines that might be potentially linked to the diseases [20, 21] , using basic and time-varying reproduction number (R 0 & R t ) to estimate changeable transmission dynamics of epidemic conditions [22] [23] [24] [25] [26] [27] , calculating and predicting the fatal risk to display any stage of outbreak [28] [29] [30] , or providing suggestions and interventions from risk management and other related aspects based on the results of modeling tools or historical lessons [31] [32] [33] [34] [35] [36] [37] [38] [39] . Some literature only used one kind of model to simulate and predict the course of diseases. For instance, to use relatively common epidemiological dynamics models like SEIR or SIRD to forecast epidemic trends and peaks in certain provinces, even the world [9, [40] [41] [42] [43] [44] ; to apply some other types of statistical models such as the logistic growth models or time series approaches to analyze the epidemic situation [45, 46] , or to develop new models to support more complex trajectories of epidemics or to predict the number of confirmed cases and the spatial progression of outbreaks [47] [48] [49] . Several studies were further expanded based on the basic epidemic dynamic models. For example, joining the border protection mechanism with the SEIR model to better identify high-risk groups and infected cases [50] ; adding the effect of media or awareness into basic models to assess whether these outside influences would possible change the transmission mode of infectious diseases [51, 52] ; or according to transmission routes contained in dynamic models, using a multiplex network model or transmission network topology to analyze the outbreak scale and epidemic spread more accurately [53, 54] . A small number of studies combined the analysis capabilities of two types of models, like SEIR model and the recurrent neural networks model (RNN), to determine whether certain interventions could affect the results of outbreak control [55] . However, we did not find any analysis method using geographically weighted regression (GWR) on COVID-19 study based on our literature research. There is also a lack of understanding the model efficacy of predicting the epidemic curve among different algorithms. In this study, an SEIR's extended model SEIRD was used to simulate the epidemic situation in China and to predict the number of confirmed and cured cases in each province and several major Chinese cities. An LSTM model combined with traffic data and a GWR model were used to predict the number of confirmed patients. Specifically, GWR Model showing geographical differences was used to predict the development of epidemic situation and analyze the impact of geographical factors. This paper also compares the characteristics and prediction ability of these models. In the absence of vaccines and drugs for COVID-19, it makes sense to use multiple models to show the situation and intensity of non-pharmaceutical interventions needed to simulate and guide the control of outbreaks. Daily updated COVID-19 epidemiological data used in this study were retrieved from National Health Commission of China [12] and accessed via https://github.com/wybert/openwuhan-ncov-illness-data. The daily number of outbound from Wuhan city and relevant migration indice from January to March were collected from an online platform called Baidu Qianxi [56] . The demographic data and medical resources data were from China urban statistical yearbook published by the National Bureau of Statistics as shown in S1 Table. This study used SEIRD model and the changes in the status of the susceptible (S), exposed (E), infected (I), recovered (R) and dead (D) population in the total population (N) are shown in Fig 1. According to the medical characteristics and clinical trials of COVID-19, both confirmed patients and asymptomatic carriers have the ability to transmit the virus. Therefore, susceptible people have a certain chance to become infected after they come into contact with exposed or infected individuals [43] . Carriers in the exposed status may develop obvious symptoms after the incubation period and become diagnosed or they may be recovered. The final status of individuals can be basically divided into two categories: one is the recovery from the combined effects of treatment in hospital and autoimmunity, and the other is the death without effective treatment. In the model formula, the infectious rate β needs to be adjusted in real time to adapt to the trend of disease development. In the middle and late stages of the epidemic, the number of daily new cases decreased significantly due to the positive influence of government policies. Thus, to better fit the model, we added an attenuation factor desc to β. Based on the basic SEIRD model formulas [57, 58] , our modified model was shown as Eqs (1) (2) (3) (4) (5) (6) . Here, the parameter t denotes the time; β is the infectious rate; α is the rate for the exposed to be infected; γ 1 is recovery rate for the exposed; γ 2 is the recovery rate for the infected; k is the mortality rate; "desc" is the attenuation factor for β, so that β decays exponentially when 0