key: cord-0801344-tqx6z9c9 authors: Guo, Qingchun; He, Zhenfang title: Prediction of the confirmed cases and deaths of global COVID-19 using artificial intelligence date: 2021-01-07 journal: Environ Sci Pollut Res Int DOI: 10.1007/s11356-020-11930-6 sha: b2de217ab4b8ec5bcbbd936fdc9be2e6589394e6 doc_id: 801344 cord_uid: tqx6z9c9 The outbreak of coronavirus disease 2019 (COVID-19) has seriously affected the environment, ecology, economy, society, and human health. With the global epidemic dynamics becoming more and more serious, the prediction and analysis of the confirmed cases and deaths of COVID-19 has become an important task. We develop an artificial neural network (ANN) for modeling of the confirmed cases and deaths of COVID-19. The confirmed cases and deaths data are collected from January 20 to November 11, 2020 by the World Health Organization (WHO). By introducing root mean square error (RMSE), correlation coefficient (R), and mean absolute error (MAE), statistical indicators of the prediction model are verified and evaluated. The size of training and test confirmed cases and death base employed in the model is optimized. The best simulating performance with RMSE, R, and MAE is realized using the 7 past days’ cases as input variables in the training and test dataset. And the estimated R are 0.9948 and 0.9683, respectively. Compared with different algorithms, experimental simulation shows that trainbr algorithm has better performance than other algorithms in reproducing the amount of the confirmed cases and deaths. This study shows that the ANN model is suitable for predicting the confirmed cases and deaths of COVID-19 in the future. Using the ANN model, we also predict the confirmed cases and deaths of COVID-19 from June 5, 2020 to November 11, 2020. During the predicting period, the R, RMSE, and MAE for new infected confirmed cases of COVID-19 are 0.9848, 17,554, and 12,229, respectively; the R, RMSE, and MAE for new confirmed deaths of COVID-19 are 0.8593, 631.8, and 463.7, respectively. The predicted confirmed cases and deaths of COVID-19 are very close to the actual confirmed cases and deaths. The results show that continuous and strict control measures should be taken to prevent the further spread of the epidemic. In December 2019, an outbreak of coronavirus disease 2019 (COVID-19) appeared in Wuhan, which was caused by severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) (Li et al. 2020a; Zhou et al. 2020; Zhu et al. 2020) . The epidemic spread quickly, with over 51,251,907 cases and 1,270,930 deaths reported globally, and there are more than 216 countries or regions with cases (as of November 11, 2020, WHO). Common symptoms are fever, cough, and myalgia or fatigue ). The proportion of black patients with COVID-19 in the USA is much higher than that of white patients (Price-Haywood et al. 2020) . Human to human transmission plays a role in the disease (COVID-19) . Environmental monitoring of two hospitals in Wuhan reveals the airborne hot spots of SARS-CoV-2 RNA . And summer does not stop the spread of SARSCoV-2 (Baker et al. 2020) . Although researchers believe that novel coronavirus (SARSCoV-2) is the natural host of bat, the final origin of the virus is not yet fully established . The intermediate host of SARSCoV-2 may be pangolins (Lam et al. 2020) , cat (Halfmann et al. 2020) , ferret , dog (Sit et al. 2020) , and hamster (Sia et al. 2020) . It is important to understand the potential epidemic of SARSCoV-2 by assessing the infectivity of the unrecorded SARSCoV-2 infection. It is estimated that 86% of infections before travel restrictions in China on January 23, 2020 have not been recorded. Seventy-nine percent of recorded infections originate from unrecorded infections, resulting in rapid spread of SARSCoV-2 and difficult to block (Li et al. 2020b ). Using the real-time mobility data and cases in Wuhan, the initial population mobility data can well explain the spatial distribution of cases. After the prevention and control measures were taken, the growth rate in most regions became negative, and the control measures greatly alleviated the spread of the disease . The global metapopulation disease transmission model is used to predict the influence of travel restrictions on the domestic and international spread of COVID-19. Measures to close Wuhan slowed the outbreak of COVID-19 in China and gave China 3-5 days to prepare for it. At the international level, it had more radiation effects, and by the middle of February, it had reduced nearly 80% of the imported cases (Chinazzi et al. 2020) . Most empirical models use hypothetical parameters, so they are not suitable for COVID-19 data. Using these models to predict future cases of COVID-19 may not be very accurate. There is still great uncertainty about the amount of new confirmed cases in the future. When the epidemic will be over is an urgent question for the public to know. Artificial intelligence (AI) has been proven of their feasibility in capturing nonlinear relationships. Artificial intelligence has been widely used in many fields (Cassimon et al. 2020; Guo et al. 2020; He et al. 2014) , and AI can quickly diagnose COVID-19 . To overcome limitations of the epidemiological model, we develop artificial intelligence (AI) for real-time predicting of the new confirmed cases of COVID-19 all over the world. Our goal is to provide better estimation methods to assist medical and government agencies in effective response and timely adjustment in the event of the epidemic. We hope our research results will be able to inform the world of the future trend of the epidemic. Data on the daily new infected confirmed cases of COVID-19 and new confirmed deaths of COVID-19 around the world from January 21, 2020 to November 11, 2020 are obtained from the World Health Organization (WHO). Data of COVID-19 worldwide are divided into training, testing, and predicting sequences. Training data are from January 21 to May 22, 2020, testing data are from May 23 to June 4, 2020, and forecasting data are from June 5 to November 11, 2020. Artificial intelligence (AI) is used to predict the amount of new infected confirmed cases of COVID-19 and new confirmed deaths of COVID-19. With one hidden layer, the artificial neural network (ANN) that hires a backpropagation algorithm is constructed (Rumelhart et al. 1986 ). The architecture of ANN contains three layers (input layer, hidden layer, output layer) and a layer contains some nodes. C (1) …C (n) is the data of daily new confirmed cases as the input variable, and C (n + 1) is the new confirmed cases predicted for + 1 day ( Fig. 1 ). C i is the node input, z expresses the node output, and W ji expresses the weight, where D expresses the node excitation threshold, and v and f express the basic and activation functions, respectively. A node assesses the weighted summation of the inputs as The activation function appraises output by To determine the relationships between previous days' cases and the next day's (tomorrow) variations, the ANN model is used to estimate global patients with COVID-19. The input parameters are the past days' global patients, and the output variable produces predictions of global patients with COVID-19 for + 1 day. The activation functions employed in the input layer, hidden layer, and output layer are hyperbolic tangent sigmoid function (tansig) and rectified linear unit (ReLU, or poslin). To avoid overfitting and validate Hidden layer Output layer Fig. 1 The architecture of ANN predicting the COVID-19 epidemic the reliability of the developed model, we use 90% of the cases for training and 10% of the cases for testing. The performance of ANN is assessed employing three metrics including: root mean square error (RMSE), correlation coefficient (R), and mean absolute error (MAE). The R values are employed to determine the model accuracy, and the root mean square error (RMSE) values are employed to determine the residuals between predictions and actual cases: A j denotes the actual amount of new confirmed cases of COVID-19, G j denotes the predicted amount of new confirmed cases of COVID-19, A is the mean of the actual cases, and G is the mean of the predicted cases. The cycles of new infected confirmed cases of COVID-19 are calculated using wavelet analysis (Fig. 2) . The wavelet variance diagram can reflect the distribution of wave energy of time series with the different scales. It can be used to determine the main periods of new infected confirmed cases of COVID-19. There are five obvious peaks in the wavelet variance diagram, which correspond to the time scales of 7 days, 32 days, 71 days, 96 days and 103 days (Fig. 2) . These are the cycles of new infected confirmed cases of COVID-19. Among them, the maximum peak value corresponds to the 7 days (time scale), which means that the period oscillation of about 7 days (time scale) is the strongest, which is the first main cycle; the second peak corresponds to the 32 days (time scale), which is the second main cycle; the third, fourth, and fifth peak value correspond to 71 days, 96 days, and 103 days (time scale), respectively, and they last for the third, fourth, and fifth main cycle in turn. This shows that the fluctuation of the above five periods controls its variation characteristics in the whole time domain. Many studies have shown theoretically that three-layered ANN can describe any nonlinear mapping relation with precision (He et al. 2014) . A typical neural network consisting of three layers was applied to forecast the global COVID-19 epidemic. The number of neuron in input and hidden layer is decided by trial and error. Figure 3 shows optimization of network topologies. The performance of various numbers of days in the input and hidden layer is compared (Tables 1 and 2) . Tables 1 and 2 show the variables and the R values of the proposed ANN model. Tables 1 and 2 also show simulation of the amount of new infected confirmed cases of COVID-19 during the training and test period. Interestingly, using only the most recent 7 days reproduces the best global patients with COVID-19 simulations. Using more than 7 days entirely confuses the model and produces unexpected cases. The final model included 7 past days' data. Seven variables were selected for the model input, and the amount of nodes of the hidden layer is similarly 6. Finally, network topologies (7-6-1 for ANN) are better than others. Training algorithms of the ANN model are also chosen by trial and error. Figure 4 shows optimization of training algorithms for predicting the COVID-19 epidemic. The simulation values are close to the actual values using trainbr in both training period and testing period. trainbr is performed best in predicting the COVID-19 epidemic. Table 3 shows the performances of the training algorithms, revealing that the trainbr algorithm has the best performance in simulating global patients with COVID-19. Table 3 also shows the simulation performance using trainbr for the developed ANN model. The simulation case is very close to the actual case. To avoid overfitting, we conduct a test. The model has similar R values, so there were no overfitting issues with the model. The RMSE value for the ANN model using trainbr for the training dataset is 3859.4, and that for the test dataset is 3102.9. The R for the ANN using trainbr during the training and test period are 0.9948 and 0.9683, respectively. The MAE for the ANN using trainbr are 2303.7 and 2090.6, respectively. Table 4 shows transfer function (tansig-poslin) is better than others during training, testing, and predicting periods. Purelin and poslin is a linear transfer function and positive linear transfer function, respectively. Tansig and logsig is a hyperbolic tangent sigmoid transfer function and logarithmic sigmoid transfer function, respectively. In the forecast period, global infected cases and deaths of COVID-19 in the next day are predicted cumulatively using previous days' predicted infected cases and deaths. Figure 5 expresses predicted new infected confirmed cases of COVID-19 all over the world. The training and testing cases are from January 20 to June 4, 2020. We began to predict from June 5 to November 11, 2020. In the 5 months, the average amount of predicted new infected confirmed cases of COVID-19 is 271,761 every day in the world, and the average amount of actual infected cases is 271,528. Figure 6 expresses predicted total infected confirmed cases of COVID-19 worldwide. The total infected confirmed cases continue to grow during the predicting period. The actual total amount of cumulative infected confirmed cases is more than 51 million by November 11, 2020, and the forecasting total infected cases are similar to the actual infected cases. Figure 8 expresses predicted globally total infected deaths of COVID-19. The actual amount of total infected deaths is 1,270,930 by November 11, 2020, and the forecasting total infected deaths are 1,258,700. In summary, the predicting infected deaths are very close to the actual infected deaths. Table 6 shows the predicting performance of new infected confirmed cases of COVID-19 in 10 countries. During the predicting period, the R, RMSE, and MAE for USA are 0.9696, 5139.1, and 4074.6, respectively; the R, RMSE, and . A set of models using 9 different machine learning algorithms for predicting the rise in new cases, having an average accuracy of 87.9 ± 3.9%, was developed for 10 high population and high-density countries. The highest accuracy of 99.93% was achieved for Ethiopia using ARMA averaged over the next 5 days (Khakharia et al. 2020) . Not every machine learning algorithm could give a very high accuracy for predicting the cases for each country. Machine learning models based on decision tree, random forest, logistic regression, and support vector machines are developed and show accuracies between 76.2 and 92.9% to predict early signs of infection containment (Kasilingam et al. 2020 ). Although our model has better performance when compared with other predicting models, it is unfortunate that cumulative confirmed cases are following increasing trend. Many machine learning methods or artificial intelligence techniques have been employed to forecast the number of confirmed cases of COVID-19. However, there are lots of challenges for the accurate prediction by machine learning methods. It is difficult to cultivate accurate machine learning models with small datasets. Deep learning methods are successful because of big training data which is not available for prediction task of COVID-19 confirmed cases. It is difficult to select suitable architectures and parameters for deep learning neural networks with small datasets. It is disputed that lots of countries are not doing enough testing. Therefore, it is impossible to have correct confirmed cases in these countries. Using poor-quality datasets to train machine learning algorithms will lead to wrong conclusions (Ahmad et al. 2020 ). We predict the amount of the infected confirmed cases and deaths of COVID-19 for the next day using artificial intelligence. The accuracies of simulation of the infected confirmed cases and deaths of COVID-19 are high. So, the ANN can be used for simulations. The results show that using trainbr algorithm has the best performance. The 7 input variables are selected to create an ANN model. We also can use the ANN with known parameters to predict the number of the virus epidemic outbreaks in the future. The lowest RMSE is accomplished by using previous 7 days data in the training and trest stage. If more input parameters are used, the error gets higher until seven input parameters for + 1 day prediction. The RMSE goes up slightly as more input parameters are used in the prediction of + 1 day cases of COVID-19. We provide a simple AI model for policy makers and researchers to understand the infected confirmed cases and deaths of the COVID-19 in the next 3 months based on specific estimates of global past historical data. The actual data of the confirmed case of COVID-19 in progress is well matched with that of AI, which strongly shows that it is suitable for simulating the epidemic caused by SARS-CoV-2. These results help authorities to control the COVID-19 epidemic. Without any measures, and the relative risk of infection is 1.5, 2.0, and 3.0, in the next year, the COVID-19 death toll will be 146,996, 293,991, and 587,982, respectively (Banerjee et al. 2020 ). In the baseline scenario, the basic regeneration number of COVID-19 was 2.68, and the model predicted that the number of people infected in Wuhan was 75,815 as of January 25, 2020. If the transmission characteristics of COVID-19 do not change significantly, the outbreak period of other major cities in China will be 1-2 weeks later than that of Wuhan . Using the reported data from January 11 to February 10, 2020 to calibrate the susceptible infected recovered dead model, and predict the evolution of the epidemic in Hubei. As of February 29, it is predicted that at least 45,000 people will be infected, and 2700 people will die in Hubei. In fact, about 67,000 people have been infected, and 2800 people have died in this period (Anastassopoulou et al. 2020) . The suspension of urban public transport, the closure of places of entertainment, and the prohibition of public gatherings are related to the reduction of COVID-19 cases. Without the Wuhan travel ban and China's emergency response, more than 70,000 people will be infected with the virus outside Wuhan by February 19, 2020. China's prevention and control measures seem to have succeeded in breaking the transmission chain and preventing contact between infectious and susceptible people . The three major non-drug interventions used in China not only contain the development of the epidemic in China but also win a time window for the world. If we do not implement a strong nondrug intervention "combination boxing," the amount of COVID-19 cases in China may exceed 7 million (Lai et al. 2020) . Spatial temporal "risk source" model predicts confirmed cases and identifies high-risk areas in the early stage by using population mobility data (Jia et al. 2020) . If the UK government does nothing, it could face more than 500,000 deaths. Without intervention, the USA could face 2.2 million deaths (Adam 2020) . The AI app allows individuals to report symptoms themselves to effectively predict whether they have COVID-19, with an accuracy of nearly 80% (Menni et al. 2020) . If the average growth rates 30.6% in the USA during the past 14 days, we will be looking at 3.9 million cases by April 12, 2020 (Perc et al. 2020) . After the epidemic, COVID-19 will break out again in winter 2020. The USA may still need long-term or intermittent social alienation interventions by 2022. SARS-CoV-2 should continue to be monitored, as the new outbreak is likely to occur again later in 2025 (Kissler et al. 2020) . The 3 biomarkers of COVID-19 are used to predict the mortality rate of COVID-19 patients at least 10 days in advance, and the accuracy rate is over 90% ). The ANN model with EEMD-based decomposition technique for predicting COVID-19 epidemic is developed. The training and testing period property of the ANN model obtained by R 2 values 0.9997 and 0.99982, respectively. And the R 2 of validation is 0.99981 (Hasan 2020) . Cloud computing and machine learning (ML) is deployed to forecast COVID-19 epidemic. The results show that the severity of the global spread of COVID-19 (Tuli et al. 2020a) . The outbreak of COVID-19 has seriously affected the environment, ecology, economy, and society (Kluge et al. 2020) . COVID-19 is a major menace to the world economy. In March 2020, the outbreak of the epidemic caused a huge earthquake in the US stock market, triggering the circuit breaker mechanism four times a month. At present, there is an urgent need to know what the future transmission trend of COVID-19 might be. We examined the feasibility of using AI with past days' cases as input variables to predict global infected confirmed cases At present, the risk of the world economic recession mainly comes from the spread of the epidemic, and the recovery process depends on when the epidemic is contained in the world. Special drugs of COVID-19 are still not successfully developed in the current situation. It is urgent to launch joint action to fight the epidemic. China's experience will undoubtedly provide effective help for the global fight against the epidemic. We will predict the epidemics trend of COVID-19 for different countries using deep learning approaches, such as the recurrent neural network (RNN), the gated recurrent unit (GRU), and the long short-term memory (LSTM) and compare how their model performs in diverse demographics. We are planning to get a single standard model that can be used for any country, which may be a combination of different algorithms. People should gather less. People should avoid the places where people gather, especially the places with poor air mobility, and reduce unnecessary going out. If going out, personal protection and frequent hand washing should be done. In densely populated public places, people try to keep a certain social contact distance with others, and it is recommended to wear medical surgical masks. Only by adhering to the concept of human community, following the trend, responding to the times, avoiding the pitfalls of protectionism rationally, strengthening international joint defense and joint control, coordinating macroeconomic policies, and strengthening global supply chain cooperation, can the international community work together to promote the early recovery of the world economy. Authors' contributions Q.G. analyzed the data and wrote the manuscript. Z.H. performed model building. All the authors read and approved the final manuscript. Data availability Extra data is available by emailing to guoqingchun@lcu.edu.cn in on reasonable request. Competing interests The authors declare that they have no competing interests. Ethics approval and consent to participate All data were obtained with the agreement of world health organization (WHO). The data in this article are obtained from an open database of the World Health Organization (links are provided in the resources section). These can be accessible locally for educational purposes. Consent to publish Not applicable. Special report: the simulations driving the world's response to COVID-19 The number of confirmed cases of Covid-19 by using machine learning: methods and challenges. Arch Comp Methods Eng Anastassopoulou C, Russo L, Tsakris A, Siettos C (2020) Data-based analysis, modelling and forecasting of the COVID-19 outbreak Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic Estimating excess 1-year mortality associated with the COVID-19 pandemic according to underlying conditions and age: a population-based cohort study Designing resource-constrained neural networks using neural architecture search targeting embedded devices A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Time series forecasting of COVID-19 transmission in Canada using LSTM networks The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Air pollution forecasting using artificial and wavelet neural networks with meteorological conditions Transmission of SARS-CoV-2 in domestic cats A methodological approach for predicting COVID-19 epidemic using EEMD-ANN hybrid model Comparative study of artificial neural networks and wavelet artificial neural networks for groundwater depth data forecasting with various curve fractal dimensions Population flow drives spatio-temporal distribution of COVID-19 in China Exploring the growth of COVID-19 cases using exponential modelling across 42 countries and predicting signs of early containment using machine learning Outbreak prediction of COVID-19 for dense and populated countries using machine learning Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period Prevention and control of noncommunicable diseases in the COVID-19 response The effect of human mobility and control measures on the COVID-19 epidemic in China Effect of non-pharmaceutical interventions to contain COVID-19 in China Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins Feng Z (2020a) Early transmission dynamics in Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Aerodynamic analysis of SARS-CoV-2 in two Wuhan hospitals Real-time tracking of self-reported symptoms to predict potential COVID-19 Forecasting COVID-19 Hospitalization and mortality among black patients and white patients with Covid-19 Learning representations by back-propagating errors Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2 Pathogenesis and transmission of SARS-CoV-2 in golden hamsters Infection of dogs with SARS-CoV-2 An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing Modelling for prediction of the spread and severity of COVID-19 and its association with socioeconomic factors and virus types. medRxiv, 2020.06 Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study An interpretable mortality prediction model for COVID-19 patients Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography A novel coronavirus from patients with pneumonia in China Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations