key: cord-0799270-65scocda authors: Djeddou, M.; Hameed, I. A.; Hellal, A.; Nejatian, A. title: Predictive modeling of COVID-19 New Confirmed Cases in Algeria using Artificial Neural Network date: 2021-04-04 journal: nan DOI: 10.1101/2021.03.29.21254532 sha: abea35b33ba6a78fdbbd74e2238f0c13868aa55e doc_id: 799270 cord_uid: 65scocda This study investigates the potential of a simple artificial neural network for the prediction of COVID-19 New Confirmed Cases in Algeria (CNCC). Four different ANN models were built (GRNN, RBFNN, ELM, and MLP). The performance of the predictive models is evaluated based on four numerical parameters, namely root mean squared error (RMSE), mean absolute error (MAE), Nash-Sutcliffe efficiency (NSE), and Pearson correlation coefficient (R). Taylor diagram was also used to examine the similarities and differences between the observed and predicted values obtained from the proposed models. The results showed the potential of the multi-layer perceptron neural network (MLPNN) which exhibited a high level of accuracy in comparison to the other models. Bacteria, parasites and viruses, can cause serious diseases for human's health but the most serious diseases are usually caused by pathogenic viruses and among these viruses, coronavirus are the large family of pathogenic viruses. Infection with these types of viruses can cause respiratory, liver, gastrointestinal and neurological diseases. They are distributed among humans, birds, cattle, mice, bats and other wildlife [1] [2] . In December 2019, the World Health Organization (WHO) received notifications from Chinese authorities for many respiratory illnesses linked to people who visited a local market for seafood and wildlife in Wuhan city, Hubei Province, China [3] . Virological investigation suggests that the causative agent of this pneumonia is a new coronavirus (COVID-19) [4] . The city of Wuhan has become the main spread focus of a new coronavirus called COVID-19 (previously it was called 2019-nCoV), this new coronavirus has made hundreds of thousands of victims around the world, and become the global pandemic threatening human health. Different researchers employed statistical and mathematical modeling approaches to predict the spreading of the outbreak in different countries. Dehesh et al. [6] proposed an Auto Regressive Integrated Moving Average (ARIMA) model to predict the trend of confirmed cases in different countries using data collected from Johns Hopkins University official website. Furthermore, Zhao et al. [7] proposed a mathematical model to estimate the real number of COVID-19 cases in the first half of January 2020. They concluded that the number of unreported cases resulted in 469 cases from 1 to 15 January 2020 determined by the maximum likelihood estimation. Nishiura et al. [8, 10] proposed an estimation model for the infection rate of COVID-19 in Wuhan, China using the data of 565 Japanese citizens who had been evacuated from Wuhan from 29 to 31 January 2020. Machine learning and artificial intelligence have become very important tools in real applications especially with the era of big data. Al-qaness et al. [9] proposed an improved adaptive neurofuzzy inference system (ANFIS) to forecast the COVID-19 confirmed cases of the upcoming ten days. The outcomes showed good performances. Genetic evolutionary programming has been applied to forecast the spread of COVID-19 outbreak in India. The study show the capabilities of the proposed model to forecasting both total cases and death cases with highly accuracy [10] . A NARANN model was applied to forecast the total COVID-19 cases in Egypt. The model has a better accuracy than that of conventional ARIMA model [11] . Artificial neural network-based curve fitting techniques was used for prediction and forecasting of the Covid-19 number of rising cases and death cases in India, USA, France, and UK. The results have shown that ANN can efficiently forecast the future cases of COVID 19 outbreak of any country [12] . A multi-layer perceptron neural network (MLPNN) model was proposed to estimate and forecast the number of confirmed and recovered cases of COVID-19 in the upcoming days until September 17, 2020 in KSA [13] . An extreme learning machine (ELM) was used to predict the COVID-19 new cases in Algeria. The results showed that the proposed ELM model achieved an adequate level of prediction accuracy with smallest errors [14] . In this study, four artificial neural network models namely (radial basis function neural networks "RBFNN", generalized regression neural networks (GRNN), Extreme learning machine (ELM), and multi-layer perceptron neural network (MLPNN) are proposed for predictive modeling of COVIS-19 new confirmed cases (CNCC) in Algeria. The pandemic Covid-19 in Algeria spreads from February 25, 2020, when an Italian citizen was tested positive for SARS-CoV-2, and then from March 1, an outbreak of contagion is formed in the department of Blida [15] . Sixteen members of the same family were infected with the coronavirus at a wedding party following contact with Algerian nationals in France [16] . The Blida department becomes the epicenter of the coronavirus epidemic in Algeria [17] . Confirmed Covid-19 cases was then detected, there were 2769 deaths and 100195 confirmed cases in Algeria as of January 3rd, 2021. Artificial Neural Networks (ANNs) present a very flexible, robust and general method of modeling data. An ANN can be trained on experimental data to model a nonlinear mapping realized by some system. One of the advantages of an ANN model, over traditional nonlinear regression approaches, is that it is not necessary to first find a suitable parametric form of the regression model for the problem at hand. A one hidden layer multilayer perceptron can act as a universal function approximator [19] . A schematic diagram of the MLPNN architecture is shown in figure 2. ANNs may be defined as structures comprised of densely inter-connected adaptive simple processing elements (called artificial neurons or nodes) that are capable of performing massively parallel computations for data processing and knowledge representation [20] [21] . Radial basis function neural networks (RBFNN) are widely used for predictive modeling in engineering fields [22] [23] . RBFNN as show in figure 3 , is represented as a three-layered architecture. First layer is the input layer that receives the inputs. The middle layer is known as the hidden layer, which contains a non-linear RBF activation function. The third one is the output layer that makes the prediction. The formulation of a RBFNN output is as follows: Where: x is input vector, yi is the network's i th output, k is the number of neurons in the hidden layer, cj denotes the center of the j th hidden neuron, wij represents the weight of the link from the j th neuron in the hidden layer to the i th neuron in the output layer, and ‖. ‖ is the Euclidian norm;  is the radial basis function which is used in the neurons of hidden layer. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 4, 2021. This parameter is a multidimensional radial basis function describing the difference between an input vector and a predefined center vector. The most common radial basis functions used is the Gaussian Function which is defined as follows: Where σj is the width of the jth hidden neuron, Finding the centers, widths, and the weights connecting hidden neurons to the output is the key for constructing and training the RBF-NN. Both the dimensionality and the distribution of the input patterns affect the number of the hidden neurons. If the dimensionality is reduced, the number of hidden neurons will also be decreased [24] . Among the advantages of the RBFNN, their simple and compact architecture, which means there are fewer parameters that need to be optimized. The RBFN has good tolerance to the noise in the dataset and a high learning efficiency [25] [26] . The Generalized Regression Neural Networks (GRNN) was firstly proposed by Specht 1991 [27] . A GRNN is a variation of the radial basis neural networks, which is based on kernel regression networks [28] [29] . One of the advantages of this network is its consistency; the estimation error approaches zero, with only slight restrictions on the function as the size of the training set becomes large. More details for the GRNN model are given in [27] and its schematic diagram is shown in figure 4 . GRNN does not require an iterative training procedure as back propagation networks. It approximates any arbitrary function between input and output vectors, drawing the function estimate directly from the training data. In addition, it is consistent that as the training set size becomes large, the estimation error approaches zero, with only mild restrictions on the function [30] . A hidden neuron layer is created to hold the input vector. The weight between the newly created hidden neuron and the output neuron is assigned the target value. A GRNN is based on the following formula: Where: y is the output of the estimator; x is the estimator input vector; [ | ] is the expected value of output, given the input vector x, and f(x,y) is the joint probability density function (pdf) of x and y. The function value is estimated optimally as follows: Where: wij is the target output corresponding to input training vector xi and output j; The major difference between GRNN and RBF neural networks is the method that the weights wij, are determined. Instead of training weights, the GRNN assigns the target value directly to the weights wij, from the training set associated with input training vector and a component of its corresponding output vector. The basic theory of ELM can be given as follows: For M arbitrary distinct inputs (xi,yi) with xi ∈ ℝd and yi ∈ ℝ; a standard SLFN with N hidden nodes and activation function f can be modelled as the following: where wi are the input weights to the ith neuron in the hidden layer, bi the biases and bi are the output weights. In the case where the SLFN would perfectly approximate the data, the relation is ∑ + , ∈ {1,2,3, … . , } (11) which can compactly be written as, = (12) Where H the hidden layer output matrix is defined as, Where: = ( … . . ) and = ( … . . ) Considering the randomly initialized first layer of the ELM and the training inputs, the hidden layer output matrix H can . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 4, 2021. ; https://doi.org/10.1101/2021.03.29.21254532 doi: medRxiv preprint be computed. Given H and the target outputs, output weight b can be solved by finding the least square solution to the linear system defined by Eq. (12) . This solution is given by = ℳ℘ , where ℳ℘ is the Moore-Penrose generalized inverse of the matrix H. More details on the ELM algorithm can be found in the original paper [31] [32] . The root mean square error (RMSE), mean absolute error (MAE), the Pearson correlation coefficient (R) and the Nash-Sutcliffe efficiency (NSE), which were applied in this study. Four different models were developed for modeling the COVID-19 New Confirmed Cases (CNCC) in Algeria, as shown in Table 2 . The models was developed using simple ANN (GRNN, RBFNN, ELM and MLPNN) . The identification of a maximum number of hidden neurons, iterations, transfer function and better structure is a very important task to design the different ANN models in order to develop the best model. The development of all the models was performed under MATLAB 2019b environment. The obtained results of the four models using GRNN, RBFNN, ELM and MLPNN models for the CNCC prediction are presented in Table 2 . From the overall comparison, it can be observed that all the model combinations demonstrate acceptable performance in the modeling horizon; this can be proved by considering the NSE values that are greater than 0.80 in both training and testing, except for ELM and GRNN models in testing phase. Table 2 shows that the best model produces the highest prediction accuracy was MLPNN model in terms of RMSE, MAE, R and NSE for CNCC prediction. Among the models, MLPNN with, RMSE (21.3122), MAE (16.2280), R (0.9958) and NSE (0.9844) values in the training phase, had the highest level of accuracy in comparison to GRNN, RBFNN and ELM. The fitting between the observed and predicted CNCC is shown in figure 5 . It can be clearly seen that the overall performance of MLPNN combinations is superior to GRNN, RBFNN, and ELM models combinations. Also, it is predicted values trends demonstrating better agreements with the observed values than other three models. In terms of overall percentage accuracy in training phase, MLPNN outperformed and increased the predictive performance up to 18.46%, 6.76% and 13.21% with regard to the GRNN, RBFNN and ELM models, respectively. In testing phase, the MLP model outperforms GRNN, RBFNN and ELM by a relative small increase of the errors RMSE (28.9663), MAE (18.9621) and increase of performance parameters NSE (0.9878), respectively. The fitting between the observed and predicted CNCC values is shown in figure 6 . The behavior of all models was influenced by the effect that highest values of CNCC are in testing phase. In order to capture the detail of the performance of the proposed predictive models using different ANN configuration (RBFNN, GRNN, ELM, and MLPNN), a twodimensional method that exhibits how the proposed models matches the observed and corresponding predicted values of CNCC. Taylor diagram is constructed to visualize the information in figure 8 . The Taylor diagram is also the most widely recommended diagram for accuracy comparison due to the advantageous nature of combining and quantifying multiple statistical performance metrics by comparing the similarity between the observed and predicted values in one diagram. However, in accordance with the visualized graphical interpretation, the MLPNN model was closer to the target observed values in comparison with the other models. Lastly, a comparative study is presented in figure 9 , in terms of CNCC predicted values using the two best models in terms of performances (RBFNN and MLPNN) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 4, 2021. ; https://doi.org/10.1101/2021.03.29.21254532 doi: medRxiv preprint To summarize the discussion section, the proposed multilayer perceptron neural networks was found to have excellent prediction skills for COVIS-19 New Confirmed Cases (CNCC) with regard to the others single artificial neural networks models. Every day new confirmed cases of COVID-19 are observed in Algeria. Hence, effective techniques are required to study and analyze the data trends of COVID-19 cases for public health facilities management. This paper presented the application of different artificial neural networks models for the prediction of COVID-19 New Confirmed Cases (CNCC). Four different simple neural networks models (GRNN, RBFN, ELM, and MLPNN) were built for CNCC prediction. The predictive results revealed the potential of MLPNN with a high level of accuracy over the comparable single models for all the considered variables with an exception in terms of performances for ELM models. The results also depicted that both MLPNN and RBFNN models demonstrated prediction skill and can, therefore, serve as reliable models in public health facilities management. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 4, 2021. ; https://doi.org/10.1101/2021.03.29.21254532 doi: medRxiv preprint Emerging coronaviruses: Genome structure, replication, and pathogenesis Review of bats and SARS Novel Coronavirus (2019-nCoV) Origin and evolution of pathogenic coronaviruses Novel Coronavirus: Where We are and What We Know Forecasting of COVID-19 Confirmed Cases in Different Countries with ARIMA Models Estimating the Unreported Number of Novel Coronavirus (2019-nCoV) Cases in China in the First Half of January 2020: A Data-Driven Modelling Analysis of the Early Outbreak The Rate of Underascertainment of Novel Coronavirus (2019-nCoV) Infection: Estimation Using Japanese Passengers Data on Evacuation Flights Optimization Method for Forecasting Confirmed Cases of COVID-19 in China Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks Forecasting of Covid-19 cases based on prediction using artificial neural network curve fitting technique Artificial Neural Networks for Prediction of COVID-19 in Saudi Arabia Predictive Modelling of COVID-19 New Cases in Algeria using An Extreme Learning Machines (ELM), medRxiv preprint Rues de plus en plus vides, hôpitaux dépassés… 16 membr newspaperes d'une même famille contaminés en Algérie Blida se confine : «Le pire est à craindre… Neural Networks: A Comprehensive Foundation Guide to Neural Computing Applications New Approach for Modeling Generalizes Microbial Groth Curves Using Artificial Neural Networks A predictive model based on rbf neural network Time Series Prediction Using Radial Basis Function Neural Network Fast learning in networks of locally-tuned processing units Selection of proper neural network sizes and architectures-a comparative study Advantages of radial basis function networks for dynamic system design A general regression neural network Public transportation trip flow modeling with generalized regression neural networks Prediction of plasma etching using a randomized generalized regression neural network Forecast of daily mean, maximum and minimum temperature time series by three artificial neural network methods Extreme learning machine: Theory and applications Trends in extreme learning machines: A review