key: cord-1006617-sv5ftud5 authors: Amar, Lamiaa A.; Taha, Ashraf A.; Mohamed, Marwa Y. title: Prediction of the final size for COVID-19 epidemic using machine learning: A case study of Egypt date: 2020-08-25 journal: Infect Dis Model DOI: 10.1016/j.idm.2020.08.008 sha: 5681d7bc101234a30d5367c0872837e99e505a06 doc_id: 1006617 cord_uid: sv5ftud5 COVID-19 is spreading within the sort of an enormous epidemic for the globe. This epidemic infects a lot of individuals in Egypt. The World Health Organization states that COVID-19 could be spread from one person to another at a very fast speed through contact and respiratory spray. On these days, Egypt and all countries worldwide should rise to an effective step to investigate this disease and eliminate the effects of this epidemic. In this paper displayed, the real database of COVID-19 for Egypt has been analysed from February 15, 2020, to June 15, 2020, and predicted with the number of patients that will be infected with COVID-19, and estimated the epidemic final size. Several regression analysis models have been applied for data analysis of COVID-19 of Egypt. In this study, we've been applied seven regression analysis-based models that are exponential polynomial, quadratic, third-degree, fourth-degree, fifth-degree, sixth-degree, and logit growth respectively for the COVID-19 dataset. Thus, the exponential, fourth-degree, fifth-degree, and sixth-degree polynomial regression models are excellent models specially fourth-degree model that will help the government preparing their procedures for one month. In addition, we have applied the well-known logit growth regression model and we obtained the following epidemiological insights: Firstly, the epidemic peak could possibly reach at 22-June 2020 and final time of epidemic at 8-September 2020. Secondly, the final total size for cases 1.6676E+05 cases. The action from government of interevent over a relatively long interval is necessary to minimize the final epidemic size. Exclusive outbreaks of a novel epidemic Coronavirus (COVID-19) worldwide lead the researchers and scientists in different fields to look for the ways to address the challenges of this virus and work on overcoming the epidemic. At the end of June 2020, more than 10 million infected cases had been reported in 188 regions and territories since the first declaration of December 2019 in Wuhan City, Hubei region, China [1] . The number of identified cases has been increasing rapidly over the world, so different researches and projects faced new recent challenges to forecast the peak of the epidemic to help the governments make decisions for limiting the spreading of the malady. In Egypt, the number of reported cases has increased daily after the first case was declared on 14 February 2020. In June 2020, exceed 60 thousand infections and about three thousand deaths cases had been reported and daily reports were published by the Health and Population Ministry since the starting date of virus till now. The most important statistics of the current situation to combat the emerging coronavirus in Egypt according to ministry reports compared to the world; First, Egypt ranked 77 th the dying toll of the whole number of individuals infected with the virus. The ratio is equivalent to (4.346%) after Lithuania (4.29%) and Guatemala (4.27%). Egypt is preceded by Curacao, America (4.348%). Second, with respect to the recovery rate, the ranking is 184 th and the ratio is equivalent to (27%), followed by Eretria (26.1%). Third, with respect to the terms of total injuries per million people, the ranking is 103 th (682 cases/million), compared to all countries and regions worldwide. Finally, the ranking is 23 th with respect to the number of individuals infected with the virus among 215 regions and countries around the globe [2] . When the Coronavirus emerged in Wuhan city, Egypt began its preventive procedures against this fatal virus as one of the most attractive tourism countries worldwide. Immediately, isolation departments in hospitals of fever were delegated with dealing with such cases. Health and Population Ministry played an important role in raising awareness and monitoring the global epidemiological situation around the clock. Where the new virus was described and clarified to the people at all private clinics and hospitals. The ministry informed all the citizens' country wide to immediately report cases and then refers them to the nearest chest or fever hospital. The ports of Alexandria, Red Sea, Damietta, and Port Said declared the emergency to face the Coronavirus in conjunction with the departments of quarantine in each port. In order to keep people and visitors safe from the danger of this virus, the quarantine officials were present to inspect and examine all arrivals during the ports' reception for boats and ships, particularly those from the countries where the disease was appeared and spread. An operating control room was established in coordination to the quarantine departments to continue checking the arrivals. When a person is suspected to have contacted a patient with coronavirus, he or she will be immediately isolated. About the successful experiences that took place these days, the Egyptian Health Minister announced the success of an injection experiment for critical cases by plasma from recover patients, increasing the recovery rate of discharge in the hospital increased too. The challenge now is how to estimate the peak of a pandemic keeping in mind all the efforts that has been made in all directions. The major challenges associated with COVID-19 is delivering several works to overcome the epidemic and take the necessary precautions needed to educate people and support government efforts that have been made to stabilize the country. The challenge now for researchers all over the world is how to estimate the peak of a virus keeping in mind all the efforts in all directions. In this section, we will review some related works in this direction. The authors in [3] use an Epidemic Calculator that uses (SEIR compartmental model) with Health of Egyptian Ministry of and population released regular reports (14 February 2020 to 11 May 2020). For the highest estimated case, mortally rate (7.7 percent), the number of hospitalized people predicted to peak in mid-June, with a total of 20,126 hospitalized cases of 20,126 individuals and total expected deaths 12,303. The author recommends reinforcement of the Egyptian preventive and control measures to get better the case fatality rate (CFR) and the numeral of cases to the least possible as we reach the peak. It is most important that appropriate quarantine measures retained before the end of June 2020. Machine learning and statistical modeling approaches were used to predict and estimate the ending stage of COVID-19 in Kuwait (especially with time-different infection rates and individual contact numbers) [4] . Results indicate that the estimated number of reproductions in Kuwait is 2.2, with data up to 19 April 2020 and before the repatriation plan. The results indicate that a high contact rate among the population denotes that the epidemic peak value will not reach and the country needs more strict intervention measures. Moreover, the prediction of the peak date and simulation of the variations that could be happening by the social behaviours of Egyptians during Ramadan (the holy month) [5] . Mainly, the peak will depend on the behaviours of people towards social distancing and hygiene measures. The strategies of lockdown in Egypt have a positive effect on the delay of the epidemic peak, providing more time to help the global health sector to encompass the situation. The Egyptian government should monitor the reported cases daily along with the performance of citizens in the coming month to identify the proper strategies to flatten the curve as much as possible. Numerical approaches and logistic model are used in [6] for COVID-19 analysis. Researchers suggest three recognized numerical methods (Euler's method, Runge-Kutta method of order two (RK2) and of order four (RK4)) for solving such equations about the Global health care and suggest important notes. Numerical results may use to guess the number susceptible to infection, recovered, and quarantined individuals in the future to support the foreign efforts to develop their intervention services and further prevention. In [7] , processing sustainable development was studied using the classification of confirmed cases of COVID-19. Therefore, using one of the Artificial Intelligence (AI) techniques, the community data handling system (GMDH) type of neural network used binary classification modelling. The proposed model was developed as a case study in China's Hubei province. Some important parameters, namely maximum, minimum, and average daily temperature, the density of a city, relative humidity, and wind speed, parsed as the input data set, and picked the number of confirmed cases as the output data set for 30 days. The proposed model of binary classification provides greater capacity for accuracy in predicting the reported cases. Furthermore, regression analysis and the pattern of reported cases relative to the variations of the daily weather parameters (wind, humidity, and average temperature) have been performed. The results showed that relative humidity and the maximum daily temperature had the greatest impact on the actual cases. The relative humidity in the confirmed case study was 77.9% on average, positively affected, and the average daily temperature was 15.4 o C on average, affected negatively the real cases. In [8] compares the COVID-19 data from India against several countries as well as key states in the US with a main outbreak, and it is found that the first reproduction number R0 for India is in the expected range of 1.4-3.9. Meanwhile, the ring of growth of infections in India is very close to that in Washington and California. Exponential and classical models of susceptibleinfected-recovered (SIR) depend on current data used to render frequent short-ring and long-term predictions. From the SIR model, it is estimated that India will enter stability by the end of May 2020 with the final size of epidemic near to 13,000. Though, if India enters the group transmission point, the approximation will be invalid. The effect of social distancing is also measured by analyzing data from various geographical locations, once again with the presumption of no group transmission. Researchers and communities are provided new AI and huge data applications to get better the COVID-19 epidemic situation, and also further studies in stopping COVID-19 outbreak to control the virus situation [9] . The paper presented a survey on the state-of the-art solutions in the action against the COVID-19 pandemic. In previous studies, researchers depended on various proposed methods and analysed the results based on some of the parameters and models. The main contributions of our study are as follows: 1-Using machine learning and the best regression analysis model to predict the rate of spread of COVID-19 for a month in Egypt. 2-Presenting mathematical models to predict the spread of COVID-19 in Egypt estimate the epidemic size and predict an ending phase of the epidemic. A. Study Area Study Area Study Area Study Area Egypt is an African country found in the Eastern Mediterranean vicinity in line with the classification of the World Health Organization (WHO) and categorized as a lower-middle-profits country with respect to a World Bank category. The total inhabitance of Egypt is almost 100 million individuals and almost 8% of them are exceeding 60 years old. About 1.7% of the entire inhabitance lives under the national poorness line. Systems of health in Egypt, like African countries, have low resources to confront the pandemic. Egypt features a physician density of 0.79 physicians/1000 individuals and a single bed capacity of 1.6 beds/1000 individuals. The demographic structure of Egypt highlights a particular nature that varies from other European and Asian countries where the middle age of the Egyptians is 24.6 years (the middle age for Chinese is 38.4 years). As they were 4.23% of Egyptian individuals have almost 65 years. The infected countries' experiences (in Europe and Asia) appeared that elderly individuals over 60 years and individuals who have weakening maladies are most defenceless to genuine grades of COVID-19. In this manner, the Egyptian young may act as a defensive line to constrain the spread of the widespread around the world. Data sources Data sources Data sources Daily, prevalence data of COVID-19 is reported by the Egyptian Ministry of Health and Population [10] and www.ourworldindata.org/coronavirus-source-data. Figure (1) presents the COVID-19 confirmed, and mortal cases distribution in Egypt for the period from 15 February to 15 June 2020. It is easy to observe the spread is exponential growth, which needs to be controlled. Its future epidemiological progression is still ambiguous as it spreads randomly. Regression Models Regression Models Regression Models Regression model analysis is a subset of Machine Learning (ML) algorithms [11] . A variety of regression models is available including linear and non-linear forms, namely Multiple Linear Regression. Some of these models follow the parametric or the non-parametric approaches for statistical inference. The regression analysis technique is a kind of modeling technique used in epidemiologic research to estimate relationships among sets of variables. ''The regression analysis techniques are a set of ML methods that allow us to forecast continuous results variable (Y) based on one or multiple predictor variables (X). It assumes a linear relationship among the results and the predictor variables''. Numbers of regression analysis technique s have been applied to forecast the accumulated confirmed COVID-19 within (15 days), the final size of epidemic cases, and the final time of epidemic in Egypt. In this proposed, we consider the following models: It is used to epidemic model cases in which starts growth slowly and then accelerates speedily without bound, or where decrease begins speedily and then the speed reduce to get closer until reach to zero. The equation that describes this model is: (1) A polynomial term turns a linear regression model into a curve but it still qualifies as a linear model. The polynomial models quadratic, third-degree, fourth-degree, fifth-degree, and sixth-degree were used in those situations. The n th order polynomial model in one variable is given by the equation: (2) Where (n=2,….,6) represents the degree of the models. The coefficients ɑ 1 , ɑ 2 … ɑ n are called the parameters of regression analysis. The logit model or (logistic model) is a technique borrowed by machine learning from the field of statistics. The logit model is a regression model that is widely used in epidemiology mathematical models to estimate the growth rate of the epidemic [12] . The model assumes an exponential growth at the beginning of the epidemic, followed by a steady increase and finally ending with a declining growth rate. The logit model is presented by equation (3) Where and t = 0 , assuming A, k > C 0 . Hence, if C is an accumulated number of cases, C r defined as the rate of infection cases, K is the final epidemic size, t is the time, is the growth rate reaches its maximum when . To fit the maximum number of confirmed cases (peak number of cases) of the infected population C Peak and coefficient, t Peak and are defined by the formulas. If C 1 , C 2 … C f represent the number of cases at times t 1 , t 2 ,…, t final , then the final size predictions of the epidemic based on these data are K 1 , K 2 ,…, K f , the predicted final epidemic size is presented by the equation (8) by iterated Shanks transformation [15] . The logit model presented in equation (4) contains three coefficients: K, C r , and A which should be determined by regression analysis because of the nonlinearity of the model. The Correlation coefficient means the force of a linear relationship between two variables. According to Karl Pearson, the coefficient of correlation is a measure or degree of the linear relationship between two random variables X and Y. The values range between -1.0 and 1.0. The correlation coefficient is denoted by "r". To find r is calculated the Pearson product-correlation with the formula as: (9) Here, when calculating the correlation coefficient γ between the date and number of real cases in Egypt. There are some predictions that are given as: o r = -1, there is an inverse relationship between input and output variables (means if the input variable increases the output variable decrease and vice versa). • Residuals. Residuals are the measure of the quality of fit straight lines of the suggested models. It is the difference between the observed values of the response variable (Y) and the value of the proposed model. The following formula is used to calculate the residuals: • Adjusted-R 2 In this suggested study, we have calculated both simple and adjusted R 2 to know which the extra terms n and d terms get pitter the predictive power of proposed methods. Adjusted R 2 for polynomial regression is defined as the following formula: Where n is the number of observations in training datasets and d is the degree of polynomials in regression models. SS residual represents the sum of the squared residuals from the regression and SS total represents the sum of the squared difference from the mean of the dependent variables. In this proposed study, we have taken a real dataset for the COVID-19 after the outbreak of the epidemic in Egypt. The first case of the COVID 2019 epidemic was found in Egypt on 15 February 2020 after that, things escalated in March, several cases were reported all over the country at the end of March caused of loss of human lives. Although the Prime Minister issued in the 4 th quarter of March a package of prudential decisions, there was a closure of all shops and establishments that provide entertainment or recreation, as well as the suspension of studies because of the number of students in schools and universities, is approximately 25 million. However, the COVID-19 epidemic in Egypt is growing in exponential form from 15 February 2020 to 15 June 2020. The discussed machine learning approaches output the possible number of cases for the next 15 days across the world. In this study, illustrates the predicted trend of the COVID-19 using different regression approaches were utilized to fit the confirmed cumulative cases in Egypt from the start of the outbreak on 15 February 2020 until 15 June 2020 and predict short term forecast to help the government for prevention measures in Egypt. We have been utilized seven regression analysis models namely exponential, quadratic, third degree, fourth degree, fifth degree, sixth degree, exponential polynomial, and logit respectively for the COVID-19 dataset. Machine learning approaches are implemented using the python library. First of all, the correlation coefficient calculate between the date and number of confirmed cases of COVID-19 spread up of Egypt from 15th February to 15th June 2020 to test the correlation between them. The correlation coefficient is γ = 0.8435, which is very close to 1, indicating that there is a strong statistical correlation between the two variables, date and the number of confirmed cases spread of COVID 2019. The regression models' approach for epidemic analysis are trained and after that tested on real data using the date and the number of confirmed cases as the label for the corresponding day presented in the above table 1. Egypt datasets were separated into training datasets from 15-February-2020 to 31-May-2020 and testing dataset from 1-June to 15-June 2020. In this regard, we have used exponential, quadratic, third-degree, fourth-degree, fifth-degree, and sixth-degree polynomial regression models. In these proposed regression models, we have used independent variable and dependent variable . From training dataset was calculated the coefficient for the equation (1&2) for all regression models as shown in table 2 and represent them in the equations below as the following: We show below in figure (2a) the results of actual cases of the proposed fitted regression analysis-based models namely: exponential, quadratic, third-degree, fourth-degree, fifth-degree, sixth-degree polynomial, and logit growth for the training datasets of the COVID-19 in Egypt from (12-Feb to 31-May). As shown in figure (2b) a comparison of the actual results and predicted results of the proposed models: exponential, quadratic, third degree, fourth degree, fifth degree, sixth degree polynomial and, logit growth for the testing dataset of the COVID-19 from (1-Jun to 15-Jun). We observed from the figures that the result of the proposed fourth-degree, fifth-degree, and sixth-degree polynomial methods are very close to actual results. Fig. 2 (b) Comparison of the real case and the predicted models: exponential, quadratic, third degree, fourth-degree, results of the proposed models: exponential, quadratic, third-Fifth-degree and sixth-degree polynomial degree, fourth-degree, fifth-degree and, sixth-degree polynomial on the testing dataset of Egypt COVID-19. In regression analysis, residuals play an important role in the COVID-19 outbreak data analysis in Egypt. All the residuals for the proposed methods exponential, quadratic, third-degree, fourth-degree, fifth-degree and, sixth-degree polynomials are calculated and plotted as in Figure 3 . We observed that the exponential, fourth-degree, fifth-degree, and sixth-degree polynomial regression models give strong patterns. Finally, Figure 2 and Figure 3 show that the better-fitted results and residuals were the exponential, fourth-degree, fifth-degree and, sixth-degree polynomial, respectively. Therefore, the proposed models: exponential, fourth-degree, fifth-degree, sixthdegree gave excellent results to predict the next 15 days. The fourth-degree regression model has given excellent result to predict the next one month as shown in figure (4) , so it is very useful for future prediction of the COVID-19 outbreak in Egypt for one month so, the government will take a good decision. We utilized the logit growth regression approach to fit the confirmed cumulative cases in Egypt from the start of the outbreak on 15 February 2020 until 15 June 2020 and represent on the training dataset and compared the prediction with the testing data as shown in figure (5) . From the below figure (6), we show that the estimated final of the epidemic t final was probably on 8 Sep 2020. The Shanks a transformation equation was used for the predicted of the final epidemic size K. It appears that the prediction of the logit model reaches to the final size almost at 1.6676E 05 cases. J o u r n a l P r e -p r o o f Table ( 3) represents the coefficients A, K, and C r of equation (4) and the phases of the epidemic time that were estimated by all regression analysis models. Notes: coronavirus affected by phases as shown in figure (6) 1-The first phase: start case infection and slow growth of the epidemic. 3-Third phase: steady-state and slow growth (peak) 4-Fourth phase: start decrease. The simulation was carried out the parameters estimated namely: start phase of the epidemic, the peak date of epidemic, the start of ending phase of the epidemic and the root mean square error. Table 4 the calculated results of the Sum of Square regression (SSR), residual square (R 2 ) and, adjusted-(R 2 ) for all proposed models, which highlights the best fitting of the suggested models. A forecast of COVID-19 spread in Egypt was carried out using various statistics and machine learning modeling approaches. The forecast was based on the data from 15 February 2020 until 15 June 2020. These models also predicted the outbreak of the COVID-19 in Egypt for the next 15 days, one month, the final size of the infected cases, and the final time of the epidemic. Here, we have found out that the best of the proposed models namely exponential, fourth-degree, the fifth-degree, and sixth-degree polynomial are strong residual and prediction for the next 15 days and also the fourth-degree model has given an excellent prediction for one month. These models are very useful for the Egyptian government for managing the COVID-19 outbreak for the next months. The study aimed to investigate and assess the effectiveness of preventive measures of the government of Egypt to control the spread of COVID-19. In this study, by applying the logit growth regression model to the daily reported cases of COVID-19, we have estimated that the peak epidemic in 22-June 2020 could possibly reach the final time in 8-September 2020. Of course, this type of peak forecasting would contain the essential uncertainty due to the possibility of some big changes in the social and natural (climate) situations. Moreover, our result suggests that the epidemic of COVID-19 in Egypt would not end so quickly. Prediction of the Epidemic Peak of Coronavirus Disease in Japan, 2020 Forecasting the peak of novel coronavirus disease in Egypt using current confirmed cases and deaths Forecasting the Spread of COVID-19 in Kuwait Using Compartmental and Logistic Regression Models Prediction of the Epidemic Peak of Covid19 in Egypt", medRxiv, the preprint server for health sciences Analysis coronavirus disease (COVID-19) model using numerical approaches & logistic model Investigating a Serious Challenge in the Sustainable Development Process: Analysis of Confirmed cases of COVID-19 (New Type of Coronavirus) Through a Binary Classification Using Artificial Intelligence and Regression Analysis Predictions for COVID-19 outbreak in India using epidemiological models AI and Big Data for Coronavirus COVID-19 Pandemic: A Survey on the State-of-the-Arts Digital technology and COVID-19 Mathematical Population Dynamics and Epidemiology in Temporal and Spatio-Temporal Domains Estimation of the Final Size of the Coronavirus Epidemic by the Logistic Model Age-Structured Population Dynamics in Demography and Epidemiology Advanced mathematical methods for scientists and engineers I asymptotic methods and perturbation theory 33 39 76 14 39 41 40 33 47 54 69 0 206 85 3 249 238 139 95 145 126 125 160 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 15 49 55 59 60 80 93 93 110 126 166 196 210 256 285 294 366 442 456 495 536 576 609 656 710 779 779 985 1070 1073 1322 1560 1699 1794 1939 2065 2190 2350 155 168 171 188 112 189 157 0 169 433 227 0 463 260 226 269 358 298 272 348 388 387 393 495 488 436 346 347 338 398 399 491 510 535 720 745 774 783 827 652 702 789 910 1127 1289 1367 1536 1399 1152 1079 1152 1348 1497 1467 1365 1385 1455 1442 1578 1667 1618 2505 2673 2844 3032 3144 3333 3490 3490 3659 4092 4319 4319 4782 5042 5268 5537 5895 6193 6465 6813 7201 7588 7981 8476 8964 9400 9746 10093 10431 10829 11228 11719 12229 12764 13484 14229 15003 15786 16613 17265 17967 18756 19666 20793 22082 23449 24985 26384 27536 28615 29767 31115 32612 34079 35444 36829 38284 39726 41304 42980 44598