key: cord-0962617-zx0uu07s authors: Ding, Wei; Wang, Qing-Guo; Zhang, Jin-Xi title: Analysis and prediction of COVID-19 epidemic in South Africa date: 2021-01-28 journal: ISA Trans DOI: 10.1016/j.isatra.2021.01.050 sha: 731f801d56d9b929cbbe0843f92449f55a7d88e8 doc_id: 962617 cord_uid: zx0uu07s The coronavirus disease-2019 (COVID-19) has been spreading rapidly in South Africa (SA) since its first case on 5 March 2020. In total, 674339 confirmed cases and 16734 mortality cases were reported by 30 September 2020, and this pandemic has made severe impacts on economy and life. In this paper, analysis and long-term prediction of the epidemic dynamics of SA are made, which could assist the government and public in assessing the past Infection Prevention and Control Measures and designing the future ones to contain the epidemic more effectively. A Susceptible-Infectious-Recovered model is adopted to analyse epidemic dynamics. The model parameters are estimated over different phases with the SA data. They indicate variations in the transmissibility of COVID-19 under different phases and thus reveal weakness of the past Infection Prevention and Control Measures in SA. The model also shows that transient behaviours of the daily growth rate and the cumulative removal rate exhibit periodic oscillations. Such dynamics indicates that the underlying signals are not stationary and conventional linear and nonlinear models would fail for long-term prediction. Therefore, a large class of mappings with rich functions and operations is chosen as the model class and the evolutionary algorithm is utilized to obtain the optimal model for long term prediction. The resulting models on the daily growth rate, the cumulative removal rate and the cumulative mortality rate predict that the peak and inflection point will occur on November 4, 2020 and October 15, 2020, respectively; the virus shall cease spreading on April 28, 2021; and the ultimate numbers of the COVID-19 cases and mortality cases will be 785529 and 17072, respectively. The approach is also benchmarked against other methods and shows better accuracy of long-term prediction. The first case of COVID-19 was reported in Wuhan, China in December 2019. Then, COVID-19 spread nearly all over the world rapidly. In eight months, more urgently needed. To this end, modelling this epidemic is necessary. The dynamical behavior of the COVID-19 spreading was analyzed [1] [2] [3] [4] [5] [6] [7] [8] , which focused on the cases in China [1] [2] [3] , Japan [4], South Korea [5] , Iran [6], Italy [7] and 10 India [8] . The effectiveness of IPCMs was evaluated [9] [10] [11] [12] . Among these, the effectiveness of the quarantine of Wuhan was assessed by calculating the contact rate of latent individuals with the SEIR model [9] . The conclusion is that the quarantine and isolation effectively reduced the potential peak number of COVID-19 infections and successfully delayed the date of peak infection. Simi- 15 larly, the impact of the disease control measures in Wuhan was studied [10] , with the non-constant transmission rates with a modified SEIR model. In addition, two simple approaches to data analysis were adopted to evaluate the influence of the intervention measures [11, 12] . Specifically, the second derivative of the function of the cumulatively diagnosed cases was calculated [11] to show the ef- 20 fect of the massive interventions in China, and a stochastic model that predicts the cumulative number of the laboratory-confirmed patients was introduced [12] to simulate the evolution process of the epidemic under intervention measures. It is noted that their estimation of the transmission parameter was made under many assumptions on the model of epidemiology, e.g., the number of exposed 25 cases in the incubation period. Further, the asymptomatic and infected cases of incubation result in inaccuracy in the reported daily number of confirmed 2 J o u r n a l P r e -p r o o f Journal Pre-proof cases. Therefore, the aforementioned approaches to evaluating the epidemic situation are over simplified and not accurate, as shown by the recent data of the epidemic. 30 Since the COVID-19 continues to spread around the world, it is necessary to model the dynamics of COVID-19 to predict its future trend. The existing epidemic models can be divided into two categories, i.e., the first-principle model [13] [14] [15] [16] [17] and the data-driven model [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] . The first-principle model is able to clearly show how and why an input has an effect on the output. Building such a 35 model necessitates some specific knowledge that is however difficult to acquire. For example, to predict the status of one person via an epidemic model in a network, we have to know the statuses of those who have contacted him/her, and determine the probability with it the person is infected by them. In addition, the interventions from the human, e.g., precautions from individuals, isolation of 40 suspect cases, and development of ascertainment infections, need to be explicitly specified in advance. Otherwise, the prediction may be far away from the true case [20] . The data-driven modelling is sometimes preferable, which builds the relationship between the system inputs and outputs without explicit domain knowledge. 45 An exponential model was obtained with the number of the daily cumulative cases at the early phase of the outbreak in China and gives the basic reproduction number [21] . Similarly, another data-driven model was developed [22] , which is matched with the mean and standard deviations of the number of the reported daily cumulative cases on the Diamond Princess cruise ship with a 50 gamma distribution and gives also the basic reproduction number. The end time and the total numbers of the infectious cases and the mortality cases of COVID-19 in China were predicted by different types of data-driven models, i.e., the logistic model, the Bertalanffy model and the Gompertz model [23] . The social media search indices (SMSIs) were taken into consideration, which 55 were fitted by the data of the confirmed cases via a model of subset selection [24] . In Castorina [25] , a generalized Gompertz law was found to predict the maximum number of the infected individuals in China, Singapore, South Korea and Italy. In Li [26] , the Gaussian distribution theory was utilized to analyze and predict the transmission of COVID-19. Besides, prediction algorithms were 60 also provided [27, 28] are not stationary and conventional linear and nonlinear models would fail for long term prediction. Therefore, a large class of mappings with rich functions and operations is chosen as the model class and the evolutionary algorithm is utilized to obtain the optimal model for long term prediction. The resulting models on the DGR, the CRR and the cumulative mortality rate (CMR) pre-95 dict that the peak and inflection point will occur on 4 November 2020 and 15 October 2020, respectively; the virus shall cease spreading on April 28, 2021; and the ultimate numbers of the COVID-19 cases and mortality cases will be 785529 and 17072, respectively. The approach is also benchmarked against other methods and shows better accuracy of long-term prediction. The rest of the paper is organised as follows. Section 2 introduces SA with the epidemic and data descriptions. The epidemic analysis and long-term prediction are presented in Sections 3 and 4, respectively. The conclusions are drawn in Section 5. SA is located in the southernmost region of Africa, with a long coastline that stretches more than 2500 km along the South Atlantic and the Indian Oceans. With a total area of 1221037 km 2 , SA is the 24-th largest country in the world. The interior of SA consists of a vast, in most places almost flat, plateau with an altitude of between 1000 m and 2100 m, with a generally temperate climate. It is 110 to the north by the neighboring countries of Namibia, Botswana, and Zimbabwe and to the east and northeast by Mozambique and Eswatini, and surrounds the enclaved country of Lesotho [30] . According to the Worldometer elaboration of the latest United Nations data in 2020, the population of SA is estimated at 59308690, which ranks 25-th in SA is a developing country with a mixed economy. In 2019, its GDP was worth 350 billion US dollars, ranking 42-th in the world. It has been being 120 burdened by a relatively high rate of crime, poverty, and unemployment, and is also ranked in the top ten countries in the world for income inequality. In 2015, 71% of net wealth were held by 10 percent richest of the population, whereas 60% of the poorest held only 7% of the net wealth with the Gini coefficient of 0.63 [32] . The health system of SA comprises the public sector and the private sector. The public health services are divided into primary, secondary and tertiary through health facilities that are located in and managed by the provincial departments of health. The health care system of SA owns more than 400 public hospitals and 200 private hospitals, and consumes about 8.8% of the 130 GDP in this country. Nonetheless, the vacancy rates for doctors and nurses are estimated at 56% and 46%, respectively. Moreover, 84% of the population depends on the public healthcare system, which is the preferred government health provision within a primary health care approach. However, only 21% of doctors work in it [33] . In addition, SA has an estimated seven million people 135 living with HIV, more than any other country in the world [34] . Thus, the health care in SA is beset with chronic human resource shortages and limited resources. The COVID-19 spread to nearly all the countries after it broke out in Wuhan, China in December 2019. The first known patient of COVID-19 in SA was con- Table 1 . The total population in 2020 is 59308690 [36] . 165 The cases are plotted in Fig. 1 shown in Table 2 . Note that the length of each phase is the same as that of SA. Since the COVID-19 spread in China was mainly confined in Hubei province before April 2020, the population size of Hubei province is used and 59270000 from the Institute of National Statistics of China (INSC) [38] . 175 Let x(i) be the number of CICs at the i-th day, i = 1, 2, · · · , N , and define the number of DNCs and DGR as CMCs at the i-th day, i = 1, 2, · · · , N , and define the CMR as Z(i) = z(i)/x(i), i = 1, 2, · · · , N . Let W (i) = y(i) + z(i), i = 1, 2, · · · , N , be the number of cumulative removed cases (CRCs) at the i-th day, i = 1, 2, · · · , N , and define the CRR as w(i) = W (i)/x(i), i = 1, 2, · · · , N . Let I(i) = x(i) − y(i) − z(i), i = 1, 2, · · · , N , be the number of ACs at the i-th day, i = 1, 2, · · · , N . With the SA and China data, we calculate their DGR, CCR and CMR. To show the effect of lockdown, these rates from the lockdown date are plotted in Fig. 2 where β denotes the effective contact rate, and γ represents the removal rate 200 that is the inverse of the expectation of infection duration for COVID-19. Here, the reason for choosing γ as 1 14 is given as follows. On the one hand, the WHO indicates that the recovery time of people with mild symptoms for COVID-19 is about two weeks [41] . On the other hand, the mild case (including the asymptomatic case) accounts for 96.79% ∼ 99.49% of the total infectious cases 205 in SA [42] . In the initial phase of the epidemic, the infectious population accounts for a small fraction of the total population, and thus S ≈ P . Substituting S = P in whose solution is given by β is estimated by the least square method aŝ whereÎ(t) is prediction from (5), I(t) is the recorded number; n 1 and n 2 respectively denote the first and last days in a phase, e.g., during the lockdown of level 4 in SA. β in SIR indicates the transmission rate of an epidemic. To measure the capacity of epidemic spreading, the basic reproduction number, R 0 , which denotes the average number of secondary infections produced by an infected host in a completely susceptible population [43] , is introduced as J o u r n a l P r e -p r o o f Journal Pre-proof follows To obtain R 0 for COVID-19 in different phases, the incidence data of SA is Table 1 . It indicates that althoughR 0 is decreasing during 215 lockdown in SA, its drop rate, (R 0 (T k−1 ) −R 0 (T k ))/R 0 (T k−1 ), is also gradually decreasing, where T k denotes the time period of the level-k lockdown. The same analysis is carried out on China case, where S(0) = 59269999, I(0) = 1 and W (0) = 0.β andR 0 are given in Table 2 , which shows that the drop rate ofR 0 in China is higher than that in SA in the middle and later 220 periods of lockdown. Note that R 0 is obtained under the assumption that everyone is susceptible. If only a part of people is the susceptible host, the effective reproduction number, J o u r n a l P r e -p r o o f Journal Pre-proof Applying the above approach to our data, R(t) is plotted in Fig. 3 , which shows that the epidemic of COVID-19 in SA is not stable and that R(t) is with However, Fig. 4 shows R(t) < 1 after the 34-th day of lockdown, which means 240 that COVID-19 in China is under control and will be extinguished. Therefore, the IPCMs in China work better than that in SA. It is found from above analysis that although R 0 in SA decreased over the quarantined. The SIR model was also applied to long-term epidemic prediction in China 255 and South Korea [2, 17, 20] . The recent data shows, however, that the prediction of the approaches [2, 20] is not accurate. We attempt to predict the epidemic trend in SA. The recent data in SA, however, reveals that the prediction accuracy is not satisfactory with SIR to be shown in Section 4. The long-term forecasting of COVID-19 in many countries has been well studied. Among these, the work for China has received considerable attention, including the exponential model [21] , the logistic model [23] and the Gompertz model [25] . To our best knowledge, however, no work on the long-term forecasting of the COVID-19 in SA is reported to date in the literature. 2020, it is found that the DGR, which is related to the DNCs and CICs, shows a tendency of periodic oscillation. In addition, the CRR, which is the sum of CMR and CCR, tends to rise, while the CMR basically keeps flat. As seen, the epidemic dynamics of DGR, CRR and CMR are nonlinear and different from each other. The evolution algorithm [45] has the great capability of learning the 275 unknown dynamics of a nonlinear coupled system and no need to specify the model structure a priori, which is however required by the existing long-term forecasting methods, thus it is adopted to train models for DGR, CRR and CMR for epidemic prediction in SA. Algorithm 1 outlines the overall process of modelling. The initial generation of population, f 0 = {f 0 1 , f 0 2 , · · · , f 0 q }, is randomly created. By iteratively performing genetic operators, i.e., selection, crossover and mutation, a series of new generations of population, f k , k = 1, 2, · · · , is produced. The optimal one, f * , is obtained [48] as where f k (i), i ∈ [m 1 , m 2 ], is the predicted value on the i-th day, and u(i) = 280 1 m2−m1+1 m2 i=m1 u(i) is the mean of u(i). Output u(i), i = m 1 , m 1 + 1, · · · , m 2 , is used to train the above model. This model is used to make prediction, i.e., substitute i = 181, 182, · · · , 200 to (10) to obtain X(i). It is depicted in Fig. 6 This model is used to make prediction for w(i), which is depicted in Fig. 9 . Consider CMR with the same procedure as above. The optimal prediction model is found with r 2 = 0.997 and 7) )))). (12) This model is used to make prediction for Z(i), which is depicted in Fig. 10 . To predict when the epidemic ends, CRC and AC are calculated into the future, respectively, by They are plotted in Fig. 11 . It is observed that the predicted AC would be less than 10000 after August 18, 2021. We treat the rising inflection point as the point at which the curvature of the ACs changes of sign. It is predicted that Table 3 . The predicted epidemic curves of CICs and ACs are plotted in Figures 12 and 13 , respectively. It is seen that our approach is with a lower R and a higher r 2 , indicating our approach with a higher prediction accuracy. J o u r n a l P r e -p r o o f Furthermore, a novel model is developed to forecast the long-term epidemic cases could be 785529, in which there would be 17072 people losing their lives. Using historical incidence data of SA, the experimental result illustrates the effectiveness of our approach, and the comparative experimental result shows a higher prediction accuracy of our approach than the others. Epidemic analysis of covid-19 in china by dynamical modeling Early prediction of the 335 2019 novel coronavirus outbreak in the mainland china based on simple mathematical model Risk estimation and prediction by modeling the transmission of the novel coronavirus (covid-19) in mainland china excluding hubei province Prediction of the epidemic peak of coronavirus disease in japan Transmission potential and severity of covid-19 in south korea Estimation of coronavirus disease 2019 (covid-19) burden and potential for international dissemination of infection from iran Covid-19 and italy: what next? Prediction of the peak, effect of intervention, and total infected by covid-19 in india The effectiveness of quarantine of wuhan city against the corona virus disease 2019 (covid-19): A well-mixed seir model analysis A mathematical model for the novel coronavirus epidemic in wuhan, china First two months of the 2019 coronavirus disease (covid-19) epidemic in china: real-time surveillance and evaluation with a second derivative model Analysis and prediction of the 2019 novel coronavirus pneumonia epidemic in china based on an individual-based model Aaedm: Theoretical dynamic epidemic diffusion model and covid-19 korea pandemic cases Modeling epidemics spreading on social contact networks A parallel sliding region algorithm to 375 make agent-based modeling possible for a large-scale simulation: modeling hepatitis c epidemics in canada Time to extinction for the sis epidemic model: new bounds on the tail probabilities Estimation of the final size of the coronavirus epidemic by the sir model Leader-follower H ∞ consensus of linear multi-agent systems with aperiodic sampling and switching 385 connected topologies Neural network based country wise risk prediction of covid-19 Estimations of the coronavirus epidemic dynamics in south korea with the use of sir model Preliminary estimation of the basic reproduction number of novel coronavirus (2019-ncov) in china, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak Estimation of the reproductive number of novel coronavirus (covid-19) and the probable outbreak size on the diamond princess cruise ship: A data-driven analysis Prediction and analysis of coronavirus disease 2019 Prediction of number of cases of 2019 novel coronavirus (covid-19) using social media search index Data analysis on coronavirus spreading by macroscopic growth laws Propagation analysis and prediction of the covid-19 Outbreak trends of coronavirus disease-2019 in india: A prediction Neural network based country wise risk prediction of covid-19 Analysis and synthesis of networked control systems: A survey of recent advances and challenges Lesotho: Year in review 1996-britannica online encyclopedia, Encyclop?dia Britannica Time running out. the report of the study commission on u.s. policy toward southern africa Tourism industry perspectives on climate change in south africa Health and health care in south africa-20 430 years after mandela Epidemiology of hiv in south africa-results of a national, community-based survey Covid-19 coronavirus pandemic National Statistics of China, Statistical communique of the hubei province on the 2019 national economic and social development Coronavirus: China's Extending the sir epidemic model director-general-s-opening-remarks-at-the-media-briefing-on-covid Cost-effectiveness of public health strategies for covid-19 epidemic control in south africa: a microsimulation modelling study Notes on R 0 Real time bayesian estimation of the epidemic potential of emerging infectious diseases Distilling free-form natural laws from experimental data Genetic programming IV: Routine human-competitive machine intelligence Evolutionary extreme learning machine with sparse cost matrix for imbalanced learning The coefficient of determination r 2 and intra-class correlation coefficient from generalized linear mixedeffects models revisited and expanded Eureqa: software review, Genetic programming and evolvable machines