key: cord-0583229-0nv5znfl authors: P'ekp'e, Komi Midzodzi; Zitouni, Djamel; Gasso, Gilles; Dhifli, Wajdi; Guinhouya, Benjamin C. title: From SIR to SEAIRD: a novel data-driven modeling approach based on the Grey-box System Theory to predict the dynamics of COVID-19 date: 2021-05-29 journal: nan DOI: nan sha: 446f988e2d8137595dcc58b65d63098ba8eaa1ca doc_id: 583229 cord_uid: 0nv5znfl Common compartmental modeling for COVID-19 is based on a priori knowledge and numerous assumptions. Additionally, they do not systematically incorporate asymptomatic cases. Our study aimed at providing a framework for data-driven approaches, by leveraging the strengths of the grey-box system theory or grey-box identification, known for its robustness in problem solving under partial, incomplete, or uncertain data. Empirical data on confirmed cases and deaths, extracted from an open source repository were used to develop the SEAIRD compartment model. Adjustments were made to fit current knowledge on the COVID-19 behavior. The model was implemented and solved using an Ordinary Differential Equation solver and an optimization tool. A cross-validation technique was applied, and the coefficient of determination $R^2$ was computed in order to evaluate the goodness-of-fit of the model. %to the data. Key epidemiological parameters were finally estimated and we provided the rationale for the construction of SEAIRD model. When applied to Brazil's cases, SEAIRD produced an excellent agreement to the data, with an %coefficient of determination $R^2$ $geq 90%$. The probability of COVID-19 transmission was generally high ($geq 95%$). On the basis of a 20-day modeling data, the incidence rate of COVID-19 was as low as 3 infected cases per 100,000 exposed persons in Brazil and France. Within the same time frame, the fatality rate of COVID-19 was the highest in France (16.4%) followed by Brazil (6.9%), and the lowest in Russia ($leq 1%$). SEAIRD represents an asset for modeling infectious diseases in their dynamical stable phase, especially for new viruses when pathophysiology knowledge is very limited. In December 2019, an outbreak of an emerging disease (COVID- 19) due to a novel coronavirus, the SARS-CoV-2, began in Wuhan, China and quickly spread in a substantial number of countries [1, 2] . The COVID-19 pandemic, as a major global health threat, was declared by the WHO on 11 March 2020 [2] . The disease is rapidly spreading in the whole globe, affecting millions of people and pushing governments to take drastic measures to contain the outbreak. For example, mitigation measures to slow transmission through infection prevention and control, and social distancing have been introduced with different timing and pace in countries worldwide. The efficiency of these measures in slowing the transmission of COVID-19 in the general population and, more specifically, in the vulnerable populations of elder adults and individuals with chronic conditions (i.e., hypertension, diabetes, cardiovascular disease, chronic respiratory disease, compromised immune status, cancer and obesity), has been proved useful although the pandemic is still growing. It is noteworthy that once ill of COVID-19, no treatment with decisive efficiency exists, albeit early supportive therapies can improve outcomes. Thus, preventive strategies and other public health endeavors are to be sustained. One asset to adequately support these interventions may be to get the most clear and realistic picture of the dynamics of the COVID-19 disease, including by taking into account the impact of different mitigation or suppression measures at work in countries. Unlike highly data-hungry statistical approaches, which may not be completely suitable in such a situation of data scarcity, common mathematical modeling used in epidemiology for infectious diseases relies on the SIR (Susceptible, Infected, and Recovered or Removed)-type models [3] , though modeling approaches such as the ARIMA model [4] coupled with polynomial functions [5] , deep learning [6] or even deep learning in combination with compartment model [7] have been applied to predict COVID-19 cases. There are many current examples of the application of the compartment modeling in the COVID-19 epidemic [8] [9] [10] [11] . However, compartment models suffer of a number of issues, including the many a priori assumptions, and the need of a thorough knowledge of the circulating virus, which was difficult at the moment of conducting this study, due to the novelty SARS-CoV-2. In order to compensate the dearth of data and uncertainties around SARS-CoV-2 mechanisms of action and that of its related disease, the COVID-19, we postulate that the grey-box system identification theory (GBSIT) [12, 13] , developed in the 1980s could make an asset to tackle these challenges. Indeed, this theory is one of the most robust ones in situations of prediction and decision-making in the presence of partial, incomplete, or uncertain information [13] . Because of its strong ability to solve uncertain problems, and in order to provide good predictions under limited knowledge and scarce data [14] , GBSIT appears to be a relevant way to describe the dynamics of emerging disease such as COVID-19. Interestingly, the COVID-19 dynamical model can be seen as a switching system with non-linear modes [12] , where the switches are triggered by control measures set by authorities. As such, the switching model coupled with grey-box approach provides flexibility in characterizing the dynamics of the epidemic across its different phases. Indeed, the grey-box modeling allows to derive the switched system active mode, and must be applied in this case in the epidemic stable dynamic phase (i.e. early stages, before control measures and/or between two measures taken to halt the propagation of a virus) [12, 13] . It can also be used in the changing phases of the epidemic (when the mode switches) to estimate trends, and investigate for instance the effectiveness of actions taken to tackle the outbreak. One major advantage of such data-driven approach is its ability to operate under limited a priori knowledge of the studied phenomenon. Furthermore, because of their reliance on the calculation of the basic reproduction rate R 0 (i.e., the average number of secondary cases arising from a typical primary case in an entirely susceptible population), compartment models often translate into considerable discrepancy between findings [15] . R 0 is arguably the most important quantity in disease modeling, and there exists a rich mathematical theory supporting how R 0 can be computed for a range of SIR-type models with varying degrees of complexity [16] . However, R 0 should not be viewed as the ultimate target of modeling so as to be able to estimate any other important parameters (e.g., the rate of infection, recovered people, and deaths in a susceptible population), which often support quick public health responses. Another concern with compartment models is the difficulty of considering asymptomatic cases. Previous estimates of the proportion of asymptomatic people from COVID-19 provided values between 5% and 80% of people being tested positive for COVID-19 but without any symptoms [17, 18] . However, it is crucial to better characterize the magnitude of the contribution of asymptomatic people to the spread of SARS-CoV-2 in order to be able to develop better strategies to halt this epidemic, not taking into account the possibility of reinfection of some recently infected people [19] [20] [21] [22] . The current study builds upon the work by Hsu and Hsieh, [23] who have developed a modeling framework to integrate asymptomatic cases in outbreak dynamics. Our aim is to provide an extension of the SEIR (Susceptible-Exposed-Infected-Removed) compartment model that includes asymptomatic cases in order to better fit both the COVID-19 behavior, as well as the mitigation measures adopted by countries. Unlike usual approach in compartment modeling, another specific feature of the current study is the use of empirical data collected on only the cases and deaths due to COVID-19, -which respectively reflect the transmission and virulence of the virus -to estimate all the required parameters for the analysis of the COVID-19 dynamics. Finally, the analytical method applied in this study, based on GBSIT, may be sound in supporting policy decisions even if the understanding of the activity of this new virus is only partial, incomplete, or uncertain. Our modeling approach comprises five components: (i) choice and processing of COVID-19 data; (ii) selection of the most appropriate compartmental modeling for COVID-19; (iii) building the required adaptation to fit current knowledge on the SARS-CoV-2 circulation and transmission; (iv) providing estimation of the targeted parameters using an identification method together with an optimization function; (v) predicting the dynamics of active cases, recovered cases, infective cases, asymptomatic infective cases and the deaths. We present a model that considers human transmission of SARS-CoV-2 strain, more unknown than strains of the past with the following assumptions (provided t is the time stamp): (i) Infective persons can be classified in two categories; one of which is with symptoms, denoted by I(t), and the other is without any clinical presence of symptoms called asymptomatic or subclinical infective cases, denoted by A(t). (ii) When a change in the behavior of people occurs due to a public response to the outbreak [24] , the contact rate (reflecting the level of risky behaviors) decreases with the increase in the cumulative numbers of removed persons; (iii) Homogeneous mixing population is assumed. Even if our modeling approach is clearly free from key biological assumptions -(a) the birth rate and death rate are equal and given by η, (b) all individuals are capable of reproducing and are equally subject to mortality, and (c) all individuals are born susceptible to infection -, we keep this in mind in the building and discussion of the modeling strategy. We propose SEAIRD model, a compartment model that explicitly incorporates asymptomatic cases in the analysis of the epidemic evolution. Our SEAIRD model consists of the following time-dependent variables: S(t): susceptible individuals; E(t): exposed (incubating) population; A(t): asymptomatic infective people; I(t): symptomatic infective individuals; R(t): removed per-sons (i.e. completely and/or temporary recovered from COVID-19); D(t): Dead people. A flow chart of the model is given in Figure 1 . Let us define the parameters as follows: α is the fraction of exposed population E progressing to class I, or proportion of symptomatic infections, with 0 < α < 1. β is the probability of COVID-19 transmission (i.e, the infection rate). δ is the ratio of the infective force in infectious asymptomatic people by the infective force in infectious (symptomatic) people γ 1 is the recovery probability of infectious people. γ 2 is the recovery probability of asymptomatic people. µ is the progression rate from the exposed class to infective class (i.e. infected symptomatic and asymptomatic people). θ is the fatality rate. η is the birth and death rate (assumed to be equal here). -N is the population size of each country. The considered model is a non-linear dynamical model defined as: For the sake of simplicity, we introduced the so-called state-space vector [12] : which gathers the variables at interest. SEAIRD model involves several parameters to be estimated. We relied on a data-driven approach by leveraging the daily released data to perform the estimation of the parameters. Data on daily infected cases (and probably deaths data) might be subject to noise (i.e. recording errors). Hence, we introduced the cumulative infective cases I defined as: Note that here the integration, which acts as low-pass filter, on variable I(t) could reduce the noise. Similarly, D(t) might be less prone to noise, since D(t) represents the cumulative dead cases on time interval [0, t]. Variables I (t) and D(t) are discretized with a sampling period of one day to reduce the computation cost, yielding: Let ϕ = α, β, δ, γ 1 , γ 2 , µ, θ be the SEAIRD model vector of parameters. Based on empirical data, we can estimate ϕ, using the grey-box identification method [12] , an approach widely used for modeling physical systems [25] . Let y(t) = I (t) D(t) be the vector including the real cumulative number of infective cases I (t), and the cumulative number of dead cases D(t) at day be the predicted counterpart of y(t) by our model at the same day. Since these predictions are obtained with parameters ϕ, we writê y(t, ϕ). To find the optimal parameters vector which fits our model to the collected data y(t) over a time window T 0 , T 1 , we considered the following optimization problem over ϕ: where w t ∈ R 2 , is the weight vector that can be used to balance infected and death cases. The estimated parameters vector ϕ was finally applied to forecast the course of the COVID-19 disease. The model was implemented and solved using Matlab 2020a, with the Ordinary Differential Equation (ODE45) solver and optimization toolbox. This package implements four different algorithms (Interior Point, Sequential Quadratic Programming, Active Set, and Trust Region Reflective) which serve to estimate SEAIRD model parameters. Two initial conditions were set for this minimization problem: the initial parameters vector ϕ 0 and the initial state vector X(0). Empirically, we found that ϕ 0 did not influence the estimation of ϕ contrary to X(0). Here, X(0) was chosen to maximize the coefficient of determination, 0 ≤ R 2 ≤ 1. We applied the proposed method (with cross-validation technique [12] ) to estimate the COVID-19 dynamics in Brazil. We carried out sensitivity analysis to examine the influence of a likely underestimation of reported number of infected people. Finally, comparative analysis was conducted by estimating similar parameters in other selected countries. The epidemiologic data were taken from an open source repository operated by the European Centre for Disease prevention and Control (ECDC). The database provides daily number of new cases and deaths for different countries. In order to preclude perturbations to the modeling approach due to changes in behaviors, as a consequence of policy measures taken by countries, we only considered data on the dynamical stable phase of the epidemic, i.e. a period within the setting of control measures (see Figure 2 ). The model accuracy was assessed using only data of Brazil, by calculating the coefficient of determination R 2 as the goodness-of-fit criterion of the model. As mentioned earlier, we applied the proposed modeling technique to COVID-19 data gathered for Brazil over 42 days from 4 April to 16 May 2020 (see Figure 2 ). This period was chosen because of the relative stability in the control measures taken by the Brazilian government in the response to the COVID-19 outbreak (see Figure 2) . Thus, the model parameters were estimated in a stable zone as required. Finally, we used 50% of the collected data for model estimation (4 to 25 April 2020) and 50% (data from 25 April to 16 May 2020) for the model validation purpose. The overall modeling procedure is summarized in Algorithm 1. A range of sensitivity analysis was conducted to illustrate the robustness of the model with regards to possible weaknesses linked to data collection on "confirmed" cases. In fact, currently reported confirmed cases may be quite far from the actual number of infected cases by COVID-19 [26] , which might not consider the asymptomatic ones. Therefore, we re-estimated the model parameters and compared its predictive ability by assuming that the actual infected cases might be higher than the reported confirmed cases by: (i) 5%; (ii) 10%; and lastly (iii) 20%. The modeling approach developed on the basis of Brazilian data was finally replicated on data gathered for France, India, Russia, South Africa and USA to highlight the common points and differences. Using the World Wide governments response to COVID-19 outbreak time chart [27] , the appropriate time window (i.e. the time scale) for the estimation in each country was chosen, so that perturbations to the modeling (due to sudden changes in the control measures decided in each country) can be ruled out. As such, the relevant windows were 26 March 2020 to 16 April 2020, 4 April 2020 to 25 April 2020, 15 April 2020 to 5 May 2020, 4 April 2020 to 25 April 2020, and 4 April 2020 to 25 April 2020 for France, India, Russia, South Africa, and USA, respectively. Require: data D = {y(t)} T f t=T0 (collected over a period of relative stability in the containment measures, starting at T 0 and ending at T f ), weights w, a threshold τ where 0 < τ < 1 Output: estimated parametersφ whereȳ test is the mean value over D test andŷ(t,φ) is the prediction 5: if R 2 ≥ τ then When applied on the data of Brazil, SEAIRD model produced a coefficient of determination R 2 of 93% and 92% for the estimation and validation sets, respectively. This suggests a good agreement and adequacy of the model to the data. The infection is mainly supported by two major compartments: the infectious and asymptomatic classes. The probability β of COVID-19 transmission in Brazil was 99.5%. In this instance, asymptomatic cases may spread the SARS-CoV-2, 50 times more than symptomatic people. The fatality rate (θ) in Brazil was about 7% (see Table 1 ). As shown in Table 1 , symptomatic people recovered at a probability of 99% while the recovery probability of asymptomatic people was 0.09% (Table 1) . The sensitivity analysis suggested high deviations to the raw estimates when reported cases of COVID-19 were assumed to be underestimated by 5%, 10%, or 20% (see Table 1 ). The pattern of the deviation is not clear. In some cases, an underestimation of the number of reported confirmed cases of COVID-19 of 5% can produce greater deviations than an error of 10% or 20% (e.g., the incidence of infected cases, µ). The effect was the greatest on γ 2 , which can reach up to 7581% difference when compared to the raw estimates (see Table 1 ). γ 1 and θ were the least affected by potential errors on the confirmed cases. Finally, about the effect of errors on reporting the number of infected cases, δ is contained in a range of about 20% below or above. When compared on the basis of about a 20-day modeling data, the highest differences between countries were found on the δ, θ, γ 2 parameters (see Table 2 ). Nonetheless, even if the incidence rate of COVID-19 was generally low across countries, as exemplified by the parameter µ, some differences were apparent. With about 5 cases per 1 million of exposed persons, South Africa exhibited the lowest incidence rate followed by Brazil and France (3 cases per 100,000 exposed persons), Russia (4.5 cases per 100,000 exposed persons), the US (6 cases per 100,000 exposed persons) and India (7 cases per 100,000 exposed persons). The infective force of asymptomatic cases (as compared to the infective force of symptomatic cases) was the lowest in Russia (5.1) and India (5.3), and the highest in Brazil (42.3), with median values found in South Africa (16.6), France (13.9) and the US (8.5). As shown in Table 2 , France exhibited the highest fatality rate from Covid-19 (16.4%) followed by Brazil (6.9%). The fatality rate was the lowest in Russia (≤ 1%) (Table2). We proposed a data-driven approach to estimate key parameters for the COVID-19 epidemic in Brazil and a number of other selected countries. Estimated epidemiological parameters from the model such as the mortality rate or incidence of COVID-19 are consistent with what has been published in the intensive literature on this disease during the last few weeks [28] . Furthermore, symptomatic people have a high probability to "recover" or to temporary lose the COVID-19 symptoms as reinfection and persistence are newly claimed [20, 21] . For asymptomatic individuals, no clear conclusion can be drawn about their recovery potential [19, 22] . Whether, they may continue to support the transmission of the virus with a high load still remain to be explored. Attention should be paid when attempting a direct comparison of estimated parameters between countries. Indeed, the counting practices of COVID-19 cases and testing strategies (e.g., type and number of tests, testing policies), may greatly vary from one country to the other one. Nonetheless, the fatality rates found herein are in line with findings from meta-analytic approach on the topic. The most recent update from the Centre of Evidence-based Medicine (CEBM) of Oxford, as of June, 9, 2020, points out France as the top country in the number of deaths due to COVID-19 at a rate of 18.94% (95%CI: 18.75% -19.14%), which is close to the 16.44% from the SEAIRD model estimation. Equivalent findings for Brazil, India, Russia, South Africa, and the US are 5.25% (95%CI: 5.20% -5.30%) and 5.25% (95%CI: 5.20% -5.30%), respectively [28] . Our analytical strategy was based firstly on the choice of a SIR-type model, which was then adapted to better mirror the behavior of the COVID-19 disease. As a result, we built up the SEAIRD (Susceptible / Exposed /Asymp-tomatic / Infected / Recovered / Dead) compartments, an extension of the SEIR (Susceptible/Exposed/Infected/Removed) compartment. This adaptation allowed us taking into account: i) the clinically reported latency between the moment of a possible contact with SARS-CoV-2 and the development of COVID-19 symptoms (i.e. the transition from the exposed status to that of an infected symptomatic person), and ii) the importance of asymptomatic (infective) people in the propagation of this virus. Secondly, the mathematical approach used herein was based on the GBSIT [13] . The structure of a grey-box model is built on a combination of knowledge (as white-box models) and empirically collected data (as black-box models). In this context of both liability of knowledge and novelty of the pathogenic agent, grey-box modeling has the potential to take the maximum of advantage of existing data even if they may be partial or incomplete. Thus, with minimal pathophysiological knowledge about the SARS-CoV-2, it was possible to identify important compartments that are then used to determine the transmission pattern and virulence of the COVID-19. Only two variables, empirically collected from April 04, 2020 to May 16, 2020, were needed to derive important parameters that may support public health decision making. A similar modeling approach, taking into account the compartment of asymptomatic patients, has been recently released by Liu and colleagues [7] . The proposal by Liu et al. was unknown to us at submission. Nevertheless, as a recall, the modeling approach we propose makes it possible to better portray the changing dynamics of the epidemic according to collective control measures decided by local authorities. The inner principle is that of hybrid systems, which are able to admit different dynamics, depending on the actions they undergo. Secondly, our approach is also based on a limited available data (i.e. confirmed cases and deaths) from open access databases in combination with the partial available knowledge at the time of building our SEAIRD model. Finally, as did Liu and coauthors [7] , our modeling is not only an endeavor to address the case of asymptomatic people in the spread of COVID-19, but it also has anticipated the possibility of reinfection by the SARS-CoV-2 or some of its various lineages; a projection that was not common in published modeling strategies. Now, the reinfection claim is no longer a hypothesis but a result substantiated by several studies [19-22, 29, 30] . While the "adaptive" SEAIRD model of Liu et al. was powered to provide accurate predictions within one to two weeks in advance [7] , the predictive ability and scope of the SEAIRD model herein may be longer depending on the duration of the dynamical stable phases of the epidemic. For instance, our SEAIRD model can give accurate prediction for over 2 months in the case of the US (see Supplementary materials). On the other hand, the performance of our SEAIRD model, which is robust for the stable phases, might give poor outputs when used in changing phases of an epidemic dynamics. This standpoint is exemplified when ones attempts to carry out a comparison of results provided by the "adaptive" SEAIRD [7] to those of our SEAIRD model in the period of March, 1, 2020 to March, 29, 2020. As shown in Figure 3 , this period falls exactly within a changing dynamics phase of the epidemic in the US, i.e. a time interval separating two control measures decisions. As such, direct comparison with the output of the "adaptive" SEAIRD model by Liu et al. may be hard, even impossible to perform, due to differences in modeling approach and implementation. Thus, these two different SEAIRD models can be viewed as complementary; one is suitable for the changing phases and the other is for stable phases of an emerging epidemic. As a hot spot the COVID-19 pandemic at period of conducting the current study, data from Brazil were first put forward for illustrative purpose. Not only is Brazil an emblem of the growth of this pandemic phenomenon [31, 32] , but also because of relative permissiveness and/or flexibility of measures in place, the country appears as a genuine case to learn more about how asymptomatic people are spreading the virus and feeding the pool of ill people. The exact proportion of asymptomatic people is actually unknown due to symptom-based screening that is currently favored. Some studies suggested that between 5% and 80% of those who get a positive test for COVID-19 may be asymptomatic [17, 18] . By definition, an infected person with the disease symptoms may express illness in a way that should not only generate infectious aerosols but also reduce his/her contact with others if sufficiently ill to be in bed. It becomes obvious that the continue transmission of the disease is to be mainly supported by subclinical cases, and especially the asymptomatic people. One consequence of such a situation is that the impact of some public health endeavors, including mitigation or case-patient isolation would be severely diminished [33] . As in the case of Brazil, these measures are jeopardised by the higher infective force of asymptomatic patients as compared to the symptomatic ones. To gain in efficiency and effectiveness, control measures against COVID-19 should be more stringent, and may stem on the capability of countries in tracing, identifying and caring asymptomatic and/or mild symptoms cases. This is all the most important since recent findings suggested that, contrary to common beliefs, the viral load of SARS-CoV-2 is similar, if not superior, in asymptomatic and mild symptoms patients as compared to their symptomatic counterparts [34, 35] . Additionally, earlier findings underscored that up to 55% of SARS-Cov-2 transmission may be caused by unidentified infected persons [18, 36] . This figure is in line with our results, which show that the infective force of asymptomatic people is about 50 times that of symptomatic people, especially in Brazil. There are some limitations to our proposed modeling approach. First, it may be less relevant at the very earlier stages/beginning of an epidemic when collected data may be too noisy or too poor to be consistent with the inherent behavior of an epidemic. Furthermore, early data collected in an emerging epidemic such as that of the COVID-19 may not be as good as those collected later due to continuous improvement in field works, as well as in the refinement of diagnostic tools, and so on. Especially, in the case of COVID-19, it should be acknowledged that criteria used to determine the infection status have substantially evolved. From the use of nucleic acid testing, guidance then changed to put emphasis on clinical signs or chest CT scan, and now on serological assays. As a consequence, what is called "confirmed cases" of COVID-19 may somewhat vary according to the type of test(s) used to define them at a specific period of the epidemic. Future studies comparing these different definitions are warranted to secure the comparison of data collected on different time scales. In the meantime, the application of a weighing factor (i.e. forgetting factor) should be pertinent to handle the issue of recent data being more accurate and valid than the earliest ones. Furthermore, as any data-driven approach, the derived parameters from our model depend mainly on the quality of the data used. As such, the fact that our analytical strategy considered only data provided by the European Centre for Disease Control and Prevention (ECDC) might put the outcome of our model under the threat of potential errors in this data repository. A final limitation is associated to the fact that our modeling has assumed a homogeneous population. Such an assumption does not allow taking into account the importance of age groups in the transmission and outcome of this epidemic as compared to findings from recently published study with a different modeling approach on COVID-19 data [5] . A future work is then required, in which new variables (e.g., age pyramid, age-related contamination pattern of the COVID-19) would be considered in the modeling strategy. Despite these limitations, our model highly fits data and may describe well the behavior of any epidemic phenomenon in its dynamical stable phase. Because our SEAIRD model combines simplicity and minimization of the number of input data, which increases its usability and capacity for generalization, we then believe that the proposed approach hold some promises. It can be used not only in the current COVID-19 epidemic, but also generally to future epidemics and notably in the presence of novel viral pathogens for which there may exist neither a treatment nor advanced pathophysiology knowledge. Future development should mandatorily include in the SEAIRD model, the dynamics in different age groups. This study is not supported by any specific funding. We declare no competing interests. Not applicable. Not applicable. Not applicable. All used epidemiologic data were taken from an https://www.ecdc.europa. eu/en/covid-19-pandemicopen source repository operated by the European Centre for Disease prevention and Control (ECDC). The executable version of the code is available at https://github.com/midzodzi/ Codym/find/main . Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and corona virus disease-2019 (covid-19): the epidemic and the challenges WHO (2020) Who director-general's opening remarks at the media briefing on covid-19-11 A contribution to the mathematical theory of epidemics Forecasting the dynamics of cumulative covid-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-regressive integrated moving average (arima) and seasonal auto-regressive integrated moving average (sarima) Forecasting of covid19 per regions using ARIMA models and polynomial functions Covid-19 in iran: Forecasting pandemic using deep learning A new seaird pandemic prediction model with clinical and epidemiological data analysis on covid-19 outbreak Early dynamics of transmission and control of covid-19: a mathematical modelling study. The lancet infectious diseases Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study Siettos C (2020) Data-based analysis, modelling and forecasting of the covid-19 outbreak Real-time forecasts of the covid-19 epidemic in china from february 5th to february 24th Gray box identification using difference of convex programming System Identification: Theory for the User Fifteen years of grey system theory research: A historical review and bibliometric analysis American hospital capacity and projected need for covid-19 patient care An introduction to compartmental modeling for the budding infectious disease modeler Covid-19-what-proportionare-asymptomatic. updated 6th Estimating the asymptomatic proportion of coronavirus disease 2019 (covid-19) cases on board the diamond princess cruise ship Munir W (2020) A patient with asymptomatic sars-cov-2 infection who presented 86 days later with covid-19 pneumonia possibly due to reinfection with sars-cov-2 Pcr assays turned positive in 25 discharged covid-19 patients Against COVID-19 Post-Acute Care Study Group (2020) Persistent symptoms in patients after acute covid-19 The importance and challenges of identifying sars-cov-2 reinfections On the role of asymptomatic infection in transmission dynamics of infectious diseases Modeling intervention measures and severitydependent public response during severe acute respiratory syndrome outbreak An effective gray-box identification procedure for multicore thermal modeling Real-time tentative assessment of the epidemiological characteristics of novel coronavirus infections in wuhan, china, as at 22 Oxford covid-19 government response tracker, blavatnik school of government Global covid-19 case fatality rates. updated 9th Sars-cov-2 immunity and reinfection Immunological memory to sars-cov-2 assessed for up to 8 months after infection Three-quarters attack rate of sars-cov-2 in the brazilian amazon during a largely unmitigated epidemic Sales FCS, al (2020) Evolution and epidemic spread of sars-cov-2 in brazil Nonpharmaceutical interventions for pandemic influenza, international measures Sars-cov-2 viral load in upper respiratory specimens of infected patients Association of initial viral load in sars-cov-2 patients with outcome and symptoms Estimation of covid-19 outbreak size in italy We would like to thank anonymous teams involved in the data collection and data repositories development all around the world.