key: cord-0804339-izfghh4y authors: nan title: A Generalized Mechanistic Model for Assessing and Forecasting the Spread of the COVID-19 Pandemic date: 2021-01-18 journal: IEEE Access DOI: 10.1109/access.2021.3051929 sha: cea64bf7db302b3a54a0cdd949e9fbd4872ab526 doc_id: 804339 cord_uid: izfghh4y Since early 2020, the world has been afflicted with an unprecedented global pandemic. The SARS-CoV-19 (COVID-19) has levied massive economic and public health costs across many countries. Due to its virulence, the pathogen is rapidly propagating throughout the world in such a way that makes it incredibly challenging for officials to contain its spread. Therefore, there is a pressing need for national and local authorities to have tools that aid in their ability to assess and extrapolate the future trends of the spread of COVID-19, so they may make rational and informed decisions that minimize public harm. Mechanistic models are prominent mathematical tools that are used to characterize epidemics. In this paper, we propose a generalized mechanistic model with eight states characterizing the COVID-19 pandemic evolution from a susceptible state to discharged states while passing by quarantined and hospitalized states. The parameters of the model are determined by solving a fitting optimization problem with three observed inputs: the number of infected, deceased, and reported cases. The model’s objective function is weighted over the training days so as to guide the fitting algorithm towards the latest pandemic period and lead to more accurate trend predictions for a stronger forecast. We solve the fitting problem with the Levenberg-Marquardt algorithm; we compare the performance of the model generated from this algorithm to the one of another state-of-the-art fitting algorithm as well as to the one of another compartmental model widely used in literature. We test the model on the COVID-19 data from four highly afflicted countries. The fitting algorithm has been validated graphically and through numerical metrics, and results show significantly accurate results for most of the countries. Once the model’s parameters are estimated, forecasting results are derived and uncertainty regions of the expected scenarios are provided. Originally starting from Wuhan, Hubei a province of China, the first cases of COVID-19 were announced in the late of 2019. The outbreak quickly spread within China due to the massive transportation and large population mobility before the Chinese Spring Festival, and subsequently, it spread all over the world. The WHO declared the outbreak to be a Public Health Emergency of International Concern on January 30 th , 2020 and recognized it as a pandemic on March 11 th , 2020. Generally, coronaviruses are not dangerous, however, the new COVID-19 comes with a considerable mortality rate. Currently, more than 12 millions cases of COVID-19 are confirmed resulting on more than 550 thousand deaths in 216 different countries, areas, and territories as indicated by WHO through the COVID-19 situation dashboard website. These numbers are expected to dramatically increase. Most of these cases are located in the United States where more than 3 million confirmed cases and more than 110 thousand confirmed deaths have been reported representing around 25% and 24% of the global results, respectively [5] . In addition to the high fatality rates, the rapid spread of COVID-19 has engendered tremendous social and economic implications all over the world [6] [7] [8] [9] [10] . For example, R. Y. Kim identified the pandemic as an accelerator of the structural change in consumption and the digital transformation in the marketplace. He also explored how the pandemic accelerated the growth of e-commerce [11] . The global economy has also been impacted since the usage of human distancing, as an easiest solution to limit the spread of the disease. Thus, non-essential services have been shut-down, and hence, hundreds and even thousands of people lost their jobs. Furthermore, since most of the world nations have closed their borders, the international trade has collapsed. Additionally, the impact of the COVID-19 pandemic has reached the telecommunication industry. Accordingly, Internet Providers (IPs) have recently reported a massive and huge traffic due to the lock-down and stay-home orders applied by several states in the USA and other regions and countries [12] . Over just few months, COVID-19 has changed the world and specifically our habits and daily lives and the situation does not seem to return to its normal soon in many countries. Therefore, it is very important to understand the behavior and expected trend of this disease and predict its spread. Forecasting the future evolution of this pandemic will also help assess its future consequences on different social and economic sectors. In literature, mathematical modelling has been widely used to model pandemics since they provide a quantitative framework that scientists can assess to build hypotheses on the potential underlying mechanisms explaining patterns in the real observed data at both spatial and temporal scales [13] [14] [15] [16] [17] and especially to model the COVID-19 outbreak [18] [19] [20] [21] [22] [23] . Mathematical models may have high different complexity levels depending on the number of variables and parameters used to characterize the dynamic states of the system, their spatial and temporal resolutions, e.g., dis-crete vs. continuous time, and their design, e.g., deterministic or stochastic [24] . The two main mathematical models used to characterize pandemics are phenomenological and mechanistic models: • Phenomenological models are used to describe the relationship between the patterns of the data without a specific basis on the physical laws or mechanisms using an empirical approach, i.e., phenomenological models serve to describe and explain the reasons of the observed interactions between the patterns as they are. Two useful phenomenological models to characterize pandemic growth patterns exist: the Generalized Growth Model (GGM) and the Generalized Richards Model (GRM) [24] [25] [26] [27] [28] . Although they are commonly used, these models have limitations and downsides because of their subjectivity. Thus, establishing the reliability and validity of these approaches can be challenging, which makes these models very subjective and requires elaborate efforts to output accurate results. • Mechanistic models are used to explain patterns in the observed data while considering key physical laws or mechanisms involved in the dynamics of the investigated problem. The Susceptible-Infected-Recovered (SIR) and Susceptible-Exposed-Infectious-Recovered (SEIR) models are the least complex mechanistic models [29] , and they are frequently used to characterize pandemics [24] , [30] [31] [32] [33] . Tuen Wai Ng et al. studied the SARS pandemic and compared the SIR and SEIR models' performance in characterising it. Then, they proposed a new SEIRP model by adding a protected state and evaluated its performance on multiple use-cases of SARS outbreak [34] . Another variant of mechanistic models are used in [35] to study the Ebola pandemic, where Tae Sug Do et al. presented a model called SLIRD to better understand the spread of the disease and tested their model on the outbreak in Nigeria. Moreover, the authors of [36] presented a calibration of a SEIR-SEI pandemic model to describe the dynamics of the 2016 Zika virus outbreak in Brazil. The authors of [19] , investigated the lifting of control measurement, in Wuhan, China, especially social distancing and the effect of returning to work using an SEIR model. Recently, the authors of [37] presented a generalized SEIR model to characterize the COVID-19 outbreak in different hot spots in China and applied inverse inference to the starting date of the outbreak. Similarly, the study in [38] examined the baseline SIR model along with the SEIR model to characterize the spread in Canada. This study modelled the social distancing through the isolation of a subset of the susceptible population. Another recent study, presented in [39] , proposed a comprehensive solution to estimate the mortality due to SARS-Cov-2 infection. An SEIR mathematical model describing infection transmission and death was developed to estimate the case-fatality, the symptomatic case fatality, and the overall infection fatality ratios. The mechanistic model has been applied to different impacted regions of the world with the objective of comparing the different indicators and asses the mortality dynamics per age group. Moreover, in [21] VOLUME 9, 2021 presented a SIDARTHE model, an extension of the SIR model, which distinguishes between detected and undetected cases and between the different Severity of Illness (SoI) and applied it to characterize the spread in Italy. In [40] , the authors presented a basic SEIR model to model the COVID-19 spread in India and used it to simulate and evaluate different scenarios of the spread based on the transmission rate and finally, they proposed potential future evolution of the spread and possible mitigation. Along with mathematical modelling, in [41] , the authors employed a machine learning algorithm namely, the decision tree algorithm, to study the indicators responsible for the evolution of COVID-19 pandemic using multi-sourced data. The study identified the population density as one of the most important determinants of infection spread. Moreover, the authors of [42] addressed the future forecasting of COVID-19 spread by identifying the threatening factors using supervised machine learning models. They focused on forecasting the number of newly infected cases, the number of deaths, and the number of recoveries in the next ten days after the training period. Although the previously mentioned studies, at some points, succeeded to model the pandemic, they only focus on specific countries or regions and their models are not generalized to fit different regions simultaneously. The fitting is also usually done using one target data set containing the number of confirmed cases. Moreover, the presented models are usually based on simplistic models, e.g., SIR and SEIR and do not provide an advanced overview about the different scenarios that a contaminated person may face. In this paper, we propose to investigate the evolution of COVID-19 in different countries of the world using an extended mechanistic model. The developed model is composed of eight states: protected, susceptible, exposed, infectious, recovered, hospitalized, quarantined, and deaths. These states aim to characterize the possible behaviour of the crowd and their effects on the pandemic spread. The evolution of the pandemic is then modeled as an Ordinary Differential Equation (ODE) system interconnecting the different states. In order to forecast the evolution of COVID-19, we proceed with an identification process that aims to estimate the values of the coefficients of the model. To this end, a fitting optimization problem is developed having as inputs three target data sets, i.e., observations: confirmed infected cases, death cases, and recovered cases. The objective function to be optimized is modeled as a weighted non-linear least squares function and is solved by the Levenberg-Marquardt (LM) algorithm. The weights are added to guide the focus of the fitter towards a particular period of the trend. The model is fitted and tested on multiple countries and show accurate performances. The impact of the weights on the fitting performance has been also evaluated. The performances of the fitting algorithms are also compared to the ones of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) [43] algorithm for various metrics. The estimated parameters are then used to forecast the future evolution of the pandemic given the current observation. Then, uncertainty regions are provided to visualize the expected upper and lower limits of the future evolution of the number of cases and deaths. To the best of our knowledge, we are the the first to characterize a pandemic with a developed model presenting a loop between the states while taking into consideration all possible movements between them. Moreover, we are the first to employ a weighted fitting process that endorses the forecasting process by focusing on the last trends of the spread. We also provide a comparison of the estimated, best, and worse cases with regards to the infection rate of the disease to estimate how well the countries operated against the pandemic and finally, to validate our forecast results, we compare the performance of our proposed model to those of a Random Walk (RW) approach with drift [44] and show that our proposed model outperforms the RW in most of the investigated scenarios. Finally, we have validate the efficiency of our proposed model with the state-of-the-art SEIRD model composed of five compartments, namely, susceptible, exposed, infected, recovered and deceased cases. The rest of the paper is organized as follows. In Section II, we present the developed model and the different parameters that characterize the COVID-19 spread. Then, we present the proposed weighted fitting technique in Section III. The Section IV is devoted to evaluate the performance of the fitting algorithm as well as expected evolution of the COVID-19 spread in the upcoming period using the proposed fitting technique. The paper is concluded in Section VI. In this section, we present a generalized model to characterize the COVID-19 pandemic composed with eight different states or phenotypes. These states are supposed to characterize all the life cycle of an infected case before the infection, during the infection, and after being discharged, i.e., either deceased or recovered as shown in Fig. 1 . Each state in our model is supposed to characterize, at a given unit of time t, in our case day, a specific population's behaviour as follows: • S(t): A state representing the total number of susceptible cases that can be infected with the COVID-19 pandemic. • P(t): A state representing the total number of protected cases that are taking the required precautions, i.e., quarantine, social distancing, etc. This state represents the category of people that are very unlikely to be infected with the COVID-19. • E(t): A state representing the total population exposed to the pandemic and are infected but not yet reported. The population within this state are supposed to be infected but not yet be infectious, i.e., in a latent period. An exposed case can be, or not, reported. • I (t): A state representing the number of confirmed infected cases with infectious capabilities that are not respecting the stay-home rules, if exist, and propagating the disease. • Q(t): A state representing the number of confirmed cases that are quarantined and are not able to infect other people. In this paper, we suppose that not all the confirmed cases are quarantined. • H (t): A state representing the number of confirmed cases that are hospitalized. This state includes the cases who require an Intensive Care Unit (ICU) bed to be treated. In this paper, we suppose that the hospitalized cases are quarantined and hence, they do not transmit the disease to other people. • D(t): A state representing the total number of deaths due to the COVID-19 pandemic. • R(t): A state representing the total number of recovered cases from the COVID-19 pandemic. We assume that the recovered cases are supposed to gain an immunity against the disease and cannot be infected again [45] . Given the definitions of the states presented earlier, we define that the total population, denoted byP, asP = S(t) + P(t) + E(t) + I (t) + Q(t) + H (t) + R(t) + D(t) for each day t. We can also define the total number of confirmed infected cases, denoted by I rep , and the total number of infected but not reported cases, denoted by I Nrep , for each day t, as follows: To characterize the possible interactions between the previously presented states, we represent each possible transition from one compartment to another with an arrow as shown in Fig. 1 . Each arrow is labeled with a rate describing the amount of time required for the transition to take place multiplied by the population of the group of individuals that the transition applies to [46] . The rates between the different compartments are defined as follows: • δ −1 : The length of the incubation period, i.e., the average period from exposure to symptoms onset. • σ −1 : The expected amount of time during which the health of an infected person will get worse and require hospitalization. • γ −1 : The average quarantine period. • α −1 : The average period for hospitalized cases to be moved from ICU bed to quarantine. • −1 : The expected number of days for a quarantined case to get worse and require hospitalization. • λ: The infected population's fatality rate. • ν: The hospitalized population's fatality rate. • χ: The quarantined population's fatality rate. • µ: The quarantined population's recovery rate. We also employ two other parameters to characterize the disease and the possible protection methods that healthy people may employ to protect themselves: • τ : A protection rate to quantify the possible protection measurements, i.e., social distancing, mask usage, quarantine respect, etc. • β: An infection rate reflecting the expected number of people that an infected person infects per day. In our model, we suppose that an infected case can be healed only after passing by the quarantine state because the COVID-19 usually has symptoms, even light ones, that will oblige the contaminated persons, sooner or later, to stay home. In other terms, our assumption is to suppose that the COVID-19 symptoms with different degrees (light to severe) will appear on all the infected cases and will force them to stay home, i.e., quarantined for at least one day before being totally healed from the diseases. Since our model is based on a constant infection rate β, considering that an infected person will be placed on quarantine before its final recovery will allow our model to learn that an infected case in state I will not affect people daily, i.e., this assumption will make our model closer to reality by allowing it to automatically learn that an infected person may or may not infect people daily and will help, when needed, determine accurate basic reproduction number values regardless the constant parameters. Other studies are considering similar scenario where recovery is only possible through quarantine can be found in [37] , [47] . Moreover, we assume that an infected case may require intensive care as soon as its contamination is confirmed and hence, can immediately hospitalized in state H . The infected cases may also choose to be quarantined in state Q and hence, will not be able to spread the disease or they may also keep contaminating other people and spreading the disease in state I . We also assume that hospitalized cases are quarantined by default so they do not spread the virus but they may pass away while in intensive care, otherwise they may be moved to the quarantine state Q before their final discharge, i.e., either to recovered state R or the death state D. Moreover, in the model, we suppose that an infected case may move in loop between the quarantined and hospitalized states multiple times. Also, we assume that infected cases may die while being on quarantine or hospitalized at any instant of time otherwise they will recover. Finally, our analysis exclusively encloses the COVID-19 cases and hence, we only consider the death cases that are only due to the COVID-19. The proposed model, shown in Fig. 1 , works as a closed system and is designed to track the local spread inside multiple countries to help officials take convenient protective policies against this pandemic such as border opening dates and stay-home orders. It can be translated into an ODE system that resumes the variation speed between the different states. The ODE system is expressed as follows: An initialization of the system is needed in order to solve the ODE system (2). The initial conditions will be denoted as follows by S 0 , P 0 , E 0 , , and R(t), respectively. This model will be used to characterize the spread of the COVID-19 pandemic. To this end, curve fitting will be applied using three target data sets to estimate the parameters of the model. Once estimated, the ODE will be solved by integrating each equation in (2) over the desired period to determine the population of each state with respect to the initial conditions provided. Lastly, our proposed model has many novel contributions compared to previous studies on COVID-19 spread. In Table 1 , we introduce a high-level comparison between our model and those used in these studies. The goal of this section is to present the curve fitting model and discuss the algorithm employed to solve the corresponding optimization problem, i.e., the Levenberg-Marquard method. Finally, we discuss the employed techniques to validate the efficiency of our model. The optimization phase is based on three target real-world data set (real data or observations), collected from official data sources. Each data set represents a vector of length N where N is the number of training days, i.e., period during which real data is observed and collected. The vectors are defined as follows: • Deaths (D r ): the elements of this vector indicate the total number of confirmed deaths due to COVID-19 at each day officially reported. • Recovered (R r ): the elements of this vector indicate the total number of recovered cases at each day officially reported. • Infected (I r ): each element of I r contains the total number of confirmed infected cases officially reported. This accounts for previous infected cases plus the new reported cases that day. In this paper, we investigate a multiple data set optimization problem, in which we aim to fit the curve of the model to the actual data by minimizing an objective function that takes into consideration the errors between every real and estimated data at each training day. In the sequel, we denote by n the n th training day where n = 1, . . . , N . Hence and for notation purposes only, we use this discrete notation n instead of the continuous one t used in (2) . Finally, we propose to employ a weighted Non-Linear Least Squares (NLLS) method where the objective function that calculates the Mean Squared Error (MSE) between the predicted and the real data values given the estimated values of the fitting variables is expressed as follows: where I r (n), R r (n), and D r (n) are the n th element of the real data of the infected, recovered, and deaths vectors, respectively. The estimated vectors I r (n, θ) = I (n, θ) + Q(n, θ) + H (n, θ) + R(n, θ) + D(n, θ), R(n, θ), and D(n, θ) are the predicted data of the total number of I r (n), R r (n), and D r (n) given a value of a predict vector θ, respectively. The optimized vector θ includes the list of all target parameters as follows: and finally, w n are the weights of the data where w n ∈ [0, 1] and N n=1 w n = 1. Notice that the I r (n, θ) is expressed as the sum of all the states that involve infected, recovered, and deceased persons as per the definition of I r (n). The idea of employing weights is to endorse the fitting process and force the model to prioritize the last trends of the spread for each country and prevent the initial periods from miss-leading the model, and hence, ensure that the forecast follows the same latest trends to provide accurate forecast results that are on the verge of reality. While uniform weights maintain the same importance level for all the observed results for each day, the piecewise constant, linear, and exponential weights will increase the importance of the latest observed results and neglect the earliest ones uniformly, linearly, and exponentially, respectively. A convenient choice of the weights, that does not completely neglect the first observations, will not necessarily lead to an overfitting since the predictions for a given day t are highly correlated with the data of the last recent period. In other words, the number of active infected cases reported in the recent few past days are those who are going to spread the disease and contaminate a part of the susceptible population S. This correlation relationship between day t and the previous days is perfectly modeled using the weights that will give a higher importance with respect to time so that our model will learn two major properties to help the forecast: i) the smooth evolution of the weights will help optimize the parameters of the model and so learn the relationship between the states and ii) the last trends will help achieve accurate forecasting results. The objective function of the fitting problem is weighted according to the training days. Hence, the values of the weights will allow the possibility to orient the focus of the fitting model towards some specific days more than the others. This will allow, for example, to relax the focus on the fitting model on the first periods of the training where the number of infected cases is very low. Indeed, the ultimate objective is to predict the evolution of the COVID-19 spread right after the training phase. In this paper, we test the following three configurations of weights: • Uniform function: this is the typical model, which is investigated in most of the previous studies. In this model, a uniform weight is assigned to each training day, i.e., w n = 1 N , ∀n. Hence, all the days are treated similarly by the fitting model. • Piecewise constant function: in this case, some consecutive days are given additional priority more than the others, e.g., w 1 = . . . , = w n 0 = p 1 and w n 0 +1 = . . . , In this example, the fitting model will focus less on the n 0 first days than the remaining days. Hence, less fitting errors will be tolerated for the first days. • Non-decreasing function: in this model, the first day will be assigned the least weight value while the last day will be assigned the highest weight value. The weights are assigned to the remaining days following a non-decreasing function, e.g., linear or exponential function. Hence, the fitting model will progressively give more importance to last days. This is a very important choice since the first days of the pandemic are usually characterised with a low number of deaths, infected cases, and recovered persons. Moreover, as we aim to forecast the future evolution of the pandemic, it is worthwhile to have accurate fitting at the end of the fitting period so as to obtain exact trends at the last portion of each data. The ODE system presented in (2) could be expressed in the following non-linear matrix form: where Note thatẊ(t) is the derivative with respect to time of X(t). Finally, (7) as shown at the bottom of the next page. Hence, the Non-Linear Least squares optimization problem for the proposed mechanistic model for COVID-19 characterization: Due to the non-convexity of the problem, the NLLS optimization problem given in (8) cannot be analytically and optimally solved. In this case, there are two major approaches. It is possible to exploit meta-heuristic algorithms to solve the NLLS problem [50] [51] [52] or employ numerical optimization algorithms that attempt to reach local minima with gradient-based techniques. The latter method is exploited in this paper. The main representative of this class of algorithms is the Newton algorithm. A downside of this algorithm is that it is considerably time-consuming due to the need for line searches and the VOLUME 9, 2021 computation of the Hessian matrix. To solve NLLS estimation problems, it is often better to exploit the quadratic structure of the cost function, as it is done by the Gauss-Newton (GN) [53] and Levenberg-Marquard (LM) [54] , [55] algorithms, which are, by far, the most popular NLLS optimization algorithms. In this study, we exploit the fitting algorithm to estimate the parameters presented in (2) that minimize the squared error between the real data I r (n), R r (n), and D r (n) and their corresponding predicted values. The LM algorithm [54] is an iterative technique to solve NLLS problems. This method is a combination of the Gradient Descent algorithm, that is efficient for early iterations, but performs slowly when it gets close to the best-fit values, and the GN algorithm, that is inefficient in early iterations, but performs perfectly when close to the best-fit values. The LM method uses steepest descent in early iterations and then gradually switches to the GN approach which means that in many cases it finds a solution even if it starts very far from the final minimum value and hence, it guarantees convergence. In fact, LM method can be seen as a generalization of the GN algorithm as its normal equations to estimate the fitting parameters are derived from those of the GN method with an additional damping factor that is adjusted every iteration. LM is slightly more computationally demanding than GN, but it converges for all initialization combinations that are far away from the solution, where GN often fails [56] . LM shows global convergence properties and is, therefore, the preferred method of choice in common NLLS problems and consequently for our problem [56] . Finally, to ensure an efficient fitting, especially that the optimization target function is nonconvex, we consider multiple starting point optimization (i.e., multiple starting points of θ) to select the best fitting solution among the tested ones. The model validation is the most important step in the model building process. Our model validation process is composed of two levels: • Graphical validation: In this level, we validate the model by comparing the original data and the fitted data in the same graph. This method is important to visualize the overall variation of the model through time. • Numerical validation: In this level, we calculate numerical metrics to evaluate the error between the estimation and the real data and to verify the wellness of the fit. The employed metrics are classified into three types [57] : • Scale-dependent metrics: These metrics provide insights about the difference between the predicted and measured values. The most commonly used are: Mean Absolute Error (MAE) and Normalized Root Mean Squared Error (NRMSE) normalized by the difference between maximum and minimum actual data. Note that our observed data, in practice, is not a constant data. Their expressions are given as follows: • Percentage-error metrics: These metrics provide insights about the percentage of the difference between the approximated and observed values. The most commonly used is the Symmetric Absolute Percent Error (SAPE). Its expression and its mean are given as follows: SAPE n = |y n −ŷ n | y n +ŷ n 2 , ∀n, • Relative-error metrics: These metrics provide insights about the error with regards to its real observed values. The most commonly used are the Mean Relative Absolute Error (MRAE) and the coefficient of determination R 2 , which are expressed as follows: whereȳ is the mean value of the observed data. Similarly to the percentage-error metrics, the relative-error metrics takes into account the percentage change between the estimated and the observed values of the data. However, it provides insights on the quality of the training in general by considering the order of amplitude of the data that may significantly vary between different parts of the training period. The values of these metrics are independent of the population of the studied countries. A perfect fitting should achieve an MRAE and R 2 close to 0 and 1, respectively. Our model is tested and validated using those two levels of validation in the following section. In this section, we present the simulation results, investigate the proposed model performance, and validate the use of the proposed fitting technique. To this end, we start by scrutinizing the COVID-19 spread in Russia to highlight the effectiveness of the training of the LM algorithm and the use of different weights in the fitting objective function. Afterwards, we present the fitting results for all the other investigated countries, namely Brazil, Italy, and USA using the proposed approach and the best weight combination. We also compare the performances of the employed LM algorithm to those of another fitting approach, i.e., BFGS [43] . The proposed mechanistic model is investigated using a real world data set containing the number of reported infected, deceased, and recovered cases for the different countries. The Novel Coronavirus (COVID-19) Cases Data data set used in this paper is obtained from The Humanitarian Data Exchange website. 1 The initialization values of each country are given in Table 2 . In this study, the training period starts on January 22 nd , 2020 and ends on June 4 th , 2020 unless otherwise stated. For some countries, we suppose that E 0 = 1 even though no confirmed cases are reported, namely for the case of Russia and Brazil, to take into consideration the delay of an exposed case to be reported, i.e., the time from being exposed to the virus until the appearance of the symptoms. Finally, the estimated parameters obtained from the fitting are used to forecast the future pandemic spread for each country by integrating the ODE equations over the desired period. The selected countries are chosen due to their high number of confirmed cases reported to WHO. They are classified among the highest infected countries and witnessing varying trends, 1 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases which might be challenging for our model. The forecast of the spread for each country is over a period t = 365 days, starting from January 22 nd , 2020. This work is implemented using python 3.7. As for the ODE integrator, we use the SciPy library where we employ the 'odeint' function that solves systems of first-order ordinary differential equations using 'lsoda' implementation of the FORTRAN library 'odepack'. For the fitting, we employ the python library 'Lmfit' that solves the formulated optimization problem (P) using a predefined algorithm (e.g., LM and BFGS algorithms) Taking advantage of the 'odeint' function, we use a multi-start optimization to run the 'Lmfit' and then, randomly select initialization values achieved the best fitting performance (i.e., minimizing the objective function of the optimization problem (P)). Note that, we selected the same initialization values for both fitting algorithms (LM and BFGS). During this phase, we employ the LM algorithm to solve the fitting optimization problem and estimate the ODE system parameters. To this end, we investigate the effect of weights employment in the fitting process. In this paper, three forms of weights are considered: piecewise constant, linear, and exponential weights. The expressions of the employed normalized weight functions are given, respectively, as follows: (13c) Note that the weight functions are obtained after normalization over the number of training days (N = 135 days) and that the sum of their values is equal to 1. Fig. 2 plots the curves corresponding to the considered normalized weight functions. In Fig. 3 , we represent the variation of the MSE error with respect to the training iterations for the case of Russia employing the three different weight functions and the LM algorithm. It is shown that, in all cases, the LM reaches stagnation at different minimum values, which confirms the convergence of the model. However, these values are very close to each other for all weight functions. Note that the plotted MSE in this figure considers the different data sets all together, i.e., the infected, deceased, and recovered data. Indeed, it plots the achieved weighted objective function expressed in (3) during the training process. Graphically, we visualize the trained fitting of the different observed states separately using the LM algorithm for the different functions of the weights in Fig. 4 . By observation, VOLUME 9, 2021 these graphs confirm that the exponential and linear weights outperform the piecewise constant function of the weights. Indeed, although the fitted data do not exactly fit the real data at the beginning of the pandemic, it successfully follows it at the latest training days, which makes the models ready to forecast with respect to these trends. The fitting results for each weight function using the LM algorithm are numerically corroborated in Table 3 where we provide the achieved average values for the different validation metrics. The table shows that the use of exponential weights allowed to obtain less error than the other investigated weights in terms of all the evaluation metrics: state-dependent, percentage-error, and relative-error metrics. For instance, the MAE with the exponential weight provides is around 44215 which is 51% less than the MAE obtained with the piecewise constant weight function. In terms of R 2 , that reflects the goodness of fit, we can notice that the exponential function of the weights provide the highest value 0.94, which is very close to 1, compared to the R 2 achieved by the piecewise constant weight function (i.e., R 2 = 0.86). This proves the efficiency of the fitting of the exponential weights. Indeed, as mentioned earlier, the non-decreasing functions of the weights allow the fitting algorithm to focus more on the latest training days in progressive way and due to the nature of the pandemic which starts with very small number of cases, the fitting errors are extremely reduced for relatively high number of infected, deceased, and recovered cases. Finally, in Fig. 5 , we plot the daily values for each training day of the APE, RAE, and AE metrics using the exponential function of the weights. The figure clearly illustrates the focus of the fitting on the last days of the training. Indeed, the SMAPE is significantly reduced for the different states during that period. The RAE is again confirming the efficiency of the fitting where the relative-error is less than 2% for the whole training period except some particular spots as the fitting is not exactly perfect. Finally, the AE highlights the absolute error for each day and state. Although it may reach the order of thousands, this error remains small compared to the real total number of infected and recovered cases. Throughout the rest of this section, we proceed with the exponential function of the weights to perform the fitting process. In this section, we present the results of the proposed weighted fitting technique using the LM algorithm for different countries. Afterwards, we compare the efficiency of the proposed fitting model to another fitting algorithm, i.e, the BFGS. We propose to provide the fitting results for several countries to prove that our model is capable of providing accurate fitting in many situations. Indeed, we chose Italy as one of the first affected countries witnessing one of the highest death rates during the first four months of the pandemic, and which is showing now stable and much safer situation. Russia is also a very interesting scenario to study due to its unique spread profile (i.e., Russia witnessed a tremendous spike in COVID-19 virus infections after a period characterized with a low infection rate). Finally, we investigate the case of Brazil, which is currently witnessing its spike. The results presented in Fig. 4, Fig. 6 , and Fig. 7 show that the model can adapt the last trend of the data for all previously mentioned countries with some minor errors. Indeed, due to the existence of three different data sets having different orders of magnitude and trends, the fitting algorithm may not fit exactly all the curves but globally the fitting is effective as confirmed by the achieved metrics presented in Table 4 . Italy is showing the weakest results among the three investigated countries. Although it achieves similar R 2 as Russia, the fitting of Italy reaches around twice the MRAE value obtained with Russia. It is also confirmed graphically in Fig. 6 for the infected and due its small population. Notice also the MAE of Italy is low compared to Russia and Brazil because of the low number of infected cases reported as well as its small population. The fitting of Brazil is the most accurate as confirmed graphically and with the achieved evaluation metrics (R 2 = 0.99 and MRAE = 0.34). Minor errors are noticed mainly during the first period of the pandemic as tolerated by the utilized weight function. Furthermore, in this study, we compare the performance of the proposed fitting algorithm to the one of another fitting algorithm employed in [43] where the BFGS algorithm is used to resolve the fitting problem. In Fig. 8 , we plot an example of the fitting for the case of Russia using the exponential weight function and BFGS algorithm. It is shown that the BFGS partially fails in fitting all the three data sets, especially the latest days of the infected and recovered cases. In Table 5 , we provide a performance comparison between the VOLUME 9, 2021 LM and BFGS algorithms for all the considered metrics and the investigated countries. We notice that LM considerably outperforms the BFGS especially in Italy and then, Russia. However, both algorithms achieve similar results for the case of Brazil with a certain advantage for the BFGS in terms of MAE. In general, the LM algorithm provides much more accurate results than the BFGS. In this study, we consider the USA as a special case. Indeed, unlike other countries which are controlled by a single authority, the USA is composed of fifty different states. Each one of them is governed independently of the other and in a different manner. For example, some states are imposing the lockdown, e.g., NY and NJ, and the stay home order at the beginning of the pandemic (early March) while others do not or impose it tardily, e.g., FL. Moreover, the Black lives matter protests on different US states were unpredictable and their impact on the spread can not be measured and/or characterized and hence, this can have a huge impact on the spread of the COVID-19. Hence, treating the US as a single entity is not appropriate and may lead to inaccurate results due to the heterogeneity of the states. Therefore, one possible option is to investigate each state separately as a single homogeneous entity. However, due to the non-availability of the data in our database that we used with other countries, we do not include these results in this study. In fact, we propose to deal with the USA case by dividing the fitting period into two periods or phases so that the model will be aware that two waves exist, one dominated by the northeastern states, NY, NJ, and CT, during the first four months of the pandemic and the other is dominated by southern states such as TX, FL, and CA starting from June 13 th . In this scenario, we can obtain relatively accurate fitting results as shown in Fig. 9 and 10 with R 2 = 0.992 and R 2 = 0.993, respectively, as given in Table 6 . In the latter table, we also provide the fitting results for the whole period (Phase I and Phase II combined) where two COVID-19 waves are registered. As explained earlier, the fitting was not successful with a poor R 2 = 0.36. Nevertheless, along the two periods, the model successfully fits the real data and shows acceptable results despite the previously mentioned characteristics of the studied country. In other words, for each period, different parameters of the mechanistic model are estimated unlike the other studied countries in the previous sections. Hence, for the USA, we only prove that our model is able to fit the reported cases representing the spread for these two periods. Otherwise, combining the two periods will require the adoption of fitful model parameters that vary with time, e.g., β(n) instead of constant β. This study is more elaborate and will be the focus of our future extension of this work. The estimated parameters of the model of each country are provided in Table 7 . These values are used to forecast the future evolution of the pandemic in the next section. In this section, we extrapolate the COVID-19 spread in the studied countries using our fitted models to forecast its evolution during the next period. We also discuss in details the evolution of each state. In addition, we provide the upper and lower bounds of the expected spread representing the worst and best cases scenarios for these countries based on our model. We particularly select to investigate the forecasting results of Russia and Italy as they are already reach their peaks unlike Brazil. The case of USA is also omitted due to the inaccurate fitting results that we discussed the reasons earlier. Finally, we compare the forecasting performance of our model to the one of the RW model. The forecasting per state, for Russia and Italy, are shown in Figs. 11 and 12 , respectively. In these figures, the total sick represents the sum of the population of the states E, I , Q, and H , that are already contaminated with the virus. As for the confirmed cases, it stands for all the reported cases represented by the states I , H , and Q. The fitting model based on the current spread parameters expects that the cumulative number of infected cases will reach about seven million cases and 250 thousands in Russia and Italy, respectively. The forecast results for Italy show that the infected cases are being reported as soon as the exposed cases start to appear as it shown by evolution of the pink (exposed cases) and light green curves (confirmed cases) of Fig. 12 . Moreover, the big gap between the confirmed cases and the exposed cases (confirmed cases is always higher than exposed cases) reflects that the country is efficiently identifying the infected cases, which proves the effectiveness of the prevention approach employed by this country to avoid higher propagation of the virus. However, for Russia, Fig. 11 shows that the number of exposed cases is more important than the number of confirmed cases which explains the huge number of cumulative cases that are expected to be reported by the end of the forecast period. Nevertheless, these results may change in practice. The local authorities, in the case of Russia, may impose stricter policies which may limit the propagation of the spread. Recall that our forecast are based on the real data observed in each country and hence, the forecast will extrapolate the evolution of the model states based on the spread data. Hence, if a country is not adopting an effective strategy at the beginning of the pandemic, the estimated results might be relatively critical as it was expected by our model for Russia. Also, in Figs. 11 and 12, we observe that the majority of confirmed cases are at the quarantine or hospitalized state, i.e., either being quarantined at home (red curve) or being quarantined at hospitals (black curve), which proves the effectiveness of the stay home orders taken in these countries and their policies to limit the spread. However, we can notice that, in Italy, the quarantining process starts around 45 days since the beginning of the training period, which is around March 10 th the day when the stay home order was announced). In Russia, the lockdown starts around 75 days (end of March) and the number of cases is expected to be very high if no actions are taken to stop the spread. By the start of 2021 we expect that few infected cases will be reported and the spread will be significantly reduced for both countries. Since our fitting model is trained during the first four months of the pandemic spread, the forecast results may not be accurate in practice. Indeed, several policies can be further ordered and social distancing will be much better applied and hence, these actions may slow down the spread. In other scenarios, people will not conveniently apply the COVID-19 prevention guidelines anymore, which may speed up the propagation of the virus. Therefore, in Figs. 13 and 14 , we provide the uncertainty regions that may characterize the COVID-19 spread in terms of number of active cases and number of deaths. The objective is to use the parameters estimated by our model and vary the infection rate, β, that reflects the daily number of contaminated person that an infected person can infect. By varying this parameter, we can evaluate the best and worst case scenarios that may occur for these countries. In Figs. 13 and 14 , we also provide the observed results during the seventeen days right after the fitting period (fit/forecast split in the figure). The objective is to show that the fitting and the resulting forecast are accurate and have the same trends as it is observed in reality in terms of reported infected and deceased cases. Regarding the expected evolution, we notice that for Italy the number of active cases will significantly diminish and the number of deceased persons will reach its maximum and slightly increases. However, for Russia, if no actions are taken by the authorities and officials, our model expects that the number of infected cases (new cases plus active cases) will keep increasing until reaching a peak in August and then, it will start decreasing significantly. Moreover, the number of deaths is expected to continue growing and slowing down starting from September. In the sequel, we provide two comparisons of our proposed model with a reference model, namely the RW approach with drift algorithm. The comparison consists of forecasting the data for the next 31 days right after the fitting period. We compare the results graphically and numerically to evaluate the efficiency of the proposed model. This comparison targets only the infected and deaths states due to the lack of data for the other states, which prevents us to build their respective RW models. Initially, the RW model is built using the observed data during the fitting period. Then, it is used to forecast the evolution of the number of active and deceased cases. We have generated two random forecasting attempts that we denoted by RW model (Test I) and RW model (Test II). In addition to that, we have provided the forecast based on the estimated mean (RW mean) and the forecasts based on a 75% prediction interval, named RW upper bound and RW lower bound where there is 75% chance that the predicted observations will fall within their ranges. Figs. 15 and 16 plot the forecast number of deceased and infected cases for Russia and Italy, respectively using our proposed model (with exponential weights) and the RW model. In each figure, six curves are presented with different colors: the observed data during the whole period (red), our proposed model during the fitting and forecasting periods (black), the RW mean (yellow), the RW model (Test I) and (Test II) (black and brown, respectively), and finally, the 75% prediction interval identified by the two green limits. The curves related to the RW model are plotted during the fitting period. The split between the fitting and forecasting periods are indicated by the purple vertical line. For fairness reasons, we train the models (proposed and RW) over the same fitting period (135 days) and investigating their prediction performance for a forecasting period of 31 days. The numerical results related to Figs. 15 and 16 are given in Tables 8 and 9 , respectively, where the NRMSE and SMAPE metrics for each forecasting attempt are provided. The results show that the RW model has a wide range of possible manifestations due to the huge variation of the observed data during the fitting phase. A random attempt to forecast the observed data may not necessary provide accurate results as shown in the Tables. For instance, in terms of NRMSE for the case of Russia, our model, that provides a deterministic forecast, achieves NRMSE values equal to 0.99 and 0.066 for the infected and deceased, while the RW attempts are quite far from these results as indicated in Table 8 . Through these tables, we also investigate the forecasting results using two different weight functions: the uniform (Uni.), no weights and the exponential (Exp.) weights. It is shown that NRMSE and SMAPE values using the exponential weights are slightly lower than the ones obtained with the uniform weight functions. This result is due to the more accurate fitting obtained with the exponential weights that guarantees a certain correlation between the forecast observations and the fitted ones. However, if we consider the mean value, which is very representative of the RW model, we can notice that our proposed model outperforms in most of the cases (e.g., infected cases for Italy and infected and deceased cases for Russia). In many cases, the RW model fails in accurately predicting the observed data due to its wide prediction interval even if the RW mean is showing accurate results. On the contrary, the deterministic prediction of our model has efficiently predicted the future pandemic spread. The achieved metrics given in Tables 8 and 9 clarify and corroborate the graphical results and proves that our approach effectively outperforms the RW model in most of the cases. In this section, we compare our proposed model to a simple model with five states only (Susceptible, Exposed, Infected, Recovered, Deaths) as described in [58] and defined in this paper as the SEIRD model. The graphical results shown in Fig. 17 present the forecasting attempt of the observed data using our proposed model in additio to weighted and unweighted versions of the SEIRD model. At the training phase, we can notice that the three models perform, approximately, with the same level of efficiency. On the contrary, at the forecasting phase, we can easily notice that our model outperforms the two versions of the SEIRD model. In fact, our proposed model follows the trend of the observed data during the forecasting period. Additionally, from these figures, we can deduce the effect of the weights on the forecast performance. In fact, the forecast of the number of infected cases proves that the weighting technique helps in improving the forecast of the observed data but it will reduce the efficiency of the fitting. Indeed, if the model is trained to perfectly follow the training data, then the model will over-fit the training data and it will perform badly during the forecast. in Section III-C2, of the proposed model as well as the weighted and unweighted versions of the SEIRD model. The numerical results in Table 10 confirm the graphical results and prove that the outperformance of the proposed model compared to the two investigated versions of the SEIRD model. For instance, in terms of NRMSE, the proposed model achieves 0.032 while the SEIRD model cannot reach below 0.11 for the forecast deaths. Similar observations are noticed for the SMAPE metric and the forecast infected cases as well. In this study, we have proposed a mechanistic model for assessing and forecasting the spread of the COVID-19 pandemic. The pandemic of interest is very novel and even though it has already spread all over the world, scientists still do not have enough knowledge to model it properly, especially due to the lack of data and evidence about its contamination process. In this study, we have made some assumptions to conceive our model, which may impact its performance. First, we have considered a closed system, which does not distinguish between local and non-local cases. Hence, the spread is assumed to be originated from a first local case or a first imported case. At the time of this study, there is no available data that provide accurate and precise statistics about non-local cases and whether they are responsible for the virus spread or not. Therefore, the model only considers the local spread of the virus and discards any cases imported from outside of the country of interest. The study also assumes long-term immune memory. In other words, cases that are recovered are assumed to be immunized against the virus and hence, can not be contaminated again. Indeed, the available data set does not provide detailed information about these scenarios and there is no strong evidence for short-term immune memory. Another future improvement of this study could be the concrete consideration of asymptomatic cases [59] . In this study, we have assumed that every case is symptomatic at least for a short period of time, e.g., 1 day, and hence, each infected case must pass by a quarantine state even for a short period. Statistically, around 30% of patients who tested positive were asymptomatic and may spread the virus. In this study, we did not distinguish between these the asymptomatic and symptomatic cases due to the absence of related data for each of the investigated countries. Therefore, we have supposed that the COVID-19 symptoms with different degrees (light to severe) will appear on all the infected cases and will force them to stay home, i.e., quarantined for at least one day before being totally healed from the diseases. This study is based on an ODE system with non-variant parameters and especially a fixed infection rate β. By considering that an infected person will be placed on quarantine before its final recovery will allow our model to learn that an infected case in state I will not affect people daily unlike typical SIR and SEIR models. This assumption will make our model closer to reality by allowing it to automatically learn that an infected person may or may not infect people daily and will help, when needed, determine accurate basic reproduction number values regardless of the constant parameters. Other works are considering a similar scenario where recovery is only possible through quarantine can be found in [37] . However, a more elaborate and accurate model would consider a time-varying infection rate. In that case, more precise results could be obtained. Finally, our model assumes that the investigated countries have centralized governmental systems. Hence, the human decisions, such as border openings, lock-down, indoor services, and mask mandates, and their impacts on the disease spread have uniform consequences As a result, the prediction of the future evolution of the pandemic within federal countries, such as the USA, can not be accurately performed with our model due to the fact that every state has its own regulations and policies and hence, it is more convenient to investigate each state separately and devise a generalized model for the whole country. All these assumptions have made our forecasting results valid for a short-period of time, e.g., 30 days as shown by our forecasting results. Our study does not claim long-term prediction of the disease spread unless similar human factors and management practices are maintained. In this study, we developed a mechanistic model composed of eight states to characterize the COVID-19 spread in different countries. An ODE system is formulated to mathematically model the interactions between the different states. A curve fitting using real world observed data sets is developed to determine accurate fitting solutions and estimate the parameters of the ODE model. We use the LM algorithm to solve and estimate the ODE system parameters. We have shown that the employed algorithms outperform a reference model, i.e., BFGS. The fitting technique is represented along with the model that aims to endorse its forecasting results by focusing on the last trends of the data, and hence, giving accurate results that are on the verge of these trends. Multiple countries with different spread trends are investigated with the proposed work. We reveal that our model is able to fit the real world data of different countries showing different spread trends, namely, Russia, Brazil, Italy, and USA. Despite of the challenges and the various human factors, i.e., social distancing, quarantine, wearing masks that are not directly characterised by our model and that my affect the spread and its evolution. It is shown that accurate forecasting results are obtained, e.g., Italy and Russia. The proposed model is not only one of the rarest extended mechanistic models that include all the possible eight states that an individual may go through during the illness cycle but also it is the only model that employs a weighted fitting technique to follow the latest trends in addition that the proposed model is trained on three target data sets. However, this work can be endorsed by the employment of variable parameters that change over the time to better fit the variations of the data and hence, the challenges related to the case of USA can be addressed. Moreover, once more detailed data becomes available, it will be very interesting and challenging to extend our model and investigate more complicated and elaborate scenarios, e.g., considering the re-contamination of healed cases and incoming non-local cases. Pathological findings of COVID-19 associated with acute respiratory distress syndrome A bibliometric analysis of corona pandemic in social sciences: A review of influential aspects and conceptual structure Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1 Dynamics of the COVID-19 contagion and mortality: Country factors, social media, and market response evidence from a global panel analysis Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study Emerging 2019 novel coronavirus (2019-nCoV) pneumonia,'' Radiology Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: A descriptive study Situation report of the 1st July A comprehensive review of the COVID-19 pandemic and the role of IoT, drones, AI, blockchain, and 5G in managing its impact The impact of COVID-19 on consumers: Preparing for digital sales Impact of coronavirus pandemic crisis on technologies and cloud computing applications Key transmission parameters of an institutional outbreak during the 1918 influenza pandemic estimated by mathematical modelling,'' Theor Strategies for containing a global influenza pandemic Cost effectiveness of vaccination against pandemic influenza in European countries: Mathematical modelling analysis Modelling an influenza pandemic: A guide for the perplexed Modelling the impact of an influenza pandemic on critical care services in England Early dynamics of transmission and control of COVID-19: A mathematical modelling study The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: A modelling study A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Data-based analysis, modelling and forecasting of the COVID-19 outbreak Modelling transmission and control of the COVID-19 pandemic in Australia Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts Mathematical models to characterize early epidemic growth: A review Using phenomenological models for forecasting the 2015 Ebola challenge Generalized logistic growth modeling of the COVID-19 outbreak: Comparing the dynamics in the 29 provinces in China and in the rest of the world A generalized-growth model to characterize the early ascending phase of infectious disease outbreaks SEIR evolutionary simulation model of the infectious disease emergency Mechanistic modelling of multiple waves in an influenza epidemic or pandemic Mechanistic models of infectious disease and their impact on public health A mathematical model of Ebola virus based on SIR model Infectious disease spread analysis using stochastic differential equations for SIR model A double epidemic model for the SARS propagation Modeling the spread of Ebola Calibration of a SEIR-SEI epidemic model to describe the Zika virus outbreak in Brazil Epidemic analysis of COVID-19 in China by dynamical modeling A study of the COVID-19 impacts on the Canadian population Estimation of SARS-CoV-2 mortality during the early stages of an epidemic: A modeling study in Hubei, China, and six regions in Europe Dynamical modelling and analysis of COVID-19 in India Risk assessment of COVID-19 based on multisource data from a geographical viewpoint COVID-19 future forecasting using supervised machine learning models Modeling the epidemic outbreak and dynamics of COVID-19 in croatia The reproduction number of COVID-19 and its correlation with public health interventions COVID-19 and postinfection immunity: Limited evidence, many remaining questions Modelling strong control measures for epidemic propagation with Networks-A COVID-19 case study Modifying the network-based stochastic SEIR model to account for quarantine SEIR and regression model based COVID-19 outbreak predictions in India Real-time differential epidemic analysis and prediction for COVID-19 pandemic Epidemic model analyzed via particle swarm optimization based homotopy perturbation method SEIR modeling of the Italian epidemic of SARS-CoV-2 using computational swarm intelligence Parameter estimation in ordinary differential equations modeling via particle swarm optimization Nonlinear least squares optimization of constants in symbolic regression NOVIFAST: A fast algorithm for accurate and precise VFA MRI T 1 mapping A brief description of the Levenberg-Marquardt algorithm implemened by Levmar Convergence of Gauss-Newton's method and uniqueness of the solution A new typology design of performance metrics to measure errors in machine learning regression algorithms Monitoring Italian COVID-19 spread by a forced SEIRD model Evolutionary game theory modeling to represent the behavioral dynamics of economic shutdowns and shield immunity in the COVID-19 pandemic He worked as a Research Assistant with the Stevens Institute of Technology His general research interests include intersection of artificial intelligence, finance, crowdsourcing, optimization, graph theory, mathematical modeling He has authored over 120 papers in peer-reviewed journals and conferences. His general research interests include intersection of wireless networks, UAVs, the Internet of Things, intelligent transportation systems, and optimization