key: cord-0749698-kidzfxzg
authors: Farkas, Csaba; Iclanzan, David; Olteán-Péter, Boróka; Vekov, Géza
title: Estimation of parameters for a humidity-dependent compartmental model of the COVID-19 outbreak
date: 2021-02-18
journal: PeerJ
DOI: 10.7717/peerj.10790
sha: 1f4967f4b6066b5ebd4242e2fc1f24d8690ae3fc
doc_id: 749698
cord_uid: kidzfxzg

Building an effective and highly usable epidemiology model presents two main challenges: finding the appropriate, realistic enough model that takes into account complex biological, social and environmental parameters and efficiently estimating the parameter values with which the model can accurately match the available outbreak data, provide useful projections. The reproduction number of the novel coronavirus (SARS-CoV-2) has been found to vary over time, potentially being influenced by a multitude of factors such as varying control strategies, changes in public awareness and reaction or, as a recent study suggests, sensitivity to temperature or humidity changes. To take into consideration these constantly evolving factors, the paper introduces a time dynamic, humidity-dependent SEIR-type extended epidemiological model with range-defined parameters. Using primarily the historical data of the outbreak from Northern and Southern Italy and with the help of stochastic global optimization algorithms, we are able to determine a model parameter estimation that provides a high-quality fit to the data. The time-dependent contact rate showed a quick drop to a value slightly below 2. Applying the model for the COVID-19 outbreak in the northern region of Italy, we obtained parameters that suggest a slower shrinkage of the contact rate to a value slightly above 4. These findings indicate that model fitting and validation, even on a limited amount of available data, can provide useful insights and projections, uncover aspects that upon improvement might help mitigate the disease spreading.

In November 2019 the virus named SARS-CoV-2 appeared in Wuhan, the capital city of Hubei province, a metropolis with 11 million inhabitants. On January 22 2020 an outbreak took place, with a massive infection count, that was later declared as a pandemic by the World Health Organization (WHO).

Physicists, epidemiologists, and mathematicians are trying to model the evolution of the currently raging outbreak, considering various models from basic propagation ones like local optima grows exponentially with system size, and becomes enormous for a model with 25 free parameters, as the one proposed in this paper (see "An Epidemiological Model for the SARS-CoV-2 Outbreak"). A random-restart gradient-based descent algorithm would need a proportional number of restarts, making the method unfeasible.

Therefore, we need a method that can escape local-optima and enables a better exploration and exploitation of the search space. In this article, we propose an iterated Particle Swarm Optimization (Kennedy & Eberhart, 1995) method (IPSO) where in each iteration the method explores only the vicinity of the best parameter estimation found previously. This approach significantly reduces the search space at each step and results in a gradual improvement process, where the method manages to advance to a significantly better local optima in almost every iteration. The gradual improvement seems to be enabled by the so called "big-valley" structure (Reeves, 2007) of the optimization problem, where local optima, good parameter estimations occur close to each other and a global optimum. IPSO can deliver a well-fitting parameter estimation for complex models, with more than 20 parameters, with a few hours of compute-time on a single processor.

Finally, in order to make this study available for everyone, we have developed a tool (https://seir-visualisation.vercel.app/) that visualizes the output of the proposed model. Therefore, visualization is also a significant part of our work. The visualization dashboard's aim is to expand our limited study for uncountable cases with freely adjustable input. We recognize that existing studies become obsolete quite fast due to the fact that conditions and circumstances regarding the pandemic change very drastically (https://www.the-scientist.com/features/why-r0-is-problematic-for-predicting-covid-19spread-67690). A case study of the evolution of the pandemic in Italian regions enables a better understanding of the virus's factors and properties. However, if we cannot easily employ the model to new countries and regions, a significant part of our work's benefits is lost. We reckon that this tool is the adequate way to adapt our study to new processes.

Since Kermack, McKendrick & Walker (1927) introduced their so called SIR mathematical model (Susceptible-Infected-Recovered compartmental model), which is composed of three differential equations, that is,

epidemiology started to make use of differential equation based models. These became one of the most powerful tools to determine infection counts and their evolution over time. Capasso & Serio (1978) generalized the original deterministic SIR model by replacing the linear interaction term with a nonlinear function which allows a more realistic and detailed control over the dependance on the number of infectious person. They point out that this way psychological effects can be taken into account, too. A variant of the SIR model is the following so called SEIR model (for more information on the SEIR models, see Anderson & May (1992) )

is to take into account the latent period ( 1 k ). For several infections there is a significant incubation period during which the individual has been infected but they are not yet infectious. Hence, the host cannot be categorized as susceptible, infectious, or recovered; we need to introduce a new category for these individuals who are infected but not yet infectious. These individuals are referred to as "exposed" (E). These type of models compute the theoretical number of people, sorting the cases into different groups, such as suspected cases, exposed cases, infectious cases, recovered cases. These epidemiological models are often used to model the spread of different diseases, such as the spread of smallpox on Easter Island in 1863 (Koss, 2019) , of Severe Acute Respiratory Syndrome (SARS) (Naheed, Singh & Lucy, 2014) or Influenza A (H1N1) (Liu et al., 2015) .

Further extensions occur when we assume that immunity lasts only for a limited period before the individual is once again susceptible (SIS, SEIRS model). For further reference, the book of Keeling & Rohani (2008) provides a more in-depth introduction to the modeling of infectious diseases. The authors start from the simplest of mathematical models and they show how the inclusion of appropriate elements of biological complexity leads to improved understanding of disease dynamics and control.

In Fisman et al. (2013) , the authors proposes the SIR based IDEA (Incidence Decay and Exponential Adjustment) model, where the incidence count evolution prediction is based on the time-dependent decay of the reproduction number, adjusted by a discount factor, which is proposed based on the public health interventions and social concern. Despite its simplicity the model can be very efficient for small basic reproduction number, R 0 , but in case of constantly changing transmission and contact parameters, and complex public health control strategies it is less accurate.

Due to the urgency and the potentially devastating threat is poses, several studies already attempted to model and to predict the severity of COVID-19. In the following we summarize, without the sake of completeness, the growing literature about the SARS-CoV-2 outbreak. Studies use both basic and more advanced epidemiological models, that also incorporate a higher number of relevant parameters, such as medical case history, social and personal interactions, travel statistics, etc. While it is not possible to capture all aspects, the parameters that heavily define the spreading of a virus, if possible, should be accounted for. A detailed enumeration of such parameters can be found in Moore et al. (2016) .

Commonly used epidemiological models for SARS-CoV-2 outbreak, are the above mentioned modified SIR, and SEIR-type models. Some studies, that simulate the geographical movement of people in the vicinity of the outbreak, sustain the parameter values of the basic epidemiological model. Usually, it is important to approximate the model's parameters correctly, to fit to the measured data in order to increase the accuracy of the model as much as it is possible. Tang et al. (2020a) presented one of the first mathematical models (SEIR-type model) for the novel coronavirus SARS-CoV-2 outbreak. In this article the authors use beside the usual categories (susceptible S, exposed E, infectious with symptoms I, recovered R) several other compartments, which are crucial from the modeling point of view, that is, they uses a separate compartment for the infectious but not yet symptomatic which is denoted by A, and also a different category is the number of hospitalized individuals, which is denoted by H. As a results of measures taken to reduce the spread of the disease the following compartments are also included in their model: quarantined susceptible, denoted by S q , isolated exposed E q and finally isolated infected I q . The model's advantage is that it captures the isolation and its effects, which is one of the most important tools to decrease the spreading of a disease. The methods used to approximate the model parameters were Markov Chain Monte Carlo (MCMC) and Metropolis-Hastings (M-H). The algorithms run more than 70,000 iterations, the prediction has 1,622−12,512 absolute error by the 7-th day depending on what type of contact rate it is used (no reduced contact, reduced contact by 50%, reduced contact by 90%). This compares with a daily average of 231.71-1,787.42, which means that the infectious cases per day prediction's errors can be significant.

The above mentioned article was recently updated by Tang et al. (2020b) . More precisely, the authors considered a time-dependent dynamic model which is built on the principles that different injunctions can significantly contribute to decreasing the contact rate c among persons, and the time period of diagnosis became shorter due to the increasing numbers of SARS-CoV-2 testing kits. This model describes more accurately the real-world situation, however, in the study only three parameters were estimated, the other ones were adopted from previous papers, which did not provide a good fit to data.

In Lin et al. (2020) a model is proposed, that extends the basic SEIR model by modeling the public perception of risk regarding the number of severe and critical cases and deaths D ð Þ, cumulative cases (reported and not reported as well) C ð Þ. The paper considered the transmission rate as a time-dependent function, which incorporates the impact of governmental action. This model also takes into account the number of individuals who leave Wuhan before the lock-down.

Other studies, such as Tang et al. (2020c), Zhou et al. (2020) also use similar epidemiological models. The Tang et al. (2020c) article uses time-dependent functions for contact rate, quarantined rate of exposed individuals, which incorporate prevention and control strategies, resulting in more punctual estimation. In Zhou et al. (2020) , the authors investigate the effect of media, which brings a new equation into the system, named M, which represents the cumulative density of awareness programs driven by the media reports. This study used only a simple linear regression method.

In Wu, Leung & Leung (2020) the epidemics was simulated with help of SEIR-type epidemiological model. They studied the disease in Wuhan and the public health risk of pandemic based on the traveling in and out of the city. In Dolbeault & Turinici (2020) the authors studied a variant of the SEIR models to interpret some qualitative features of the statistics of the COVID-19 epidemic in France. The authors observed that after the lock-down, the social distancing is not enough to control the outbreak. A possible explanation for this issue, is that the lock-down is creating social heterogeneity. In He, Tang & Rong (2020) discrete-time stochastic epidemic model with binomial distributions was used to study the transmission of the disease. In order to calculate the total number of the infected population and the total number of fatality in Carcione et al. (2020) the authors implemented a new SEIR-type model. Taking into account the lack of suitable data and uncertainty of the different parameters they avoided the rigorous case study etc.

In Giordano et al. (2020) it is pointed out that to end the global SARS-CoV-2 pandemic multiple population-wide strategies have to be implemented, including social distancing, testing and contact tracing. They proposed a SEIR-type model as well which discriminates between infected individuals depending on whether they have been diagnosed and on the severity of their symptoms. This distinction is an important issue because the isolated individuals typically do not spread the virus.

In Liu et al. (2020b) authors use two epidemiological models to study the importance of the latency period. In order to validate their results they fit the models' parameters to empirical data. For similar approach see also López & Rodo (2020) . Later, in Liu et al. (2020c) the authors developed a SEIR-type model of COVID-19 epidemics to project the spread of the virus based on the early reported cases.

In Griette, Liu & Magal (2020) the authors developed a mathematical model to predict a bound for the ending date of the COVID-19 epidemics in mainland China with strong quarantine and testing measures for a sufficiently long time.

All these models are useful in highlighting critical factors, opening them up for discussions and analysis, and often narrowing down their distribution. However, without exact parameter estimations and at least a rough fit to available data, it is hard to objectively compare different models.

The study of Boldog et al. (2020) analyzes the situation from the other perspective using a complex system of three different models. The probabilistic system analyzes the effect of each individual imported case from importation until extinction. The first model in the system is a SE 2 I 3 R model which is providing C (cumulative number of cases). This result is used to find importation ratio in a country which is having θ connectivity and C imported cases from China and a baseline local reproductive number R loc . The effective infection scenario is modeled through a Galton-Watson branching process evaluating the branches, thus finding the value of the potential risk of an outbreak in the destination country. The findings show that, based on θ and R 0 values, what is the optimal direction for the given country to minimize the risk of an outbreak. Countries with high θ and low R 0 should lower their reproductive number, else if basic reproductive number is high and connectivity with China is low, then they should work on lowering even more the connection rate through screenings, even bans.

The effectiveness of traveling bans and it's effect on possible outbreaks outside the base area is discussed in several articles. In a very recent article (Chinazzi et al., 2020), the authors analyze the travel restrictions implemented in Wuhan using the GLEAM (Global Epidemy and Mobility Model), optimizing it's performance with Approximate Bayesian Computation. Their method approximates the posterior distribution of the basic reproductive number R 0 , analyzing the risk of importation of cases in other areas than Wuhan. The findings suggest that even when restricting the travel by 90%, as the case in Wuhan, with some latency, other outbreaks may possibly occur. The model also uses a transmission reduction factor r, which is representing the control actions for reducing the transmission rates, which are shown to be more effective than the travel restrictions. Figure 1 presents the evolution of the daily number of reported infectious cases in North and South Italy, for the 24/01/2020-24/06/2020 period. The trend suggests that Italy managed to successfully curb the contact rate and effectively restrict the spreading of the infection. This statement can be reconciled with a recent study Li et al. (2020c) , where the authors' findings indicate that a radical increase in the identification and isolation of currently undocumented infections would be needed to fully control the novel coronavirus SARS-CoV-2.

Using epidemiological models, scientists have established a good understanding of the spread and control of infectious diseases from both mathematical and ecological points of view (Anderson & May, 1992; Kermack, McKendrick & Walker, 1927; Keeling & Rohani, 2008 ) (see also reference therein). As seen in (1) and (2) the classical Susceptible-Infected-Recovered compartmental models assume that disease transmission is β·SI (where β, S and I denotes the transmission rate, the number of susceptible and infectious individuals, respectively). During the study of the cholera epidemic spread in Bari, Capasso & Serio (1978) observed that the incidence rate may increase more slowly as I increases rather than linearly, if we use the following non-linear function f ðS; IÞ ¼ bSI 1þk 1 I ; k 1 > 0. After their study, several nonlinear functions were used in epidemic models, for more details see Adnani, Hattaf & Yousfi (2013) , Kaddar (2009 ), Zhou, Xiao & Li (2007 (see also reference therein).

This observations opened new directions in the epidemiological modeling, that is, several authors used some new sophisticated non-linear functions to describe the modeled phenomena more accurately, see for instance Lahrouz et al. (2012) , Xiao & Ruan (2007) , Cui, Sun & Zhu (2008) . For example, in Cui, Sun & Zhu (2008) , the authors used the non-linear function βe −mI(t) S(t)I(t) (which is a non-monotonic and non-concave function) to incorporate the media impact in their model.

In addition to the use of new non-linear functions, the following question can be formulated: in a very general case what assumptions on the function f should guarantee a global stability result for the proposed model?

A possible answer to this question can be found in Wang, Zhang & Liu (2018) , (see also Xiao, Tang & Wu (2015) ) where the authors considered a very general non-linear f(S, I) with the following assumptions: (f 2 ) @f 1 ðS; IÞ @S . 0, and @f 1 ðS; IÞ @I 0 for all S, I ≥ 0.

With the help of the above assumptions, in Wang, Zhang & Liu (2018) a global stability can be found.

Taking into account the above studies, we propose the following SEIRS-type model, which builds upon the models from Tang et al. (2020a) and Xiao, Tang & Wu (2015) :

where, as in the usual case, denote by S, E, I and R respectively be the proportion of the population susceptible, exposed, infectious, and recovered at time t. Similarly to Tang et al.

(2020a), we also consider the following functions: denote by A(t) the number of pre-symptomatic cases, by H(t) the number of hospitalized cases in time t. Also denote by S q (t) the number of quarantined susceptible, the E q (t) isolated exposed, and I q (t) the number isolated infected compartments in time t. In such a general framework, we have that wðS; I; AÞ ¼ f ðS; I; AÞ þ gðS; I; AÞ þ hðS; I; AÞ

The parameter σ and λ describes the transition rate of exposed individuals to the infected class and the rate at which the quarantined uninfected were released into the wider community, while the parameter ϱ represents the probability of having symptoms among infected individuals. The parameters δ I and δ q denote the transition rate of symptomatic infected and quarantined exposed to the quarantined infected class. The γ I , γ A and γ H represent the recovery rate of symptomatic, asymptomatic and quarantined infected individuals, and finally γ R is the rate at which immunity is lost and recovered individuals move into the pre-symptomatic class (according to a recent NHK-World Japan report and Li et al. (2020a)). We assume that natural birth and natural death rates are equal.

From their definitions, it follows that the models presented in Tang et al. (2020a) use the following functions:

The motivation of such a choice is the following (Castillo-Chavez, Castillo-Garsow & Yakubu (2003); Xiao, Tang & Wu (2015)): individuals move from quarantined cases with 1 − q proportion to S q and with q proportion to E q . If the transmission probability is β and the contact rate is c, then, the infected quarantined individuals move to E q at rate of βcq and uninfected quarantined individuals move to S q at 1 À b ð Þcq rate. In case of not quarantined infected individuals, they are going to move to E at a rate of bc 1 À q ð Þ. The parameter θ represents the relative transmission probability of pre-symptomatic individuals to infected individuals.

In the hope of capturing more accurately the effect of factors that influences the outbreak progression, we choose the following functions in (3):

Such choice of functions w, f, g, h, incorporates biological, social or environmental processes that could account for temporal changes in transmission rate, in contact rate and in quarantined rate of exposed individuals.

When an epidemiological outbreak occurs, many preemptive actions can be taken to mitigate the spreading. Once people become informed, they can change their behavior, for instance, working from home, practicing social distancing and take actions such as often hand washing, wearing protective apparel, disinfecting etc., all of them contributing to the prevention of the spread. When the media interacts with the susceptible population, it starts influencing them to take appropriate measures to minimize the chances of getting infected. This media influence is initially low and increases as the infection increases. This observation suggests the following contact rate function:

where b < 1 and c 0 denotes the initial contact rate, while c a denotes the minimum contact rate under the current control strategies. Thus, c(0) = c 0 , and lim t!1 c(t) = c a Control measures taken in case of possible outbreak include local and international traveling bans, isolation at residence, quarantine. To account for such countermeasures, we propose an increasing function q(t) for the quarantined rate of exposed individuals, defined by

where at the initial quarantined rate is q(0) = q 0 , and the maximum quarantined rate under the current control strategies is lim t!1 q(t) = q 1 . Finally, for the transmission rate we propose the following function, which is defined by

where β 0 denote the baseline transmission rate, AH(t) denotes absolute humidity, ξ is the amplitude of the response of transmission rate to absolute humidity changes (recent studies indicates temperature and humidity play a significant role in influenza transmission (Lowen et al., 2007; Shaman et al., 2010; He et al., 2013; Wang et al., 2020) ; for COVID-19 see (2008)). Note that, for the absolute humidity (AH) we use the so called Clausius-Clapeyron equation (for more information see Iribarne & Godson (1973) ), that is, where T(t) denotes the temperature in time t, while RH(t) denotes the relative humidity in percent (0 − 100) Using the method of next generation matrix (Diekmann, Heesterbeek & Metz, 1990; Van den Driessche & Watmough, 2002; Tang et al., 2020a) one can define the effective daily reproduction number as

Note that, R(t) is the number of new infections by a single infected individual during his infectious period per day. Thus, form (7) one has that

From the definition of the function c(t), it follows that

In a similar way, one has that q 0 qðtÞ lim t!1 qðtÞ ¼ q 1 :

Combining the above outcomes, it follows that c a ð1 À q 1 ÞM RðtÞ bðtÞ c 0 ð1 À q 0 ÞM; or c a ð1 À q 1 ÞMbðtÞ RðtÞ c 0 ð1 À q 0 ÞMbðtÞ:

The above inequality together with the definition of the function β shows, that

AHðtÞ :

In order to get an upper bound for R(t), we consider the function f : ½À20; 50 ! R defined by where T max = max t T(t) and T min = min t T(t). Thus, combining the above outcomes, one can see that

where RH max = max t RH(t), RH min = min t RH(t) and To understand how to control and liquidate infectious diseases is one of the main goals of mathematical epidemiology. We know that a disease can cause an epidemic if and only if the basic reproduction number (the expected number of secondary cases caused by a primary case in a fully susceptible population, denote by R) is greater than 1, Van den Driessche & Watmough (2002) . In order to fight the currently ragging outbreak, we need to reduce R to less than 1. According to He et al. (2013) and Ma & Ma (2006) , stability of the system can be connected with the basic reproduction number R of the time-average systems (replacing the time-varying parameters with their long-term time averages), that is, R ¼b Áĉ Á ð1 ÀqÞ q c I þ d I þ hð1 À qÞ c A S 0 ;

whereĝ represents the time average of the function g, that is,

By a simple calculation it yields thatĉ ¼ c a ; andq ¼ q 1 (Fig. 11) . Thus, R ¼b Á c a Á ð1 À q 1 ÞM:

The correlation provided by Wang et al. (2020) refers to this averaged reproduction number, and, if R is numerically solvable, we are confident, that our model will show the same sensitivity to temperature changes as the measurements in the before-mentioned study. However, we can make the following estimate for R: 

Similarly as above,

In the rest of this section we deal with the initial values of the proposed model. Accordingly, all the initial values of the functions and parameters of the models are detailed in Tables 1 and 2 . Since, the first case in Europe was confirmed in Bordeaux on 24 January, we use the following initial conditions E = I = A = S q = E q = H = R = 0, and S = 28,328,178 for Northern Italy and S = 17,757,000 for Southern Italy. For each parameter we provide the acceptable value ranges, and where available, we indicate the source of these.

From all these parameters, only S 0 and λ are kept fixed, all the others are approximated, starting from their initial values.

We obtained the data regarding the cumulative number of laboratory-confirmed COVID-19 cases, cumulative number of recoveries, cumulative number of deaths and cumulative number of hospitalized in Northern and Southern Italy from the dataset provided by Italian Department of Civil Protection (http://www.protezionecivile.it/web/ guest/department). The database is continuously updated and made freely available trough a GitHub repository (https://github.com/pcm-dpc/COVID-19/blob/master/dati-regioni/ dpc-covid19-ita-regioni.csv).

We accounted for Northern Italy the following provinces: Aosta Valley, Emilia-Romagna, Friuli-Venezia Giulia, Liguria, Lombardy, Piedmont Veneto, Trentino-Alto Adige and Tuscany. Southern Italy's provinces are Abruzzo, Apulia, Basilicata, Calabria, Campania, Molise, Sicily, Sardinia.

Temperature and humidity data were retrieved from AccuWeather Inc. (https://www. accuweather.com/) monthly view/daily data and www.worldweatheronline.com's API, taking each region center's or city's airport data, when available. In case of Northern Italy and Southern Italy the above listed regions' values were averaged, this being used as daily temperature and humidity.

In mathematical biology, modeling complex phenomena by ordinary differential equations pose among others, two main challenges: (i) finding the appropriate model with the right balance of complexity and explanatory power and (ii) estimating correctly the parameters of the selected model. Model parameter estimation often proves difficult for large models, due the multiomdality and combinatorial explosion, induced by the many and non-linearly linked parameters.

More versatile optimization methods, like evolutionary algorithms and other metaheuristics, are usually more successful but also computationally much more expensive as they search through the entire combinatorial parameter space.

Genetic Algorithms (GA) have been successfully utilized to optimize the parameters of epidemiological models for the Severe Acute Respiratory Syndrome (SARS) (Yan & Zou, 2008 q 0 Initial quarantined rate of exposed individuals (5.631e−10 to 5.631e−2) σ Transition rate of exposed individuals to the infected class (1/15-1/3) WHO λ

Rate at which the quarantined uninfects were released into the wider community Coefficient for contact rate function (65-1) -might be better suited to fit epidemiological models to data, outperforming evolutionary algorithms. The PSO is a population based meta-heuristic where the potential solutions are called "particles" and the population of solutions forms a "swarm". PSO roughly mimics the social behavior of birds flocking or fish schooling for food, hoping to capture the swarm intelligence emerging from the cooperation between individuals (Kennedy & Eberhart, 1995; Clerc, 2010) . The method has a more stable convergence characteristics than other stochastic methods and it is easily parallelizable. Because the method does not rely on gradients, it is well-suited to optimizing non-linear or discontinuous systems (Jalilzadeh et al., 2009) . This property makes it amenable for ODE model parameter estimation, where small parameter variations can result in significantly different prediction errors.

Solutions in PSO are encoded as n-tuples of real numbers. At the start of the method each particle is randomly placed in the n dimensional space. The methods' core concept is that the particles are flown through R in discrete time steps, and at each iteration the particles are responding to the quality factors of the best solution, being accelerated towards them. PSO keeps track of the overall best value, solution seen so far and the actual best value, solution present in the swarm. At each iteration, acceleration toward these solutions is weighted by random terms, in order to enable a better exploration of the solution space.

As PSO is a Hessian-free optimization method, that does not require the derivative of the error function, we choose to satisfy the L 1 norm optimality criterion, minimizing the sum of absolute errors.

Formally, we wish to:

argmin P X DÀ1 t¼0 jjIðtÞ À ð CðtÞ À RðtÞ À DðtÞÞjj (9) where P is the parameter set of the model we want to estimate (some parameter values might be already known, have fixed values), D is the number of days for which we have data available, I(t) is the predicted number of infected people by our model on day t, C, R; and D; are the reported numbers (data) for the cumulative confirmed cases, the number of cumulative recoveries, respectively the number of cumulative deaths by As the model fitting is based on measuring the error on I(t), we can estimate the parameters present on the highlighted, black paths from Fig. 2 , the directed edges before I. All outgoing parameters fromIcan only be aggregately approximated (sum of δ I , γ I ) and γ R , γ H cannot be assessed. Therefore our analysis models bðtÞ; cðtÞ; qðtÞ; r; q (for more details of the parameters see "An Epidemiological Model for the SARS-CoV-2 Outbreak" and Table 2 ) and their effects on I(t). Since according to official numbers death cases are not divided into compartments from where the cases come (H or I), in this work we neglect death rate, we assume a = 0.

The placement of local-optima in real-world optimization problems is usually not random, high quality solutions tend to be clustered close together (Reeves, 2007; Ochoa et al., 2011) . Iterated local search Lourenço, Martin & Stützle (2019) methods exploit this structured distribution of local optima, advancing from one local optima to a better one with the help of random perturbations. Therefore, we propose an iterated PSO method (IPSO) where the particles only explore the vicinity of an established good solution, greatly reducing the search space. Once the exploration around one solution has finished and a better local optima was identified, the method moves to explore the proximity of that solution.

Reduction of the solution space around one solution is achieved by 1. Restricting the PSO to a search on ½0 . . . 1 n .

2. Using the particles, not as direct solution, but as scaling factors used to remap the solution whom neighborhood we are currently exploring, as presented in listing 1.

To minimize the compute time for the parameter estimation, the number of particles and number of generations per one PSO run is gradually increased each iteration. In this way, more compute resources are allocated in later phases, when the method has to improve on already good parameter estimations.

We estimated all relevant parameters of our model (see Fig. 2 ) based on the epidemiological, weather and humidity data for the Northern and Southern Italy for the interval 24/01/2020-23/05/2020 inclusive, totaling 120 days. The 99 day period of 24/05/2020-31/08/2020 inclusive was reserved for validation. The initial conditions for the functions from Table 1 were chosen due to the fact that on 24 January, the first case in Europe was confirmed in Bordeaux (https://www.bbc.com/ news/world-europe-52526554). Two more cases were confirmed in Paris by the end of the day, all of them originated from China. Until the first case in Northern Italy which was in 21/02/2020 (https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Italy), we chose every initial value as 0.

For both parameter set estimations, the IPSO was run for 20 iterations. For each iteration i, the particle swarm size was set to 100 Ã i and the generation number to 5 Ã i.

The dashboard is a serverless webpage, developed in Svelte (https://svelte.dev), that includes three main components. The first component is the data entry part, the second part concerns the data visualization, and lastly, the third one is a differential equation solver.

Data entry is further divided in two two major parts. Users can entry specific data by selecting values from ranges. Those parameters' values which are not time dependent can be modified by range inputs (see Fig. 3 ). Parameters that vary over time, such as temperature, humidity, etc., are defined with the help of interactive charts, where users can place and move around so called "control points" (see Fig. 4 ). The points are linked using Cubic Spline Interpolation, which fits the given data by a piecewise polynomial functions avoiding overfitting Xie (1984) . The resulting spline interpolation defines the parameters' values over time.

Algorithm 1 Remapping of the particles to parameter estimations. Complementing the easy manual parameter definition, the visualization dashboard also provides a way to meticulously fine-tune the input data. The Parameter import/export section allows the user to import initial values and parameter data in the JSON format, and one can also export the current settings in the sam format. After an export, the parameters in the JSON can be modified at the desired precision, and then the user can proceed with the re-import of the JSON file.

The data visualization is realized using the Chart.js open-source Javascript library (see Fig. 5 ). The user can choose which functions to visualize, making the diagram less crowded and more understandable.

The dashboard implements a "reactive" user interface, meaning that the user is immediately aware of the effect of each change, modification in the inputs. Therefore, in order to satisfy the real-time update constraints, it was paramount to be able to compute the model's outputs very efficiently. The classic Runge-Kutta IV (RK4) method, implemented in the Runge-Kutta library (https://www.npmjs.com/package/runge-kutta), suited our needs.

The model parameter estimations are presented in Tables 3 and 4. Table 4 contains the approximated parameter values. As depicted in Fig. 6 , the estimated parameters provide a very good fit to the reported daily infectious cases data, both on the training and the beginning of the validation period.

The Root Mean Square Error, defined as RMSE d ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 D P D t¼1 ðIðtÞ À IðtÞÞ 2 q with D denoting the number of days in the prediction range, ī(t) and I(t) being the actual and modeled active cases on the t-th day, is presented in Fig. 7 for both the training and validation period.

In this section we present some aspects of the model parameters.

a) Similarly to the results reported for Italy in Kozio, Stanisawski & Bialic (2020) , the modeled transmission, depicted in Fig. 6 , is higher than the actual one for the first 5 weeks, while in the second half of the validation period, the actual transmission is significantly higher than the modeled one. A possible explanation put forward in Kozio, Stanisawski & Bialic (2020) is that these discrepancies might arise from the absence or the relatively low daily tests in the beginning of the outbreak, respectively from the result of gradually easing the restrictions and the onset of the second wave. Kozio, Stanisawski & Bialic (2020) reported an RMSE 120 of 3,283. As seen in Fig. 7 , for both regions the RMSE stays at the same level, or even slightly decreases, in the first 60 days of the validation period. Towards the end of the validation period the RMSE quickly rises for both regions, as the model is not able to predict the onset of the second-wave. c) The proposed model assumes that the contact rate (Eq. (6)) is a monotonically decreasing function, therefore it can not account for multiple waves, that might occur after a relaxing of the social distancing measures. Several studies research the modeling possibilities of multi-wave epidemic outbreaks. The proposed models try to fit waves to the measured data, extrapolating potential outcomes and repeated outbreaks, nevertheless we can affirm, that they are not being able to predict future subepidemics, only fitting to past or ongoing waves and extrapolating them. A mechanistical model Camacho & Cazelles (2013) tries to benefit from the knowledge about immune response of infected individuals, specifically that there exists two types of immune responses, providing short term and long term immunity, thus opening a possibility for a reinfection. Another model evaluates regions based on a classification of the population for being susceptible in the first wave, or later waves. Findings of the research are, that geolocalization and traveling of individuals combined with delayed immune reactions can provide the basis of multiple waves Kaxiras & Neofotistos (2020) .

Our model can't handle multi-wave pandemics, indeed. But the aim of the proposed model is rather to study an ongoing epidemic/subepidemic, providing a useful parameter array which helps identifying the possible directions of public reaction aiming the change of the given epidemic curve. Thus the following outbreaks, seasonal or induced by ineffective ending of the current outbreak (based on public health status, social migration, implemented public actions) are parametrized by the final status and outcomes of the previous wave. But, as the mentioned studies, we are not yet willing and ready to predict/extrapolate multiple waves ahead, with sufficient precision, based on the current epidemiological status. Though, for a future research direction we are considering a more complex model, which eventually forecasts or combines multi-wave situations.

d) In the case of Northern Italy, the initial contact rate is slightly higher than in case of South Italy. It is possible that in that period tourism in Northern Italy was more prevalent than in South Italy due to winter activities. Looking at the approximated contact rate parameter and evolution of c(t) if Fig. 8 we can observe a quick drop. Italy officials have implemented strict social movement restrictions, which lead in time to a major decrease of the effective contact rate in the city. Home confinement led to even a 90% reduction of the contact rate. By elevating the contact rate to be a function, which decreases under the governmental actions' influence, we obtain a shrinkage that reaches 50% in approximately 90 days and tends toward the minimum contact rate (c a ), which is 92.29% smaller than the initial contact rate. Our model's results seem to confirm the effectiveness of the measures implemented in Italy. As in both regions the contact rate converges to minimum contact rate (c a ), we can confirm that preventative measures were successful. In other scenario, many asymptomatic or unknown infections are not considered by the official case counting, as they cannot be taken into evidence, see Liu et al. (2020d) . Therefore, even when in reality the contact rate has dropped more, the percentage of unaccounted cases that go critical and need hospitalization, keep I(t) inflated.

As the model has no awareness of unaccounted cases, it compensates the fast growth in I(t) by maintaining a high contact rate. Most probably, both factors play a role in the elevated c a and b values.

e) A pioneer step in epidemiological studies was the inclusion of two of the main environmental parameters in the process, temperature and humidity, as basic natural influencing factor for virus spreading (Lowen et al., 2007; He et al., 2013) . Experiments conducted with the influenza virus on animals showed a connection between the virus transmission rate, temperature and humidity: decreasing temperature and increasing humidity leads to an increased virus transmission rate (Lowen et al., 2007) . Lowen et al. (2007) included this theory in their SIR model using the temperature as a relaxing factor in the transmission probability time-dependent function. The median value for ξ in He et al. (2013) is 0.057. However, the present paper suggests that warmer temperatures slow COVID-19 transmission, but not significantly, as ξ values suggests both in case of Northern Italy (0.0044) and Southern Italy (0.0023). This affirmation is founded by studies such as Sehra et al. (2020), Gupta, Raghuwanshi & Chanda (2020) . The introduction of time-dependent transmission probability β(t) increased our fit percentage to the empirical data and also proves that weather conditions do not have a significant effect to the spread of COVID-19 disease (the obtained graph of function β can be seen on Fig. 9 ). From the definition of the daily reproduction number (the obtained function can be seen on Fig. 10 ), we see that

We ran the approximation process for the given time period's minimum, maximum, and average daily temperature and humidity respectively, which also suggests that temperature does not have a significant affect. However, as it takes its highest values when we use the minimal temperatures, bounds were set for daily reproduction number. Note that, our proposed parameter estimation method does not approximate the values of the parameter δ I , γ I . However, we can estimate the sum of δ I , γ I (see Fig. 2 ), and for the value γ A we used the value calculated in Tang et al. (2020a).As a final remark on R(t), the definition of the daily reproduction number for the model incorporating the disease induced mortality rate would be R a ðtÞ ¼ qbðtÞcðtÞð1 À qðtÞÞ d I þ a þ c I þ hð1 À qÞbðtÞcðtÞð2 À qðtÞÞ c A S 0 :

Since a > 0, it is obvious that R a (t) ≤ R(t). The approximation process for the first wave (till 2 of July 2020) gives that R N:Italy 0:2369; R S:Italy 0:1323:

f) There are still many unknowns and uncertainty regarding SARS-CoV-2, including one of the key epidemiological parameters, the incubation period and its distribution. Correct assessment of the incubation period helps determine the appropriate duration of quarantine, and indicates how far contact tracing efforts of suspected individuals should go. In absence of data on the SARS-CoV-2 incubation period, initial parameter estimations for COVID-19 epidemiological models Tang et al. (2020a Tang et al. ( , 2020b Tang et al. ( , 2020c ) have assumed Field reports and later studies suggested shorter incubation periods, of 5.8 days (Backer, Klinkenberg & Wallinga, 2020) 5.2 days (Li et al., 2020b) , 5 days , 4 days (Read et al., 2020) and even 3 days (Guan et al., 2020; Lin et al., 2020) . Currently, the WHO (https://www.who.int/news-room/q-a-detail/q-a-coronaviruses) states that "most estimates of th2e incubation period for SARS-CoV-2 range from 1-14 A B Figure 10 The graph of the function R(t), Northern Italy (A), South Italy (B). Full-size  DOI: 10.7717/peerj.10790/ fig-10 A B Figure 11 The graph of the function q(t), Northern (A) and Southern Italy (B). Full-size  DOI: 10.7717/peerj.10790/ fig-11 days, most commonly around 5 days." The estimated parameter for the transition rate of exposed individuals to the infected class for the proposed model is 0.1706, resulting in a 5.86 days mean incubation period. This value almost matches the estimated value of 5.8 (Backer, Klinkenberg & Wallinga, 2020) gathered from the travelers with confirmed 2019-nCoV infection in the Wuhan, China region in the early stage of the outbreak.

The proposed parameter estimation procedure measures error on I(t), therefore it can approximate correctly only the parameters that directly influence it, namely the ones related to contact rate, transmission probability, incubation period, probability of having symptoms among infected individuals. With more detailed data, with a breakdown of number of isolated individuals, number of deceased individuals per compartments, recovered individuals (not only from the hospitalized people) the error function could be extended to minimize errors also on these values, provide an even better insight into the epidemiological dynamics described by our proposed model.

According to the WHO reports (www.who.int), time dependent mathematical models can help understand and predict the transmission of the outbreak, support the impact evaluation of the existing and future public health interventions, and provide a better grasp on the severity of ongoing outbreaks. Factors that have major influence the spread of SARS-CoV-2 may evolve and change over time. To account for such changes, in this paper we propose a time dynamic, temperature dependent SEIR-type extended epidemiological model.

The paper also proposes an Iterated Particle Swarm Optimization (IPSO) method, that efficiently explores only the vicinity of already established local optima. The method searches iteratively around better and better local optima and it is able to accurately estimate all the required parameters of the model in a reasonable running-time.

By matching the historical data closely, the model can provide more accurate near-projection of the trend and peak time estimations. Analyzing and comparing the estimated contact rate and the estimated transmission probability from different outbreaks might also advance the understanding of the unfolding trends.

Stability analysis of a stochastic SIR epidemic model with specific nonlinear incidence rate

Particle swarm optimization

The impact of media on the control of infectious diseases

On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations

Heterogeneous social interactions and the covid-19 lockdown outcome in a multi-group seir model

An IDEA for short term outbreak projection: nearcasting using the basic reproduction number

Modelling the covid-19 epidemic and implementation of population-wide interventions in italy

Estimating the last day for covid-19 outbreak in mainland China

Clinical characteristics of 2019 novel coronavirus infection in China

Effect of weather on covid-19 spread in the us: a prediction model for India in 2020

Inferring the causes of the three waves of the 1918 influenza pandemic in england and wales

A discrete stochastic model of the covid-19 outbreak: forecast and control

Atmospheric thermodynamics

Output feedback upfc controller design by using quantum particle swarm optimization

On the dynamics of a delayed SIR epidemic model with a modified saturated incidence rate

Multiple epidemic wave model of the covid-19 pandemic: modeling study

Modeling infectious diseases in humans and animals

Particle swarm optimization

A contribution to the mathematical theory of epidemics

SIR models: differential equations that support the common good

Fractional-order sir epidemic model for transmission prediction of covid-19 disease

Complete global stability for an SIRS epidemic model with generalized non-linear incidence and vaccination

Coronavirus infections and immune responses

Feng Z. 2020b. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2)

A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action

The effectiveness of age-specific isolation policies on epidemics of influenza A (H1N1) in a large city in central South China

Modeling the situation of covid-19 and effects of different containment strategies in china with dynamic differential equations and parameters estimation

A covid-19 epidemic model with latency period

A model to predict covid-19 epidemics with applications to South Korea

Understanding unreported cases in the covid-19 epidemic outbreak in wuhan, china, and the importance of major public health interventions

A modified seir model to predict the covid-19 outbreak in spain and italy: simulating control scenarios and multi-scale epidemics

Iterated local search: framework and applications

Influenza virus transmission is dependent on relative humidity and temperature

The role of absolute humidity on transmission rates of the covid-19 outbreak

Epidemic threshold conditions for seasonally forced seir models

Effects of temperature variation and humidity on the death of covid-19 in Wuhan

Deep learning via hessian-free optimization

Identifying future disease hot spots: infectious disease vulnerability index

Numerical study of SARS epidemic model with the inclusion of diffusion in the system

A survey of truncated-newton methods

A simplex method for function minimization

Compartmentalized mathematical model to predict future number of active cases and deaths of COVID-19. Research on Biomedical Engineering Epub ahead of print

Numerical optimization

High temperature and high humidity reduce the transmission of covid-19. Epub ahead of print 10

An SEIR epidemic model with relapse and general nonlinear incidence rate with application to media impact

School closure and mitigation of pandemic (H1N1) 2009, Hong Kong

Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in Wuhan, China: a modelling study

Global analysis of an epidemic model with nonmonotone incidence rate

Media impact switching surface during an infectious disease outbreak

Quadratic and cubic spline interpolation

Optimal and sub-optimal quarantine and isolation control in sars epidemics

Optimal policies for control of the novel coronavirus (covid-19)

Effects of media reporting on mitigating spread of covid-19 in the early phase of the outbreak

Bifurcations of an epidemic model with non-monotonic incidence rate of saturated mass action

We would like to thank the reviewers for their thoughtful comments and efforts towards improving our manuscript.

Funding Csaba Farkas has been supported by the Sapientia Foundation Institute for Scientific Research, Romania, Project No. 17/11.06.2019. Boróka Olteán-Péter has been supported by the Sapientia Hungariae Foundation-Collegium Talentum project. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The following grant information was disclosed by the authors: Sapientia Foundation Institute for Scientific Research, Romania: 17/11.06.2019. Sapientia Hungariae Foundation.

The authors declare that they have no competing interests.

Csaba Farkas conceived and designed the experiments, performed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft. David Iclanzan conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. Boróka Olteán-Péter performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. Géza Vekov analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

The following information was supplied regarding data availability:Initial condition for the proposed model is available at GitHub: https://github.com/ pcm-dpc/COVID-19.The dataset provided byJohns Hopkins University Center is also available at GitHub: https://github.com/CSSEGISandData/COVID-19.PSO is available at Distributed Evolutionary Algorithms in Python (DEAP): https://deap.readthedocs.io/en/master/examples/pso_basic.html.