key: cord-0775747-adynorbt authors: Firmino, Paulo Renato Alves; de Sales, Jair Paulino; Gonçalves Júnior, Jucier; da Silva, Taciana Araújo title: A non-central beta model to forecast and evaluate pandemics time series date: 2020-08-23 journal: Chaos Solitons Fractals DOI: 10.1016/j.chaos.2020.110211 sha: 1c35f6ce3f00db0e8249c129124d5ba86588127a doc_id: 775747 cord_uid: adynorbt Government, researchers, and health professionals have been challenged to model, forecast, and evaluate pandemics time series (e.g. new coronavirus SARS-CoV-2, COVID-19). The main difficulty is the level of novelty imposed by these phenomena. Information from previous epidemics is only partially relevant. Further, the spread is local-dependent, reflecting a number of social, political, economic, and environmental dynamic factors. The present paper aims to provide a relatively simple way to model, forecast, and evaluate the time incidence of a pandemic. The proposed framework makes use of the non-central beta (NCB) probability density function. Specifically, a probabilistic optimisation algorithm searches for the best NCB model of the pandemic, according to the mean square error metric. The resulting model allows one to infer, among others, the general peak date, the ending date, and the total number of cases as well as to compare the level of difficult imposed by the pandemic among territories. Case studies involving COVID-19 incidence time series from countries around the world suggest the usefulness of the proposed framework in comparison with some of the main epidemic models from the literature (e.g. SIR, SIS, SEIR) and established time series formalisms (e.g. exponential smoothing - ETS, autoregressive integrated moving average - ARIMA). Pandemics have been one of the main threats to the sustainable development of territories. Most recently, at December 2019, a number of coronavirus-infected pneumonia (NCIP) cases were recorded in a large metropolitan city in China, Wuhan, caused by infection with a novel coronavirus, SARS-CoV-2 [1] , named COVID-19 hereafter. Accelerated by human migration, exported cases have been reported in several regions of the world, including Europe, Asia, America, and Oceania [2] . As of 17 August 2020, it was estimated a total of 21,549,706 confirmed cases of COVID-19, including 767,158 deaths [3] . To mathematically envelop and predict pandemics spread is an arduous task and obviously offer no guarantee of success. For instance, pandemic incidence time series are sensitive to a number of environmental, social, political, technological, and economic variables. Thus, any variation of the underlying scope might alter infectious-removed'), SIS ('susceptible-infectious-susceptible'), and so on. Sustained oscillations in differential equations-based SIR and related models are frequently described using delay differential equations, periodic forcing terms involving sine or cosine functions, and/or age structure [10] . However, fitting epidemiological models to real data possess significant challenges because most epidemics do not conform to the assumptions underlying the basic models formulation. In particular, populations are rarely spatially homogeneous and disease transmission varies with age and other individual-level factors [11] . Referring COVID-19 pandemic, SEIR variants [12, 13] to predict its chronological incidence according to the preceding time series have been introduced. In fact, time series forecasting methods relying on historical surveillance data have been proposed in order to detect abnormal behaviour of infectious diseases [14] . Exponential smoothing [15] [16] [17] [18] , Autoregressive integrated moving average (ARIMA) [19] [20] [21] , and Decomposition methods [14, 22] are commonly considered. Multiplicative SARIMA models for quarterly measles infections [23] are also considered. In this line, some authors [24] have compared SARIMA and self-Excited Threshold Autoregressive (SE-TAR) models when forecasting monthly pneumonia cases. Furthermore, combined approaches [25, 26] and machine learning techniques [27] [28] [29] are also applied in health time series forecasting exercises. In the context of coronavirus, statistical models for MERS-CoV [4, 30] and SARS [31] have been attractive. Anyway, regardless of the time series formalism taken into account, one can highlight two types of frameworks: the short-term and middle-long-term predictors. Based on the former, Roosa et al. [32] have modelled the number of cases of a disease in Chinese provinces to predict 5, 10, and 15 days ahead via generalised logistic growth, Richards growth, and sub-epidemic wave models, for instance. In addition, it was also introduced an improved adaptive neuro-fuzzy inference model (ANFIS) for the incidence of COVID-19 in China for a 10-day horizon [33] . ARIMA models were in turn proposed to predict COVID-19 daily incidence [34] . Appending COVID-19 forecasting models, based on Convolutional Neural Network, were then proposed [35] , for the next day. It must be emphasised that short-term forecasts are useful to assist operational management. On the other hand, middle-long-term frameworks, though more challenging, are paramount to plan and control pandemic intervention processes. The present paper brings an alternative way to study pandemics in this challenging scenario. A middle-long-term forecasting approach based on the non-central beta (NCB) probability distribution is designed in order to deal with disease time incidence. The framework considers that, regardless of the real trajectory of the pandemic time incidence, at least one cicle of three phases is expected: (i) exponential increasing, (ii) plateau, and then (iii) decreasing. The shape of the proposed model is optimally adjusted to the available incidence time series and then extrapolated to infer the number of daily cases in a time horizon determined by the analyst. Thus, a number of relevant statistics (e.g. the global peak date, the end date, the total number of infected cases, the velocity of occurrence of new cases during Phases (i) and (iii)) can be inferred. The rest of the paper is divided as follows. Section 2 introduces the insights underlying the proposed approach, highlighting the aforementioned shape phases in the COVID-19 daily incidence time series of thirteen countries around the word. Then, in Section 3 , the proposed method is presented in details, with emphasis to the near-optimal fit of the NCB-based model to the available time trajectory of the pandemic daily incidence. Section 4 exhibits the promising performance of the NCB-based models in comparison with alternatives from the literature (SIR, SEIR, SIS, exponential smoothing -ETS, and ARIMA) when modelling and forecasting the COVID-19 daily incidence time series of thirteen coun-tries. Cases involving multiple peaks are also addressed. This section also presents a comparison of the challenge imposed by the COVID-19 in the countries taken into account, via NCB framework. Section 5 brings some concluding remarks. As previously mentioned, regardless of the level of novelty of a given pandemic, it is expected that the shape of the time incidence involves three consecutive phases: (i) exponential increasing, (ii) plateau, and then (iii) decreasing. This dynamic cycle can involve a single or multiple peaks, reflecting viruses mutation, variations on public health intervention policies, technological improvements, and so on (e.g. Spanish flu [36] , H1N1 [37] , Zika fever [38] , Dengue and Chikungunya [38] and COVID-19 [39] ). Regarding COVID-19, Fig. 1 sketches the daily incidence in countries around the world until 2020-06-26. The time series are provided by the Johns Hopkins University [40] . One can see that Argentina, Brasil, and India are in Phase (i). In turn, Iran, and States of America-US seem to be transiting between Phases (i) and (iii), involving more than one relevant peak. The remaining countries seem to be in Phase (iii), surmounting the COVID-19. Specially for the cases in Phase (i), the prediction of the periods in which the virus incidence time series will achieve Phases (ii) and (iii) is a hard task. Common time series formalisms, like ARIMA, ETS, support vector regression, and artificial neural networks [41] are not usually able to predict the plateau-decreasing phases when trained by data sets belonging to Phase (i), only. Further, these approaches suffer when performing relatively middlelong-step-ahead predictions in the light of small sized training data sets. The present paper aims to address these issues by adapting probability density functions that can shape one cycle of Phases (i), (ii), and (iii) even when only data from Phase (i) are available. The one-cycle exponential increasing-plateau-decreasing shape of pandemics evolution in some territories is also usual to a number of probability density functions (PDFs). In fact, a PDF, say f ( x ), can not assume negative values and must integrate one [42] , leading to exponential decreasing or bounded tails. The NCB family provides a flexible distribution and have wide applications in statistical analysis [43] . It can fit symmetric as well as asymmetric behaviours, also allowing exponential increasing and decreasing shape around a maximum point. Mathematically, the NCB PDF is given by (Nadarajah [44] ) in which x ∈ [0, 1] is an instance of the NCB-distributed random variable taken into account (say X ), α ( > 0) and β ( > 0) are shape parameters, λ ( ≥ 0) is the non-centrality parameter, and (y ) = ∞ 0 z y −1 e −z dz is known as gamma function [42] . When λ = 0 , the NCB distribution equals classical beta; otherwise the greater the λ the greater the shift of the mode of X to the right. Thus skewness depends on the combination between λ and the pair ( α, β). For instance, taking λ = 0 , α = β reflects symmetric distributions, while α > β ( α < β) implies in negative (positive) skew. In the case of pandemics, negative (positive) skew reflects the cases in which the time during probability infection increasing (i.e. Phase (i)) is longer (faster) than during probability infection decreasing (Phase (iii)). In other words, negative (positive) skew implies in a Phase (i) longer (faster) than Phase (iii). Therefore, it might be preferred a negative skew, allowing more time to plan and review intervention policies during Phase (i), and presenting dbeta function of R [45] . Therefore, the present paper aims to adapt Eq (1) in order to fit the behaviour of an one-cycle pandemic incidence time series. Thus, situations involving multiple local peaks are considered as transient periods, between Phases (i) and (ii) or between Phases (ii) and (iii). It is claimed that though limiting such a reasoning maintains the simplicity of the framework. However, differently from the usual way of adjusting PDFs to a given frequency distribution, it is considered here to fit the time trajectory of the pandemic incidence, via a NCB PDF-based approach. The proposed framework is summarised in Fig. 2 . Three steps are considered: Pre-processing, Modelling, and Forecasting. In the preprocessing step, the available incidence time series (of size N ), say u = (u 1 , . . . , u t , . . . , u n , . . . , u N ) , is firstly partitioned in two sets. The training set involves the first n points, n < N . For instance, based on the training time series, one can compute the cumulative pandemic incidence until instant n , say Cum n = n t=1 u t . On the other hand, the remaining (N − n ) points are left for evaluating the performance of the prediction model. Besides n , the analyst must determine the time horizon of the study, say TH ( > N ). The resulting NCB model will then forecast the incidence time series from instant 1 to instant TH . Based on TH , the time indexes are normalised The proposed framework. In the pre-processing step the time indexes ( t ) are normalised. It allows one to fit the corresponding incidence time series ( u t ) via a NCP-based model, say ˆ u t , in the modelling step. In the forecasting step, the model ˆ u t is used to predict the time series of the pandemic incidence through the time horizon determined by the part of the analyst. in order to allow the use of the NCB PDF. Let the normalised time indexes set be given by . In turn, in the Modelling step, the training set is used to compute the near-optimal NCB model, ˆ u t . Here, each observed time se- in which ˆ θ obs represents the estimate of the parameter θ in the light of the training set, = x 2 − x 1 reflects the length of the interval involving each NCB PDF evaluation, and is the estimate of the total number of confirmed infected cases in the territory during TH , with the estimate of the cumulative probability P ( X ≤ x n ). One can notice that Eq. (3) is based on the idea that X reflects the normalised time until contamination. Thus, if Cum n involves the proportion 100 × F X ( x n | · )%, then Cum n F X (x n |·) will involve 100%. Finally, Int [ z ] rounds z to its nearest integer number. It is worthwhile to mention that Eq (2) infers the expected value of the pandemic incidence in instant t , considering that the total number of cases, ˆ T IP obs , occurs until TH . To better explain, let T be the random cumulative time to confirm one infection since the first case date, i.e. X = T −1 T H−1 . In fact, supposing that the normalised time X follows a NCB distribution, one has the probability estimate of confirming one case between subsequent instants t and t + 1 : . Thus, supposing a binomial distribution for the random incidence between t and t + 1 , say U t ~binomial (n = ˆ T IP obs , p t ) , one has as expected value E(U t ) = n · p t , leading to Eq (2) . Therefore, once one fixes n ( < N ) and TH ( > N ), an optimisation method can be adopted in order to achieve the best estimates of the NCB-based predictor parameters ( α, β, λ), The mathematical optimisation problem in this way has the mean square error (MSE) as fitness function to be minimised: The MSE brings compromise with both accuracy and efficiency [42] . In the present work, the probabilistic optimisation method named generalised simulated annealing (GenSA) [46] is taken into account. The NCB-based framework has been considered to model, forecast, and compare COVID-19 daily time series incidence from thirteen countries (Argentina, Brazil, China, Germany, India, Iran, Italy, Japan, France, South Korea, Spain, United Kingdom, and US). The time series have been maintained and daily updated by Johns Hopkins university collaborators [40] . The experiment is divided in two parts. First, the goodness of fit of near-optimal NCB, epidemic models [47] (SEIR, SIR, and SIS), and established time series formalisms (ARIMA [48] and ETS [49] ) approaches are compared, according to a number of performance metrics. Then, the NCB is considered for comparing the level of difficulty imposed by COVID-19 to the countries. The computer used to execute the modelling and forecasting exercises is a notebook with Windows 10 Home (64 bits) operational system, Intel i7 processor with 2.6GHz, and 8GB RAM memory. After presenting the design of each experiment, some specific results and comments are introduced in this way. Table 1 summarises the tuning parameters adopted for achieving the near-optimal NCB, SEIR, SIS, and SIR forecasting models. It must be highlighted that the framework introduced in Section 3 has been adapted (Eq (1) ) to SEIR, SIS, and SIR SEIR has also involved the pair per capita death rate and transition rate from exposed to infectious: This optimisation phase has been implemented according to the GenSA package [50] of R . Regarding SEIR, SIR, and SIS, the EpiDynamics package of R [51] has been used. In this way, it was considered that the maximum number of calls of each MSE-based fitness function was (GSA.max.call = ) 3E+04, the maximum running time was (GSA.max.time = ) 10 seconds, the maximum number of iterations of the algorithm was (GSA.max.it = ) 5E+03, the initial value for temperature was (GSA.temperature = ) 1E+08, and the algorithm would stop when there were no improvement after (GSA.nb.stop.improvement = ) 20 steps. It was assumed T H = 600 , allowing one to predict the daily incidence during 600 days since the first infection. In turn, the size of the training set is in which the time series size ( N ) is country-dependent. Regarding ARIMA and ETS, the forecast package of R [52] was considered. The respective auto.arima and ets functions have also promoted near-optimal ARIMA and ETS models. The maximum number of models considered in the stepwise search was (nmodels = ) 5E+03. Table 2 summarises the time consumption by the part of the models during training and test exercises, per country and on average. One can see that the ARIMA modelling is the cheaper framework, followed by ETS and then NCB. In turn, SIR, SEIR, and SIS have required the maximum allowed time of the GSA optimiser. Tests involving GSA.max.time superior to 10 seconds have not led to expressive changes in the fitted models, though GSA.max.time has been fully consumed by SIR, SEIR, and SIS. Fig. 3 exhibits the available COVID-19 incidence time series and the respective models forecasts. The vertical orange dashed line separates the training and test data sets. The machine learning has been based on the training set. Then, the predictors were challenged to infer the test series. One can see the difficult of the predictors in fitting the pandemics incidence trajectory though some adherence can be verified. It is argued that any change in the national and local intervention policies might affect the pandemics trajectory, leading to the fluctuations of the target series around the expected values inferred from the models, mainly in the training set. In turn, to predict the incidence trajectory of countries in Phases between (i) and (ii) has been specially intriguing. Anyway, the target has lied between the forecasts bounds, but for the case of Argentina ( Fig. 3 (a) ), in which the target has always been underestimated. Further, the performance of SEIR, ARIMA, and ETS in predicting the transition between Phases (i) and (iii) has usually been precarious. As previously mentioned, ARIMA and ETS might be more useful to perform short-term (e.g. one-step-ahead) than middle-long-term forecasts, thus tending to present small oscillations through the latter. In turn, SEIR has usually predicted the Phase (ii) for the series, though taking Germany, France, and United Kingdom as exceptions. Finally, cases like Iran and US have been particularly tricking once they are transiting between Phases (i) and (iii) in a very peculiar way, suggesting multiple peaks. With respect to the quality of the predictors, some performance metrics are considered. In this way, let u t be the incidence of COVID-19 at day t and let ˆ u t be the respective forecast for the target u t . Further, let N be the number of observations of the incidence time series taken into account. Besides MSE, there are a number of metrics for evaluating the discrepancies between u t e ˆ u t , for t = 1 , 2 , . . . , N. Here, the following metrics are considered for evaluating the quality of near-optimal NCB, epidemic (i.e. SEIR, SIR, and SIS), and usual time series (i.e. ARIMA and exponential smoothing -ETS) models: MSE; Mean Absolute Percentage Error (MAPE); Average Relative Variance (ARV); Index of Disagreement (ID); Theil's U (Theil); Wrong Prediction on Change of Direction (WPOCID); Intercept of the linear fit between ˆ u t,i and u t (Reg_Intercept); Slope Coefficient of the linear fit between ˆ u t and u t (Reg_Slope); Indeterminacy Coefficient of the linear fit between ˆ u t,i and u t (WR2) [41, 53] ; and an Aggregate Performance Metric (APM). See Eqs (4), (6) - (13) . The greater the value of a given metric, the worse the model is. MAPE measures the model accuracy is a relative value: ARV compares the performance of the predictor with the one of the simple mean of the past values of the series. in which ū t = Theil'U compares the performance of the predictor with the one of the Random Walk model (in which u t is inferred by u t−1 ): Further, WPOCID measures the model quality in forecasting the tendency of the target time series. In turn, WR 2 = 1 − R 2 , as well as Reg_Intercept and Reg_Slope are related to the linear model adjusted to the pairs u t e ˆ u t , via minimal squared estimation. In this way, one can consider Table 2 Time consumption (in seconds) for training and testing near-optimal NCB, SIR, SEIR, SIS, ARIMA, and ETS models for each COVID-19 daily incidence time data taken into account (Argentina -Ar, Brazil -Br, China -Ch, France -Fr, Germany -Ge, India -In, Iran -Ir, Italy -It, Japan -Ja, Korea, South -KS, Spain -Sp, United Kingdom -UK, US). . 3 . Prediction of the national daily incidence of COVID-19 since the first register, according to Johns Hopkins University data set [40] . The vertical orange dashed line separates training and test series. the general equation u t = Reg _ Intercept + Reg _ Slope ׈ u t [41] . Thus, Reg_Intercept and Reg_Slope coefficients represent the additive and multiplicative errors of the forecasts ˆ u t of u t , respectively. In this case, there is a constant error Reg_Intercept, independent from the forecast, and a proportional error Reg_Slope related to the prediction [41] . In turn, R 2 , the determination coefficient, reflects the performance of the model in capturing the variability of the time series [41] . R 2 is defined as in which ū is the average of the observed series. Thus, an ideal predictor would present WR2 = 0, Reg _ Intercept = 0 , and Reg _ Slope = 1 , leading to, u t = 0 + 1 ·ˆ u t . In order to provide a general analysis of these metrics, it is considered the aggregate performance metric: in which min i and max i are, respectively, the observed minimal and maximal values of Metric i among the adjusted models under study. For instance, Tables 3 and 4 summarise the performance of the near-optimal predictors when fitting and forecasting COVID-19 incidence in Brazil, in this order. The second and third columns of the tables highlight the model with the worst and best figures, respectively. One can see that NCB has always beaten the remaining models during training, but in terms of MAPE and WPOCID, in which SEIR and ARIMA have been the best, in this order. During test phase, NCB has only been overcome by SIR WPOCID. Thus, under an aggregate point of view (two last lines of the tables), NCB has been attractive. Anyway, the expressive values of MSE reflect the challenge of fitting and predicting this series. On the other hand, the NCB model has been able to capture ( R 2 = 1-WR2 = ) 92.3% of the variability of the Brazilian series during training phase. In turn, NCB has presented a MAPE of 0.377 in the test phase. Tables 5 and 6 allow one to compare the performance of the models in the light of the thirteen countries taken into account, in terms of APM. In general, NCB has been the best model whilst it has never been the worst alternative. On the other hand, SEIR has presented precarious performance in comparison with the remaining models, mainly during training phase. Though evidently limited in the light of multiple-peaks pandemics time series, the proposed NCB approach can be useful for summarising and comparing the difficulty imposed by these diseases among territories. Fig. 4 suggests the time trajectory of the COVID-19 daily incidence taking data until 2020-06-26 as training set for NCB models. Fig. 5 allows one to compare the shape of the NCB models in the face of these national incidence. Table 7 involves specific figures. Besides n and the respective training cumulative incidence ( Cum n ), the table also exhibits the inferred near-optimal NCB PDF parameters ( ˆ α obs , ˆ β obs , ˆ λ obs ) , cumulative pandemic incidence through TH days ( ˆ T IP obs ), and remarkable instants, i.e. the starting ( date 0 ), global peak ( date m ), and finish ( date end ) dates. For instance, until 2020-06-26, the most contaminated country was US, with (Cum n =) 2,467,554 infected, followed by Brazil. In turn, considering the available data, it is expected that Argentina assume the worst position until the end of the pan- demic, at (date end =) 2021-09-11, involving a total of ( ˆ T IP obs =) 40,676,197 cases. In fact, special attention must also be taken with respect to Argentina and India, which seem to be in the beginning of the pandemic trajectory. The most peculiar NCB shape is dedicated to Iran, with the smallest β estimate and a similar value for α. It is predicted that the global peak in this country would occur near (date m =) 2020-11-13. Further, though the increasing incidence during the last days in US, the NCB framework estimates that the country is facing Phase (iii). In fact, the proposed NCB approach might not fit multiple-peaks-shaped incidence time series, as it seems to be the case of Iran and US. In turn, it is suggested that US has presented the greatest difference in length of Phases (i) and (iii), i.e. the period in which the probability infection increases is clearly lesser than the period in which the probability infection decreases. Table 7 Estimates of the GenSA-based near-optimal NCB models for the national COVID-19 daily incidence time data until 2020-06-26 with respect to Argentina, Brazil, China, France, Germany, India, Iran, Italy, Japan, Korea, South -KS, Spain, United Kingdom -UK, and US. Fig. 5 (b) allows one to compare the thirteen countries in the same terms, via skewness and kurtosis estimates of the fitted NCB probability infection distributions. As previously mentioned, the greater the value of the skew of the probability distribution, in absolute terms, the greater the difference between the lengths of Phases (i) and (iii). On the other hand, the greater the value of the kurtosis the faster the epidemic cycle is. Thus, one can infer that China has presented the fastest cycle, though under the worst difference between the lengths of Phases (i) and (iii). In fact, the duration of Phase (i) was lesser than the one of Phase (iii), thus reflecting the worst scenario for the health system. In turn, Iran would face the longest epidemic cycle, followed by India, Brazil, and Argentina. The similarity of the COVID-19 relative incidence time series in countries like Korea, South and Germany must also be highlighted. It might reflect similar effectiveness of the intervention policies adopted in these countries. Pandemics have been a public health issue for organisations and governments around the world. For instance, COVID-19 has played decisive role for profound culture, economic, and social changes. Thus, predicting and comparing incidence trajectories among territories are paramount. The present paper has provided a relatively simple method for performing these two exercises. It is expected that the proposed NCB approach complements analogous studies at the regional and national level and might be useful in the assessment of plans and emerging disease outbreaks. Though limited to one-peak shape fitting, the NCB approach has performed better than near-optimal versions of established epidemic models (e.g. SIR, SEIR, SIS) as well as time series formalisms (e.g. ARIMA and ETS) for both fitting previous incidence times series and forecasting future values. The results of the methods with respect to a number of performance metrics underlie this argument. In turn, the NCB probability distribution shape has showed useful for summarising and comparing the pandemic incidence trajectory among countries, via kurtosis and skewness estimates. From that, caution with respect to Iran, Argentina, and China, for the sake of illustration, has been suggested. The method has showed to be cheap, demanding less than 7 seconds, of an intermediate notebook, to model and forecast the COVID-19 daily incidence in each one of thirteen countries, for a time horizon of 600 days. Thus, considering a database platform that promotes a daily update of the incidence time series (e.g. the Johns Hopkins University [40] ), the proposed models can be easily updated. The Pand-Pred user interface, freely provided at www. mesor.com.br , makes use of this reasoning. A conceptual limitation of the NCB-based framework is the supposition that the incidence of the disease in a given day follows a binomial distribution. Thus, it is considered that one positive diagnostic is independent from another one in that day, something that might disregard from the reality. In addition, the need to set a maximum time horizon for the end of the pandemic cycle may lead to an underestimation of the spread of the disease in the territory. On the other hand, the impossibility of shaping several peaks is a disadvantage of the NCB models. Countries like Iran and the US seem to demand a more flexible approach, in the case of COVID-19. Thus, ongoing research are dedicated to develop modelling alternatives, such as mixtures of PDFs, adapted artificial neural networks, support vector regression, and copulas formalisms. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Assessing the impact of reduced travel on exportation dynamics of novel coronavirus infection (COVID-19) Coronavirus disease (COVID-19) outbreak situation Agent-based modeling for super-spreading events: acase study of MERS-CoV transmission dynamics in the Republic of Korea A contribution to the mathematical theory of epidemics Stochastic epidemic metapopulation models on networks: SIS dynamics and control strategies Predicting hepatitis B monthly incidence rates using weighted markov chains and time series methods Seasonality and trend forecasting of tuberculosis prevalence data in eastern cape, south africa, using a hybrid model Dengue disease mapping in malaysia based on stochastic sir models in human populations Emergence of oscillations in a simple epidemic model with demographic data Discrete stochastic analogs of erlang epidemic models Short-term predictions and prevention strategies for COVID-2019: A model based study Epidemic analysis of COVID-19 in China by dynamical modeling Applications and comparisons of four time series models in epidemiological surveillance data Epidemiological analysis of hemorrhagic fever with renal syndrome in China with the seasonal-trend decomposition method and the exponential smoothing model Modelling the frequency of depression using holt-winters exponential smoothing method Application of exponential smoothing model in predicting incidence of scarlet fever in Shanghai Time series parameter prediction for ICU patient Assessing and forecasting of epidemiological data using time series analysis Application of an autoregressive integrated moving average model for predicting the incidence of hemorrhagic fever with renal syndrome Forecasting incidence of hemorrhagic fever with renal syndrome in China using arima model Piecewise finite series solutions of seasonal diseases models using multistage adomian method Assessing and forecasting of epidemiological data using time series analysis Comparative analysis of sarima and setar models in predicting pneumonia cases in kenya Time-series analysis of tuberculosis from Neural-net classification for spatio-temporal descriptor based depression analysis Application of a long short-term memory neural network: a burgeoning method of deep learning in forecasting HIV incidence in guangxi, china Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology Diagnosis of dementia by machine learning methods in epidemiological studies: a pilot exploratory study from south india Estimation of basic reproduction number of the middle east respiratory syndrome coronavirus (MERS-CoV) during the outbreak in South Korea Temporal variability and social heterogeneity in disease transmission: the case of sars in Hong Kong Real-time forecasts of the COVID-19 epidemic in China from february 5th to february 24th Optimization method for forecasting confirmed cases of COVID-19 in China Application of the ARIMA model on the COVID-2019 epidemic dataset Multiple-input deep convolutional neural network model for COVID-19 forecasting in China Comparative estimation of the reproduction number for pandemic influenza from daily case notification data National influenza surveillance in the Philippines from 2006 to 2012: seasonality and circulating strains Surveillance for chikungunya and dengue during the first year of chikungunya virus circulation in puerto rico Report 9: impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand Covid-19 data set from the Johns Hopkins University, center for systems science and engineering A temporal-window framework for modeling and forecasting time series Statistical inference. 2nd Non-central bivariate beta distribution Sums, products, and ratios of non-central beta variables. Commun stat-theory methods 20 05 R: a language and environment for statistical computing. R foundation for statistical computing Generalized simulated annealing Modeling infectious diseases in humans and animals Time series analysis: with applications in R Forecasting with exponential smoothing: the state space approach Generalized simulated annealing for efficient global optimization: the GenSA package for R EpiDynamics: dynamic models in epidemiology forecast: forecasting functions for time series and linear models Correcting and combining time series forecasters This work was partially supported by Brazilian national council for scientific and technological development -CNPq.