key: cord-185121-f6vjm4j4
authors: Paiva, Henrique Mohallem; Afonso, Rubens Junqueira Magalhaes; Caldeira, Fabiana Mara Scarpelli de Lima Alvarenga; Velasquez, Ester de Andrade
title: A computational tool for trend analysis and forecast of the COVID-19 pandemic
date: 2020-10-20
journal: nan
DOI: nan
sha: 
doc_id: 185121
cord_uid: f6vjm4j4

Purpose: This paper proposes a methodology and a computational tool to study the COVID-19 pandemic throughout the world and to perform a trend analysis to assess its local dynamics. Methods: Mathematical functions are employed to describe the number of cases and demises in each region and to predict their final numbers, as well as the dates of maximum daily occurrences and the local stabilization date. The model parameters are calibrated using a computational methodology for numerical optimization. Trend analyses are run, allowing to assess the effects of public policies. Easy to interpret metrics over the quality of the fitted curves are provided. Country-wise data from the European Centre for Disease Prevention and Control (ECDC) concerning the daily number of cases and demises around the world are used, as well as detailed data from Johns Hopkins University and from the Brasil.io project describing individually the occurrences in United States counties and in Brazilian states and cities, respectively. U. S. and Brazil were chosen for a more detailed analysis because they are the current foci of the pandemic. Results: Illustrative results for different countries, U. S. counties and Brazilian states and cities are presented and discussed. Conclusion: The main contributions of this work lie in (i) a straightforward model of the curves to represent the data, which allows automation of the process without requiring interventions from experts; (ii) an innovative approach for trend analysis, whose results provide important information to support authorities in their decision-making process; and (iii) the developed computational tool, which is freely available and allows the user to quickly update the COVID-19 analyses and forecasts for any country, United States county or Brazilian state or city present in the periodic reports from the authorities.

mathematical models have been widely used to study the transmission dynamics of infectious diseases, enabling the understanding of the disease spread and the optimization of disease control [34] .

Forecasting models are used to predict future behavior as a function of past data. This is a widely used method in the implementation of epidemic mathematical models, since it is necessary to know the past behavior of a disease to understand how it will evolve in the future. Accurate forecasts of disease activity could allow for better preparation, such as public health surveillance, development and use of medical countermeasures, and hospital resource management [8] .

A similar approach is the concept of trend analysis, which allows predicting future behavior with accuracy, especially in the short run. A trend is a change over time exhibited by a random variable [28] ; trend analyses provide direction to a trend from past behavior, allowing predicting future data. For better effectiveness, the predictions should be updated periodically, as soon as new data are available.

The technique of trend analysis is widely used in several areas of science, such as finances [2] [48] and meteorology [28] [36] . In the context of health systems, trend analysis was used by Zhao et al. [57] , to analyze malignant mesotheliomas in China, aiming to provide data for its prevention and control; by Soares et al. [46] , to predict the testicular cancer mortality in Brazil; by Zahmatkesh et al. [56] , to forecast the occurrences of breast cancer in Iran; by Mousavizadeh et al. [33] , to forecast multiple sclerosis in a region of Iran; and by Yuan et al. [55] , to analyze and predict the cases of type 2 diabetes in East Asia.

Modeling and prediction of the dynamics of the COVID-19 pandemic is a subject of great interest.

Therefore, a myriad of papers on this theme have been published over the last months, exploiting different modeling approaches, such as compartment models [18] , time series analysis [44] , artificial intelligence [41] [54] , and regression-based models [17] [40] . For this purpose, some research groups extended previous epidemiological models to describe the COVID-19 pandemics: Lin et al. [25] created a conceptual model for the COVID-19 outbreak in Wuhan, China, using components from the 1918 influenza pandemic in London, while Paiva et al. [38] proposed a dynamic model to describe the COVID-19 pandemic, based on a model previously developed for the MERS epidemic. This list is far from being exhaustive. For a detailed survey on different modeling approaches in this context, the reader is referred to review papers such as [26] and [32] .

It is important to note that the behavior of the pandemic may vary greatly in the different regions of the world, due to characteristics such as different social habits (higher or lower physical interaction between citizens), capacity of the local health system, different governmental actions, and so on. Therefore, the parameters of a mathematical model need to be tailored to the region where the disease behavior is being studied. Furthermore, even in the same region, the conditions may vary very quickly, in a matter of weeks or even days (for instance, following the decree or release of a lockdown, or the saturation of the available intensive care unit vacancies in the hospitals); thus, the model parameters would need to be updated very often, usually by an expert. However, these analyses might take time and require dedicated work from highly qualified personnel, thus decreasing their availability. It is natural to expect that such analyses are run periodically at the country level, but the same may not be a reality locally at every municipality. Therefore, in this scenario, it is useful to have a computational tool to perform a quick and automatic analysis and forecast of the disease conditions in any region, following the periodic updates published by the authorities. This is the purpose of the present paper.

In the present paper, the fundamental curve that is used to describe the historical data is an asymmetric sigmoid, i.e., letting the independent variable be t, then the dependent variable is given as a function f: ℝ ↦ ℝ [42] :

with the parameters A ∈ ℝ ∪ {0}, ν ∈ ℝ , δ ∈ ℝ , t ∈ ℝ ∪ {0}. In the present work, the independent variable t is the time in days, whereas the dependent variable is either the cumulative number of individuals that were positively tested for SARS-CoV-2 or the cumulative number of individuals deceased with the disease as the cause. Notice that

i.e., the modeling of the cumulative number of cases/demises by ( 1 ) implies convergence to a final value A.

However, the convergence is asymptotic, therefore it is interesting to know when a certain threshold of the final number of infected/deceased has been reached. For that purpose, let a time instant τ ' be such that a particular value f τ ' is reached:

where the parameter α ∈ )0,1+. Then, by replacing ( 1 ) for f τ ' in ( 3 ), one may solve to find:

Therefore, from ( 4 ) one can determine the (finite) instant when a certain proportion of the final number of cases/demises is reached, which is a useful figure to evaluate whether the contamination can be considered over or not. In this paper, the settling date of the contamination is adopted as τ 5.78 , corresponding to the day where the number of occurrences reaches 98% of its final value. The settling ratio of 98% is a standard value used in the analysis of dynamic systems [10] .

The rate at which the number of infections/demises grows can be calculated by differentiation of ( 1 ) with respect to the independent variable t, which yields df t dt = A δ e 1 + νe

As a matter of fact, the value of ( 5 ) in a particular day t is an important indicator for healthcare infrastructure decision-making concerning the number of infected individuals, as a higher value indicates that the upcoming period might stress the healthcare infrastructure, whereas a comparatively lower value points that the number of new cases might be accommodated with the existing infrastructure. By analyzing the number of individuals that are cured each day and discharged from the facilities and comparing it with the rate of newly infected individuals, if the first is greater than the latter, than the capacity of the facilities is enough to treat the ill and they will not be endangered by lack of proper treatment.

Differentiating ( 5 ) 

A sign change in ( 6 ) = > < 0 for t > t , this point corresponds to the maximum rate, i.e., the daily number of either infected or deceased individuals. Replacing t = t in ( 1 ) yields

Notice from ( 7 ) that ν = 1 entails f t = 0.5A, i.e., the sigmoid curve crosses half of the final value at t = t . This is deemed a symmetric sigmoid. For the sake of understanding, consider two other illustrative possible values of ν:

a) for ν = 2, from ( 7 ), f t = A/√3 ≈ 0.58A, that is, the inflection happens at a later stage, when roughly 58% of the final values has been reached; b) for ν = 0.5, from ( 7 ), f t = 4A/9 ≈ 0.44A, in other words, the inflection happens at an earlier stage, when approximately only 44% of the final values has been reached.

It is clear from these examples and from ( 1 ) that the value of ν controls the degree of asymmetry in the sigmoid curve, with ν = 1 representing a symmetric curve about the t = t vertical straight-line. This is illustrated in Figure 1 (a), where ( 1 ) is shown for three values of ν whereas Figure 1 (b) shows ( 5 ), i.e., the rate. It is interesting to remark that the value of ν impacts the symmetry of the derivative, with ν = 1 representing a Gaussian curve, with acceleration and deceleration phases occurring at the same rate. When ν < 1, the deceleration phase of the sigmoid is slower than the acceleration phase; when ν > 1, the opposite occurs.

For their capability of representing processes with asymmetric acceleration and deceleration phases, asymmetric sigmoid curves are interesting to represent the data of a pandemic. Many factors can contribute to the asymmetry between acceleration and deceleration phases besides the very nature of the disease spread, such as the introduction of policies by health authorities in order to slow down the spread, e.g., reduced social

contact. Therefore, this extra degree of freedom brought by the asymmetric sigmoid curve is useful to better represent the data. Moreover, the added complexity with regard to a symmetric curve is due only to the necessity of estimating a single additional parameter, namely ν. In our context, there are three main sigmoid parameters of interest, which are described in Table 1 . The next section presents the algorithm used to estimate the parameters A, ν, δ, and t based on measured data from either the number of newly infected individuals per day or the number of deceased per day.

The parameters A, ν, δ, and t are estimated based on the solution of a constrained optimization problem, in which the Integral Time Square Error (ITSE) [10] is minimized, where the error is the difference between the value of f t output by ( 1 ) and the corresponding data y t obtained from the authorities at the same day.

We consider a time window for t ∈ {0, 1, … , t VW= } for which the data y t are available at each day. There is a small abuse of notation by restricting the real-valued variable t to assume only integer values coinciding with the number of the day, Let the vector of parameters to be estimated be defined as

where the symbol • Y indicates the transpose of a vector •. The optimal value of the vector Θ is given as

where the argument Θ was explicitly included in f t, Θ to emphasize that the parameters may be varied during the optimization process. Note that, for optimization purposes, strict inequalities cannot be implemented, therefore for the constraints ν > 0 and δ > 0, an arbitrary small positive real number μ > 0 is chosen and the constraints are approximated as ν ≥ μ and δ ≥ μ. After the optimization problem is solved to yield Θ * , the optimal values of A * , ν * , δ * , t * are fixed values used to build the curve.

The function f t, Θ is nonlinear in the parameters Θ, and the cost function exacerbates that further, rendering the optimization problem nonlinear. Moreover, the inequality constraints introduce additional difficulty, rendering the analytical solution of the optimization problem impractical. Therefore, numerical methods must be used.

One class of methods that are suitable for nonlinear constrained optimization is the so-called Sequential

Quadratic Programming (SQP) [14] [20] [21] . SQP iteratively approximates the general nonlinear cost function in ( 9 ) by a quadratic one, and the constraints by linear ones, which entails a Quadratic Programming (QP)

problem. QPs can be solved to global optimality in finite time, therefore each iteration of the SQP method takes finite time. The solution of the underlying QP approximation is then used to build a next iterate, for which another QP is solved, therefore the name Sequential Quadratic Programming. SQP presents good convergence properties, converging quadratically to the optimal solution when the active set does not change [35] . The implementation of SQP that is used in the present work is that of the function fmincon [31], from the Optimization Toolbox TM of MATLAB ® .

A second wave of spread has not been discarded. On the contrary, researchers argue that lifting the social distance measures might indeed lead to a retake in the infections [1] [24] [29] [53] .

In order to describe the occurrence of multiple epidemiological waves, we propose to employ a sum of sigmoids. For this purpose, let N j be the adopted number of sigmoids. Equation ( 1 ) is then generalized to

where

Similarly, the vector of parameters Θ, originally given by ( 8 ), is generalized to a column vector with 4N j parameters defined as:

where

With these extended definitions, equation ( 9 ) can still be used to estimate the value of Θ * by considering the inequalities applied to each A k , ν k , δ k and t ,k , i = 1,2, … , N j .

It is important to establish the number of sigmoids N j . For this purpose, an evaluation of the number of switches between acceleration and deceleration phases is performed. The rationale behind this assessment is:

each sigmoid results in a single acceleration and a single deceleration phases, with a clear switching point between them, as discussed in Section 2.1. Therefore, the number of sigmoids can be estimated by counting the amount of such switches between acceleration and deceleration phases. However, this counting requires careful consideration, as one is dealing with real noisy data. More so, recall that for identifying acceleration/deceleration the second derivative of the cumulative number of either infected or deceased individuals has to be considered. As it is well known, differentiation is prone to increase the effect of noise in the measurements [10] . Therefore, to mitigate the effect of noise in increasing artificially the amount of switches, a common approach is to consider a deadzone [10] in the difference between the acceleration and deceleration. Let S be the set of switching instants between acceleration and deceleration phases, then, for each t = 0,1, … , t VW= − 1, the following logic is used to implement an identification of switches with a deadzone:

where the parameter ϵ can be adjusted to provide a compromise between noise and detection sensitivity. In the present work, the value was set to ϵ = 3 ⋅ 10 • persons/day 2 .

Thus, the number of switches is given by the cardinality of S N j = |S| ( 15 ) Recall, from Table 1 , that there are three parameters of interest. The final number of occurrences may be obtained as:

On the other hand, when a sum of sigmoids is used, there are no analytical expressions to determine the other two parameters of interest, i.e., the date of maximum number of daily occurrences t and the settling dateτ 5.78 . In this case, a numerical search algorithm has to be used to find each of these parameters.

The optimization problems to determine these parameters can be posed as follows:

These two optimization problems are solved using the Nelder-Mead algorithm [23] . It should be noted that each problem has only one independent variable (time). Therefore, the search algorithm converges very quickly to the desired solution.

The rationale to select the use of one or multiple sigmoids will be explained in a following section, which discusses the complexity of the model.

Two criteria are used to evaluate the degree of fidelity of the fitted curves to the data. The first is the socalled Root Mean Square Error (RMSE), defined as:

From ( 19 ) the name of RMSE becomes clear, as it involves the square root of the mean of the squared error. Notice that, in ( 19 ) , the values of the curve with the optimal parameters f t, Θ * are used to calculate the error between the data and the value returned by the fitted curve. Moreover, the term t VW= + 1 reflects the number of terms in the summation, as the index t starts at 0 and ends at t VW= . The RMSE is used in statistical analysis to measure compactly the degree of fidelity between the fitted curve and the data. The lower the value of the RMSE, the better the fitted curve matches the data [3] .

In this paper, a normalized version of the RMSE is used, obtained as:

where A is the final number of occurrences, as defined in ( 1 ) and ( 16 ) for one and multiple sigmoids,

respectively. This normalization is adopted to allow a fair comparison of the RMSE of different curves.

A second criterion to determine the quality of the representation of the data by the fitted curve generally applied in statistics is the squared correlation coefficient, which varies between 0 and 1, with the latter meaning that there exists a perfect linear functional relationship between the data and the fitted curve points, whereas the first means the opposite. First, let us define the covariance of the data as

where μ ‰ and μ ? are the mean values of y t and f t, Θ * , respectively, i.e. ( 22 ) in which the symbol • can be replaced by either of y t and f t, Θ * , yielding μ ‰ and μ ? , respectively. Similarly, the variances of y t and f t, Θ * are

The squared correlation coefficient R : can then be determined from ( 21 )-( 24 ) as:

Additional criteria are defined to evaluate whether the data are enough to allow the convergence of the estimated values of the parameters Θ * . This is carried out by fitting the sigmoid curves to the data for each possible value of n ∈ {10,11, … , t VW= − 21, t VW= − 20, … , t VW= }. Thus, instead of using all available data as in ( 9 ) , windows of varying length are used; the minimum length of a window is adopted as 10 to ensure a minimum amount of data to calibrate the curve. Therefore, the sigmoid parameters Θ are estimated within different windows as

where i = 1, 2, … , N j , depending on the number of sigmoids.

The main parameters in Table 1 are then determined from Θ * n as follows:

• For a single sigmoid, A * n and t * n are directly extracted from Θ * n in view of ( 8 ), whereas τ α * n is calculated by ( 4 ) employing t * n , δ * n and ν * n extracted from Θ * n considering ( 8 ).

• For multiple sigmoids, ( 16 )-( 18 ) are used to determine A * n , t * n and τ 5.78 * n .

Then, the relative variation of the estimated values of these parameters is calculated for each time window and multiplied over the time window, composing indices to evaluate if the data are enough to asseverate the suitability of the sigmoid that was fitted. These indices are defined as

where the symbol • represents one of the parameters of interest, namely, A * n , t * n , or τ α * n , for a time window up to k days of data. It is clear that, if the data are enough and a suitable set of parameters is found, then each of the terms in the product in ( 26 ) approaches one. Therefore, the closer the value γ ‹ • is to one, the better the fit. Moreover, the "min" in ( 26 ) ensures that each term in the product is less than or equal to one, from which it follows that γ ‹ • ≤ 1. Analyzing γ ‹ • for different values of k enables the conclusion of whether the convergence has occurred or not within variable window sizes. We adopt windows of size 7, 14 and 21 days, in order to verify the stability of the predictions over the last one, two and three weeks.

From the previous discussion, it is possible to choose among different curve types (symmetric or asymmetric) and numbers (single or multiple sigmoids). This plays an important role both in the accuracy of the fit and in the complexity of the models (as per the different amounts of parameters to be estimated with each choice).

It should be noted that the choice of a more complex model without a significant increase in the accuracy may lead to the problem of model overfitting, that is, an exaggeration while fitting of the training data that may compromise the generalization of the model predictions [47] . In order to avoid this problem, criteria should be established to enable a compromise between accuracy and complexity. These criteria are described in this section.

Particularly for cases of regions where the contagion is in its early stage, there are not enough data to observe a deceleration phase. Therefore, in this case the data are insufficient to support estimation of the asymmetric curves. In these situations, the symmetric curves can be used in the fitting and an automated decision of whether to present results with a symmetric or an asymmetric curve has to be done. The criterion for this decision considers a compromise between complexity and quality of the fitting results. The complexity is deemed higher for the asymmetric curve, as it requires estimation of one additional parameter, namely ν. As for the quality, it is evaluated through the following ratio:

where R • : represents the squared correlation coefficient obtained from fitting an asymmetric curve and R j : the one yielded by a symmetric curve.

Recall that the value of the squared correlation coefficient ranges from zero to one and that this latter value implies a perfect relationship between the data and the curve. Furthermore, the symetric sigmoid may be considered a particular case of the asymetric sigmoid, obtained by imposing ν = 1. Since the asymetric sigmoid contains one additional parameter, and therefore one more degree of freedom for optimization, a better fit is expected, leading to a value of R • : higher than R j : . Therefore, the value of η is expected to be lower than one.

Nevertheless, a value of η very close to one indicates a low increase in the squared correlation coefficient, which may not be enough to justify the increase in the model complexity.

To prevent a division by zero, if R j : = 1 (up to 10 : precision), then the symmetric curve is selected, as it has perfect fit and the asymmetric curve cannot yield better results. This case has been considered for robustness of the computational code; nevertheless, since this a perfect situation, it is not expected to occur in practice.

In the remaining cases, to balance between complexity and a more accurate fit, the criterion for selection follows the rule:

With this choice, a more substantial gain in the accuracy of the fit has to be obtained to justify a more complex curve.

Given the number N j of sigmoids, two fits are performed, i.e., using one and using criterion to decide upon the use of one or multiple sigmoids is similar to the one defined in the previous section for symetric and asymetric sigmoids

We define a new ratio η : as where R ' : and R " : represent the squared correlation coefficient obtained from fitting to the calibration data, respectively, provided that (which is a perfect situation, not expected to occur in practice

The criterion to choose between one and multiple sigmoids is then:

The methodology proposed in the η represent the squared correlation coefficient obtained from fitting one and multiple sigmoids to the calibration data, respectively, provided that R ' : is different from one. As in the previous case, if is a perfect situation, not expected to occur in practice), then only one sigmoid is adopted.

The criterion to choose between one and multiple sigmoids is then:

The methodology proposed in the current paper is summarized by the flowchart presented in n only one sigmoid is adopted.

are selected selected ( 31 ) current paper is summarized by the flowchart presented in Figure 2 .

Flowchart summarizing the methodology proposed in the current paper.

The computer program described in this paper was developed using MATLAB ® 2020a, with the Optimization Toolbox TM and the MATLAB Compiler TM .

The program uses as data source reports published in spreadsheet format in the websites of the European Centre for Disease Prevention and Control (ECDC) [12] , of Johns Hopkins University [9] and of the Brasil.io project [4] .

The ECDC reports contain country-wise data of the countries in the world, while the reports of Johns

Hopkins University and of the Brasil.io project presents data of United States counties and of Brazilian states and cities, respectively.

The data inform the number of newly infected and deceased people on each date. These numbers are informed separately for each region, allowing to perform an independent analysis for each of them.

The computer program may be downloaded from the following link, where the data files updated until 12-Aug-2020 are also available.

The folder contains a "readme" file, which explain the main features and the preliminary steps to use the program. We emphasize that the program may be installed and run directly from the operating system, independently of the user possessing a licensed MATLAB ® installation. Should the user have MATLAB ® and the required packages installed, he/she may run directly a different file from the package without any installation.

A Graphical User Interface (GUI), illustrated in Figure 3 and Figure 4 , will appear. These figures present the main screen of the GUI with data from the European Centre for Disease Prevention and Control and from Johns Hopkins University, respectively. A zoom was applied to these figures to allow for a better reading of their contents; that is the reason why the names of some U.S. counties appear truncated and why the predictions in the bottom of the figures appear incomplete.

When running the GUI for the first time, the user is advised to initially select the option "File: Download New Data File" of the main menu, as described in further detail below. in the next section.

• " Figures: Export Figure" figure, in order to facilitate its edition and copy to external software. download; however, we opt to also provide an option to the work performed by the people responsible for them.

This is a standard option to close the interface. presented in Figure 6 , where the Brasil.io project. The user may choose to download the data file directly or to access one of these sites, using the default web browser. It to access the websites as and to access the corresponding websites.

analysis, as described in the this analysis are presented This option is used to export the graphs of the main screen to a new d information about the interface. It also contains an

On the left of the main screen ( Figure 3 and Figure 4 ), there is a list of all available regions, which will be henceforth called the region list. In this list, when analyzing data from the ECDC or from Johns Hopkins University, the user can select the name of the desired country or of the desired U.S. county (in English); when analyzing data from the Brasil.io project, the user can select the name of the Brazilian states and cities (in Portuguese). Brazilian states are identified by their two-letter acronym. The names of the cities are presented without accents; for instance, the cities of "São Paulo", "Santa Bárbara d'Oeste" and "Santa Fé" are identified as "Sao Paulo", "Santa Barbara d'Oeste" and "Santa Fe", respectively.

Above the region list, there is an edit field where the user can type the name of a region to look for on the list. The user can select the region name in full or in part, and may also employ regular expressions.

Furthermore, a vertical bar "|" can be used representing the "or" operator, to perform a search for more than one region; for instance: fran|germ|italy will restrict the countries in the list to France, Germany and Italy. An empty string is used in the search bar to restore the complete list of regions. Note the search for "Illinois" in Below the region list, there are two options: "Show predictions" and "Show dates". "Show predictions" is used to enable or disable the mathematical modeling (if disabled, only historical data will be shown). If "Show dates" is disabled, then sequential numbers are shown in the graphs' axes, instead of dates.

In the lower left corner of the GUI, there is an edit field where the user can specify the number of days for testing. For instance, if the user specifies a value of 7 days, then the model is calibrated with all data available until one week before the data acquisition, and the remaining days are used to test the model, allowing a comparison between the predictions of the model and the observed data.

The main screen presented in Figure 3 and Figure Below the graphs, the following predictions are presented, for either cases or demises: final number, date of maximum daily occurrences and settling date (as defined in Table 1 ). Furthermore, the equation of the best sigmoid (or set of sigmoids in case more than one wave is identified) matching the accumulated data is presented, as well as the indices RMSE and R : , defined in the previous section. The predictions are presented in editable fields, such that the user can copy their texts and paste them in external software. An example of such information is presented in Table 2 . 

In order to illustrate the use of the tool to perform predictions, Figure 7 to Figure 

As previously mentioned, the trend analysis is run when the user selects the corresponding option in the main menu. Examples of figures resulting from such analysis are presented in Figure 10 to Figure 12 , which correspond to the Brazilian city of São Paulo (SP), the US county of Cook, Illinois, and the Brazilian state of São Paulo, respectively. Each of these figures contain three subfigures, showing the predicted value of the three parameters of interest described in Table 1 .

The abscissa of the graphs indicates the date of the estimation, meaning that all data available until that date were used to estimate the value of the parameter under study. It can be seen that, as expected, the values of the estimated parameters vary with the amount of data used to estimate them.

On the title of each subfigure, the values of γ • , γand γ : are presented, indicating how stable each prediction is, considering the last one, two and three weeks, respectively. A value of γcloser to one indicates a more stable prediction. Similarly, Figure 11 -(a) indicates that the dynamics of the pandemic in the U.S. county of Cook, Illinois, followed a similar pattern. There were oscillations until the end of April, followed by an increasing trend until May 17th, a slightly decreasing tendency and finally an stabilized predicted value of approximately 5000 demises since June 17th.

On the other hand, Figure 12 indicates that the pandemic is not yet stabilized in the Brazilian state of SP (São Paulo). An increasing tendency can be seen over the last weeks. For instance, Figure 12 -(a) shows that the predicted number of demises was approximately 37000 on July 14th and changed to 44000 on August 11th, indicating an increase of approximately 20% in 28 days.

The stability of such predictions over the last one, two and three weeks may be verified by the values of γ • , γand γ : shown in the title of each figure. It can be seen that these values are close to one, indicating stabilized or near-to-stabilization predictions.

The values of γ ‹ are intended to represent the convergence of the estimations. As an additional feature, they may also be used as a measurement of the quality of the prediction -higher values of γ indicate that the pandemic has been following the same predicted behavior over the last weeks. For instance, when analysing the values of γ : presented in Figure 10 -(a) and Figure 12 -(a), it can be seen that such values are γ : = 0.963 and γ : = 0.873 for the city and for the state of São Paulo, respectively. One may infer from these numbers that the predictions obtained in the last three weeks are more stable in the city of São Paulo than in the state with the same name. This is the same conclusion that was achieved by analysing the curves, as described in the previous paragraphs.

It should be emphasized that the values in Figure 10 -(a), Figure 11 -(a) and Figure 12 -(a) do not refer to the number of demises on the day of the analysis, but rather to the predicted final number of demises, estimated on the basis of all data available until that date. This is an innovative approach for trend analysis in this context and, to the best of the authors' knowledge, has not been proposed before.

Additionally, the same analyses presented here for the number of demises can be run for the number of The stabilized results for the city of São Paulo and for the county of Cook, Illinois, allow to conclude that that the actions of the local governments to control the pandemic are taking effect. It is of public interest to determine how the disease will spread in each city after the restriction measures are alleviated. For this purpose, the trend analysis should be run again. Should a new increasing tendency be observed, the authorities would be advised to reinstate some containment measures.

It is important to point out that these results, although helpful, should be validated by medical experts and not be considered alone when deciding public policies.

The trend analysis may be run for a country, a state, a county or a city. It provides more useful information when it is run for smaller administrative regions such as a county or a city, because it allows supporting decision by local authorities based on specific data of the region under consideration.

A limitation of the proposed approach is that it is not adequate to analyze the pandemic in very small cities or counties, because the number of infections and demises is usually very low, not allowing a good fitting by the mathematical model proposed here. However, for medium-and large-sized cities or counties, informative results are expected, as the ones presented here for the U.S. counties of Cook, Illinois and Los Angeles, California and for the Brazilian cities of Brasilia (DF) and São Paulo (SP).

The results presented here are illustrative and correspond to the scenario on the date when the data were acquired, that is, on 12-Aug-2020. These analyses should always employ updated data to increase their reliability. Therefore, the authors recommend these studies to be repeated periodically, at least on a weekly basis. The developed computer program allows to easily perform this task.

This paper proposed a methodology and a computational tool to forecast the COVID-19 pandemic throughout the world, providing useful resources for health-care authorities. A user-friendly Graphical User

Interface (GUI) in MATLAB ® was developed and can be downloaded online for free use. An innovative approach for trend analysis was presented.

Resources in the computational tool allow to quickly run analyses for the desired regions. Additional options allow to access the official website of the European Centre of Disease Prevention and Control, of Johns Hopkins University and of the Brasil.io project, in order to download new data as soon as they are published online. To this date, these institutions have been updating their reports on a daily basis.

The analyses run by the program are intended only as an aid and the results should be interpreted with care. They do not replace a careful analysis by experts. Nevertheless, such results may be a very useful tool to assist the authorities in their decision-making process.

The proposed program is in continuous development and future added features will be published and described in the project webpage. The authors would appreciate any feedback and suggestions to improve the computational tool.

The program, in its current version, is able to process detailed information about U.S. counties and about

Brazilian states and cities. These two countries were chosen because they have continental dimensions and are currently the focus of the COVID-19 pandemic. Nevertheless, the same resource could be extended to other countries. For this purpose, the main requirement would be to write a code to read other country data files and convert them to the format recognized by the program, which is quite simple.

Future works can employ the same methodology and adapt the computer tool to describe the dynamics of other epidemics around the world. In the recent past, no pandemic was as severe as the COVID-19, but there were occurrences of other diseases such as Influenza A and MERS-CoV. Should a similar epidemic occur again, the computer program described here would be a resourceful tool.

Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19

Financial performance of Malaysian local authorities: A trend analysis

Correspondence among the Correlation, RMSE, and Heidke Forecast Verification Measures; Refinement of the Heidke Score. Weather Forecasting

The role of laboratory diagnostics in emerging viral infections: the example of the Middle East respiratory syndrome epidemic

Convalescent plasma as a potential therapy for COVID-19

The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak

Influenza forecasting in human populations: a scoping review

An interactive web-based dashboard to track COVID-19 in real time

Modern Control Systems

Influenza forecasting with Google flu trends

Download today's data on the geographic distribution of COVID-19 cases worldwide

A novel human coronavirus: Middle East Respiratory Syndrome human Coronavirus

Sequential Quadratic Programming Methods

Clinical Characteristics of Coronavirus Disease 2019 in China

Could chloroquine /hydroxychloroquine be harmful in Coronavirus Disease 2019 (COVID-19) treatment?

Forecasting of COVID19 per regions using ARIMA models and polynomial functions

The effectiveness of quarantine of Wuhan city against the Corona Virus Disease 2019 (COVID-19): A well-mixed SEIR model analysis

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China

Backtracking search integrated with sequential quadratic programming for nonlinear active noise control systems

Optimization of multi-product economic production quantity model with partial backordering and physical constraints

The characteristics of Middle Eastern respiratory syndrome coronavirus transmission dynamics in South Korea. Osong public health and research perspectives

Convergence properties of the Nelder--Mead simplex method in low dimensions

First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment. The Lancet

A conceptual model for the coronavirus disease 2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental action

Spread and impact of COVID-19 in China: a systematic review and synthesis of predictions from transmission-dynamic models

Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data

Trend analysis of annual and seasonal rainfall time series in the Mediterranean area

The end of social confinement and COVID-19 re-emergence risk

Coronavirus Pandemic -Therapy and Vaccines

A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19

Time-trend analysis and developing a forecasting model for the prevalence of multiple sclerosis in Kohgiluyeh and Boyer-Ahmad Province, southwest of Iran

Real-time forecasting of an epidemic using a discrete time stochastic model: a case study of pandemic influenza (H1N1-2009)

Numerical Optimization

Climatology and trend analysis of extreme precipitation in subregions of Northeast Brazil

Clinical manifestation, diagnosis, prevention and control of SARS-CoV-2 (COVID-19) during the outbreak period

A data-driven model to describe and forecast the dynamics of COVID-19 transmission

COVID-19 immunity passports and vaccination certificates: scientific, equitable, and legal challenges

Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model

Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil

A flexible growth function for empirical use

CDC estimate of global H1N1 pandemic deaths: 284,000. Center for Infectious Disease Research and Policy

Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming

Molecular characterization and comparative analysis of pandemic H1N1/2009 strains with cocirculating seasonal H1N1/2009 strains from eastern India

Testicular Cancer mortality in Brazil: trends and predictions until 2030

Overfitting and optimism in prediction models. Clinical Prediction Models

Stock Market Trend Prediction Using High-Order Information of Time Series

World Health Organization (WHO)

World Health Organization (WHO)

World Health Organization (WHO)

Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study

Beware of the second wave of COVID-19

Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions

Type 2 diabetes epidemic in East Asia: a 35-year systematic trend analysis

Breast cancer trend in Iran from 2000 to 2009 and prediction till 2020 using a trend analysis method

Epidemiology and trend analysis on malignant mesothelioma in China

The authors acknowledge the European Centre for Disease Prevention and Control (ECDC), Johns Hopkins University and the Brasil.io project for making the COVID-19 data publicly available and for allowing its use for research purposes.