key: cord-0554744-pmi0j8ox
authors: Schumacher, Fernanda L.; Ferreira, Clecio S.; Prates, Marcos O.; Lachos, Alberto; Lachos, Victor H.
title: A robust nonlinear mixed-effects model for COVID-19 deaths data
date: 2020-07-02
journal: nan
DOI: nan
sha: 0b6f76028c97bea2136c6668bb10805344d3be7c
doc_id: 554744
cord_uid: pmi0j8ox

The analysis of complex longitudinal data such as COVID-19 deaths is challenging due to several inherent features: (i) Similarly-shaped profiles with different decay patterns; (ii) Unexplained variation among repeated measurements within each country, these repeated measurements may be viewed as clustered data since they are taken on the same country at roughly the same time; (iii) Skewness, outliers or skew-heavy-tailed noises are possibly embodied within response variables. This article formulates a robust nonlinear mixed-effects model based in the class of scale mixtures of skew-normal distributions for modeling COVID-19 deaths, which allows the analysts to model such data in the presence of the above described features simultaneously. An efficient EM-type algorithm is proposed to carry out maximum likelihood estimation of model parameters. The bootstrap method is used to determine inherent characteristics of the nonlinear individual profiles such as confidence interval of the predicted deaths and fitted curves. The target is to model COVID-19 deaths curves from some Latin American countries since this region is the new epicenter of the disease. Moreover, since a mixed-effect framework borrows information from the population-average effects, in our analysis we include some countries from Europe and North America that are in a more advanced stage of their COVID-19 deaths curve.

Some studies for modeling the COVID-19 data are based on nonlinear models. Tsallis and Tirnakli (2020) proposed a nonlinear model based on volume of stock-markets to predict the number of active cases:

where C > 0, α > 0, β > 0, γ > 0, q > 1 and t 0 ≥ 0. The constant t 0 indicates the first day of appearance of the epidemic in that particular region; it is conventionally chosen to be zero for China; for other countries, it is the number of days elapsed between the appearance of the first case in China and the first case in that country. The normalizing constant C reflects the total population of that particular country. We refer to Tsallis and Tirnakli (2020) , for more details about the interpretation of the parameters α, β, γ, q.

The CovidLPTeam (2020) proposed an hierarchical Bayesian non-Gaussian and non-linear model to capture the dynamics of the epidemic (last accessed on the 26 th of June on 2020). The daily counts are assumed to come from a Poisson distribution and the daily pandemic evolving by a generalized logistic dynamic:

where d is responsible to capture subnotification on specific day(s), c control the infection rate, f is an asymmetry parameter, a, b and f control the asymptote of the curve, given by a b f , with the peak occurring at time t = − ln (b/f ) c . However, to the best of our knowledge, most models developed up to this date are univariate, not taking into account a possible structure of dependence on the disease among countries or regions. Also, as the onset of the disease occurs at different periods between countries, those at a more advanced stage of the disease can provide valuable information to countries at an early stage.

In recent years, nonlinear mixed-effects (NLME) models have been proposed for modeling many complex longitudinal data (Lindstrom and Bates, 1990; Wu, 2010) . However, one often assumes that both random error and random effects are normally distributed, which may not always give reliable results if the data exhibit excessive skewness and heavy tailedness, as is the case of COVID-19 deaths data. In this paper, we present a novel class of asymmetric NLME models that provides an efficient estimation of the parameters in the analysis of longitudinal data. We assume that, marginally, the random effects follow a scale mixtures of skew-normal distribution (SMSN, Branco and Dey, 2001) and that the random errors follow a symmetric scale mixtures of normal distribution (SMN, Lange and Sinsheimer, 1993) providing an appealing robust alternative to the usual normal distribution in NLME models. We propose an approximate likelihood analysis for maximum likelihood (ML) estimation via an EM-type algorithm that produces accurate ML estimates and significantly reduces the numerical difficulty associated with the exact ML estimation. The newly approximate procedures are applied to COVID-19 deaths data and it is showed that models with skewheavy-tailed assumption may provide more reasonable results and these may be important for COVID-19 research in providing quantitative guidance to better understand the stages and future development of the disease.

The purpose of this study is to predict the number of deaths caused by COVID-19 to short and long term, in countries at early stage such as Peru, Mexico, Chile, Brazil and Colombia along with some countries at a more advanced stage of the disease such as Belgium, Italy, the USA and the United Kingdom. Furthermore, a general bootstrap method is used for constructing confidence intervals for the fitted COVID-19 deaths curves and its mode (date of the peak) for the selected countries. The method proposed in this paper is implemented in R (R Core Team, 2019), and the codes are available for download from Github (https://github.com/fernandalschumacher/NLMECOVID19).

The rest of the paper is organized as follows. In Section 2, we describe the motivating COVID-19 deaths datasets obtained from the John Hopkins repository. In Section 3, we present the methodology and associated ML estimation procedure via the EM-algorithm. In Section 4, we present our result, where we forecast the total number of deaths in the selected countries for the short term 30 and 60 days and long term 90 and 150 days with starting point at June 25 th 2020. Finally, the paper concludes with some discussions in Section 5.

The Johns Hopkins University through the Center for Systems Science and Engineering created a COVID-19 repository (https://github.com/CSSEGISandData/COVID-19/) that is updated daily with data from many locations of the world. The repository combine data from a variety of official agencies and others reliable sources and unify them.

Due to different testing capacity between countries, subnotification is a challenge to understand the true contamination numbers of COVID-19 in the population. Despite that subnotification might also happen in number of deaths, they are less likely to be affected by detection biases. In fact, recent studies use the number of deaths as a proxy measure for COVID-19 cases (see, for example, Maugeri et al., 2020; Ribeiro and Bernardes, 2020; Amaro et al., 2020) . Figure 1 shows the reported number of daily deaths in different countries. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 1 : Number of daily reported deaths since first death for the nine countries considered in this study, until 24 th of June 2020, with a LOESS non-parametric estimated curve superimposed.

As it can be seen the selected countries are in different stages of the COVID-19 pandemic. Belgium, Italy, the United Kingdom (UK), and the United States of America (US) present a controlled number of deaths by the disease. Other countries like Brazil, Mexico, and Peru seem to be around their peak, while Chile and Colombia apparently are in the increasing part of the curve. Figure 1 also presents a LOESS (Cleveland, 1979) fit and a 30 days prediction for each series. Non-parametric curves are alternatives for model fitting because of their flexibility. As it can be seen, the curves have difficulties to capture the rapid increase in deaths in some countries but overall provide a reasonable fit. However, when using this model to make prediction we can clearly see that their extrapolation capacities are not reliable, making necessary the usage of a more adequate statistical parametric model to capture the nature of the pandemic and provide sensible fit and prediction.

Although that Peru moved quickly to lock down its citizens as the pandemic took hold in early March and has extended the lockdown until the end of June 2020, cases nonetheless exploded in May, reaching a peak of more than 8, 000 cases per day late in May, in part explained because the informal employment reaches 73% of the Peruvian labor market.

Peru has now reported 268, 602 cases and 8, 761 deaths of COVID-19 (Worldometers.info, 2020) , the second-highest number of confirmed cases of the disease in Latin America, behind Brazil, and the seventh-highest globally.

In Mexico, the accumulated COVID-19 cases reached 202, 951 and a death total of 25, 060 until June 26 th . Moreover, the statistics show that the curve is ascending (Worldometers.info, 2020) . Torrealba-Rodriguez et al. (2020) presented a modeling and prediction of accumulated cases of COVID-19 infection in Mexico using the models of Gompertz and Logistic and a framework of Artificial Neural Network. These models predict the peak between May 8 th and June 25 th , which might be unrealistic as presented in Section 4.

Colombia is in an ascending stage of the curve of cases and deaths, having accumulated 80, 599 cases and 2, 654 deaths until June, 26 th according to Worldometers.info (2020) . Rivera-Rodriguez and Urdinola (2020) used a SEIR model to estimate the number of patients that would required Intensive Care Units (ICU) care (critical), and only hospital care (severe) in order to manage their limited resources. De Castro (2020) used a SIR model to estimate the number of cases and deaths in Colombia using data of Italy and South Korea where the peak of infections would be by June 29 th . As shown in Figure 1 , Colombia presents an increasing pattern and the peak should be observed somewhere in the future.

For Chile, the statistics are of 263, 360 accumulated cases and a total of 5, 068 deaths until June 26 th (Worldometers.info, 2020). It is important to observe that there are two days with observations far above the standard: number of deaths of 649 (June 8 th ) and number of cases of 36, 179 (June 18 th ).

Lastly, Brazil presented 1, 244, 419 cases and 55, 304 deaths until June 26 th according to Worldometers.info (2020) and is the country with second-highest number of confirmed cases and deaths in the world. CovidLPTeam (2020) estimates that the Brazilian peak of deaths should happen in June 06 th and a estimated total number of death of 99, 932 which is closely related to the predictions presented in Section 4. From Figure 1 we can see that Brazil is the country in more advanced stage of the evolution of the disease in the Latin American region.

The different stages of spread of the disease are an important information that should be used into modeling. In the next section we introduce our methodology that jointly accommodates the different stages of the diseases and borrow information of the different time series to provide a more robust and reliable fit and prediction.

3 The model

The idea of the SMSN distributions originated from an early work by Branco and Dey (2001) , which included the skew-normal (SN) distribution as a special case. We say that a p × 1 random vector Y follows a SN distribution with p × 1 location vector µ, p × p positive definite dispersion matrix Σ and p × 1 skewness parameter vector λ, (often known as the shape parameter) and write Y ∼ SN p (µ, Σ, λ), if its probability density function (pdf) is given by

where φ p (.; µ, Σ) stands for the pdf of the p-variate normal distribution with mean vector µ and dispersion matrix Σ, N p (µ, Σ) say, and Φ(.) is the cumulative distribution function (cdf) of the standard univariate normal. Note for λ = 0, (2) reduces to the symmetric N p (µ, Σ)-pdf, while for non-zero values of λ, it produces a perturbed (asymmetric) family of N p (µ, Σ)-pdf's. Let Z = Y − µ. Since aZ ∼ SN p (0, a 2 Σ, λ), for all scalar a > 0, the SMSN family can be defined as follows: a SMSN distribution is that of a p−dimensional random vector

where U is a positive random variable with the cdf H(u; ν) and pdf h(u; ν), and independent of the SN p (0, Σ, λ)random vector Z. Here ν is a scalar or vector parameter indexing the distribution of the mixing scale factor U . Given U = u, Y follows a multivariate skew-normal distribution with location vector µ, scale matrix u −1 Σ and skewness parameter vector λ, i.e., Y|U = u ∼ SN p (µ, u −1 Σ, λ). Thus, by (2), the marginal pdf of Y is

The notation Y ∼ SMSN p (µ, Σ, λ; H) will be used when Y has pdf (4). The asymmetrical class of SMSN distributions includes the skew-t (ST), the skew-slash (SSL), and the skew-contaminated normal (SCN). All these distributions have heavier tails than the skew-normal and can be used for robust inferences. When λ = 0, the SMSN class reduces to the scale mixtures of normal (SMN) class, which is represented by the pdf f 0 (y) = ∞ 0 φ p (y; µ, u −1 Σ)dH(u; ν). We use the notation Y ∼ SMN p (µ, Σ; H) when Y has distribution in the SMN class. We refer to Schumacher et al. (2020a) for details and additional properties related to this class of distributions.

In this section, we present the models and methods in general forms, illustrating that our methods may be applicable in other applications as well. Denote the number of subjects by n and the number of measurements on the ith subject by n i . For notational convenience, let x ij (i = 1, 2, . . . , n; j = 1, 2, . . . , n i ) be a vector incorporating independent variables such as number of ICU beds, β ij = (β 1ij , ..., β sij ) , β = (β 1 , . . . , β r ) (r > s). The NLME model can be written as

where the subscript i is the subject index, y i = (y i1 , ..., y ini ) , with y ij being the response value for individual i at time t ij , η i (t ij , β ij ) = (η(t i1 , β i1 ) , . . . , η(t ini , β ini )) , with η(·) being a nonlinear known function,

where c = − 2 π k 1 , with k 1 = E{U −1/2 }, ∆ = D 1/2 δ, with δ = λ/ 1 + λ λ, σ 2 is the unknown withinsubject variance, D = D(α) is the q × q variance-covariance matrix of b i , which depends on unknown and reduced parameters α of dimension v × 1 and λ = (λ 1 , . . . , λ q ) is a q × 1 vector of skewness parameters for the random effects. Using the definition of a SMSN random vector and (6), it follows that marginally (Schumacher et al., 2020a) b i iid ∼ SMSN q (c∆, D, λ; H) and i ind ∼ SMN ni (0, σ 2 I ni ; H), i ∈ {1, . . . , n},

so that E{b i } = E{ i } = 0. Thus this model considers that the i 's, related to within-subject errors are symmetrically distributed, while the distribution of random effects is assumed to be asymmetric and with mean zero. In addition, under this consideration the regression parameters are all comparable.

Let θ = (β , σ 2 , α , λ ) (ℵ × 1), where ℵ = r + 1 + v + q, then classical inference on the parameter vector θ is based on the marginal distribution for Y = (y 1 , . . . , y n ). Thus, the integrated likelihood of (5)-(6) for θ in this case is given by

which generally does not have a closed form expression because the model function is not linear in the random effects.

In the normal case, to make the numerical optimization of the likelihood function in a tractable problem, different approximations to (8) have been proposed. Some of these methods consist of taking a first-order Taylor expansion of the model function around the conditional models of the random effects b (Lindstrom and Bates, 1990) . Following this idea, the marginal distribution of Y i , for i ∈ {1, . . . , n}, can be approximated as

∼ " denotes approximated in distribution.

The approximated empirical Bayes estimator of b i , denoted by b i , obtained by the conditional mean of

where

We refer to Lachos et al. (2010) and Schumacher et al. (2020a) , for further details.

In this section, we demonstrate how to use the EM algorithm (Dempster et al., 1977) to obtain approximate maximum likelihood (ML) estimator of a SMSN-NLME model. We denote the current estimates of (β, b i ) by ( β, b i ). In this case, the linearization procedure (Wu, 2010) consists of taking the first-order Taylor expansion of η i around the current parameter estimate β and the random effect estimates b i , which is equivalent to iteratively solving the following linear mixed-effect (LME) model

∼ SMN ni (0, σ 2 e I ni H). The model defined in (11) can be seen as a slight modification of the general SMSN-LME model proposed by Lachos et al. (2010) and Schumacher et al. (2020a) , where a simple and efficient EM-type algorithm for iteratively computing ML estimates of the parameters in the SMSN-LME model has been proposed and for which we use the skewlmm package in R (Schumacher et al., 2020b ; R Core Team, 2019). The approximated likelihood function, derived from (9), can be easily computed as a byproduct of the EM-algorithm and is used for monitoring convergence and for model selection, such as, the Akaike (AIC) and Bayesian Information Criterion (BIC) (Wit et al., 2012) .

Suppose now that we are interested in the prediction of y + i , a future υ × 1 vector of measurement of Y i , given the

The minimum mean square error (MSE) predictor of y + i , defined as the conditional expectation y + i given Y i and θ, is given next.

Let b i be an expansion point in a neighborhood of b i , y + i be an υ ×1 vector of future measurement of Y i (or possibly missing), x + i and t + i be an υ × r matrix of known prediction regressors. Then, under the SMSN-NLME model the predictor (or minimum MSE predictor) of y + i can be approximated as

where

(2) i ) and

. The expression for τ −1i are given in Schumacher et al. (2020a) .

Bootstrap methods are statistical tools used in many statistical problems such as calculus of bias, variance, confidence intervals, sampling distributions of the estimators, among others. For technical details, we refer to Efron and Hastie (2016) . In this paper, we use the parametric Bootstrap to calculate a confidence interval for the mode (date of the peak) of COVID-19 deaths by country. Also, we construct confidence bands to the estimated COVID-19 deaths curve along the time. We generate M random samples of length n of the model (eq. 5-6) using the estimated parameters and for each random sample, we fit the proposed model and calculate the statistical mode, the fitted and predicted curve ( Y t ). So, for each country, we will have M o 1 , ..., M o M estimates of the mode and Y t1 , ..., Y t M . We calculate the percentiles α/2 and 1 − α/2 to construct a confidence interval of level α for the mode and for the fitted curve of the COVID-19 deaths. In this notes, we use α = 5%.

To propose the confidence interval for the peak of deaths, we select the two dates such that the 97.5% percentile curve coincides with the highest value of the 2.5% percentile curve. Thus, we guarantee that all curves belonging to the confidence band will peak within this interval.

In order to analyse the COVID-19 data described in Section 2, we propose to fit the SMSN-NLME model given in (5) with the derivative of the generalized logistic model as nonlinear function, which can be written as follows:

where α 2i = exp{β 2 + b 2i }, α 3i = exp{β 3 + b 3i }, and α k = exp{β k }, for k = 1, 4, with the exponential transformation being used to ensure positiveness of the parameters. Note that this function is similar to the one considered in the univariate approach of CovidLPTeam (2020), as presented in (1), except that in (13) random effects are included to enable a multivariate approach and borrow information between the different time series.

For numerical stability, we use the linear transformation y ij = z ij /k z , where z ij is the number of registered deaths at the ith country and jth day since first death, and k z = 33.944 is chosen to be the sample standard deviation from the Colombia data, which is the smaller one in the observed data. For model selection, we consider AIC and BIC, as described in Section 3.3, for distributions normal (N), skew-normal (SN), student's-t (t), and skew-t (ST). As shown in Table 1 , the distribution with lowest AIC and BIC is the ST, which will be used in the analysis hereafter. It is worth noting that the fitted ST distribution has ν = 1.568 (Table 2) , evidencing the need for heavy tail models. To obtain long term evolution estimates, we computed the 300-days-ahead prediction for all countries considered in this study. Figure 2 presents the fitted model, along with the prediction, the real data and confidence intervals information. To preserve the individual properties observed for each country, the random error was generated conditioned on the mixing scale factor u i = E{U i | θ, y i }, i = 1, . . . , 9, which is estimated as a byproduct of the EM algorithm and is responsible to preserve the observed characteristics of the series for each country. Additionally, the 95% confidence intervals were obtained as described in Subsection 3.5, where M = 600 samples were generated, from which the model estimate or prediction for 13 samples resulted in numerical error and/or non-convergence. For the 587 remaining samples we performed a 15% trim of the series to remove those noise series that were not able to replicate the characteristics of the data. To identify such series we used as metric the mean square error (MSE) between the fitted values and observed data and removed those such that the MSE have the highest values. This procedure was performed to prevent convergence problems to affect the interval estimates, resulting in a final of 499 samples. 3.800 λ 2 13.915 ν 1.568 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Apr 12 Peak 95% CI: (Apr 05, Apr 21) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Apr 22 Peak 95% CI: (Apr 14, May 02) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Jun 16 Peak 95% CI: (May 29, Jul 27) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Apr 01 Peak 95% CI: (Mar 27, Apr 08) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Jun 08 Peak 95% CI: (May 24, Jun 27) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Aug 17 Peak 95% CI: (Jun 10, Apr 20)

q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Peak: Jul 15 Peak 95% CI: (Jun 13, Oct 05) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 2 : Fitted and predicted curve for the ST-NLME model (blue line), along with real data (black), and bootstrap 95% interval estimates for the curve (shaded blue area) and for the peak (shaded grey area), for each country.

From Figure 2 , we can see that for countries in more advanced stage of the evolution of COVID-19, the confidence intervals are narrow. On the other hand, for countries in early stage such as Chile and Colombia, the uncertainty regarding the disease evolution is much bigger, which is reflected by the wider confidence intervals and very vague information about the peak. Table 3 reports predictions for the total number of deaths for each country and up to 30, 60, 90, and 150-days-ahead of the date from the latest observation considered (2020-06-24), using the fitted ST-NLME model (our selected model), along with its 95% confidence interval, obtained from the bootstrap results. As can be seen, based on the observed data, the European countries and the US have the deaths by COVID-19 practically controlled, since the death estimates are stable in all future prediction times. Brazil, Mexico and Peru are around its peak (Figure 3 ). Their prediction mildly increase with time and they present a little more imprecision as observed by their 95% confidence interval estimates if compared to the controlled countries. Finally, Chile and Colombia seem to be in the increasing phase. The predicted number of deaths changes drastically between the time intervals with a wide uncertainty. Table 4 shows the total estimated number of deaths at the end of the pandemic and the estimated peak date by each country. This results corroborates and reinforce the analysis presented in the previous paragraph. Moreover, Table 4 presents the estimates of the parameters of the logistic dynamic of the model. As it can be seen,α 4 1 showing a large right skewness for the pandemic dynamics. In other words, the increase phase occurs much faster than the decrease phase as observed in many places. This is a clear effect of borrowing strength from curves in different stages. For example, for the curves from the Latin American countries, where observations are mainly on the left side of the Table 3 : Prediction for 30, 60, 90, and 150-days-ahead of the total number of deaths using the fitted ST-NLME model. Values inside parenthesis are the 95% confidence intervals.

Total expected by 2020-07-25 Total expected by 2020-08-24 peak, estimation of the skewness parameter is unstable due to the lack of information after the peak. Although in similar scale, we can see some variation between α 2i and α 3i , i = 1, . . . , 9, which are important to capture the unique characteristic of each time series and provide a good fit and meaningful predictions. Understanding the dynamics of the pandemic and being able to predict its future behavior is of extreme importance. While many countries and consortia try to create an effective vaccine for the disease, the best current alternative is to flatten the contamination curve to guarantee that the health systems are able to provide the necessary care for the population without being overwhelmed. With the use of countries that have controlled the pandemic number of deaths we strongly believe that our methodology can provide more reliable and meaningful prediction that allow policies makers in Latin America to make effective decisions.

Extension of the presented methodology to the number of cases can also be performed, however, because of the large subnotification in the data and due to the different testing capability among countries, this approach is challenging. A possible alternative to overcome this fact is to incorporate into modeling time changing covariates that adequately capture the subnotification dynamics of each country. Another interesting future development and work is to model the cases and deaths by COVID-19 jointly since these series are necessarily correlated. Moreover, the WHO has warned that the coronavirus may never go away and concerns are growing about a second (perhaps a third) wave of infections. Thus, a natural generalization of our method is to consider a finite mixture of SMSN-NLME models (Lachos et al., 2017; Zeller et al., 2019 ). An in-depth investigation of such extensions are beyond the scope of the present paper, but certainly an interesting topic for (near) future research.

We believe that the proposed methods may have a significant impact on COVID-19 research because, in the presence of skewness and heavy tails in the data, appropriate statistical inference for COVID-19 research can lead to more accurate and reliable clinical decisions, in addition, our model can be useful to guide some policies that need to be taken by the selected Latin American governments in order to overcome the COVID-19 pandemic. To the best of our knowledge, this is the first attempt in working on such general distributional structure related to COVID-19 deaths data. Our proposed method is quite general and is not limited to the analysis of the selected countries and reported deaths, thus we have made the R codes available for download from Github

Global analysis of the covid-19 pandemic using simple epidemiological models

A general class of multivariate skew-elliptical distributions

Robust locally weighted regression and smoothing scatterplots

CovidLP: short and long term prediction for COVID-19. Departamento de Estatística. UFMG, Brazil

Sir model for COVID-19 calibrated with existing data and projected for colombia

Maximum likelihood from incomplete data via the EM algorithm

JF Salvando Todos Team, 2020. Plataforma Estatstica JF para COVID-19. Departamento de Estatística. UFJF, Brazil

Likelihood based inference for skew-normal independent linear mixed models

Finite mixture modeling of censored data using the multivariate student-t distribution

Normal/independent distributions and their applications in robust regression

Nonlinear mixed-effects models for repeated-measures data

Estimation of unreported novel coronavirus (sars-cov-2) infections from reported deaths: A susceptible-exposed-infectious-recovered-dead model

Estimating covid-19 cases and reproduction number in brazil

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

Estimate of underreporting of COVID-19 in Brazil by Acute Respiratory Syndrome hospitalization reports. Notas Tcnicas Cedeplar-UFMG. Cedeplar, Universidade Federal de Minas Gerais

Predicting hospital demand during the covid-19 outbreak in bogota, colombia

Scale mixtures of skew-normal linear mixed models with withinsubject serial dependence

skewlmm: Scale mixtures of skew-normal linear mixed models

Modeling and prediction of covid-19 in mexico applying mathematical and computational models

Predicting covid-19 peaks around the world

an introduction to model uncertainty

Covid-19 coronavirus pandemic

Mixed Effects Models for Complex Data

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

This paper was written while Marcos O. Prates was a visiting professor in the Department of Statistics at the University of Connecticut (UConn). In addition to the support of UConn, the professor would like also to thank the CovidLP group for the discussion about the topic and the Conselho Nacional de Desenvolvimento Cientfico e Tecnolgico (CNPq) for partial financial support. Fernanda L. Schumacher acknowledges the partial support of Coordenao de Aperfeioamento de Pessoal de Nvel Superior -Brasil (CAPES) -Finance Code 001, and by Conselho Nacional de Desenvolvimento Cientfico e Tecnolgico -Brasil (CNPq).