key: cord-0691869-8odxnup7 authors: Naimoli, Antonio title: Modelling the persistence of Covid-19 positivity rate in Italy date: 2022-01-07 journal: Socioecon Plann Sci DOI: 10.1016/j.seps.2022.101225 sha: b2a9b5659f6d96d3421d16f22b2a83b72c5a3009 doc_id: 691869 cord_uid: 8odxnup7 The current Covid-19 pandemic is severely affecting public health and global economies. In this context, accurately predicting its evolution is essential for planning and providing resources effectively. This paper aims at capturing the dynamics of the positivity rate (PPR) of the novel coronavirus using the Heterogeneous Autoregressive (HAR) model. The use of this model is motivated by two main empirical features arising from the analysis of PPR time series: the changing long-run level and the persistent autocorrelation structure. Compared to the most frequently used Autoregressive Integrated Moving Average (ARIMA) models, the HAR is able to reproduce the strong persistence of the data by using components aggregated at different interval sizes, remaining parsimonious and easy to estimate. The relative merits of the proposed approach are assessed by performing a forecasting study on the Italian dataset. As a robustness check, the analysis of the positivity rate is also conducted by considering the case of the United States. The ability of the HAR-type models to predict the PPR at different horizons is evaluated through several loss functions, comparing the results with those generated by ARIMA models. The model confidence set is used to test the significance of differences in the predictive performances of the models under analysis. Our findings suggest that HAR-type models significantly outperform ARIMA specifications in terms of forecasting accuracy. We also find that the PPR could represent an important metric for monitoring the evolution of hospitalizations, as the peak of patients in intensive care units occurs within 12–16 days after the peak in the positivity rate. This can help governments in planning socio-economic and health policies in advance. Following the China's report (31 December 2019) of a cluster of cases of pneumonia of unknown aetiology (later identified as a new coronavirus Sars-CoV-2) in the city of Wuhan, the World Health Organisation (WHO) declared on 30 January 2020 the novel coronavirus outbreak a Public Health Emergency of International Concern. On 11 March 2020, the Covid-19 outbreak was identified by the WHO as a global pandemic. As of 01 December 2020, Covid-19 has infected more than 64.06 million people with 1.54 million global deaths since its emergence. One and economic activities (Bloise and Tancioni, 2021; Ehlert, 2021) . Along these lines, Panarello and Tassinari (2021) , making use of a containment index, sanction data, and Google's movement trends across Italian provinces, provided evidence of a deterrent effect on mobility given by the increase in sanction rate and positivity rate among the population. Overall, the spread of the novel coronavirus has severely affected the integrity of the global economic, social, financial, and behavioral system. Analyzing the relationships between epidemiological and economic models, Verikios (2020) pointed out that because of the greater uncertainty surrounding its nature and the stringent preventive measures taken by governments, Covid-19 is likely to be of longer duration and consequently more severe in its economic effects than previous pandemics. The current outbreak has turned out to be the first extraordinary long-term disruption of the global supply chain (Ivanov, 2020; Ivanov and Dolgui, 2020; Koonin, 2020) , and differently from the past, all supply chain players have been severely affected by this pandemic Paul and Chowdhury, 2020a; Gunessee and Subramanian, 2020) . For example, for some supply chains, demand for necessary items such as personal protective equipment and beauty and personal care products has increased (Ivanov, 2020; Paul and Chowdhury, 2020a) . Conversely, for other industries, such as transportation and manufacturing, demand and supply have decreased dramatically, causing a halt in production (Gray, 2020; Majumdar et al., 2020) . In response to the current vulnerability of the entire supply system, several resilience strategies have been suggested to mitigate the impacts of Covid-19 and to recover from the current pandemic (see, e.g. Paul and Chowdhury, 2020a; Taqi et al., 2020; De Silva et al., 2021; Lozano-Diez et al., 2020; Paul and Chowdhury, 2020b; Deaton and Deaton, 2020; Ivanov and Das, 2020; Remko, 2020, among others) . The rapid spread of Covid-19 has also significantly affected sustainability, raising several environmental, economic, and social issues (Dente and Hashimoto, 2020; van Barneveld et al., 2020; Sarkis, 2020; Sharma et al., 2020; Queiroz et al., 2020; Rendana et al., 2021; Ibn-Mohammed et al., 2021) . However, in this negative scenario, it is worth highlighting some positive aspects mainly related to environmental sustainability: improved air quality, low carbon dioxide and greenhouse gas emissions, decreased energy use and environmental pollution (Dente and Hashimoto, 2020; Ibn-Mohammed et al., 2021; van Barneveld et al., 2020; Sarkis et al., 2020) . In this context, it is necessary to formulate functional planning for the health infrastructure and services in order to curb the spreading of the Covid-19 pandemic. An accurate forecast of the epidemiological trends is essential for health system management and government reform planning. Therefore, to support non-pharmaceutical intervention policies during the Covid-19 outbreak, several models have been proposed for fitting and forecasting the epidemic evolution (see, e.g., Albani et al., 2021; Giordano et al., 2020; Sun et al., 2020) . In the past, the Autoregressive Integrated Moving Average (ARIMA) model has been widely employed for forecasting time series of epidemic diseases. For example, ARIMA models have been successfully applied to estimate the incidence of Severe Acute Respiratory Syndrome (SARS) (Earnest et al., 2005) , malaria (Gaudart et al., 2009) , tuberculosis (Zheng et al., 2015) , influenza viruses (He and Tao, 2018) and brucellosis (Cao et al., 2020) . This class of models is also being used to estimate and predict the evolution of the ongoing pandemic. Benvenuto et al. (2020) performed ARIMA model on world data to predict the epidemiological trend of the prevalence and incidence of Covid-2019. In Singh et al. (2020) it was used to predict confirmed cases, deaths, and recoveries for the top 15 countries. Sahai et al. (2020) employed ARIMA models on the daily time series of total infected cases for US, Brazil, India, Russia and Spain for forecasting the spread of Covid-19. Monllor et al. (2020) applied it to analyze the series of infected persons in China, Italy and Spain, finding a common pattern of disease spread. Ceylan (2020) used ARIMA specifications to predict the epidemiological trend of total confirmed cases of Covid-19 in Italy, Spain, and France. Most of these papers that rely on ARIMA specifications, model the dynamics of the total number of infected cases, deaths or recoveries. To support policymakers in defining guidelines for the management of health systems, as well as to facilitate the development of plans for economic recovery, this paper investigates the dynamics of the Covid-19 positivity rate, defined as the number of new positive cases divided by the number of total tests. Our empirical analysis reveals that the positivity rate is characterized by a slowly moving long-run level and a highly persistent autocorrelation structure. ARMA models are not well suited to model the long-term behavior of time series. Long memory processes are characterized by a high-order correlation structure indicating persistent dependence between distant observations, implying that the effect of shocks takes a very long time to disappear. The conventional ARMA process is often referred to as a short memory process as it is unable to capture the dynamics of a long memory series. On the other hand, the Autoregressive Fractionally Integrated Moving Average (ARFIMA) process, allowing the order of integration of a series to take on fractional values, provides a useful tool for modelling and forecasting time series with long memory properties (Baillie, 1996) . However, being a fractional integration model, the ARFIMA is not trivial to estimate and lacks a clear economic interpretation. This leads us to introduce a new approach to directly model and forecast the Covid-19 positivity rate. Namely, the time series behavior of the PPR can be adequately captured by the Heterogeneous Autoregressive (HAR) model by Corsi (2009) . In this context, the HAR model represents an attractive alternative because of its computational simplicity, ease of interpretation, and remarkable forecasting performance. Although formally not belonging to the class of long memory models, the HAR model is able to closely mimic the observed long memory behavior by using variables aggregated at different interval sizes. Therefore, differently from the standard ARIMA specifications, HAR-type models are based on an additive cascade of components, from high-frequencies to low-frequencies, allowing to capture both the high degree of persistence (through the long-term component) and short-term dynamics (through the short-and medium-term components) that characterize the PPR behavior. Along with conventional HAR-type models, we also consider the possibility of selecting relevant lagged components through flexible HAR specifications based on the use of the least absolute shrinkage and selection operator (lasso) (Tibshirani, 1996) and the adaptive lasso (Zou, 2006) . The aim of this paper is to assess the usefulness of the HAR as a new modelling approach to predict the spread of the Covid-19 by capturing the short-, medium-and long-term dynamics of the PPR. Accurate short-and long-term predictions of the PPR can be essential both for developing non-pharmaceutical strategic planning by policymakers to address the current health emergency and for shaping new policies to overcome the severe negative impacts experienced by businesses and supply chains because of the pandemic. The positivity rate measures both the severity of the outbreak and the limitations of testing. That is, the PPR is a useful measure of whether sufficient testing has been done and what the current level of SARS-CoV-2 transmission is in the community. Therefore, this approach could provide a useful tool for both monitoring the spread of the virus and guiding policymakers to undertake actions to curb the spread of the disease. HAR-type models prove to be particularly useful as, on the one hand, they are able to adequately predict the short-, medium-and long-term trend of the positivity rate, and, on the other hand, the lasso-based HAR specifications are completely data-driven, thus reducing uncertainty in the choice of predictor lags. Therefore, the proposed approach provides a reliable tool that simplifies the decision-making process by moving towards a single data-driven direction. The profitability of the HAR approach in forecasting the Covid-19 positivity rate is evaluated through an application on the Italian dataset, as it was the first European country to be seriously affected by the pandemic. On 30 March 2020, more than 101 thousand people were positive to Covid-19 3 . The empirical application shows that HAR-type models outperform the most popular ARIMA models revealing that the improvements are especially significant for longer forecast horizons as detected by the model confidence set (MCS) (Hansen et al., 2011) . These findings are also confirmed by analyzing the positivity rate of the United States, which was considered as a reference country to pursue a robustness check of the proposed approach. Finally, the positivity rate exhibits predictive ability with respect to hospitalizations. The remainder of the paper is organized as follows. Section 2 presents the Heterogeneous Autoregressive model and its lasso-based extensions. Section 3 describes the data and the main non-pharmaceutical measures adopted by the Italian government. Section 4 illustrates the results of the empirical study and robustness checks. Section 5 presents a broader discussion of the positivity rate along with some limitations and caveats. Finally, Section 6 summarizes the findings with concluding remarks. Inspired by the Heterogeneous Market Hypothesis (Müller et al., 1993) and the asymmetric propagation of volatility between long-term and short-term horizons, Corsi (2009) proposed the HAR model to parsimoniously capture the strong persistence typically observed in Realized Volatility (RV) (Andersen and Bollerslev, 1998) by the sum of lagged RV components aggregated over different interval sizes. The HAR model is commonly used in modelling the dynamics of financial volatility as it is able to reproduce the main stylized facts of financial data such as the long memory and asymmetric propagation of volatility over time. In most empirical applications, the HAR model is specified as an additive cascade of three volatility components aggregated over different time intervals, that is daily, weekly and monthly, which implies a fixed (1,5,22) lag structure. However, the structure (1,5,22) may not fully reflect the characteristics of the data. Thus, determining the optimal lag structure of the HAR could significantly improve the predictive ability of the model. In this direction, Audrino and Knaus (2016) showed that the HAR-implied lag structure can be recovered asymptotically by the lasso only if the HAR is the underlying data generating process (DGP). On the other hand, differently from Audrino and Knaus (2016) who employed the lasso on the Autoregressive (AR) framework, Audrino et al. (2019) referred to the adaptive lasso to investigate whether the lag structure implied by the HAR can be identified. However, their results highlight the difficulty of outperforming the forecast performance of the standard HAR model based on the daily, weekly and monthly components. The HAR model can be easily estimated by ordinary least squares (OLS), showing remarkable good volatility forecasting performance. Therefore, it is widely used to model the dynamics of RV, but it has never been used to predict the evolution of pandemics. In light of its inherent characteristics, to capture the slowly decaying autocorrelation structure (also known as long memory) of the Covid-19 positivity rate series, we propose to apply the HAR modelling approach. Let P P R t be the positivity rate at time t. The HAR model for the h-step-ahead daily P P R t can be specified as where P P R (k) t = k −1 k j=1 P P R t−j is the k-period average of daily P P R and ε t is a zero mean innovation. This specification, substantially, states that tomorrow's PPR is a weighted sum of daily, weekly and monthly averages of PPRs that can be characterized by different dynamics of virus infection and transmission over time. For example, to take into account the different periods of time between infection and development of clinical symptoms as well as the transmission periods, the model in Equation 1 can be further extended as Thus, the HAR model is parsimonious, it allows to approximate long memory in a very simple way and it can be consistently estimated by OLS. In this context, it becomes crucial to define the lag structure and the maximum order of the model. It is worth noting that the HAR can be represented as a constrained AR(p) model (Corsi, 2009) . Considering the HAR process introduced in Equation 1, we can write it as a restricted AR(28) process, namely β 1 + 1 7 β 2 + 1 28 β 3 for i = 1 1 7 β 2 + 1 28 β 3 for i = 2, . . . , 7 1 28 β 3 for i = 8, . . . , 28 In contrast to the fixed daily-weekly-monthly time scale, extensions of the HAR model have been proposed to allow for potentially different predictive information arising from a different lag structure. In this direction, to investigate whether a more general lag structure provides more accurate predictions than the fixed (1,5,22) lag index, Audrino and Knaus (2016) and Audrino et al. (2019) compared the standard HAR model to a lasso-based method. Let the daily P P R t be denoted by x t , with (x t , . . . , x t−p+1 ) the predictor variables. Then, the lasso 4 (Tibshirani, 1996) estimator of the AR(p) model The lasso (least absolute shrinkage and selection operator) method aims to select a subset of important covariates by shrinking the coefficients of redundant ones towards zero. 6 J o u r n a l P r e -p r o o f is obtained as where λ ≥ 0 is the tuning parameter which controls the strictness of the penalty term, with λ = 0 leading to the OLS estimator. The solution for the constant c isĉ =x, that is zero for demeaned data. It has been shown that the lasso suffers from some drawbacks due to the lack of oracle properties. On the other hand, the adaptive lasso (Zou, 2006) estimator fulfils the oracle property in the sense introduced by Fan and Li (2001) , as it allows asymptotically consistent and efficient variables selection and provides asymptotically unbiased and normally distributed estimates of the non-zero coefficients. The adaptive lasso estimator is given by where the weights λ i can be computed as the inverse of the absolute value of the corresponding preliminary ridge regression or OLS estimator. The ordinary lasso is obtained as a special case for λ i = 1, ∀i = 1, . . . , p. The K-fold cross-validation is used to determine the optimal tuning parameter λ. Specifically, the data are randomly divided into K groups (G 1 , ..., G K ) and for each group the mean squared error is estimated on the validation set by For each tuning parameter value, the average error over all folds is computed as and thus the optimal λ is chosen by minimizing the CV (λ) function, that iŝ The Flexible HAR (FHAR) of Audrino et al. (2019) can be estimated by applying the adaptive lasso procedure to select the active terms to be included in the model considering the following equation Motivated by these theoretical developments, this paper aims to apply the HAR and its lasso extensions to Covid-19 data to predict its spread and provide a good alternative to other models that have been proposed to study the dynamics of the ongoing pandemic. The data used in this paper refer to the daily number of confirmed Covid-19 cases and daily total tests in Italy, between 24 February and 20 December 2020 for a total of 301 days. The data have been downloaded from the official Civil Protection Department website 5 -Presidency of the Council of Ministers. Table 1 provides summary statistics for the new positive cases, number of tests (swabs) performed and PPR at a daily level in Italy for the full sample period. The occurrence of new cases of a disease developing in a population over a period of time, also known as "incidence" in epidemiology, can be used to map the frequency with which Covid-19 develops in a community 6 . These peaked at 40,902 during the second wave (October -November 2020) along with the number of tests performed in a single day, 254,908. On the other hand, the positivity rate touched 46.21% during the first wave (February -March 2020), reaching its minimum of 0.23% in June during the Phase 2, characterized by an easing of previously adopted restrictive measures. A possible explanation for the high positivity rate in the very early phase of the epidemic is that on 25 February 2020, the Italian Ministry of Health issued more stringent testing policies. That is, testing was prioritized for patients with more severe clinical symptoms who were suspected of having Covid-19 and required hospitalization. Consequently, testing was limited for people who were asymptomatic or had mild symptoms. This strategy inevitably led to a high percentage of positive tests (Onder et al., 2020) . Figure 1 displays the time series of the daily positivity rate given by (new positive cases/total tests)×100 for Italy. It also shows that the PPR exhibits persistence, i.e. large changes in the positivity rate are often followed by other large changes and small changes are often followed by small changes. The presence of long memory can be identified by a data-driven empirical approach in terms of the persistence of observed autocorrelations. This feature is highlighted in Figure 2 which displays the daily and weekly autocorrelation functions for the PPR up to lag 40. The correlograms show that the autocorrelations exhibit a clear pattern of slow decay and persistence. In particular, the sample autocorrelations reveal the presence of a hyperbolic decay rate, which is much slower than the usual geometric rate associated with stationary ARMA processes. Also, it was not possible to reject the null hypothesis of a unit root at the 5% level using the Augmented Dickey-Fuller test. However, the classical trend stationary I(0) and unit roots I(1) representations may be too restrictive with respect to the low-frequency dynamics of the series. This compelling evidence of long memory, i.e. the historical PPRs have a persistent impact on the future PPR, suggests that the Covid-19 positivity rate can be adequately modelled through HAR specifications. To better understand the dynamics of the contagion and the importance of PPR behavior in guiding decisions about reopening schools and businesses, we briefly report the main measures taken by the Italian government to contain the epidemic 7 . For a more extensive analysis of policy interventions implemented by the Italian government and their impact on health and non-health outcomes, see Berardi et al. (2020) . The Italian government confirmed the first cases of the disease in the country on 30 January 2020, when the novel coronavirus was detected in two Chinese tourists while visiting Italy. On request of the Italian Health Authorities, all flights to/from People's Republic of China (PRC) including Hong Kong, Macao and Tapei were suspended. Once the first internal outbreak was discovered, one of the first measures adopted was the quarantine of 11 municipalities in Northern Italy located in Lombardy and Veneto. On 23 February, the Council of Ministers decreed the total closure of the municipalities with active outbreaks. This is also confirmed by Figure 1 , that shows how the positivity rate continues to grow after 23 February, reaching its maximum of 46.21% on 09 March. During this time, there was a succession of different measures aimed at containing the epidemic, and on 09 March, the so-called Phase 1, began with the country being locked down until 03 May 2020. Italy was the first country to implement a national quarantine due to the 2020 coronavirus outbreak. As a result, the positive percentage rate began to slowly decline towards zero and on 26 April, the then-Prime Minister Giuseppe Conte announced the so-called Phase 2, that would start from 04 May. Phase 2 was characterized by a gradual relaxation of previous containment measures. Italy therefore tried to restart by reopening bars, restaurants and shops. All while observing the new safety rules, ranging from social distancing to the use of face masks. As it can clearly be seen from Figure 1 , the infection curve tends to flatten out and so from 15 June (end of Phase 2) to 07 October, Phase 3 of coexistence with Covid-19 began. Following the rise of the epidemic curve in the autumn, renewed restrictions were progressively introduced, mainly concerning commercial and private activities rather than restricting movement. This led to the second wave of the pandemic, with the positivity rate rising and new restrictive measures being introduced between 08 October and 05 November. Starting from 03 November 2020, the Regions and Autonomous Provinces of Trento and Bolzano have been classified into three areas, namely red, orange and yellow, according to the degree of risk, for which specific restrictive measures were envisaged. This classification is based on the ordinances issued by the Ministry of Health. To mitigate the effects of Covid-19 and make appropriate health, economic and social system decisions, it is crucial to understand the pandemic evolution. The effective reproductive number (R t ) is a parameter that has been widely used to measure the transmissibility of the ongoing epidemic infection. Therefore, as a further tool to understand the actions enacted by the Italian government to counter the spread of the coronavirus, the R t is also estimated. The R t is a fundamental epidemiological parameter that characterizes the temporal dynamics of infectious disease as measuring the average number of secondary cases caused by an infected individual in a population composed of both susceptible and non-susceptible individuals (Wallinga and Teunis, 2004; Cori et al., 2013; Delamater et al., 2019) . The Covid-19 R t has been estimated by the Cori et al. (2013) approach, using the EpiEstim package (Cori, 2020) and R Core Team (2020) software. To explicitly take into account the uncertainty in the serial interval (SI) distribution, the mean and standard deviation of the SI (time interval between the onset of symptoms in the primary and secondary cases) are allowed to vary according to truncated normal distributions, employing parameters estimated from existing studies (see, Liu et al. (2020) ; Nishiura et al. (2020) and Du et al. (2020) , among others). Therefore, the R t was estimated on sliding weekly windows, with values drawn from a Gamma distribution, with the mean and variance sampled from 1,000 truncated normal distributions for which we used an average mean serial interval of 4.8 days (sd = 2.3, min = 3.8, max = 5.8), and an average standard deviation of 2.3 days (sd = 2.3, min = 1.3, max = 3.3). The resulting weekly R t series is reported in Figure 3 . It clearly shows the peak of transmissibility during the first wave (February-March 2020), while between April and around mid-June the R t index remains below one, indicating that the spread of infection is decreasing. However, with the start of Phase 3 (15 June), a slight but steady increase in the national transmission index was noted, reaching a summer peak in the period 15 August -31 August where R t reached 1.59 (95% CI: 1.41 -1.75) with an incidence calculated as daily cases. It is worth noting that the estimates have a large stochastic variability, especially with regard to the summer period, which is overall characterized by a small number of cases. Because of the increase in the number of infections, on 16 August 2020, the Minister of Health Roberto Speranza signed an ordinance ordering the closure of discos and dance halls and making it compulsory, from 6 p.m. to 6 a.m., to wear masks even in public spaces. In September, with the start of the new school year, classroom activities resumed. To reduce the risk of infection, school staff were allowed to undergo serological testing. Finally, during the second wave (October-November 2020) an R t index above 1.5 is recorded for most of October, falling below 1 from mid-November until the end of December. As of 20 December 2020, a total of 622,760 people tested positive for Covid-19, with 68,799 deaths and 1,261,626 patients discharged/healed, nationwide. In this section, we conduct several empirical studies to compare the in-sample and out-of-sample performance of the HAR with the commonly used ARMA models. The data employed for the empirical analysis consist of the daily Covid-19 positivity rate recorded in Italy between 24 February and 20 December 2020, with a full-sample period of 301 days. Regarding the Flexible-HAR (FHAR) based on the adaptive lasso, following Audrino et al. (2019) , the maximum lag order is setted to p = 50, while the tuning parameter λ is chosen by five-fold cross-validation 8 . The weights λ i in the adaptive lasso are calibrated as the inverse of the absolute value of the corresponding preliminary ridge regression estimator (Audrino et al., 2019) . We also estimate a Flexible-HAR based on the lasso method to select appropriate HAR lag length, resulting in the Lasso-HAR (LHAR) model. All the lasso estimates are obtained using the glmnet package (Friedman et al., 2010) . For the purpose of comparing the in-sample and out-of-sample performance of the analyzed models, we consider the following loss functions (Patton, 2011) : where x t is the P P R t andx t is the prediction obtained by the HAR-type or ARIMA-type models. In addition, we further assess the significance of differences in forecasting performance of all competing models by means of the MCS (Hansen et al., 2011) . The MCS relies on a sequence of statistical tests to identify, at a certain confidence level (1 − α), the set of superior models with respect to some appropriately-chosen measures of predictive ability. The MCS p-values are obtained by 5,000 bootstrap resamples generated by a block-bootstrap procedure, estimating the optimal block length through the method described in Patton et al. (2009) . Table 2 shows the results of the model comparison in terms of in-sample accuracy. In particular, we compare 15 ARIMA specifications with the HAR based on different lag structures, together with LHAR and FHAR. To determine whether differencing in the PPR series is required, we use the Augmented Dickey-Fuller (ADF) test, suggesting that one difference is needed to make the data stationary. This leads to using ARIMA(p,1,q) specifications 9 . The table reports the average values of the different loss functions for the models under analysis. The lowest value of the loss in each column is displayed in bold. The sample runs from 24 February 2020 to 20 December 2020. Note that for LHAR and FHAR the lags are not imposed, but the selected lag structure allowed by the lasso and adaptive lasso methods is reported, respectively. The empirical results in Table 2 highlight that the selected lag structure by the adaptive lasso for the FHAR is (1,7,27), which is in line with the canonical daily-weekly-monthly lag structure of the standard HAR model, while the lasso suggests using an additional biweekly/triweekly 9 The values of p and q chosen by minimizing the Corrected Akaike's Information Criterion (AICc) are p = 3 and q = 3. However, in our analysis, we consider multiple ARMA specifications because, for example, a model that fits in-sample data well will not necessarily provide good out-of-sample forecasts. J o u r n a l P r e -p r o o f Comparison of actual (black) and in-sample prediction of the Lasso-HAR (red) and of ARIMA(2,1,2) (green) for the daily positivity rate of Italy. lag for the LHAR, i.e. (1, 7, 19, 25) . It is worth noting that in seven out of nine cases the loss functions considered are minimized by HAR-type models. In particular, the LHAR minimizes M AE, M AE sd , M AE prop , M SE, M SE sd whereas the HAR(1,7,14,21,28) minimizes M SE prop and QLIKE. Finally, the ARIMA(2,1,2) is the specification that minimizes M AE log and M SE log (even though for M AE log the LHAR returns a very similar result). In Figure 4 , we plot the actual (black line) and the estimated daily PPR given by the LHAR model (red line) and ARIMA(2,1,2) model (green line). It can easily be seen that while the LHAR better follows the dynamics of the percentage positive rate in both low and high infection periods, the ARIMA(2,1,2), being smoother, is not able to fully capture all variations and peaks in the actual PPR, especially in periods characterized by a high viral spread rate. These considerations remain essentially valid also for the other HAR and ARIMA specifications, which have not been shown for ease of interpretation. To investigate the predictive ability of the models, we conduct an h-step-ahead rolling window study at the forecasting horizons h = 1, h = 3 and h = 7. The forecasts are obtained by reestimating the model parameters at each step with a rolling window of 200 observations (2/3 of the sample). To compare the forecasting performances, we consider the set of loss function specified in Section 4, while to assess the significance of differences of the competing models we refer to the MCS relying on the semi-quadratic statistic and the confidence levels of 75% and 90% 10 . The table reports the average values of the different loss functions for the models under analysis. Bold numbers indicate the best performing model by each criterion at the forecast horizon h = 1. The numbers shaded in gray and light-gray denote that the corresponding models are included in the 75% and 90% MCS, respectively. We use a rolling window of 200 observations to estimate the coefficients of the models at each step. We first consider the case of one-day-ahead PPR forecasts (h = 1), showing the results in Table 3 . Overall, HAR-type models provide significantly better performance than ARIMA models. The lowest loss values are always returned by the LHAR with the single exception of the M SE minimized by the HAR (1,7,14,21,28) . The superiority of the HAR models is also confirmed by the MCS as the only ARIMAs falling in the 75% MCS are the ARIMA(2,1,3) for M AE and M AE sd and the ARIMA(3,1,2) for M AE. Some other ARIMA specifications of order (p, q) ≥ 2 are included in the less restrictive 90% MCS in a few isolated cases. On the other hand, the LHAR and FHAR are the only models always entering the set of superior models for all the forecast criteria considered. For the remaining forecast horizons, ARIMA models perform poorly in general. Table 4 and Table 5 report the forecast performance of all considered models for h = 3 and h = 7 periods ahead, respectively. It can easily be seen that the superiority of the HAR models remains unchanged over longer forecast horizons. Among the seven HAR-type models, the LHAR shows a dominant position for predicting the PPR at h = 3 since it minimizes the loss functions in eight out of nine cases and it is the only model that always enters 75% MCS (Table 4 ). At the same time, the FHAR provides better forecast accuracy at a weekly (h = 7) horizon, achieving the lowest losses for all criteria used and being the only model permanently included in 75% MCS (Table 5) . Summarizing, the out-of-sample results clearly show that HAR-type specifications outperform ARIMA models in predicting the positivity rate at the considered forecast horizons h = 1, h = 3 and h = 7. Also, it should be noted that, overall, both Lasso-HAR and Flexible-HAR, taking into account uncertainty in model specification, outperform the standard HAR based on a fixed lag index at each forecast horizon and by each criterion. Therefore, the above results suggest that allowing a more general specification of HAR is successful probably because the lasso-based models include only active predictors, letting the lag structure approximate the long memory observed in the data. J o u r n a l P r e -p r o o f The table reports the average values of the different loss functions for the models under analysis. Bold numbers indicate the best performing model by each criterion at the forecast horizon h = 7. The numbers shaded in gray and light-gray denote that the corresponding models are included in the 75% and 90% MCS, respectively. We use a rolling window of 200 observations to estimate the coefficients of the models at each step. We investigate the robustness of the proposed approach by considering the evolution of PPR for the United States (US). The 2019 coronavirus pandemic has led to massive social upheaval around the world and in the US. As mentioned above, the first outbreak of the virus occurred in the city of Wuhan in China's Hubei province in December 2019, but the virus then spread to Asia, Europe and North America between January and March 2020. By the end of March, there were more than 700,000 confirmed cases of Covid-19 worldwide and more than 34,000 people had died from causes related to the virus, with the US reaching more confirmed cases than any other country, surpassing China and Italy with more than 86,000 positive tests (Johns Hopkins University, 2020). The data to conduct our analysis on the PPR for the US were downloaded from the Our World in Data website 11 by Roser et al. (2020) . Since data on testing are not available for the early phase of the pandemic the sample period goes from 01 March to 20 December 2020. As with the Italian PPR, forecasts are obtained by recursively estimating model parameters every day over a 200-day rolling window. Accordingly, the out-of-sample period runs from 16 September to 20 December 2020. However, in order to improve the overall presentation of the paper, the tables with the forecasting results have been reported in the Appendix, while we will discuss only the main findings here. Considering the forecasting horizons h = 1, h = 3 and h = 7, the results for the US confirm what was found for Italy. In particular, it turns out that LHAR and FHAR are the only models that consistently enter the MCS regardless of the chosen forecast horizon, always minimizing each of the nine loss functions considered. For h = 1 and h = 3 no other models are included in the MCS, while for h = 7 some HAR specifications with fixed lag index enter the set of superior models but only for MAE-type loss functions. Overall, for short forecast horizons, ARMA-type and HAR-type models with fixed-based lag structure tend to have similar performance, but for longer forecast horizons HAR specifications prevail. On the other hand, the lasso-based HAR specifications outperform competing models in each forecasting scenario, capturing both short-and long-run PPR dynamics. As discussed above, Covid-19 is an infection characterized by a high percentage of asymptomatic cases. Several studies have shown that more than 40% of cases may not reveal symptoms. This means that no country knows the true total number of people affected by Covid-19, but all we know is the infection status of who has been tested. As a result, testing is a crucial element in understanding the spread of the ongoing pandemic . Better than simply counting the total number of tests and in conjunction with data on confirmed cases, the positivity rate represents a key metric for understanding the pandemic, as it measures both the severity of the epidemic and the limitations of testing. According to WHO, before a country can loosen restrictions or begin reopening, the positivity rate for a comprehensive testing program should be 5% or less for at least 14 days. High rates of positivity occur when, for example, the only people being tested are patients with more severe clinical symptoms who are suspected of having Covid-19 and have required hospitalization. Consequently, a high PPR means that countries should probably aim for a larger and more comprehensive testing program, suggesting that it is not a reasonable time to relax restrictions designed to reduce coronavirus transmission. At the same time, because a high positive rate suggests high rates of infection in the community caused by rapid transmission of the virus, this indicates that it may be useful to impose restrictions to slow the spread of the disease. Achieving a low test positivity rate may be the result of a large enough testing volume such that asymptomatic and mild cases as well as exposed contacts are monitored, even if they are asymptomatic. On the other hand, low positivity rates can be the result of enacting different types of public health interventions such as encouraging smart working, closing schools, banning mass gatherings, restrictions on eating in public places along with permissive to total lockdown home stay measures. Such a situation requires the parallel development of at least two dimensions: on the one hand, policymakers should plan ahead for needs in terms of medical facilities and equipment, while, on the other hand, analytical tools and models allowing the generation of reliable forecasts and future scenarios should be developed. For example, analyzing the data in Figure 5 , which shows the time series of the nomalized 7-day moving average of PPR and patients in Intensive Care Unit (ICU) for Italy, it turns out that the peak of patients in ICU occurs between 12 and 16 days after the peak of PPR. Similar scenarios also arise at the regional level. (Fenga and Gaspari, 2021) . In this perspective, the approach proposed in this paper could help decision makers to plan public health policies in advance because the PPR could have a predictive capacity with respect to hospitalizations, changing the level of intensity of these interventions over the course of the epidemic. In addition, the ability of HAR to reproduce the short-and medium/long-term trend of the positivity rate could avoid both the immediate economic costs of lockdown and the societal costs of social distancing measures. For example, a long-term upward prediction of the PPR, could lead to a testing strategy on a larger population scale to monitor and reduce viral transmission. This will inevitably also have repercussions on the social networks of the various actors in the supply chain. Understanding the dynamics of the current epidemic is essential for the development of nonpharmaceutical interventions and thus for reducing the health, economic and social impacts caused by the pandemic. This paper focuses on the positivity rate as it represents a crucial metric for understanding the Covid-19 outbreak. The positivity rate offers a measure of how adequately countries are testing and provides insight on the current level of coronavirus transmission in the community. Since ARMA models have been found to be poorly suited to model the long-term behavior of time series, in this paper, we propose to use the HAR approach to model the slowlymoving long-run level and the highly persistent autocorrelation structure that characterize the Covid-19 positivity rate. The empirical study of the Italian positivity rate, along with robustness checks on the US data, shows that HAR models generally outperform ARIMA specifications under various criteria and forecast horizons. The forecasting superiority of the HAR emerges from the MCS where the standard HAR and its lasso-based alternatives significantly outperform ARIMA models under the forecast horizons h = 1, h = 3 and h = 7. In particular, the gains widen as the forecast horizon increases. Also, the out-of-sample results point out that the more general HAR lassobased lag structure is preferable compared with the HAR fixed-based lag structure. These results are confirmed by the in-sample analysis, as allowing for model specification uncertainty under the HAR framework leads to improvements in model fitting, minimizing the loss functions considered. Thus, this approach is particularly useful as it allows for accurately forecasting short-, medium-, and long-term trends of the positivity rate. In this regard, monitoring the trend in positivity rate and ICU admissions suggests that the PPR might have predictive ability with respect to hospitalizations because peaks in positivity rate precede peaks in hospitalizations, which occur on average within an interval of 12 to 16 days later. Generating accurate forecasts at different horizons of the PPR is relevant for reducing uncertainty around interventions, leading to maximize resource and investment allocations. Therefore, understanding the trend in Covid-19 positivity rates would allow governments to modify their social and health policies in advance. Also, since the model components are chosen in a completely data-driven fashion, this significantly reduces the uncertainty and arbitrariness associated with model specification. In this respect, lasso-based HAR-type models simplify the decision-making process by leading it towards a common direction driven by the dynamics of the data. However, it is worth noting that policy decisions should not only be determined by the positivity rate, but also by other health, economic, demographic, environmental, and climate variables. This is because the positivity rate provides some useful information about testing capacity and the spread of the virus in the community, but it can also depend on factors such as how it is calculated, the testing accessibility and the timeliness of the laboratory in providing results. Of course, no single metric gives us a complete picture of the prevalence of Covid-19 in the community. Therefore, each week, we need to monitor PPR trends along with other metrics such as recovered and active cases, percent change in new cases, hospitalizations and deaths. In the United States, there are no federal standards for reporting Covid-19 test data. It makes impossible to offer a single view of testing data at the national level and consequently test data are reported differently. In addition, there are several possible ways to calculate test positivity. For example, on the Johns Hopkins University & Medicine webpage Differences in Positivity Rates four possible ways to calculate positivity rates are outlined. Regarding Italy, the Italian Civil Protection provides comprehensive data at the regional level on some variables of interest such as swabs analyzed, cumulative confirmed cases, home isolation cases, hospitalized cases, ICU cases, and deaths. However, regionalization of the health care system and data fragmentation pose challenges in the management of the Covid-19 outbreak in Italy. This has led to the enactment of several regional policies, especially in terms of testing strategies. Consequently, in order to trace the real extension of Covid-19 infection, official data must be interpreted with caution considering several aspects. For example, because there are some inconsistencies and delays in the transmission of this data, on some days negative values of positive cases, tests, and deaths are reported. In addition, short-term fluctuations could affect the reliability of daily data. These fluctuations may be the result of laboratory delays (laboratory saturation) or calendar effects (typically, the number of tests tends to decrease on weekends). Thus, these inconsistencies should be considered to account for data variability. In this direction, in addition to official national bulletins, it might be useful to cross-reference information from different data sources. Currently, a complete picture of the drivers of Covid-19 spread that clarifies the causes of the variability in infections across provinces and regions within countries is still lacking. Although human-to-human transmission is recognized as the primary vehicle for virus transmission, several studies have argued that virus circulation may also be associated with geographical, environmental, and socio-economic factors. Accordingly, using information on these factors could allow the definition and refinement of epidemiological modelling and thus the design of appropriate policy responses to manage this threat to population health and, more generally, to socio-economic systems. In this perspective, the proposed approach could be improved by including health, environmental, and socio-economic variables in the HAR models. Along these lines, as a direction for future research, it could be useful to investigate, and potentially include in HAR models, the relationships between positivity rate and epidemiological variables (effective reproductive number R t ); demographic parameters (social interactions, age, and sex); environmental and climatic factors (humidity, wind speed, and temperature); pollution indicators (air quality); and socio-economic activities (economic and social interactions within and between countries). At the same time, it might be appropriate to consider data inconsistencies. Considering these variables when generating forecasts could provide decision makers with better guidance in establishing additional control measures, loosening restrictions, enhancing benefits, and preventing the failure of measures already taken. I would like to thank the Editor, the Associate Editor and two anonymous Referees for their highly constructive comments which make this paper more valuable. The author has no conflicts of interest to disclose. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The table reports the average values of the different loss functions for the models under analysis. Bold numbers indicate the best performing model by each criterion at the forecast horizon h = 1. The numbers shaded in gray and light-gray denote that the corresponding models are included in the 75% and 90% MCS, respectively. We use a rolling window of 200 observations to estimate the coefficients of the models at each step. J o u r n a l P r e -p r o o f The table reports the average values of the different loss functions for the models under analysis. Bold numbers indicate the best performing model by each criterion at the forecast horizon h = 3. The numbers shaded in gray and light-gray denote that the corresponding models are included in the 75% and 90% MCS, respectively. We use a rolling window of 200 observations to estimate the coefficients of the models at each step. J o u r n a l P r e -p r o o f The table reports the average values of the different loss functions for the models under analysis. Bold numbers indicate the best performing model by each criterion at the forecast horizon h = 7. The numbers shaded in gray and light-gray denote that the corresponding models are included in the 75% and 90% MCS, respectively. We use a rolling window of 200 observations to estimate the coefficients of the models at each step. Estimating, monitoring, and forecasting covid-19 epidemics: a spatiotemporal approach applied to nyc data Answering the skeptics: Yes, standard volatility models do provide accurate forecasts Flexible har model for realized volatility Lassoing the har model: A model selection perspective on realized volatility dynamics Long memory processes and fractional integration in econometrics The unprecedented stock market reaction to covid-19 Correlation between climate indicators and covid-19 pandemic in new york, usa Application of the arima model on the covid-2019 epidemic dataset The covid-19 pandemic in italy: Policy and technology impact on health and non-health outcomes The geography of covid-19 spread in italy and implications for the relaxation of confinement measures Predicting the spread of covid-19 in italy using machine learning: Do socio-economic factors matter? International trade as critical parameter of covid-19 spread that outclasses demographic, economic, environmental, and pollution factors Can commercial trade represent the main indicator of the covid-19 diffusion due to human-to-human interactions? a comparative analysis between italy, france Relationship of meteorological factors and human brucellosis in hebei province, china Estimation of covid-19 prevalence in italy, spain, and france. Science of The Total Environment A case study on strategies to deal with the impacts of covid-19 pandemic in the food and beverage industry Enhancing supply resilience in the covid-19 pandemic: a case study on beauty and personal care retailers The relation between length of lockdown, numbers of infected people and deaths of covid-19, and economic growth of countries: Lessons learned to cope with future pandemics similar to covid-19 and to constrain the deterioration of economic system The role of air pollution (pm and no2) in covid-19 spread and lethality: a systematic review The spread of 2019-ncov in china was primarily driven by population density. comment on "association between short-term exposure to air pollution and covid-19 infection: Evidence from china EpiEstim: Estimate Time Varying Reproduction Numbers from Epidemic Curves A new framework and software to estimate time-varying reproduction numbers during epidemics A Simple Approximate Long-Memory Model of Realized Volatility Examining risks and strategies for the spice processing supply chain in the context of an emerging economy Food security and canada's agricultural system challenged by covid-19 Complexity of the basic reproduction number (r0) Covid-19: A pandemic with positive and negative outcomes on resource and waste flows and stocks. Resources, Conservation and Recycling 161 Influence of population density, temperature, and absolute humidity on spread and decay durations of covid-19: A comparative study of scenarios in china, england, germany, and japan Serial interval of covid-19 among publicly reported confirmed cases Using autoregressive integrated moving average (arima) models to predict and monitor the number of beds occupied during a sars outbreak in a tertiary hospital in singapore The socio-economic determinants of covid-19: A spatial analysis of german county level data The role of environmental factors to transmission of sars-cov-2 (covid-19) Variable selection via nonconcave penalized likelihood and its oracle properties Test positivityevaluation of a new metric to assess epidemic dispersal mediated by non-symptomatic cases Role of the chronic air pollution levels in the covid-19 outbreak risk in italy Predictive capacity of covid-19 test positivity rate Regularization paths for generalized linear models via coordinate descent Modelling malaria incidence with environmental dependency in a locality of sudanese savannah area, mali Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Agriculture, transportation, and the covid-19 crisis Ambiguity and its coping mechanisms in supply chains lessons from the covid-19 pandemic and natural disasters The model confidence set A cross-country database of covid-19 testing Epidemiology and arima model of positive-rate of influenza viruses among children in wuhan, china: A nine-year retrospective study A critical analysis of the impacts of covid-19 on the global economy and ecosystems and opportunities for circular economy strategies Viable supply chain model: integrating agility, resilience and sustainability perspectives-lessons from and thinking beyond the covid-19 pandemic Coronavirus (covid-19/sars-cov-2) and supply chain resilience: A research note Viability of intertwined supply networks: extending the supply chain resilience angles towards survivability. a position paper motivated by covid-19 outbreak Transmission of covid-19 virus by droplets and aerosols: A critical review on the unresolved dichotomy Cornonavirus resource center Covid-19 and financial markets: A panel analysis for european countries Novel coronavirus disease (covid-19) outbreak: Now is the time to refresh pandemic plans The contribution of pre-symptomatic infection to the transmission dynamics of covid-2019 Designing a resilient supply chain: An approach to reduce drug shortages in epidemic outbreaks Covid-19 debunks the myth of socially sustainable supply chain: A case of the clothing industry in south asian countries Covid-19 infection process in italy and spain: Are data talking? evidence from arma and vector autoregression models Fractals and intrinsic time-a challenge to econometricians. Unpublished manuscript Why italy first? health, geographical and planning aspects of the covid-19 outbreak Serial interval of novel coronavirus (covid-19) infections. International journal of infectious diseases Case-fatality rate and characteristics of patients dying in relation to covid-19 in italy Prevalence of asymptomatic sars-cov-2 infection: a narrative review One year of covid-19 in italy: are containment policies enough to shape the pandemic pattern? Socio-Economic Planning Sciences Correction to "automatic block-length selection for the dependent bootstrap" by d. politis and h. white Volatility forecast comparison using imperfect volatility proxies A production recovery plan in manufacturing supply chains for a high-demand item during covid-19 Strategies for managing the impacts of disruptions during covid-19: an example of toilet paper A dictionary of epidemiology Impacts of epidemic outbreaks on supply chains: mapping a research agenda amid the covid-19 pandemic through a structured literature review R: A Language and Environment for Statistical Computing Research opportunities for a more resilient post-covid-19 supply chain-closing the gap between research findings and industry practice Air pollutant levels during the large-scale social restriction period and its association with case fatality rate of covid-19 Coronavirus pandemic (covid-19). Our world in data Arima modelling & forecasting of covid-19 in top five affected countries Supply chain sustainability: learning from the covid-19 pandemic A brave new world: Lessons from the covid-19 pandemic for transitioning to sustainable supply and production. Resources, Conservation, and Recycling 159 Developing a framework for enhancing survivability of sustainable supply chains during and post-covid-19 pandemic Prediction of the covid-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (arima) model Covid-19 and air pollution and meteorology-an intricate relationship: A review Forecasting the long-term trend of covid-19 epidemic using a dynamic model Strategies to manage the impacts of the covid-19 pandemic in the supply chain: implications for improving economic and social sustainability Regression shrinkage and selection via the lasso The covid-19 pandemic: Lessons on building more equal and sustainable societies The dynamic effects of infectious disease outbreaks: The case of pandemic influenza and human coronavirus Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Public health criteria to adjust public health and social measures in the context of covid-19: annex to considerations in adjusting public health and social measures in the context of covid-19 Financial markets under the global pandemic of covid-19 Forecast model analysis for the morbidity of tuberculosis in xinjiang, china The adaptive lasso and its oracle properties