key: cord-0173383-b8qjnluh
authors: Wickramasuriya, Shanika L
title: Probabilistic forecast reconciliation under the Gaussian framework
date: 2021-03-20
journal: nan
DOI: nan
sha: 868337b70c9233fd11274ef0ee58ebbd432274c0
doc_id: 173383
cord_uid: b8qjnluh

Forecast reconciliation of multivariate time series is the process of mapping a set of incoherent forecasts into coherent forecasts to satisfy a given set of linear constraints. Commonly used projection matrix based approaches for point forecast reconciliation are OLS (ordinary least squares), WLS (weighted least squares), and MinT (minimum trace). Even though point forecast reconciliation is a well-established field of research, the literature on generating probabilistic forecasts subject to linear constraints is somewhat limited. Available methods follow a two-step procedure. Firstly, it draws future sample paths from the univariate models fitted to each series in the collection (which are incoherent). Secondly, it uses a projection matrix based approach or empirical copula based reordering approach to account for contemporaneous correlations and linear constraints. The projection matrices are estimated either by optimizing a scoring rule such as energy or variogram score, or simply using a projection matrix derived for point forecast reconciliation. This paper proves that (a) if the incoherent predictive distribution is Gaussian then MinT minimizes the logarithmic scoring rule; and (b) the logarithmic score of MinT for each marginal predictive density is smaller than that of OLS. We show these theoretical results using a set of simulation studies. We also evaluate them using the Australian domestic tourism data set.

Multivariate time series forecasting problems often have a set of linear constraints to be satisfied.

For example, regional tourism demand (measured as the number of visitor nights spent away from home) must sum to the demand for state-level, which must then sum to the overall tourism demand of a country. We refer to these structures as hierarchical time series. A simple method to ensure these constraints is to forecast all the series at the most disaggregated level and then sum them to form forecasts for other aggregated series in the structure. We refer to this method as bottom-up (BU) (see Orcutt et al., 1968; Dunn et al., 1976; Shlifer & Wolff, 1979; Pennings & Dalen, 2017; Bertani et al., 2020, among others) . This method ignores the complicated relationships that exist between series in the structure. It can perform poorly on highly disaggregated data which have a low signal-to-noise ratio.

While overcoming these difficulties, forecast reconciliation was proposed by Hyndman et al. (2011) and later developed by van Erven & Cugliari (2015) , Hyndman et al. (2016) , Ben Taieb & Koo (2019) and Wickramasuriya et al. (2019) to achieve coherence in the forecasts for a given hierarchy. These methods firstly generate independent forecasts for each series (we refer to these as base forecasts). Secondly, they reconcile these to make them coherent (i.e., forecasts for the most disaggregated series follow the same set of linear constraints present in the data). Hyndman et al. (2011) formulated reconciliation as a regression model where the base forecasts were modeled as the sum of the expected values of the future outcomes and an error term.

Other subsequent work formulated reconciliation as an optimization problem that intended to minimize various quadratic loss functions. Recently, established relationships that exist between the methods proposed in Hyndman et al. (2011) , Ben Taieb & Koo (2019) and Wickramasuriya et al. (2019) . Panagiotelis et al. (2020a) provided a geometrical interpretation to some of these methods by nesting them within the class of projections.

One shortcoming of point forecasts is their inability to provide information about any departures from the predicted outcome, limiting their use in decision-making. As a consequence, probabilistic forecasts in the form of probability distributions over future quantities of interest have become widely used in many fields: economics (Clement, 2004; Rossi, 2014; Clements, 2018; Liu et al., 2021) , meteorology (Gneiting et al., 2008; Leutbecher & Palmer, 2008; Sloughter et al., 2013; Leutbecher, 2019) , energy (Jeon & Taylor, 2012; Hong et al., 2016; Ben Taieb et al., 2016) , and retail (Kolassa, 2016; Berry et al., 2020) . Even though reconciliation methods for point forecasts have been developed over the last decade, the literature on probabilistic forecast reconciliation is rather limited. Shang (2017) proposed to compute point-wise prediction intervals for infant mortality rates using the maximum entropy bootstrapping method. They generated bootstrap samples for each series in the structure independently, and then the best fitted ARIMA (autoregressive integrated moving average) model was identified for each generated bootstrap sample. Assuming that the fitted models closely follow the true data generating process, the future sample paths were simulated. These were then reconciled using the orthogonal projection matrix proposed by Hyndman et al. (2011) for point forecast reconciliation. A drawback of this approach is that the bootstrapped samples do not account for the inherent contemporaneous correlations among the series nor satisfy the linear constraints present in the data.

Ben Taieb et al. (2020) proposed an algorithm to compute coherent probabilistic forecasts in a bottom-up fashion. The algorithm obtains the forecast distribution of each aggregate series as a convolution of the forecast distributions of the corresponding disaggregated series, and dependencies between forecast distributions are incorporated through the use of empirical copulas. Gamakumara (2020) extended the definitions in point forecast reconciliation to probabilistic forecast reconciliation. The definitions they provided are general and can accommodate any continuous mapping from incoherent to coherent probabilistic forecasts. They provided conditions under which the linear mapping is a projection and favored an oblique projection similar to that of Wickramasuriya et al. (2019) when the predictive density is Gaussian. They have also shown that for a coherent data generating process, the logarithmic scoring rule is improper with respect to the incoherent base probabilistic forecasts. Hence they recommended using energy or variogram score if we wish to compare incoherent and reconciled probabilistic forecasts.

When the distributional assumptions are unlikely to make, Gamakumara (2020) proposed a non-parametric bootstrapping approach to generate future sample paths for each series in the structure and then make them reconcile using projections. Even though this method is similar to Shang (2017) , it does not bootstrap the observed series. It assumes that the models fitted to observed data closely follow the true data generating process and then generates future sample paths from the fitted univariate models by block bootstrapping the in-sample residuals.

In contrast to Shang (2017) , this method accounts for the contemporaneous correlations. The results of this study also tend to favor oblique type projection matrix similar to Wickramasuriya et al. (2019) . Rather than using existing projection matrices for point forecast reconciliation, Panagiotelis et al. (2020b) proposed to optimize either the energy or variogram score to find Wickramasuriya: 23 March 2021 the reconciliation weights. They allowed for any linear mappings from incoherent to coherent probabilistic forecasts, which do not necessarily lead to a projection matrix. The simulations were performed for Gaussian and non-Gaussian (using copula) errors driving the ARIMA processes. In the experiments, incoherent probabilistic forecasts were drawn jointly from a multivariate Gaussian distribution with parameters given by incoherent point forecast and covariance matrix of the in-sample incoherent forecast errors, or following a non-parametric procedure in Gamakumara (2020) . The results revealed that the performances of using an oblique type projection matrix as in Wickramasuriya et al. (2019) and score optimized mapping are similar. This pattern is also observed in the empirical application.

In this paper, we intend to fill a few gaps in probabilistic forecast reconciliation. Firstly, we theoretically show that when the incoherent (base) predictive distribution is jointly Gaussian, then among all the projection matrices, the oblique projection matrix used in Wickramasuriya et al. (2019) minimizes the logarithmic score for coherent predictive distribution. Secondly, we prove that the univariate logarithmic score after applying an oblique projection matrix is smaller than that from an orthogonal projection matrix used in Hyndman et al. (2011) for each marginal Gaussian reconciled predictive distribution in the structure.

The rest of the paper is structured as follows. Section 2 presents the notations, definition of probabilistic forecast reconciliation, a review of projection matrices used in point forecast reconciliation setting, and scoring rules for evaluating probabilistic forecasts. In Section 3, we introduce theoretical derivations. Section 4 and 5 show the results from simulations and Australian domestic tourism data set, respectively. Section 6 conclude with a short discussion.

Let y t ∈ R m be a vector of all observations collected at time t from each series in the structure, and b t ∈ R n be a vector formed only using the observations collected at time t from the most disaggregated level. These are connected via

where S is of order m × n which consists of aggregation constraints (for hierarchical time series) present in the structure. Due to the constraints present in S, y t lies in an n-dimensional subspace of R m which we refer to as "coherent subspace" and denoted by s. This subspace is spanned by the columns of S.

To clarify these notations and relationships more clearly, consider the structure given in Figure 1 .

Let's define a generic series within the structure as X, with y X,t denoting the value of series X at time t and y t being the aggregate of series in the most disaggregated level at time t.

For the structure given in Figure 1 , m = 8, n = 5, b t = [y AA,t , y AB,t , y AC,t , y BA,t , y BB,t ] , y t = [y t , y A,t , y B,t , y AA,t , y AB,t , y AC,t , y BA,t , y BB,t ] , and

where I k denotes an identity matrix of order k × k.

These notations can be easily extended to any large collection of time series subject to any aggregation constraints. We should also emphasize that the definition of S and b t can differ depending on the application (Shang, 2017; Jeon et al., 2019) .

To describe the concept of coherence and probabilistic forecast reconciliation, we adapt the notations and formal definitions introduced in Panagiotelis et al. (2020a) .

Let (R n , F R n , µ) be a probability space, where F R n is the Borel σ-algebra on R n . The triple can be assumed as the probabilistic forecast for the bottom-level series. Define s : R n → R m to be the premultiplication by S that we noted in Eq. (1). Then a σ-algebra F s can be constructed from the collection of sets s(B) for all B ∈ F R n .

Wickramasuriya: 23 March 2021 

OLS [Hyndman et al. (2011)] (S S) −1 S WLS ]

MinT(Shrink) [Wickramasuriya et al. (2019)] (S Ŵ −1 1,shr S) −1 S Ŵ −1 1,shr W 1,sam andŴ 1,shr are the sample, and shrinkage (Schäfer & Strimmer, 2005) covariance matrix, respectively of 1-step-ahead in-sample base forecast errors.Λ 1 = diag(Ŵ 1,sam ).

Definition 2.1 (Coherent probabilistic forecasts). Given the triple, (R n , F R n , µ), we can define the coherent probability space, (s, F s ,μ) satisfying the following property:

Let (R m , F R m ,μ) be a probability space referring to the incoherent probabilistic forecast for all m series in the structure and ψ : R m → R n be a continuous mapping function.

Definition 2.2 (Probabilistic forecast reconciliation). The reconciled probability measure ofμ with respect to ψ is a probability measureμ on s with σ-algebra F s satisfying

We can also define the mapping ψ as a composition of two transformations, say s • g, where g : R m → R n is a continuous function. A few choices of g from point forecasting literature are given in Table 1 , where g involves premultiplication by a matrix G ∈ R n×m such that SG is a projection matrix. Gamakumara (2020) showed that SG is a projection matrix if and only if SGS = S or equivalently, GS = I n holds. Hyndman et al. (2011) and Wickramasuriya et al. (2019) treated these as a set of constraints ensuring unbiased reconciled forecasts provided that the base forecasts are unbiased.

Wickramasuriya: 23 March 2021

This section briefly reviews the scoring rules that can evaluate the performance of different probabilistic forecast reconciliation methods. Scoring rules provide summary measures about the predictive performance of distributions. It addresses both calibration and sharpness simultaneously and provide a mechanism for ranking competing forecast methods.

Following Gneiting et al. (2008) , we define negatively oriented scoring rules that a forecaster wishes to minimize. Let P be the forecaster's predictive distribution, Z ∼ Q for which z ∈ R is a realization. A scoring rule is defined as s(P, z) and is said to be a proper scoring rule if

where E Q [·] denotes that the expectation is taken with respect to Q.

These scoring rules can evaluate the efficiency of marginal probabilistic reconciled forecasts. In this section, we discuss three scoring rules: logarithmic score, continuous ranked probability score and interval score. The first two scoring rules can evaluate the full predictive distribution.

The last scoring rule is helpful to evaluate quantile predictions.

This is the most widely used scoring rule when the predictive distribution P has a known density function p. It is defined as LS(P, z) = − log p(z).

The logarithmic score places a strong penalty on low probability events and therefore can be more sensitive to outliers.

The continuous ranked probability score is defined as the squared difference between the predictive and the empirical cumulative distribution function (CDF), and is given by

where 1 is the indicator function. For predictive CDFs with a finite first moment, CRPS can be written as

where X and X * are independent random variables with distribution P.

For computing CRPS, closed form analytical expressions exist for most classical parametric distributions. For instances where CDFs are not available, the expectations can be approximated:

1(x i ≤ ω) and x 1 , x 2 , . . . , x N is a collection of N random draws taken from the predictive distribution.

To evaluate the univariate central 100 × (1 − α)% prediction intervals from various reconciliation methods, we can use an interval score defined by

where l and u are the α/2 and 1 − α/2 quantiles, respectively. This scoring rule tends to reward narrower prediction intervals while incurring a penalty if the observation does not captured by the interval.

Univariate scoring rules cannot account for the dependencies that exist between the series in the structure. Therefore, we also consider three multivariate scoring rules: logarithmic score, energy score and variogram score.

We have stated the expression for the logarithmic score in Section 2.2.1 and the only modification is we now need to substitute a multivariate predictive density. Gamakumara (2020) showed that the logarithmic score is improper with respect to the class of incoherent measures if the true data generating process is coherent. Hence we cannot make a reliable comparison between incoherent

and coherent predictive densities. Another property of the logarithmic score is for any coherent density, the score for the entire structure differs from that for the most disaggregated level only by a fixed quantity which depends on S. Therefore, if one probabilistic forecast reconciliation approach achieves a lower expected score than another approach, the same ordering is preserved for the entire structure.

The energy score is the multivariate generalization of CRPS and is defined by

for E X 2 is finite, where z ∈ R m is the observation vector, X, X * ∈ R m are independent random vectors with distribution P and · 2 is the l 2 norm. We generally use Monte Carlo methods when the analytical expressions for these expectations are not readily available:

where x 1 , x 2 , . . . , x N is a collection of N random draws taken from the predictive distribution.

This formulation is computationally more efficient than the multivariate extension of the empirical counterpart of CRPS (Gneiting et al., 2008) . Pinson & Tastu (2013) and Scheuerer & Hamill (2015) noted in their studies that the discrimination ability of energy score to misspecified correlations can be limited.

Overcoming the drawbacks of energy score, Scheuerer & Hamill (2015) proposed an alternative score by considering the pairwise differences of the components of an m-dimensional vector. If p-th absolute moments are finite, then the variogram score of order p is given by

where z i is the i-th component of z, X i is the i-th component of X having the distribution P and w ij are non-negative weights. Similarly to the energy score, we approximate the expectation Wickramasuriya: 23 March 2021 from the sample counterpart. The simulation results of Scheuerer & Hamill (2015) suggested setting p = 0.5. We also set w ij = 1, ∀i, j in our experiments.

Let the h-step-ahead base probabilistic forecasts are given by N ŷ t+h|t , W h , whereŷ t+h|t ∈ R m is the h-step-ahead base forecasts for each series in the structure, made using observations up to and including time t, and arranged in the same order as y t , and W h = E y t+h −ŷ t+h|t y t+h −ŷ t+h|t . Suppose there exists a projection matrix SG h onto the column space of S that gives h-step-ahead reconciled probabilistic density by N SG hŷt+h|t , SG h W h G h S . As we know the parametric form of the density of probabilistic forecasts, we can use the logarithmic score to find the optimal choice of the G h matrix. In other words, we are interested in solving the following constrained optimization problem:

wheref (·) is the density of the h-step-ahead reconciled forecasts.

Let W h be a positive definite matrix. Then G h W h G h is also positive definite if SG h is a projection matrix onto the column space of S.

Proof. As projection matrices are idempotent

where tr(·) denotes the trace of a square matrix.

On the other hand, rank (S) = n, hence

Therefore, G h W h G h is full-rank and positive definite.

Probabilistic forecast reconciliation under the Gaussian framework Theorem 1. Let W h be a positive definite matrix. The optimal G h matrix which minimizes Eq. (2) subject to G h S = I n is given by

Proof. Let's consider the logarithmic score of the density of the h-step-ahead reconciled forecasts:

where det(A) and A − denote the pseudo determinant and pseudo inverse of the positive semi-definite matrix A, respectively.

Consider the second term in Eq. (3):

The first equality follows from the fact that SG h W h G h S and S SG h W h G h are isospectral (i.e., both quantities share the same non-zero eigenvalues). The second equality follows from the fact that S S and G h W h G h are symmetric and positive definite matrices.

Consider the third term in Eq. (3):

The third equality follows from Fact 6.4.8 of Bernstein (2005) . The logarithmic score of the predictive density can be rewritten as

The expected logarithmic score becomes

Therefore, the constrained minimization problem given in Eq.

(2) can be restated as

We decompose G h as given below such that the constraints are always satisfied:

The reason for approaching on this manner is to avoid using Lagrange multipliers in the objective function. The unconstrained minimization problem then becomes

Wickramasuriya: 23 March 2021

The first order condition of Eq. (5) gives

This leads to

which can also be written as

We then evaluate the Hessian of L (X h ) to ensure that X * h corresponds to a minimum. Let

where K m * n is the commutation matrix. The Hessian evaluated at the critical point is given by

The Hessian is also positive definite as the Kronecker product of two positive definite matrices is also positive definite. There is only one critical point, X * h , hence it corresponds to the global minimum of the optimization problem given in Eq. (5).

Under the Gaussian assumption, the expected logarithmic score for the reconciled marginal predictive density of a given series in the structure is smaller for MinT than OLS.

Proof. Let S = S 1 S 2 . . . S m . For a given projection matrix SG h , the logarithmic score of the h-step-ahead marginal reconciled predictive density of a series X i in the structure is given by

where S i denotes the aggregation constraint corresponds to series X i for i = 1, 2, . . . , m.

The expected logarithmic score becomes:

where K = 1 2 (log(2π) + 1). Using Theorem 1 from Wickramasuriya (2021), we know that

hence the expected logarithmic score for MinT is smaller than that for OLS.

To evaluate the performance of different reconciliation methods on predictive distributions, we follow the same simulation setups in assuming that the base predictive distribution for the series in the structure is jointly Gaussian.

We consider a hierarchy with four series at the bottom level, which are then aggregated in groups of size two to form all the aggregated series. The structure has seven series in total. We assume a stationary first-order autoregressive (i.e. VAR(1)) process to generate the observations at the bottom level:

where A 1 and A 2 are 2 × 2 matrices with eigenvalues z 1,2 = 0.6[cos(π/3) ± i sin(π/3)] and z 3,4 = 0.9[cos(π/6) ± i sin(π/6)], respectively. We also assumed that ε t ∼ N (0, Σ), where

and ρ ∈ 0, ±0.1, ±0.2, ±0.3, . . . , ±0.8.

We consider a slightly larger hierarchy. The structure consists of two-levels and 43 series in total.

There are 36 series at the bottom level and are aggregated in groups of size six to form six series at level 1, which are then aggregated to form the total series. We assume a VAR(1) process to generate the observations at the bottom level. The coefficient matrix used for the VAR(1) process is identical to the simulations carried out by .

Two representations for the correlation matrix of the Gaussian innovation process are considered:

(a) all the correlations are non-negative; (b) allows a mixture of positive and negative correlations.

A compound symmetric correlation matrix is used for each block of size six at the bottom level.

The correlation coefficient for each block is chosen from a uniform distribution on the interval (0.2, 0.7), and correlations between blocks are allowed using the algorithms developed by Hardin et al. (2013) . The covariance matrix is constructed by sampling the standard deviations from a uniform distribution on the interval ( √ 2, √ 6). Some of these covariances are turned into negatives to allow for a mixture of positive and negative correlations.

For each setup, we generated T = 101 or 501 observations for the bottom level series, with the last observation being withheld as the test set. Using the remaining observations as the training set, base forecasts are then generated from the best fitted ARMA (autoregressive moving average) models obtained by minimizing the AICc (corrected Akaike information criterion). We used the default settings in the automated algorithm of Hyndman & Khandakar (2008) which is implemented in the forecast package for R . The base forecasts are then reconciled using the projection matrices given in Table 1 .

For each reconciliation method, we use two different covariance estimators: sample and shrinkage. In light of Theorem 1 we use both univariate and multivariate scoring rules discussed in Section 2.2 for evaluations. We use N = 10000 random draws from the predictive distributions to compute energy and variogram scores. We repeat each simulation setup 1000 times. In

the following sections, the percentage relative improvements in scoring rules for a particular method relative to that for the bottom-up method which uses the sample covariance matrix, are reported. A negative (positive) value indicates that the method performs superior (inferior) to the bottom-up method.

We have also considered T = 101, 301, and real-roots for the matrices A 1 and A 2 for the first simulation setup. However, to save space, we do not present all the results in this paper. The omitted results follow a similar pattern and are available upon request. whereas the right panel shows that of when using the shrinkage covariance estimator. For the logarithmic score, we do not report the percentage relative improvements for base forecasts as it is an improper score with respect to the incoherent measures if the true data generating process is coherent. In addition, we compute the logarithmic score only based on the bottom level series as they differ from the full structure only by a constant.

It can be observed that for T = 101, MinT shows the best performance irrespective of the covariance matrix and the scoring rule used. This pattern has become more prominent when T = 501. The superior performance of MinT over OLS under the logarithmic score is also evident from Theorem 1.

Figures 4 and 5 show the percentage relative improvements of different forecast reconciliation methods when evaluated using the logarithmic score for each series in the structure when T = 101 and T = 501, respectively. We also repeated the analysis using other univariate scoring rules such as CRPS, and 80% and 95% IS. The conclusion from these scoring rules are qualitatively similar hence we report them in Appendix A.

For T = 101, no reconciliation method consistently outperforms all the correlation coefficients and the two choices of covariance estimators considered in this study. However, overall, MinT performs better than other methods. MinT which uses the shrinkage estimator seems slightly better than MinT which uses the sample covariance matrix. Because for instances where MinT which uses the sample covariance is worse than OLS, MinT with shrinkage estimator performs

Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q MinT which uses the sample covariance matrix does not perform well for T = 101. This may be due to the fact that the sample covariance matrix is a poor estimate of the truth for high dimensional data and have disastrous effects in the calculations of the logarithmic score. As the sample size increases, it performs similarly to MinT which uses the shrinkage estimator. −3.9 MinT 2.4 −6.3 −6.3 −6.3 −6.2 −6.2 −6.2 −6.3 −6.3 −6.3 −7.5 −7.5 −7.5 −6.9 −6.9 −6.9 2.2 −4.2 −4.2 −4.2 −6.3 −6.3 −6.3 −6.6 −6.6 −6.6 −5.7 −5.7 −5.7 −7.7 −7.7 −7.7 −4.2 MinT −2.7 −2.7 −2.7 −9.7 −9.7 −9.7 −10.3 −10.3 −10.3 −2.6 −2.6 −2.6 −9.5 −9.5 −9.5 −9.9 −9.9 −9.9 −3.1 −3.1 −3.1 −6.4 −6.4 −6.4 −10.7 −10.7 −10.7 −3.2 −3.2 −3.2 −6.6 −6.6 −6.6 −10.7 −10.7 −10.7 Base 1.5 −2.2 1.5 −2.2 −0.1 −2.9 −0.1 −2.9 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q qq q q q q qq q q q q q q q q q qq q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q are similar and given in Appendix A. We do not report the results for the non-negative error correlation structure as they are qualitatively similar.

As we noted in Section 4.1, for T = 101, there is no reconciliation method consistently outperforms the rest for all the series in the structure. For MinT, the shrinkage estimator performs better than the sample covariance matrix. As the sample size increases, the performance of

MinT dominates for most of the series. In comparison to OLS, MinT is superior for almost all the series as we would have expected.

We consider the Australian domestic tourism demand data set to build a hierarchical structure.

We measure domestic tourism demand using "visitor nights", the total number of nights spent by Australians away from home. The data are managed by Tourism Research Australia and are collected through the national visitor survey conducted by computer-assisted telephone interviews. The information is gathered from an annual sample of 120,000 Australian residents aged 15 years or over. The data are monthly time series and span the period from January 1998 to December 2019.

A two-level structure is considered where the total number of visitor nights in Australia is disaggregated based on geography. The first level of disaggregation is by seven states and the second level of disaggregation is by 77 regions. Table 5 given in Appendix B provides more details about the structure.

We begin with a training size of 120 observations. Using this training data, the best fitted ARIMA and ETS models are obtained for each series in the structure by minimizing AICc and then 1-step-ahead base forecasts are computed. Assuming that the base predictive density is Gaussian, the reconciled probabilistic forecasts are obtained using the G-matrices given in Table 1 . We repeat this procedure by rolling the training window by one observation at a time until the end of the sample. We evaluate the accuracy of both the point and probabilistic forecasts using MSE and scoring rules, respectively. For probabilistic forecasts, we use multivariate scoring rules such as logarithmic score, energy score and variogram score, and univariate scoring rules such as logarithmic score, continuous ranked probability score and interval score.

Probabilistic forecast reconciliation under the Gaussian framework MinT to perform better than OLS, on average. The rankings of these two methods might have changed because the estimation of the covariance matrix is challenging for high dimensional data. Table 4 shows the accuracy of probabilistic forecasts for ARIMA and ETS models using the multivariate scoring rules. The results are given separately for the two choices of the covariance estimators. The figures represent the percentage relative improvements in different scoring rules relative to the bottom-up method which uses the sample covariance matrix. A negative (positive) entry indicates a decrease (increase) in predictive accuracy relative to the predictive distribution from the bottom-up method. The bold entries identify the best performing methods.

We should emphasize here that the logarithmic score is computed based only on the joint predictive distribution of the bottom level series. We do not present the results of the logarithmic score for the base predictive density as the score is improper for incoherent densities when the true data generating process is coherent.

The percentage relative improvements of reconciliation methods which use the sample covariance matrix vary in a large range when evaluated using the logarithmic score. Among them, BU is the best and MinT is the worst. On the other hand, no such prominent behavior is observed when the shrinkage covariance estimator is used. This may be due to the fact that the sample covariance matrix provides a poor estimate for high dimensional data and has some adverse effects on the computation of the logarithmic score. This can also be seen from Table 3 , where

MinT(Sample) is worst than MinT(Shrink) for the bottom-level series (i.e. regions). It is surprising to observe that the logarithmic score could not differentiate BU from other reconciliation approaches even when the shrinkage covariance estimator is used. Because we noted in the previous analysis that BU is the worst performing method for point forecast reconciliation. As for the energy score, OLS is the best, and mostly MinT is the second best reconciliation method, and BU is the worst performing method regardless of the covariance matrix used. This is the same ordering that we noted in the point forecast reconciliation. For the variogram score and sample covariance combination, OLS or WLS is the best performing reconciliation method while MinT is the worst, whereas the variogram score and shrinkage covariance combination is considered MinT is the best and OLS or WLS is the second best. 

Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 3  33  73  5  83  50  52  42  6  69  15  8  41  4  39  11  59  21  64  31  24  66  19  17  48  27  43  16  30  13  18  22  36  49  60  77  10  32  84  80  81  12  56  76  25  61  9  79  28  54  55  74  45  46  53  71  20  23  67  29  68  57  62  47  14  72  26  70  7  44  38  40  58  51  35  75  34  63  37  65  78  82 : Percentage relative improvements in predictive accuracy evaluated using univariate scoring rules for each series in the structure. The univariate scoring rules used are logarithmic score (LS), continuous ranked probability score (CRPS) and interval score (IS). The base predictive distributions are obtained from ARIMA models. The series are sorted according to the performance of the OLS method evaluated using the logarithmic score.

In this paper, we aimed to study the properties of probabilistic forecast reconciliation methods as it has attracted a lot of attention in recent years. We theoretically showed that if the base predictive distribution is jointly Gaussian, then among all the projection matrices, MinT minimizes the logarithmic score of reconciled predictive distribution. In addition, the logarithmic score for each marginal Gaussian predictive density after applying MinT is smaller than that of OLS.

The simulations also revealed that these relationships hold as the sample size increases. The performance can be impacted by small samples as obtaining a precise estimate of the covariance matrix is challenging for high dimensional data. In our real data application, the logarithmic score was greatly impacted by the covariance matrix used, whereas the energy and variogram score yielded comparable results.

The author greatly appreciates valuable comments and insights from Professor Rob J Hyndman, A Univariate scoring rules q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq (−1, 1) . The series are sorted according to the performance of MinT. We set α = 0.05. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q Covariance: Sample q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q Probabilistic forecast reconciliation under the Gaussian framework q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 3  5  6  4  77  73  30  59  33  7  80  71  61  50  68  76  28  42  27  60  81  48  69  43  10  22  62  29  21  52  55  15  31  49  13  32  19  14  66  83  74  56  67  26  64  24  11  54  46  20  72  16  12  18  8  39  45  75  9  23  57  25  53  36  44  70  38  41  79  17  47  84  58  51  63  40  35  34  65  37  78 Figure 17 : Percentage relative improvements in predictive accuracy evaluated using univariate scoring rules for each series in the structure. The univariate scoring rules used are the logarithmic score (LS), continuous ranked probability score (CRPS) and interval score (IS). The base predictive distributions are obtained from ETS models. The series are sorted according to the performance of the OLS method evaluated using the logarithmic score. 

Forecasting uncertainty and in electricity and smart meter and data by boosting additive quantile regression

Regularized regression for hierarchical forecasting without unbiasedness conditions

Hierarchical probabilistic forecasting of electricity demand with smart meter data

Matrix mathematics: Theory, facts, and formulas with application to linear systems theory

Probabilistic forecasting of heterogeneous consumer transaction-sales time series

Joint bottom-up method for hierarchical time-series: Application to Australian tourism

Evaluating the bank of England density forecasts of inflation

Are macroeconomic density forecasts informative?

Aggregate versus subaggregate models in local area forecasting

Probabilistic forecast reconciliation: Theory and applications

Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds

A method for generating realistic correlation matrices

Probabilistic energy forecasting: Global energy forecasting competition 2014 and beyond

forecast: Forecasting functions for time series and linear models

Optimal combination forecasts for hierarchical time series

Automatic time series forecasting: the forecast package for R

Fast computation of reconciled forecasts for hierarchical and grouped time series

Probabilistic forecast reconciliation with applications to wind power and electric load

Using conditional kernel density estimation for wind power density forecasting

Evaluating predictive count data distributions in retail sales forecasting

Ensemble forecasting

Ensemble size: How suboptimal is less than infinity?

Panel forecasts of country-level Covid-19 infections

Data aggregation and information loss

Forecast reconciliation: A geometric view with new insights on bias correction

Probabilistic forecast reconciliation: Properties, evaluation and score optimisation

Integrated hierarchical forecasting

Discrimination ability of the energy score

Density forecasts in economics and policymaking

A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics

Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities

Reconciling forecasts of infant mortality rates at national and sub-national levels: Grouped time-series methods

Aggregation and proration in forecasting

Probabilistic wind vector forecasting using ensembles and Bayesian model averaging

Game-theorically optimal reconciliation of contemporaneous hierarchical time series forecasts

Properties of point forecast reconciliation approaches

Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization