key: cord-0446506-3mn18lnx authors: Pfarrhofer, Michael title: Modeling tail risks of inflation using unobserved component quantile regressions date: 2021-03-05 journal: nan DOI: nan sha: e0658a494aa6a9ccf5f9473e5041d87a4f11c170 doc_id: 446506 cord_uid: 3mn18lnx This paper proposes methods for Bayesian inference in time-varying parameter (TVP) quantile regression (QR) models featuring conditional heteroskedasticity. I use data augmentation schemes to render the model conditionally Gaussian and develop an efficient Gibbs sampling algorithm. Regularization of the high-dimensional parameter space is achieved via flexible dynamic shrinkage priors. A simple version of TVP-QR based on an unobserved component model is applied to dynamically trace the quantiles of the distribution of inflation in the United States, the United Kingdom and the euro area. In an out-of-sample forecast exercise, I find the proposed model to be competitive and perform particularly well for higher-order and tail forecasts. A detailed analysis of the resulting predictive distributions reveals that they are sometimes skewed and occasionally feature heavy tails. Predictive inference is often concerned with producing point forecasts of the conditional mean of some economic or financial series. However, recent contributions to the forecasting literature stress to take into account higher-order moments of the predictive distribution. Tail risks in particular have been studied in the finance literature for some time, but interest in the tails of distributions of key macroeconomic variables is a comparatively recent phenomenon. In an influential paper, Adrian et al. (2019) find time-varying downside risks in the distribution of GDP growth conditional on economic and financial conditions. Adams et al. (2021) use a similar framework to quantify risks to several other key macroeconomic variables, and come to the same conclusions: risks are timevarying, potentially asymmetric, and partly predictable. 1 A popular approach to model tail risks is quantile regression (QR). QRs are designed to estimate the conditional quantiles of some endogenous variable, and have originally been proposed by Koenker and Bassett (1978) . In this paper I add to the literature on QRs featuring time-varying parameters (TVPs) in a unified state space framework. TVPs have proven useful in addressing parameter change in macroeconomic and financial series in the context of structural inference, and often improve predictive accuracy when interest centers on forecasting. 2 Several previous contributions consider varying coefficient QRs (see De Rossi and Harvey, 2006; Kim, 2007; Wang et al., 2009; Oka and Qu, 2011) ; others rely on approximate or nonparametric approaches to account for structural breaks when estimating higher-order moments of the distribution of some series (see Cai and Stander, 2008; Taddy and Kottas, 2010; Gerlach et al., 2011; Chen and Gerlach, 2013; Liu, 2016; Wu and Zhou, 2017; Gonçalves et al., 2020; Lim et al., 2020; Griffin and Mitrodima, 2020) . A recent related paper is Korobilis et al. (2021) , who study time-varying inflation risks conditional on a set of macroeconomic and financial indicators in the euro area using a variant of TVP-QR. The conventional Bayesian QR originates from Yu and Moyeed (2001) , who establish a correspondence to classical methods by specifying the likelihood function as an asymmetric Laplace distribution. One shortcoming of this approach is that it complicates the setup of an efficient Gibbs sampling algorithm. This is due to the fact that conditional posterior distributions of the model 1 Other recent papers that address tail risks of macroeconomic variables are Manzan (2015) , Giglio et al. (2016) , De Nicolò and Lucchetta (2017) . 2 Examples using TVP models for structural inference are Primiceri (2005) , Mumtaz and Theodoridis (2018) , Paul (2020) . Recent forecasting applications featuring TVP models are, for instance, D' Agostino et al. (2013) , Aastveit et al. (2017) , , Yousuf and Ng (2021) . parameters are available in well-known form only for specific prior choices. As a solution, Kozumi and Kobayashi (2011) introduce auxiliary variables to approximate the asymmetric Laplace distribution, which renders the model conditionally Gaussian. This approximation is the point of departure for using state space methods within the class of QR models. I show how to extend the Bayesian QR to feature TVPs using conventional methods for Gaussian state space models involving several reparameterizations and approximations to facilitate the conditional likelihood. The assumption of a constant scale parameter of the asymmetric Laplace likelihood is relaxed, allowing for conditional heteroskedasticity within quantiles. While QR is itself a method to account for heteroskedastic data features, this extension adds an additional layer of flexibility. In particular, it allows the model to decide whether parameter change is attributed to the conditional quantile or whether quantiles feature time-variation in their error component, a crucial aspect when incorporating TVPs to discriminate signals from noise (see, e.g., the discussion in Sims, 2001) . Since quantile estimates are not guaranteed to be monotonic in the baseline version of the model, I consider an auxiliary Gaussian process regression to provide estimates of noncrossing quantiles. A special case of the Bayesian TVP-QR is applied to model the distribution and to produce tail forecasts of inflation. Forecasts of the conditional mean of inflation, but also its dispersion and tail risks, are of crucial importance to policy makers in central banks and practitioners in the private sector. Several related approaches using QRs have been proposed to model the full distribution of inflation or varying degrees of persistence in specific quantiles in autoregressive frameworks conditional on a set of other macroeconomic indicators (see, e.g., Wolters and Tillmann, 2015; López-Salido and Loria, 2020) . From a forecasting perspective, some papers suggest improvements in predictive accuracy by directly modeling conditional quantiles of inflation (e.g., Manzan and Zerom, 2013; Korobilis, 2017; Ghysels et al., 2018; Korobilis et al., 2021) . Most of the preceding literature on quantile models of inflation includes explanatory variables based on stylized models such as the Phillips curve and small information sets (e.g., Manzan and Zerom, 2013; López-Salido and Loria, 2020) , or model/variable selection approaches in higherdimensional data environments (e.g., Korobilis, 2017) . While the methods proposed in this paper apply to the general case featuring regressors, I consider a QR-version of a comparatively simple TVP model that has had great success in forecasting the conditional mean of inflation in the empirical application: the unobserved component (UC) model of Stock and Watson (2007) . Variants and extensions of this model featuring several additional unobserved factors (e.g., Chan et al., 2013; Jarociński and Lenza, 2018) , or the UC stochastic volatility in mean (UC-SVM) model proposed by Chan (2017) exhibit strong overall forecast performance for inflation. These models mainly target the conditional mean of inflation. Probabilistic error distributions allow to compute measures of density forecast accuracy and quantiles as a by-product. In this paper, I shed light on the question whether it is beneficial to model the quantiles of inflation explicitly within the class of unobserved component models, and how the resulting shapes of the predictive distributions differ along key dimensions such as skewness or heavy tails. Special emphasis is put on investigating the performance of such approaches in different parts of the predictive distribution. Define q p (x t ) = x t β pt as the pth quantile regression function of y t conditional on x t , for p ∈ (0, 1). The regression coefficients are collected in a K × 1-vector {β pt } T t=1 . They vary over time and are specific to the pth quantile. The error term t with density f p (•) has its pth quantile equal to zero. Following Yu and Moyeed (2001) , the density f p (•) is chosen to be the asymmetric Laplace (AL p ) distribution. This is due to the correspondence between frequentist and Bayesian inference that this likelihood implies. 3 Kozumi and Kobayashi (2011) use a mixture representation of the AL p distribution to cast a constant parameter version of (1) as a conditionally Gaussian model. In this paper, in addition to TVPs, I extend the Bayesian QR to feature a time-varying scale parameter similar to a stochastic volatility model. 4 To achieve this, define auxiliary variables v pt ∼ E(σ pt ) which follow an exponential distribution with time-varying scaling σ pt , and u t ∼ N (0, 1). The model in (1) can be written as: . (2) In a Gibbs sampling algorithm, the term θ p v pt shifts the location of the observations in y t , while τ p √ σ pt v pt reweights them such that β pt targets the relationship between y t and x t with respect to the pth quantile. To see this more clearly, −1 x t with I K denoting an identity matrix of size K. Conditional on v pt and σ pt , (2) can be written as a standard TVP regression: The parameters θ p , τ p and v pt are the usual parameters shifting and reweighting observations in Bayesian QR. The time-varying scales σ pt introduce additional flexibility by allowing for both quantile and time-specific differences in volatilities. This allows the model to discriminate between time-varying signals and noise, and induces a varying degree of smoothness in quantile estimates. I introduce time-variation in the quantile specific regression coefficients and the logarithmic scale parameters via random walk state equations: log(σ pt ) = log(σ pt−1 ) + e pt , e pt ∼ N 0, ς 2 p , with K × K-matrix Ω pt = diag (ω p1,t , . . . , ω pK,t ) collecting independent state innovation variances on its diagonal and ς 2 p refering to the state innovation variance of the scale parameters. Note that combining (3) and (4) yields a state space model with standard normal measurement errors. This enables the use of all standard methods for Gaussian state space models that are available in frameworks specified in terms of the conditional mean. Time-variation for the kth coefficient in β pt is governed by ω pk,t for k = 1, . . . , K. There are several options to model these variances, each of them being consequential for the dynamic evolution of the states. The default option is to disregard varying degrees of time variation and rely on a constant specification, that is, ω pk,1 = . . . = ω pk,T = ω pk . A common prior in this case is the inverse Gamma distribution (subsequently labeled iG). This can be achieved by assuming independent inverse Gamma priors, ω pk ∼ G −1 (m, n). In the empirical application, I choose a weakly informative prior with m = n = 0.1. To impose a time-varying degree of shrinkage, the first option is to construct a prior which detects the necessity of time variation on a t-by-t basis. The prior I assume is ω pk,t = λ 2 pk φ 2 pk,t with λ pk ∼ C + (0, 1) and φ pk,t ∼ C + (0, 1). C + refers to the half-Cauchy distribution. This consideration implies that there is no persistent state evolution for ω pk,t , and for this reason I refer to it as the static horseshoe prior (labeled shs). In other words, I impose an overall degree of shrinkage λ pk towards constancy, with scalings φ pk,t providing local adaptiveness for periods where shifts in the model parameters are required. This prior is closely related to the one used in Korobilis et al. Persistence in the shrinkage process can be achieved by the dynamic horseshoe prior (labeled dhs). Here, I assume that ω pk,t = λ p0 λ pk φ pk,t and consider this quantity on the log-scale to obtain a joint law of motion for ψ pk,t = log(λ p0 λ pk φ pk,t ) = log(ω p,kt ), ψ pk,t = µ ψ,pk + ϕ pk (ψ pk,t−1 − µ ψ,pk ) + ν pk,t , ν pk,t ∼ Z(c, d, 0, 1). Following Kowal et al. (2019) , this respresentation establishes a dynamic version of the horseshoe prior (for c = d = 1/2), where λ p0 acts as global shrinkage parameter specific to quantile p, λ pk is a covariate specific shrinkage parameter, φ pk,t a covariate and time-specific shrinkage parameter and Z denotes the Z-distribution. The priors on the shrinkage parameters are λ p0 ∼ C + (0, 1/T K) and λ pk ∼ C + (0, 1). The prior setup is completed by assuming an inverse Gamma prior with σ p ∼ G −1 (a/2, b/2) for the case of a time-invariant scale parameter (labeled TIS) of the AL p distribution (i.e., for σ p1 = . . . = σ pT = σ p ), and an inverse gamma prior on the state innovation variance of the logarithmic timevarying process of the scale parameter (labeled TVS), ς 2 p ∼ G −1 (e, f ). For the empirical application, I set a = b = 0.1, e = 3 and f = 0.3 to achieve weakly informative priors. The resulting posterior distributions and details on the sampling algorithm are provided in Appendix A. Disregarding a number of draws as burn-in, the MCMC algorithm delivers draws from the desired posterior distributions. In the empirical application of this paper, I discard the initial 3,000 draws as burnin and use each third of the 9,000 subsequent draws for posterior and predictive inference. Note that there is no dependence between coefficients across quantiles for QRs as in (1), so estimation for different values of p can easily be parallelized. The recursive forecast exercise shown in this paper, for instance, employs a high-performance cluster which distributes all individual quantile regressions across nodes and collects them afterwards. The R-code associated with this paper also provides routines to distribute the computational burden to any number of available CPUs on a single machine. This feature enables the individual quantile-specific models to be estimated roughly in the same time as a model targeting the unconditional mean. However, this comes with a caveat. Disjoint models of quantiles may be problematic, since independent estimates of the quantile function are not guaranteed to be monotonic. 5 To ensure that the estimated quantiles are monotonic, i.e., q p 1 (y t |x t ) < q p 2 (y t |x t ) for p 1 < p 2 , I rely on an auxiliary Gaussian process regression (GPR, see Williams and Rasmussen, 2006) and post-process the collected MCMC draws for all considered quantiles using the two-step approach discussed in Rodrigues and Fan (2017) . Let p = {0.05, 0.10, . . . , 0.90, 0.95} denote the P = 19 quantiles of interest, and define p = (0.05, 0.10, . . . , 0.90, 0.95) . 6 The approach of Rodrigues and Fan (2017) involves constructing a P × P matrix This matrix collects induced quantiles, that is, the quantile function of the fitted AL p model with respect to the full grid of quantilesp = {0.05, 0.10, . . . , 0.90, 0.95}. The variablesβ pt andσ pt refer to the posterior mean estimates of the respective quantities. Consequently, this auxiliary model provides P − 1 additional estimates of the quantile for each p, since the raw quantile estimates x tβ pt are collected on the diagonal of Q t,(p,p) (y t |x t ). This additional information to unify independent quantile regressions is exploited via the following GPR: The covariance matrix Σ t is diagonal and features the posterior variances of the corresponding element Q t,(p,p) (y t |x t ) divided by the number of retained MCMC draws. The Gaussian process covariance is chosen such that it reflects a decreasing function of the distance between quantiles via a squared exponential kernel. The elements of the square matrix Υ t , υ t, (p,p) , are thus defined as where w t is the bandwidth and s 2 is a variance hyperparameter of the prior. The variance is set to s 2 = 100 to yield a comparatively uninformative prior. The final quantile estimate is given by the fitted values of the GPR at all desired levels of p. Rodrigues and Fan (2017) show that this corresponds to a weighted average of the underlying induced quantiles, which is consistent and exhibits favorable empirical properties. It remains to discuss the choice of the bandwidth w t . Rodrigues and Fan (2017) show that there always exists a bandwidth which guarantees noncrossing quantiles, and note that as w t → ∞ one obtains equal weights for the induced quantile, while the case w t → 0 puts non-zero weights only on the raw quantile estimates. Consequently, I select the minimal w t that results in noncrossing quantiles for all p and t = 1, . . . , T . This approach is subsequently referred to as GPt. An alternative is to choose a w t = w for all t, which is the default setup for constant parameter QRs in Rodrigues and Fan (2017) . The latter choice is labeled GP and typically results in smoother estimates of the quantiles for TVP-QR over time, since w will be the maximum value of the set of minimal values of {w t } T t=1 that guarantee noncrossing quantiles over time. This section first provides descriptive statistics and stylized facts of inflation dynamics, which subsequently motivate the baseline model specification in the empirical part of this paper. . Red (• with πt < π 0.05 or πt > π 0.95 ) and blue (• with π 0.05 ≤ πt < π 0.10 or π 0.9 ≥ πt < π 0.95 ) dots mark observations in the tails of the distribution, with πp denoting the pth unconditional quantile over time. Inflation is defined as π t = 400 log(P t /P t−1 ), where P t denotes the price index at time t. Figure 1 shows inflation for all three economies, alongside a histogram and smoothed density estimate of all observations over time. Panel (a) of Fig. 1 shows π t for the US. The majority of observations above the 90th and 95th percentile of the unconditional distribution over time occur during the late 1970s and early 1980s. During the 1950s, a period of high volatility, I observe several values below the 10th and 5th percentile. The majority of the remaining observations allocated in the interval π 0.05 ≤ π t < π 0.10 occur during or just after recessions until the early 1960s. After the Volcker chairmanship of the Federal Reserve ended in 1987, all observations for inflation in the tails of the unconditional distribution are located below the 10th percentile. This suggests a shift from upside risks of excessive inflation during the 1970s and 1980s towards downside risks of deflation in later periods of the sample. This stylized fact may be linked to the decoupling of inflation and inflation volatility discussed in Chan (2017) . Considering the density plot on the right-hand side, the dynamic evolution of inflation translates to a slightly right-skewed unconditional distribution with heavy tails. Investigating the rolling window quantile estimates suggests substantial movements and asymmetries with a varying degree of persistence. Turning to the UK in panel ( only a minor relevance of asymmetries, but indicates that the median of inflation appears to decline. Some periods, such as during the Great Recession and the European sovereign debt crisis feature markedly wider distributions. The proposed model specification is motivated based on the literature on forecasting the conditional mean of inflation. Popular and successful approaches, particularly for forecasting, are variants of the unobserved component model with stochastic volatility (see, e.g., Stock and Watson, 2007; Chan et al., 2013; Chan, 2017; Chan et al., 2018; Jarociński and Lenza, 2018) . The simplest case of this model assumes a persistent (unobserved/latent) trend for the mean, usually augmented with some form of conditional heteroskedasticity. Such models are aimed at providing an accurate model for the conditional mean, which combined with a potentially time-varying variance yields estimates of the conditional quantiles of inflation as a by-product. In this paper, I ask how explicit models of quantiles within this class compare to mean-based specifications. Interest centers both on overall forecast performance, but also how the competing models perform for different parts of the distribution. The proposed econometric framework thus extends the conventional unobserved component model for the conditional mean by introducing analogous but explicit unobserved components for different quantiles of the distribution of inflation. Section 2. This model generalizes the UC-SV model of Stock and Watson (2007) to modeling the conditional quantiles of inflation. The UCQR model is given by: with t following an AL p distribution. I rely on the Gaussian approximation of the AL p distribution using auxiliary variables discussed in detail in Sub-Section 2.1. The competing model specifications include both a time-invariant scale (TIS) and a time-varying scale (TVS) version of UCQR. To assess the robustness of this approach with respect to priors, I consider the dynamic horseshoe (dhs), the static horseshoe (shs) and a conventional inverse Gamma (iG) prior to determine ω pt . The paper also provides a comparison between unprocessed and processed quantile estimates. Unprocessed estimates (i.e., noncrossing of quantiles is not guaranteed) are referred to as raw. They are shown alongside those adjusted using the two-stage GPR procedure discussed in Section 2.2. GP refers to the bandwidth parameter w being fixed over time, GPt indicates a time-varying w t . The UCQR model is compared to mean-based versions of the UC model. In particular, I consider the following models as competing specifications: -Unobserved component with stochastic volatility (UC-SV) This model assumes a measurement equation of the form π t = α t + t , with error term t ∼ N (0, exp(h t )), a random walk state equation α t = α t−1 + √ ω αt e αt and log-volatility process (2007), and the natural competitor to UCQR given that it is specified analogously, but targets solely the conditional mean rather than conditional quantiles. -Unobserved component with stochastic volatility in mean (UC-SVM) This model assumes a measurement equation of the form π t = α t + γ t exp(h t ) + t with error and for the stochastic volatilities h t = h t−1 + ς h e ht , with e st following independent standard normal distributions for s ∈ {α, γ, h}. This specification was originally used in Chan (2017) UC-SV and UC-SVM are implemented and estimated as described in Huber and Pfarrhofer (2021) with a dynamic horseshoe prior. An overview of all model specifications and adjustments is provided in (dhs), static horseshoe (shs), inverse Gamma (iG) prior; stochastic volatility (SV), stochastic volatility in mean (SVM), time-invariant scale (TIS), time-varying scale (TVS) parameter; raw refers to unprocessed estimates of the conditional quantiles, GP is adjustment using the Gaussian process regression with the bandwidth w fixed over time, GPt marks the Gaussian process regression with time-varying bandwidth wt. The charts in Fig. 2 show several interesting differences with respect to the three models To assess predictive accuracy of the competing models, I rely on a pseudo out-of-sample exercise using an expanding window. The sample is split into training and holdout periods. I estimate the models using data from the first training sample to produce h-steps ahead forecasts, and evaluate these using the corresponding realized values in the holdout. Subsequently, an additional observation from the holdout is added to the training sample and the procedure is iterated until the holdout is exhausted. The initial sample is chosen such that it consists of 50 quarterly observations for all three economies, resulting in differently sized holdout periods. I consider the quantile weighted cumulative ranked probability score (CRPS) to measure the overall accuracy of the density forecasts at different points of the distribution, alongside conventional log predictive scores (LPSs). Let π I first define the quantile score (QS, see, e.g., Giacomini and Komunjer, 2005; Gneiting and Raftery, 2007) for quantile p at time t for h-steps ahead, which is computed as: where I is the usual indicator function. Raw values for QSs to assess tail forecast performance at specific quantiles are provided in Appendix B. Quantile weighted CRPSs, following Gneiting and Ranjan (2011) , are defined as: CRPS t+h (w p ) = 1 0 w p QS p,t+h dp, with non-negative weights w p on the unit interval putting emphasis on specific parts of the distribution. Four weighting schemes are considered for CRPSs: (a) equal w p = 1, (b) tails w p = (2p − 1) 2 , (c) left tail w p = (1 − p) 2 , (d) right tail w p = p 2 , for p ∈ {0.05, 0.10, . . . , 0.90, 0.95}. To recover a smooth estimate of the entire predictive distribution from the individual QRs, I apply kernel smoothing based on a Gaussian kernel on the estimated quantiles (see also Gaglianone and Lima, 2012; Korobilis, 2017) . This procedure can be exploited to compute log predictive scores for the QR-based models. Tables 2 to 4 report the quantile weighted CRPS variants and LPS for the US, UK and EA as averages over the holdout. They are benchmarked relative to the UC-SV model (as ratios for CRPSs, and differences for average LPSs). The row for the benchmark displays actual metrics, all Investigating the results in detail, UC-SV and UC-SVM appear to be tough benchmarks to improve upon for all three considered economies. UC-SVM typically performs slightly better than the conventional UC-SV. These results confirm the findings of Chan (2017) and Huber and Pfarrhofer (2021) . The conditional-mean based models are particularly strong for short-horizon forecasts, with most favorable metrics in all parts of the distribution targeted by the quantile weighted CRPSs. In fact, the best performing specification for one-quarter ahead forecasts in terms of CRPSs is UC-SVM in almost all cases across the three economies. For multi-step ahead forecasts, the additional distributional flexibility for the quantile-based model tends to pay off in terms of predictive accuracy. This finding is especially pronounced at the three-year ahead horizon. One of the UCQR specifications is identified as the best performing model measured with the CRPS variants for all considered higher-order forecasts, with the exception of one-year ahead forecasts for the US. The quantile weighted versions of CRPSs target different parts of the distribution, and thus allow to investigate where overall gains stem from in more detail. The results indicate that UCQR, again for higher-order forecasts, results in pronounced gains, especially when focusing on either the left, right or both tails via the weighting scheme. Within the class of quantile-based models, some differences with respect to the considered economy are apparent. Allowing for conditional heteroskedasticity (TVS) within quantiles results in more accurate forecasts for the US and the EA. In these two economies, it is worth mentioning that the homoskedastic variant (TIS) fails to improve upon the mean-based models in many cases, while TVS shows improvements ranging between 10 to 20 percent lower CRPSs depending on the economy, horizon and weighting scheme. This is different in the UK, where TIS seems to be better suited to model the predictive distribution of inflation. However, these gains are more modest, and in general, the differences in forecast metrics are smaller for the UK than the US or EA. Interestingly, LPSs and CRPSs do not necessarily agree on model selection. One of the UCQRmodels is still identified as the best performing specification, but by the LPS metric, it is usually one of the homoskedastic QRs. Differences in predictive accuracy between prior specifications for UCQR are often muted. In many cases one of the two dynamic shrinkage priors is identified as the best performing model. In cases where the iG prior works best, it does so typically at a small margin. This corroborates the findings in Huber and Pfarrhofer (2021) , who show that imposing shrinkage in small-scale timevarying parameter models never severely hurts predictive accuracy, but in many instances yields improvements. It is worth mentioning that the dynamic horseshoe (dhs) prior performs well most consistently on average compared to the static horseshoe (shs). Turning to the consequences of post-processing quantile estimates to achieve a noncrossing quantile solution, I find that using a fixed bandwidth in the GPR usually yields similar results compared to those allowing for timevariation in this parameter. Interestingly, the raw quantile estimates which are not guaranteed to be monotonic exhibit the best forecast performance in a small number of cases. In a nutshell, adjusting quantiles ex post neither helps nor hurts predictive accuracy much. Summarizing, four key findings are worth noting. First, variants of UCQR perform particularly well for higher-order and tail forecasts. For one-quarter ahead forecasts, the mean-based competing models often indicate better performance, but only at small margins and UCQR is competitive. Second, there is some heterogeneity with respect to the considered economies. Allowing for conditional heteroskedasticity within quantiles pays off for the US and EA, while homoskedastic scales are sufficient for the case of the UK. Third, imposing dynamic shrinkage helps predictive accuracy in many cases. The dynamic horseshoe exhibits the most consistent improvements across forecast metrics, horizons and economies. Finally, adjusting for noncrossing quantiles has only minor implications with respect to predictive accuracy. To shed light on which features in the resulting predictive distributions yield gains in tail forecast performance, I investigate their shapes in more detail in the following. Note that the kernel smoothing of the estimated quantiles used to compute LPSs can also be employed to generate random samples from the estimated predictive distributions. In order to compute and investigate higher-order moments of these distributions, I focus on one-year ahead forecasts for UC-SV, UC-SVM, UCQR-TIS-GPt-dhs and UCQR-TVS-GPt-dhs, and use the following procedure. First, I generate a sample of 3, 000 random numbers from the respective predictive distribution which I then use to compute the mean, variance, excess kurtosis and skewness of this sample. This step is iterated for 1, 000 times. Second, to obtain numerical standard errors for the estimates of these moments using a type of bootstrap-procedure, I compute the 5th and 95th percentile over these 1, 000 replicates. The resulting empirical moments are shown in Figure 3 . Starting with the mean of the predictive distribution, several stylized facts across all three economies can be established. While UC-SV and UC-SVM often show similar paths, UC-SVM typically produces more pronounced high-frequency movements in the context of large-variance shocks. This feature is due to feedback effects between inflation and its volatility. For the US, Chan (2017) shows that the level-volatility relationship was positive pre-1980, but turned insignificant or negative afterwards. In other words, large-variance shocks coincided with higher inflation early in the sample, while large variance shocks later in the sample relate to downward movements of inflation. Comparing the two mean-based competing models to the QRs, predictive means from UCQR-TIS often exhibits a similar path, while shifts in the location of the predictive distribution in the case of UCQR-TVS often occur with a delay and are less pronounced. This finding is similar to in-sample evidence, and overall, the predictive mean for the latter model displays a less noisy and smoother evolution. The more aggressive fit of the mean-based models, in light of the forecast results, yields gains in predictive accuracy in the short horizon. However, it appears that a less aggressive fit tends to pay off for higher-order forecasts. Turning to the variances, several findings are worth noting. In general, time-varying variances for the US and UK are clearly featured for all competing models. This requirement is mostly dictated by the comparatively long sample vis-à-vis the EA spanning different economic phases (see, e.g., Clark, 2011) . By contrast, variances in predictive distributions for the EA are comparatively stable, which corroborates findings by Jarociński and Lenza (2018) and Huber et al. (2020) , who detect only limited evidence in favor of stochastic volatility for inflation in the EA. Note that this finding suggests that models for the conditional mean could safely neglect time-varying volatilities, while conditional heteroskedasticity within quantiles via TVS receives empirical support. The UC-SVM model overall yields the narrowest predictive distributions, apart from during the Great Recession in the US. This may be explained by the notion that the volatility process is featured as a predictor, providing a better fit, thereby also reducing the predictive variance. Interestingly, for the US and the EA, predictive variances in normal economic times are typically slightly larger for UCQR-TVS compared to the benchmark models, which can be related to the less agressive fit with respect to the location of the distribution. During the Great Recession, or the Covid-19 pandemic for the case of the US, however, the predictive distributions are narrower than those of the mean-based models or UCQR-TIS. Turning to the higher-order moments -excess kurtosis and skewness -differences between model specifications are more striking. Lower variances during recessionary episodes may in part be explained by noting that particularly UCQR-TVS signals periods of substantial excess kurtosis, pointing to heavier than normal tails of the predictive distribution. Moreover, the resulting distributions are skewed (positively and negatively) during different parts of the samples across economies. Interestingly, UCQR-TIS usually exhibits excess kurtosis values around zero, and less pronouncedly skewed predictive distributions, indicating that this feature arises from allowing for conditional heteroskedasticity within quantiles. Given the comparatively worse performance of UCQR-TVS in the case of the UK, the forecast results suggest that such features are irrelevant to improve density forecast accuracy, while they yield gains for the US and the EA. It is worth discussing several periods in more detail. Strikingly, excess kurtosis in predictive distribution for the US occurs after the high-inflation period of the late 1970s. The skewness parameter in conjunction with these heavier than normal tails in the mid 1980s suggests substantial upward inflation risk during the Volcker chairmenship. Later in the sample, however, during the global financial crisis, non-zero excess kurtosis is again measured, but with a negatively skewed distribution, pointing to disinflationary pressures and downward risk. This notion may be linked to the findings of a sign-switch in the relationship between inflation and its volatility identified in Chan (2017) . By contrast, the skewness parameter for UCQR-TIS in the UK is positive for most of the sample, pointing mainly towards upward risk early in the sample which has declined since the early 1990s. For the EA, some periods of skewed distributions are featured, particularly during the European debt crisis, but they are not as pronounced as those for the other economies. Summarizing, this analysis indicates that forecast gains for the UCQR models can in part be explained by allowing for more flexible predictive distributions featuring both heavier than normal tails and skewness. While such features improve longer-horizon and tail forecasts in the US and the EA, this is not necessarily the case for the UK. To complement this discussion of tail risks in inflation, and to quantify both upside and downside risk, I compute the probabilities of three different inflation-scenarios: (a) Deflation with Pr(π t+h < 0), (b) Target with Pr(1 ≤ π t+h ≤ 3), and (c) Excessive with Pr(π t+h > 4). An interesting aspect of this exercise is that while increased variances for UC-SV and UC-SVM usually widen the predictive distribution symetrically around the mean, this is not necessarily the case for UCQR, which might affect the evolution of scenario probabilities. Moreover, as suggested above, there are several differences between estimates of the predictive mean, variance, excess kurtosis and skewness. All of these may affect the probability of inflation lying within the specified bounds. I again use 1, 000 replicates of samples from the predictive distributions and compute the numerical 5th and 95th percentiles. The results are shown in Figure 4 . The previous findings of UCQR-TVS producing smoother moments of predictive distributions carry over to this exercise. Overall, all models tend to agree on the respective probabilities of the considered scenarios, with some differences in the case of the EA. For the US, deflationary risks were virtually non-existent between 1970 and 1985. They appear to be a threat particularly in more recent years, with similarities to the case of the EA. Conversely, probabilities close to one of excessive inflation are visible during the high-inflation periods during the 1980s for the US and the UK, but show overall declines with minor peaks for the rest of the sample, while being only minor in the EA. This is consistent with the overall trend towards more stable prices in the last three decades. Notable upward risks occur prior to the early 1990s recession in the US and the ERM crisis in 1992 in the UK; moreover, I detect inflationary risks just before the Great Recession for both economies, and also for the EA. Turning to the Target scenario, which reflects the ability of central banks to keep inflation anchored at the respective target, I find several interesting patterns. Both in the US and the UK, there is a trend towards increasing probabilities of keeping inflation within the interval between one and three percent since the 1980s and the 1990s, respectively. A theoretical argument for this break is provided by noting that the Volcker disinflation in the case of the Federal Reserve, and the adoption of inflation targeting by the Bank of England in 1992, constitutes a policy regime change in the approach to central banking. For a theoretical model discussing breaking the link between inflation and inflation volatility via policy regime changes, and, with it, stabilization of price dynamics, see Cukierman and Meltzer (1986) . Further increasing probabilities of keeping inflation on target in recent years may be linked to more transparent central bank communication and expectation management strategies (see, e.g., Blinder et al., 2008) . Compared to the Federal Reserve and the Bank of England, the European Central Bank exhibits stable and higher probabilities of fulfilling its target measured by UCQR (albeit for a much shorter sampling period). It is worth mentioning that a slight downward trend in target-probabilities is observable for the EA particularly in the aftermath of economic crises, coinciding with upward movements in deflationary pressures. defined in (2), a simple accept/reject sampling algorithm with acceptance probability ζ pt on a tby-t basis can be derived using a random walk proposal, h pt , c) with c being a tuning parameter: The proposed value h ( * ) pt is accepted with probability ζ pt , otherwise, the previous draw h (r) pt is retained. After obtaining the full history of {h pt } T t=1 , the state innovation variance ς 2 p can be sampled from its inverse Gamma conditional posterior distribution using standard moments for Bayesian linear regression models. The conditional posterior for a time-invariant scale parameter σ p is: Given the state innovation variances in Ω pt , conventional Kalman-filter based methods such as forward-filtering backward-sampling (FFBS, see Carter and Kohn, 1994; Frühwirth-Schnatter, 1994) or faster alternatives (see, e.g., Chan and Jeliazkov, 2009; Hauzenberger et al., 2021) can be used to draw the full history of the time-varying, quantile-specific regression coefficients {β pt } T t=1 . Time-variation in the state innovation variances of the regression coefficients, collected in Ω pt , is governed by (6). Given β pt , a mixture representation of the Z-distribution using Pólya-Gamma (denoted by PG) random variables can be employed: ν pk,t |ξ pk,t ∼ N (ξ −1 pk,t (c − d)/2, ξ −1 pk,t ), ξ pk,t ∼ PG(c + d, 0), for k = 1, . . . , K. The approximation renders (6) conditionally Gaussian, and an appropriate algorithm can again be used to sample the full history of the state innovation variances. This procedure is similiar to popular approaches in stochastic volatility models (see, e.g., Kim et al., 1998) , and described in detail in Kowal et al. (2019) . For the iG prior, posterior moments of the state innovation variances have a standard textbook form and are not reproduced here; corresponding moments for the shs prior can be found in Makalic and Schmidt (2015) and Huber and Pfarrhofer (2021) . After initializing the sampler, the Markov chain Monte Carlo (MCMC) algorithm iterates through the following steps: (1) Draw the full history of the time-varying, quantile-specific regression coefficients {β pt } T t=1 conditional on all other parameters using FFBS based on the observation and state equations given by (3) and (4). (2) Using the mixture representation of the Z-distribution and conditional on {β pt } T t=1 , draw the full history of the state innovation variances {ω pk,t } T t=1 for k = 1, . . (2), while the law of motion given by (5) defines the conditional prior at each point in time. In the time-invariant case, it is sampled from its inverse Gamma conditional posterior with moments provided in (A.2). (5) Forecasts for the dynamic shrinkage process and the TVPs can be obtained via simulation based methods. For multi-step ahead forecasts, ony may specify the vector x t accordingly and produce direct forecasts. The produced MCMC output can be used in the context of the GPR discussed in Section 2.2 to post-process raw estimates for achieving noncrossing quantile estimates. A.1.1. This Section provides several additional in-sample empirical results. I produce the same charts as in Fig. 2 using an inverse Gamma prior (iG) on the state innovation variances rather than the dynamic horseshoe prior. Have standard VARs remained stable since the crisis? Forecasting macroeconomic risks Vulnerable growth Central bank communication and monetary policy: A survey of theory and evidence Quantile self-exciting threshold autoregressive time series models Macroeconomic and Financial Risks: A Tale of Volatility Measuring sovereign contagion in Europe Capturing Macroeconomic Tail Risks with Bayesian Vector Autoregressions On Gibbs sampling for state space models The stochastic volatility in mean model with time-varying parameters: An application to inflation modeling A new model of inflation, trend inflation, and long-run inflation expectations Efficient simulation and integrated likelihood estimation in state space models A new model of trend inflation Semi-parametric quantile estimation for double threshold autoregressive models with heteroskedasticity Tail Forecasting with Multivariate Bayesian Additive Regression Trees Real-time density forecasts from Bayesian vector autoregressions with stochastic volatility Investigating Growth at Risk Using a Multi-country Non-parametric Quantile Factor Model A theory of ambiguity, credibility, and inflation under discretion and asymmetric information Macroeconomic forecasting and structural change Forecasting tail risks Time-varying quantiles Modeling and forecasting macroeconomic downside risk Data augmentation and dynamic linear models Constructing density forecasts from quantile regressions Asymmetry in unemployment rate forecast errors Bayesian time-varying quantile forecasting for valueat-risk in financial markets Quantile-based inflation risk models Evaluation and combination of conditional quantile forecasts Systemic risk and the macroeconomy: An empirical evaluation Strictly proper scoring rules, prediction, and estimation Comparing density forecasts using threshold-and quantile-weighted scoring rules Dynamic quantile linear models: A Bayesian approach A Bayesian quantile time series model for asset returns Fast and Flexible Bayesian Inference in Time-varying Parameter Regression Models Inducing sparsity and shrinkage in time-varying parameter models Dynamic shrinkage in time-varying parameter stochastic volatility in mean models A multi-country dynamic factor model with stochastic volatility for euro area business cycle analysis Bayesian Analysis of Stochastic Volatility Models Gibbs sampling methods for Bayesian quantile regression Sparse signal shrinkage and outlier detection in high-dimensional quantile regression with variational Bayes Markov switching quantile autoregression Federal Reserve System, Finance and Economics Discussion Series A simple sampler for the horseshoe estimator Forecasting the distribution of economic variables in a data-rich environment Are macroeconomic variables useful for forecasting the distribution of US inflation? The changing transmission of uncertainty shocks in the US Estimating structural changes in regression quantiles The time-varying effect of monetary policy on asset prices When is Growth at risk? Time varying structural vector autoregressions and monetary policy Regression adjustment for noncrossing Bayesian quantile regression Comment on Sargent and Cogley's 'Evolving US Postwar Inflation Dynamics Why has US inflation become harder to forecast? A Bayesian nonparametric approach to inference for quantile regression Quantile regression in partially linear varying coefficient models Gaussian processes for machine learning The changing dynamics of US inflation persistence: A quantile regression approach Bayesian Multiple Quantile Regression for Linear Models Using a Score Likelihood Nonparametric inference for time-varying coefficient quantile regression Boosting high dimensional predictive regressions with time varying parameters Bayesian quantile regression The likelihood based on (2) allows for constructing an efficient Gibbs sampling algorithm, as discussed in Kozumi and Kobayashi (2011) . In particular, the conditional posterior distribution (with • denoting conditioning on all other parameters of the model and the data) for the auxiliary variable v pt is given by:where GIG is the generalized inverse Gaussian distribution. 8 The time-varying scale parameter σ pt can be sampled using a random walk Metropolis-Hastings algorithm similar to Jacquier et al. (2002) for the stochastic volatility model. It is worth mentioning that the approximation of the AL p distribution moves the scale parameter to the mean of the model, which implies that this algorithm shares commonalities with samplers of stochastic volatility in mean models.For notational simplicity, define h pt = log(σ pt ). The prior on the initial state is h p0 ∼ N (m 0 , ς 2 0 ), with m 0 = 0 and ς 2 0 = 1 in the empirical application. The conditional posterior of the initial state under this prior is given by h p0 ∼ N (m 0 ,S 0 ) and corresponding moments m 0 = ς 2 0 (m 0 /ς 2 0 + h p1 /ς 2 p ) andS 0 = (ς 2 0 ς 2 p )/(ς 2 0 + ς 2 p ). The state equation (5) Let r refer to the currently accepted draw of the respective quantity and define z pt = v pt /σ (r) pt , see Kozumi and Kobayashi (2011) for details. Combining the conditional priors with the likelihood