key: cord-0232870-ro6i7cqo authors: Iacopini, Matteo; Ravazzolo, Francesco; Rossini, Luca title: Proper scoring rules for evaluating asymmetry in density forecasting date: 2020-06-19 journal: nan DOI: nan sha: 7e3534c0c93713d1fd6b8bf5427b6d1496201e2e doc_id: 232870 cord_uid: ro6i7cqo This paper proposes a novel asymmetric continuous probabilistic score (ACPS) for evaluating and comparing density forecasts. It extends the proposed score and defines a weighted version, which emphasizes regions of interest, such as the tails or the center of a variable's range. The ACPS is of general use in any situation where the decision maker has asymmetric preferences in the evaluation of the forecasts. In an artificial experiment, the implications of varying the level of asymmetry are illustrated. Then, the proposed score is applied to assess and compare density forecasts of macroeconomic relevant datasets (unemployment rate) and of commodity prices (oil and electricity prices) with a particular focus on the recent COVID crisis period. Macroeconomic forecasting has always been of pivotal importance for central bankers, policymakers and researchers. Nowadays, the vast majority of the research in macroeconomics and finance mainly focuses on the development and implementation of forecasting techniques that minimize the expected squared forecast error (Gneiting (2011) ). This approach is grounded on the implicit assumption of using a symmetric loss function in evaluating the accuracy of a forecast. Despite being common practice, the use of symmetric loss functions in forecasting is unrealistic especially in policy institutions, where the policymakers could have a specific aversion to positive or negative deviations of a forecast from the target. Consider a policymaker who is interested in forecasting unemployment. Suppose that, if the predicted unemployment rate exceeds a given threshold, she will be forced to adopt new expansionary economic policy. It is highly likely that the policymaker is more averse to forecasts that overestimate the unemployment rate, while she may be more relaxed with respect to forecasts that underestimate it. This calls for the design of a more general class of loss functions and scoring rules that account for asymmetry, in order to guide the process of making and assessing forecasts. To the best of our knowledge, a measure that properly incorporates asymmetry in density forecasting evaluation does not exist in the literature. The main goal of this paper is the proposal of novel and practical forecasting evaluation tools that can fill in this gap and answer the increasing demand from policymakers and central bankers. We plan to achieve this result by introducing an innovative asymmetric scoring rule that is able to measure and evaluate heterogeneous aversion to different deviations of a density forecast from the target. We derive some properties of the new scoring rule and in particular demonstrate that it is a proper scoring rule. Moreover, we provide threshold-and quantileweighted versions that allow to emphasize the performance of the forecast in regions of interest to the policymaker. Within the literature on point forecasting, Diebold (1996, 1997) proposed some asymmetric loss functions. In the former paper, they studied the optimal prediction problem under general loss structures and characterized the optimal predictor under an asymmetric loss function, focusing on the linex and the linlin asymmetric functions. In the latter paper, the authors provided an illustration of an asymmetric loss in the context of GARCH processes. More recently, scholars have begun to empirically investigate the degree of loss function asymmetry of central banks and other international institutions. Among others, Elliott et al. (2005) and Patton and Timmermann (2007) proposed formal methods to infer the degree of asymmetry of the loss function and to test the rationality of forecasts. Within this stream of literature, Artis and Marcellino (2001) found that IMF and OECD forecasts of the deficit of G7 countries are biased towards over or under-prediction relative to mean square error (MSE) forecasts. Regarding European institutions forecasts, Mamatzakis (2008, 2009) found evidence of asymmetric loss. In another study, Dovern and Jannsen (2017) documented that the GDP growth forecasts made by professional forecasters tend to exhibit systematic errors, and tend to overestimate GDP growth. Moreover, Boero et al. (2008) interpreted the tendency to over-predict GDP growth as a signal that policymakers exhibit greater fear of under-prediction than over-prediction, thus suggesting that their judgements are based on an asymmetric loss. More recently, Tsuchiya (2016) examined the asymmetry of the loss functions of the Japanese government, the IMF and private forecasters for Japanese growth and inflation forecasts. In the framework of forecast combination, Elliott and Timmermann (2004) showed that the optimal combination weights significantly differ under asymmetric loss functions and skewed error distributions as compared to those obtained with mean squared error loss. Finally, Demetrescu and Hoke (2019) studied factor-augmented forecasting under asymmetric point loss function. An alternative and more universal approach to forecasting is the provision of a predictive density, known as probabilistic or density forecasting (see Elliott and Timmermann (2016a, ch.8) ). Two key aspects of density forecasts are the statistical compatibility between the forecasts and the realized observations (calibration) and the concentration of predictive distributions (sharpness). The aim of probabilistic forecasts is to maximize their sharpness, subject to calibration (Gneiting and Ranjan (2013) ). Density forecasting is more complex than point forecasting since the estimation problem requires to construct the whole predictive distribution, rather than on a specific function thereof (e.g., mean or quantile). Several reasons have been suggested for preferring density over point forecasts (e.g., Elliott and Timmermann (2016b) ). First, point forecasting is often associated to the mean of a distribution and it is optimal for highly restricted loss functions, such as quadratic loss function, but insufficient for any prospective user having a different loss. Moreover, the value of a point forecast can be increased by supplementing it with some measure of uncertainty and complete probability distributions over outcomes provide information helpful for making economic decisions; see, for example, Anscombe (1968) and Zarnowitz (1969) for early works and the discussions in Granger and Pesaran (2000) , Timmermann (2006) and Gneiting (2011) . Finally, in recursive forecasting with nonlinear models the full predictive density matters since the nonlinear effects typically depend not only on the conditional mean, but also on where future values occur in the set of possible outcomes. A natural way to evaluate and compare competing density forecasts is the use of proper scoring rules, which assess calibration and sharpness simultaneously and encourage honest and careful forecasting. Despite the wide literature on the class of proper scoring rules for probabilistic forecasts of categorical and binary variables (e.g., see Savage (1971 ), Schervish (1989 ) the advances for continuous variables are more limited. Motivated by these facts, we aim at designing a novel asymmetric proper scoring rule to be used for evaluating density forecasts of continuous variables, which is the typical case in macroeconomics and finance exercises (e.g., predicting variables such as unemployment, inflation, log-returns, GDP growth, and realized volatility). Gneiting and Raftery (2007) proposed the continuous rank probability score (CRPS) as a proper scoring rule for probabilistic forecasts of continuous variables, and more recently, Gneiting and Ranjan (2011) extended the CRPS by introducing a threshold-and a quantileweighted version (tCRPS and qCRPS, respectively). These scoring rules give more emphasis to the performance of the density forecast in a selected region of the domain, B, by assigning more weight to the deviations from the observations made in B. The major drawback of both the CRPS and its weighted versions is the symmetry of the underlying reward scheme, meaning that they assign equal reward to positive and negative deviations of a probabilistic forecast from the target. This comes from the fact that the CRPS is built on the Brier score and inherits some of its properties, such as properness and symmetry. Similarly, since both the weighted versions of the CRPS essentially consist in re-weighting the CRPS over the domain of the variable of interest, they inherit the symmetry of the latter. Winkler (1994) did a first effort towards asymmetric scoring rules and proposed a general method for constructing asymmetric proper scoring rules starting from symmetric ones. However, this approach is limited to forecasting binary variables, and continuous variables were not investigated. We address this issue and contribute to the literature on proper scoring rules for evaluating density forecasts by proposing a novel asymmetric proper scoring rule which assigns different penalties to positive and negative deviations from the truth density. The main contribution of this paper is twofold. First, we define a new proper scoring rule which assigns an asymmetric penalty to deviations from the target density. Moreover, we provide a threshold-and quantileweighted version of it. Then, we compare the performance of the scores with the CRPS and its weighted versions. Second, we use the proposed score to evaluate density forecasts in three relevant applications in macroeconomics (unemployment) and commodity prices (oil and electricity prices) with data updated to the COVID crisis period. Variables have experienced large volatilities, with sizeable spikes and negative energy prices. The key result of this paper is the provision of a tool able to account for the decision maker's preferences in the evaluation of density forecasts, both in terms of domain-and error-weighting schemes. Domain-weighting gives heterogeneous emphasis to the performance on different regions, while the error-weighting asymmetrically rewards negative and positive deviations from the target value. The proposed weighted asymmetric scoring rule combines the two schemes and allows to evaluate the performance of the forecasting density from both perspectives. The rest of the paper is organized as follows. Section 2 presents a novel asymmetric scoring rule for density forecasts. Then Section 3 discusses its main properties. It also illustrates a comparison with the (weighted) CRPS in simulated experiments. Finally, Section 4 provides different applications on forecasting US macroeconomic variables (unemployment rate) and commodity prices (oil and electricity prices). The article closes with a discussion in Section 5. The MATLAB code for implementing the proposed scoring rules is available at: https://github.com/matteoiacopini/acps 2 Asymmetric Proper Scoring rules for Density forecasting The evaluation and comparison of probabilistic forecasts typically relies on proper scoring rules. Informally, a scoring rule is a measure that summarises the goodness of a probabilistic forecast by combining the predictive distribution and the value that actually materializes. One can think of it as a measure of distance between the probabilistic forecast and the actual value. We consider positively oriented scoring rules, therefore if probabilistic forecast P 1 obtains a higher score than P 2 , this means that P 1 yields a more accurate forecast than P 2 . Therefore, the score can be interpreted as a reward to be maximized. In more formal terms, following the notation of Gneiting and Raftery (2007) , consider the problem of making probabilistic forecasts on a general sample space Ω. Let A be a σ-algebra of subsets of Ω, and let P be a convex class of probability measures on (Ω, A). A probabilistic forecast is any probability measure P ∈ P, such that P : Ω →R, whereR = [−∞, +∞] denotes the extended real line, is said to be P-quasi-integrable if it is measurable with respect to A and is quasi-integrable with respect to all P ∈ P (see Bauer (2011)) . A scoring rule is any extended real-valued function S : P × Ω →R such that S(P, ·) is P-quasi-integrable for all P ∈ P. In practice, if P is the forecast density and the event ω materializes, then the forecaster's reward is S(P, ω). In order to be effectively used in scientific forecasts evaluation, scoring rules have to be proper, meaning that they have to reward accurate forecasts. Suppose the true density of the observations is Q and denote the expected value of S(P, ω) under Q(ω) with then the scoring rule S is strictly proper if S(Q, Q) ≥ S(P, Q). The equality holds if and only if P = Q, thus implying that the forecaster has higher reward if she predicts P = Q. If instead S(Q, Q) ≥ S(P, Q) for all P and Q, then the scoring rule is said to be proper. The vast majority of the proper scoring rules proposed in the literature are symmetric, that is, they reward in the same way positive and negative deviations from the target. For example, suppose a forecast P 1 overestimates the true density and a forecast P 2 underestimates it by the same amount. If these forecasts are evaluated under a symmetric scoring rule, then they receive the same score. A symmetric loss is unsatisfactory for many real world situations where the decision maker has a preference or aversion towards a particular kind of error. We aim at filling in this gap by defining a new asymmetric proper scoring rule for continuous variables, which is suited for evaluation and comparison of density forecasts and penalises more either side of the deviation from the target. Definition 1 (Asymmetric Continuous Probability Score). Let c ∈ (0, 1) represent the level of asymmetry, such that c = 0.5 implies a symmetric loss, while c < 0.5 penalises more the left tail, and c > 0.5 the right tail. Let P be the probabilistic forecast and y the realized (ex-post) value. We define the asymmetric continuous probability score (ACPS) as (1) The following result shows the properness of our new score for every level of asymmetry. Theorem 1 (Properness). The asymmetric scoring rule ACPS defined in eq. (1) is strictly proper for any c ∈ (0, 1). Proof. The strict properness derives from the fact that ACPS can be obtained from the quadratic score for binary outcomes, which is strictly proper, via two transformations that preserve properness, see Winkler (1994) and Matheson and Winkler (1976) . Specifically, let p ∈ (0, 1) be a probabilistic forecast of success in a binary experiment and let S be the quadratic rule, that is Notice that S(p) is a strictly proper and symmetric scoring rule. Following Winkler (1994), one can obtain a strictly proper asymmetric scoring rule for binary outcomes via the transformation where c ∈ (0, 1) denotes the level of asymmetry. Then, following Matheson and Winkler (1976) , to obtain an asymmetric scoring rule for continuous variables, we assume that the subject assigns a probability distribution function P (x) to a continuous variable of interest. Fix an arbitrary real number u to divide the real line into two intervals, I 1 = I(−∞, u] and I 2 = I(u, ∞), and define a success the event that u falls in I 1 . Since P (u) ∈ (0, 1) for any u ∈ R, we can evaluate the binary scoring rule S A c at p = P (u), thus obtaining a different value S A c (P (u)) for each u. Finally, the dependence of the scoring rule on the arbitrary value of u is removed by integrating over all u, which yields eq. (1). Notice that one can obtain a different (strictly) proper asymmetric scoring rule as long as the baseline score is (strictly) proper. The integrals in eq. (1) can be numerically approximated by truncating the domain to [u min , y] and [y, u max ] such that where (w y To get an insight of the shape of the ACPS for varying levels of asymmetry, c, we consider two examples: one with several probabilistic forecasts and the other with a fixed forecast. Example 1. Let us consider several Gaussian probabilistic forecasts P . In Fig. 1 we show the value of the score on a range of asymmetry values c ∈ {0.05, 0.275, 0.50, 0.725, 0.95}, for a given observation y whose true density is a standard Gaussian. When the density forecast is Gaussian with the same mean as the target, the score is an inverse U-shaped function of the asymmetry level c. This is essentially due to the symmetry of the Gaussian distribution around its mean, since the probability mass in excess on the right tail is exactly equal to the mass lacking on the left one. However, notice that a higher score is assigned to N (0, 1), as compared to N (0, 16). Instead, the density forecasts N (−3, 1) and N (3, 1) receive a high penalty for high and small levels of c, respectively. This shows that values of c close to 1 heavily penalise forecasting densities that put more mass on the left part of the support as compared to the target, and conversely for values of c close to 0. [Insert Figure 1 here] Example 2. Let us consider an alternative case when we keep fixed the probabilistic forecast to N (2, 1) and inspect the value of the ACPS for alternative target densities. As expected (see Fig. 2 ), when the true density assigns more mass on the left part of the support as compared to the N (2, 1), the forecast receives a very low score especially for c close to 0. Conversely, when the underlying true density is N (3, 1) the forecast receives a higher reward for c = 0.05, since its CDF is basically a left-shifted version of the target. [Insert Figure 2 here] In addition to asymmetric preferences towards under-or overestimation, a decision maker is usually concerned with a precise forecast in a specific range of all possible values. Therefore, it is important to have a tool that allows to assign heterogeneous weights to various regions of the set of possible values of the variable. This calls for a scoring rule able to account for both error-weighting, i.e. asymmetric preferences and domain-weighting of density forecasts. Gneiting and Ranjan (2011) modified the CRPS by re-weighting the loss according to a user-specified weight function, which allows to select the regions where the decision-maker has greater concern. By exploiting the representation of the CRPS in terms of quantile functions, they define a threshold-weighted (tCRPS) and quantile-weighted (qCRPS) score functions as follows tCRP S(P, y) = qCRP S(P, y) = where w(z) ≥ 0 and v(α) ≥ 0 are the weight functions and level α ∈ (0, 1). Table 1 reports some examples of weighting functions for the case of real-valued variables of interest; notice that the uniform weight, w(z) = 1 and v(α) = 1, leads to the standard CRPS. See Lerch et al. (2017) for discussion and applications of these scoring rules. [Insert Table 1 here] The definition of ACPS in (1) can be modified to address this issue and obtain a thresholdweighted and a quantile-weighted asymmetric scoring rule, as follows. Definition 2 (Threshold-weighted ACPS). Let G(du) be a positive measure 1 . We define the threshold-weighted asymmetric continuous probability score (tACPS), as where c ∈ (0, 1) is the level of asymmetry and P is the probabilistic forecast and y the value that materializes. Definition 3 (Quantile-weighted ACPS). Let p(u) denote the probability density function of P (u) and let P −1 (α) be the corresponding quantile function at α ∈ [0, 1]. Let V (dα) be a positive measure on the unit interval. We define the quantile-weighted asymmetric continuous probability score (qACPS), as qACP S(P, y; c) = 1 P (y) As stated for ACPS, we can provide evidence of the properness of the two novel scores defined in eq. (5) and eq. (6). Theorem 2 (Properness of tACP S, qACP S). For any c ∈ (0, 1), it holds: a) the threshold-weighted asymmetric continuous probability score tACP S in eq. (5) is strictly proper; b) the quantile-weighted asymmetric continuous probability score qACP S in eq. (6) is strictly proper. Proof. The result follows from Theorem 1 and Matheson and Winkler (1976) . Both tACPS and qACPS can be computed by approximating eq. (5) and eq. (6) in a way analogous to eq. (2). The main advantage of the tACPS and qACPS consists in the ability to consider two levels of asymmetry: in terms of the loss at each point, and over different 1 Notice that G(du) is not required to be a probability measure. regions of the domain. This is fundamental to answer the need of the decision maker who is concerned with the performance of the forecast in a given interval of possible values (e.g., the right tail) and who has an aversion to particular deviations from the target (e.g., averse to underestimation). Tab. 2 provides a summary of some key differences between the CRPS and ACPS, and the corresponding weighted versions. [Insert Table 2 here] Remark 2 (Multivariate case). The proposed asymmetric scores can be easily generalized to multivariate settings. To this aim, denote with Q the class of the Borel probability measures on R n and let F ∈ Q be a probabilistic forecast identified via its cumulative distribution function, P . Let c ∈ (0, 1) represent the level of asymmetry and denote with y = (y 1 , . . . , y n ) ′ the multivariate value that materializes. The multivariate version of the asymmetric continuous probability score is defined as where du = du 1 · · · du n . Moreover, one can define a multivariate threshold-weighted ACPS by substituting the product of Lebesgue measures in eq. (7) with a positive measure G(du) on R n . This section investigates the performance of the proposed asymmetric scoring rule and compares it with the CRPS. In order to assess the good performance of our measure, we consider different target densities: (i) Gaussian, (ii) Student-t, (iii) Gamma, (iv) Beta. This with a Gaussian and a Student-t target, respectively. Both figures show that the ACPS rewards the forecast density which corresponds to the ground truth, for all levels of asymmetry. In addition, we find that the ranking of the competing probabilistic forecasts changes according to the value of c, due to the different penalty assigned to asymmetric deviations from the target. [Insert Fig. 4 here] To investigate further this aspect, Fig. 5 presents the ranking of forecasts when none of the candidates corresponds to the true density, which is N (2, 4) . The CRPS indicates N (3, 1) as the best forecast, as does the ACPS for values of c around 0.5. However, when the ACPS assigns more weight to the asymmetric loss, that is for c close to the boundary of (0, 1), the ranking significantly changes. For c = 0.05, that is when great importance is given to underestimation of the target, the N (0, 1) is preferred, while N (0, 16) is the best for the opposite case, when c = 0.95. [Insert Fig. 5 here] Many economic and financial variables in levels are inherently positive (e.g. GDP, volatility) or take values on a bounded interval (e.g., interest rate, unemployment rate). To account for these cases, we investigate the performance of the ACPS in simulated experiments where the target density is either Gamma or Beta. [Insert Fig. 6 here] Fig. 6 presents the results for a Ga(2, 1) target density. By looking at the worst performing densities according to ACPS, we find that Ga(1, 1) is assigned the highest penalty for values c ≤ 0.725, while Ga(1, 2) becomes the worst for c = 0.95. This reflects that for c ≤ 0.725, the asymmetric score penalizes more the underestimation, while for c = 0.95 it gives more weight to overestimation. Similar results are found in Fig. 7 with a positively skewed Beta target density, Be(1, 2). [Insert Fig. 7 here] We deep further the properties of the proposed asymmetric scoring rule by considering a threshold-weighted version and comparing it with the threshold-weighted CRPS. The goal is to disentangle the different role of the domain-weighting scheme, which reflects the interest of the decision-maker in having good forecasts within a specific interval of values, and of the error-weighting scheme, which corresponds to the decision-maker's loss in case of under or overestimation. Consider a simulated experiment where N = 100 observations are drawn from a Normal distribution N (1, 4) and several forecasting densities are approximated using M = 500 draws. We consider the domain-weighting schemes in Tab. 1, using 5 alternative asymmetry levels c ∈ {0.05, 0.275, 0.50, 0.725, 0.95}. In Tab. 3 we find that the asymmetric penalty imposed by ACPS plays a significant role for all domain-weighting schemes considered. For an uniform weight, the ACPS agrees with the CRPS for c = 0.5, i.e. the symmetric case, but rewards differently the density forecasts for alternative values of the asymmetry level c. When the interest is focused on the right tail of the distribution, both threshold-weighted CRPS and ACPS agree, but when the attention is on the left tail, the two scoring rules perform remarkably different. The CRPS favours the standard Normal over the N (3, 1) , while the ACPS rewards the latter for all c ≥ 0.275. The key insight obtained from this simulated exercise concerns the importance of domainand error-weighting schemes. The first assigns an heterogeneous weight to the performance on different intervals, while the latter asymmetrically rewards negative and positive deviations from the true value. The threshold-weighted asymmetric scoring rule, tACPS, combines the two schemes and allows to evaluate the performance of the forecasting density from both perspectives. This is important to the decision makers, who are usually interested in a specific range of all possible values, thus calling for heterogeneous domain-weighting, and have asymmetric preferences towards under or overestimation, which motivates an asymmetric score. [Insert Table 3 here] In the empirical applications, we adopt a similar framework to Amisano and Giacomini (2007) and Gneiting and Ranjan (2011) , and consider the task of comparing density forecasts in a time series context. We use a fixed-length rolling window to provide a density forecast for h step ahead future observations. We focus on three different applications related to macroeconomics (e.g. employment growth rate) and to commodity prices (oil prices and electricity prices). We compare several univariate models, such as the autoregressive (AR) model, the Markovswitching (MS) AR model and the time-varying parameter (TVP) AR model. We use the AR(1) as benchmark model, then we specify 12 lags for the employment growth rate (i.e., 1 year of monthly observations) and 20 lags for the oil (i.e., 1 month of daily observations). Regarding the electricity prices, we include 7 lags (i.e., 1 week of daily observations) and by following common practice in the literature, we restrict lags to t − 1, t − 2, and t − 7, which correspond to the previous day, two days before, and one week before the delivery time. For the MS-AR model we consider only 1 lag, while for the TVP-AR model we use 1 and 2 lags. For both AR and TVP-AR, we consider three specifications of the variance: constant volatility and time-varying volatility in the form of stochastic volatility with Gaussian and Student-t error. For the MS-AR, we impose an identification constraint on the error variance. In the first application, we aim at forecasting monthly US total nonfarm seasonally adjust employment growth rate downloaded from the FRED database. We consider the growth rate of the monthly employment rate in US from January 1980 to April 2020. We see evidence of some spikes, in particular with a strong fall in April 2020 due to present COVID situation (see Figure 2 available in the supplementary material). We use a rolling window approach of 20 years (thus 240 observations) and we forecast h = 1 and h = 12 (thus 1 year ahead) month ahead by using a recursive forecasting exercise. For oil prices, we analyse daily West Texas Index (WTI) data (no weekends) from 02 January 2012 to 07 May 2020 in order to include in the analysis the recent turmoil. Indeed, large drops in demand that suddenly occurred and storage scarcity have resulted in negative WTI oil prices at the end of April 2020. As for the unemployment rate, we have used a rolling window of 4 years and we forecast h = 1 and h = 5 day ahead by applying recursive techniques. In the third application, we consider the problem of forecasting the day-ahead electricity prices in Germany, one of the largest and leading energy market. In the electricity markets, the phenomenon of negative prices -when allowed to occur such as in Germany where there is not a floor price-has become more frequent due to the increasing share of electricity generated from renewable energy sources (RES) and the current impossibility to store it (see Figure 2 in the supplementary material). Indeed, worldwide energy policies have supported, and they are still fostering, green generation to reduce carbon emissions and mitigate the climate change. From a technical point of view, RES have induced prices to be null or even negative. We analyse daily data (with weekends) from 01 January 2014 to 08 May 2020. For the forecasting analysis, we have considered a rolling window of 3 years and a recursive techniques for predicting h = 1 and h = 7 days ahead. Tab. 4 shows the ranking of the probability forecasts over vintages and across models for all the three datasets. Regarding the employment growth rate, we can see that the best model over vintages is the TVP-AR with 2 lags but with different time-varying volatility. In particular, if we consider an asymmetry level c less than 0.5, we have that the Gaussian stochastic-volatility model is performing better at h = 1 step ahead, while the Student-t stochastic volatility is the bets with c greater or equal to 0.5. In Fig. 8 , we report the best model in each vintage for the two horizon ahead, where the black line refers to the CRPS, the red and the yellow for the ACPS for c = 0.05 and for c = 0.95, respectively. The graph shows large instability in the best model, in particular when using the CRPS. The ACPS rules seem to prefer one of the alternative models for more consecutive vintages. For example, by looking at the relative frequency of occurrence of each model as the best model, we find that for c = 0.05, 31% times the AR(12)-SV is considered the best model for h = 1 and h = 12, while for c = 0.95 the best model changes across the horizons moving from the AR(12)-SV for 1 month ahead to the TVP-AR(1) for 1 year ahead, whereas they are not the best specifications on average. Therefore, the uncertainty in the best model remains large with all scoring rules. Moving to the oil prices, in the middle panel of Tab. 4, we find that across vintages, for 1 day ahead the TVP-AR(2) with Gaussian stochastic-volatility is almoast always the best model, for all the asymmetric levels c. These results completely change when we consider 1 week ahead of forecasting, where the best models for c < 0.5 are the non-linear models and the AR with 20 lags, while for c ≥ 0.5 the best model is the benchmark. Fig. 9 confirms the large uncertainty in the ordering of the best model and the ACPS is less variable in this selection than the CRPS. Despite showing the results for a single model, this figure presents some interesting insights. By looking at the scores between April 17 and April 21, we find that the forecast is worst performing for c = 0.05 and best for c = 0.95, indicating that the density forecast assigns more mass on the right part of the support as compared to the density of the observations. This situation is similar to the yellow line in Fig. 1 . Surprisingly, the ranking is reversed between April 21 and April 24, where the forecast receives a higher score under c = 0.05. This suggests that the density forecast is likely to be a right-shifted version of the observation density, similarly to the blue line in Fig. 1 . These results highlight how accounting for asymmetry in forecast evaluation may lead to dramatically different implications. By looking at the period until April 21, a decision maker averse to overestimation of oil price is likely to discard the AR(20) in favour of alternative models for making forecasts. Conversely, another agent facing the same decision problem, equipped with the same data and models, but averse to underestimation is likely to agree to the AR(20). Moreover, these insights provide an important value added of the ACPS as compared to symmetric scores. By looking at variation of the ranking according to the ACPS over time, it is possible to infer the relative dynamics of the forecasting and observation densities. In the case previously mentioned, between April 17 and April 21 the forecast tends to overestimate (i.e., its CDF is to the right of the observations CDF), while it tends to underestimate between April 21 and April 24 (i.e., its CDF is to the left of the observations CDF). Under a symmetric score it is not possible to grasp these insights since negative and positive deviations from the target are equally penalized. The bottom panel of Tab. 4 reports the results for the electricity prices. As in the previous cases, TVP-AR models provide more accurate forecasts, but here there is more uncertainty on the lag and error specification. The high volatility, spikes and negative prices of the electricity prices drive different results depending on the level of asymmetry of the user. At h = 1 and c = 0.05, the TVP-AR(2) is the best model, for higher values of c, the TVP-AR(2)-SV is the preferred one. At h = 12 the AR(7)-SV for c = 0.05, the TVP-AR(1)-tSV for c = 0.275 and 0.5, and the TVP-AR(2)-SV for c > 0.5 give the highest ACPS. Fig. 11 again indicates more stable performance of some models when accounting for asymmetry relative to use the symmetric CRPS. This paper has introduced a novel asymmetric proper score for probabilistic forecasts of continuous variables, the ACPS. Its main application is the evaluation and comparison of density forecasts. In addition, we have proposed a threshold-and quantile-weighted version of the asymmetric score, which, by reweighing the domain, allow for a further level of asymmetry in the evaluation of forecasts. The definition of ACPS is sufficiently flexible to be used in a variety of univariate contexts, and carries over to the multivariate case. The latter deserves further investigation and is an open field for future research. We provide a tool able to account for the decision maker's preferences in the evaluation of density forecasts both in terms of domain-and error-weighting schemes. In an artificial data exercise, we have shown the good performance of our proposed asymmetric score for different continuous target distributions. In relevant macroeconomic and energy applications, we evaluate our score across different models and for different horizons and we improve on the quality of the forecasts by providing an effective tool for density forecast evaluation. However, the proposed score, ACPS, is of general use in any situation where the decision maker has asymmetric preferences in the evaluation of forecasts and thus it can be applied to a much wide range of applications. Table 1 : Examples of weight functions for threshold-weighted and quantileweighted CRPS, and variables supported on the real line. φ, Φ denote the probability density and cumulative distribution functions of the standard Normal distribution, respectively, with x ∈ R and α ∈ (0, 1). Emphasis Threshold weight function Quantile weight function uniform N (1, 4) , forecasting densities are N (0, 1), N (−3, 1), N (3, 1), N (0, 16). Comparing density forecasts via weighted likelihood ratio tests Topics in the investigation of linear relations fitted by the method of least squares Fiscal forecasting: The track record of the IMF, OECD and EC Measure and integration theory Evaluating a three-dimensional panel of point forecasts: the Bank of England Survey of external forecasters An assessment of the EU growth forecasts under asymmetric preferences Assessing the prudence of economic forecasts in the EU Further results on forecasting and model selection under asymmetric loss Optimal prediction under asymmetric loss Predictive regressions under asymmetric loss: Factor augmentation and model selection Systematic errors in growth expectations over the business cycle Optimal forecast combinations under general loss functions and forecast error distributions Economic Forecasting Forecasting in economics and finance Estimation and testing of forecast rationality under flexible loss Making and evaluating point forecasts Strictly proper scoring rules, prediction, and estimation Comparing density forecasts using threshold-and quantileweighted scoring rules Combining predictive distributions Economic and statistical measures of forecast accuracy Forecaster's dilemma: Extreme events and forecast evaluation Scoring rules for continuous probability distributions Testing forecast optimality under unknown loss Elicitation of personal probabilities and expectations A general method for comparing probability assessors Forecasting density N (0, 1) N (−3, 1) N (3, 1) N (0, 16) CRPS 1 4 3 2 ACPS(·, ·; 0.05) 1 2 4 3 ACPS(·, ·; 0.275) 1 3 4 2 ACPS(·, ·; 0.5) 1 4 3 2 ACPS(·, ·; 0.725) 1 4 3 2 ACPS(·, ·; 0.95) 1 4 2 3Figure 3: Ranking of probabilistic forecasts. Results from S = 1 simulation of N = 100 observations. Density estimated with M = 500 draws from forecasting distribution. Target is N (0, 1) (black), forecasting densities are: N (0, 1) (blue), N (−3, 1) (orange), N (3, 1) (yellow), N (0, 16) (purple).Figure 4: Ranking of probabilistic forecasts. Results from S = 1 simulation of N = 100 observations. Density estimated with M = 500 draws from forecasting distribution. Target is t(0, 1, 5) (black), forecasting densities are: t(−3, 1, 3) (blue), t(2, 1, 3) (orange), t(0, 1, 5) (yellow), t(4, 1, 15) (purple).Forecasting density N (0, 1) N (−3, 1) N (3, 1) N (0, 16) CRPS 3 4 1 2 ACPS(·, ·; 0.05) 1 3 4 2 ACPS(·, ·; 0.275) 2 4 1 3 ACPS(·, ·; 0.5) 3 4 1 2 ACPS(·, ·; 0.725) 3 4 1 2 ACPS(·, ·; 0.95) 3 4 2 1 Figure 5 : Ranking of probabilistic forecasts. Results from S = 1 simulation of N = 100 observations. Density estimated with M = 500 draws from forecasting distribution. Target is N (2, 4) (black), forecasting densities are: N (0, 1) (blue), N (−3, 1) (orange), N (3, 1) (yellow), N (0, 16) (purple). Figure 6 : Ranking of probabilistic forecasts. Results from S = 1 simulation of N = 100 observations. Density estimated with M = 500 draws from forecasting distribution. Target is Ga(2, 1) (black), forecasting densities are: 3 6 8 1 10 4 13 9 11 12 7 CRPS 8 11 1 5 6 2 3 4 13 12 7 10 9 EEX AR(1) AR(1)-SV AR(1)-tSV AR (7) AR (7)-SV AR(7)-tSV AR(1)-MS TVP-AR(1) TVP-AR(1)-SV TVP-AR(1)-tSV TVP-AR(2) TVP-AR(2)-SV TVP-AR(2)-tSV Horizon 1 ACPS(·, ·; 0.05) 6 13 5 10 12 3 9 2 4 11 1 7 8 ACPS(·, ·; 0.275) 6 5 13 12 4 10 9 11 8 2 3 1 7 ACPS(·, ·; 0.5) 13 5 12 10 6 9 4 11 8 2 3 1 7 ACPS(·, ·; 0.725) 12 10 9 13 5 6 11 8 4 2 3 1 7 ACPS(·, ·; 0.95) 12 5 9 13 10 6 2 3 8 11 4 1 7 CRPS 5 13 12 10 6 9 4 11 8 2 3 7 1Horizon 7 ACPS(·, ·; 0.05) 4 11 8 6 1 13 7 10 9 12 5 3 2 ACPS(·, ·; 0.275) 13 10 11 12 9 8 6 4 5 1 7 2 3 ACPS(·, ·; 0.5) 13 10 12 9 11 8 6 5 4 1 7 2 3 ACPS(·, ·; 0.725) 13 10 11 12 8 9 6 5 4 3 2 1 7 ACPS(·, ·; 0.95) 8 11 13 10 6 12 5 9 4 3 2 1 7 CRPS 13 10 12 9 11 8 6 5 4 1 7 2 3