key: cord-0183026-47ghxd51
authors: Caballero, Renzo; Kebaier, Ahmed; Scavino, Marco; Tempone, Ra'ul
title: A Derivative Tracking Model for Wind Power Forecast Error
date: 2020-06-29
journal: nan
DOI: nan
sha: 1524f2f571cfaa9464f19baa204c17c7737a82e6
doc_id: 183026
cord_uid: 47ghxd51

Reliable wind power generation forecasting is crucial for applications such as the allocation of energy reserves, optimization for electricity price, and operation scheduling of conventional power plants. We propose a data-driven model based on parametric Stochastic Differential Equations (SDEs) to capture the real asymmetric dynamics of wind power forecast errors. Our SDE framework features time-derivative tracking of the forecast, time-varying mean-reversion parameter, and an improved state-dependent diffusion term. The methodology we developed allows the simulation of future wind power production paths and to obtain sharp empirical confidence bands. All the procedures are agnostic of the forecasting technology, and they enable comparisons between different forecast providers. We apply the model to historical Uruguayan wind power production data and forecasts between April and December 2019.

Despite the Covid-19 pandemic causing "the biggest shock to the global energy system in more than seven decades" the recent report released on April 30, 2020, by the International Energy Agency (https://www.iea.org/reports/globalstresses that projections of energy demand will fall 6% in 2020 -seven times the decline after the 2008 global financial crisis -It is emphasized that "renewables are set to be the only energy source that will grow in 2020, with their share of global electricity generation projected to jump thanks to their priority access to grids and low operating cost" (IEA Press release nodate).

Although the growth of renewables in electricity generation in 2020 is set to be lower than in previous years, solar photo-voltaic (PV) and wind are on track to help lift renewable electricity generation by 5% in 2020, aided by higher output from hydropower.

Wind and solar energy are expanding renewable generation capacity, experiencing record growth in the last years. Sultana et al. 2017 . We recall the importance of accurate forecasts to use green energies optimally.

Reliable wind power generation forecasting is crucial for the following applications (see, for example, Giebel et al. 2011 , p. 5, Chang 2014 , p. 162, Zhou et al. 2013 ):

• Allocation of energy reserves such as water levels in dams or oil, and gas reserves.

• Operation scheduling of controllable power plants.

• Optimization of the price of electricity for different parties such as electric utilities, Transmission system operator (TSOs), Electricity service providers (ESPs), Independent power producers (IPPs), and energy traders.

• Maintenance planning such as that of power plants components and transmission lines.

Different methods have been applied to wind power forecasting. They can be generally categorized as follows: physical models, statistical methods, artificial intelligence methods, and hybrid approaches. The output of such methods is usually a deterministic forecast. Occasionally probabilistic forecasts are produced through uncertainty propagation in the data, parameters, or forecast ensembles. However, there is a lack of simulating and producing data-driven stochastic forecasts based on forecasting models. It is crucial to capture the forecast's actual performance as it has been known that different forecasting technologies exhibit different behavior for different wind farms and seasons (Chang 2014) . This is due to many factors that forecast are challenged to capture, such as the surrounding terrains of the wind farm and the condition of the blades such as icing, wear and tear, or dirt. It is known that complex terrains in both off shore and on shore locations decrease the accuracy of wind power forecasts significantly (Schicker et al. 2017 ). It also has been shown that the performance of forecasts varies from month to month. Thus the performance of wind power forecasts is location and time dependent.

Many approaches have been taken to evaluate the uncertainty of a given forecast. There are two types of errors: level errors and phase errors. The use of mean or median errors in this context may be misleading as wind power forecast errors are asymmetric. This is a natural consequence of wind power being non-negative and bounded by the maximum capacity of production. This is important as the associated cost to power forecast errors is also asymmetric due to different costs for up and down power regulations, which are determined by the electricity market (Tsitsiklis and Yunjian Xu 2015) . We propose to model wind power forecast errors using parametric stochastic differential equations (SDEs) whose solution defines a stochastic process. This resultant stochastic process describes the time evolution dynamics of wind power forecast errors while capturing properties such as a correlation structure and the inherent asymmetry. Additionally, the model we propose is agnostic of the forecasting technology and serves to complement forecasting procedures by providing a data-driven stochastic forecast. Hence, we can evaluate wind power forecasts according to their real-world performance, and we can compare different forecasting technologies. Most notably, we can simulate future wind power production given a deterministic wind power forecast. Future wind power production using Monte Carlo methods, as well as the analytic form of the proposed SDE, can be used in optimal control problems involving wind power production.

Some interesting works have been devoted to probabilistic forecasting related to renewable energies based on stochastic differential equations, among them (Møller, Zugno, and Madsen 2016) on wind power forecast, (Iversen et al. 2014) and (Badosa et al. 2018 ) on forecasts of solar irradiance. Here, we propose an improved model featuring time derivative tracking of the forecast, time-dependent mean reversion, modified diffusion, and non-Gaussian approximations. We apply the model to Uruguayan wind power forecasts together with historical wind power production data pertaining to the year 2019.

The rest of the paper is organized as follows. In Section 2, we introduce the main characteristics of a real data set encompassing the normalized wind power production in Uruguay between April and December 2019, with the most accurate predictions, as highlighted in our posterior analysis, performed by one out of the three sources of forecast providers. The significant steps for constructing phenomenological models of the normalized wind power production and the forecast error based on stochastic differential equations are described in Section 3. The application of the Lamperti transform with unknown parameters in Section 4 leads to model the forecast error through a stochastic differential equation with a unit diffusion coefficient. In Section 5, we write down the expressions for the likelihood functions of the forecast error in its original space and the Lamperti space. We also derive simple approximations of the likelihood functions. Section 5 concludes with the description of the optimization algorithm to compute approximate maximum likelihood estimates, including the case where we expand the model comprising an initial transition from the time the forecast is performed to the time of the first forecast.

In Section 6, we apply our proposed numerical estimation procedures to the Uruguay wind and forecast dataset, comparing two alternative models and assessing, for the best candidate model, the performance of three different forecast providers. Section 7 concludes the paper. The proofs of the existence, strong uniqueness, and boundedness of the SDE solutions used to model normalized wind power production and its forecast error are given in the Appendix.

In recent years, Uruguay has triggered a remarkable change in its energy matrix. In (IRENA 2019, p.23), Uruguay was among those countries showcasing innovation, like Denmark, Ireland, Germany, Portugal, and Spain, with proven feasibility of managing annual variable renewable energy (VRE) higher than 25% in power systems.

According to (REN21 2019, pp.118-119) , in 2018, Uruguay achieved 36% of its electricity production from variable wind energy and solar PV, raising the share of generation from wind energy more than five-fold in just four years, from 6.2% in 2014 to 33% in 2018.

At present, Uruguay is fostering even higher levels of wind penetration by boosting regional power trading with Argentina and Brazil. In this rapidly evolving scenario, it is essential to analyze national data on wind power production with wind power short-term forecastings to orientate and assess the strategies and decisions of wind energy actors and businesses.

Our study is based on publicly available data (source: Administrator of Electric Market) on the wind power production in Uruguay between April and December 2019, that we adequately normalized with respect to the present 1474 MW maximum installed wind power capacity. Each day, wind power production recordings are available every ten minutes. In this work, we also considered data from three different forecast providers, available each day starting at 1 pm. The next Figure 2 shows the wind power real production during four segments 24 hours selected from the observation period together with their corresponding hourly short-term forecast, computed by a forecast provider. For the sake of visualization clarity, this section relies only on forecasts from one provider, called "provider A" from now on, ranked as the most accurate forecast provider, as it emerged from our posterior analysis. 

Forecast Real Production Figure 2 : Four 24-hour segments with the wind power real production in Uruguay (orange line) recorded every ten minutes, and the hourly wind power production forecasted by provider A (blue line).

A view of the global discrepancy between the real production and the forecasted production, during the nine months observation period, is summarized through the forecast error histograms in Figure 3 , where we also partitioned the forecast errors according to three contiguous categories of normalized generated power. Low normalized generated power corresponds to the range [0, 0.3], mid-power refers to the range ]0.3, 0.6], and highpower to the range ]0.6, 1]. We may observe that all the histograms in Figure 3 exhibit skewed patterns, to a different extent, as well as extreme observations. The presence of these features can be partly explained. The data analysis highlighted that, during several 24-hour segments, the system operators decided to reduce or even cease the wind power production. Indeed, as recalled in (IRENA 2018, p.8) , "Uruguay experiences high curtailment levels because generation exceeds demand." Despite the large country's interconnection capacity with Argentina and Brazil, there is no active cross-border market; the energy is traded via ad hoc short-term agreements. (IRENA 2018, p. 3) "Even with interconnection capacity exceeding peak demand, the power system experiences high VRE curtailment, mostly at night when wind generation exceeds demand."

The curtailment of the wind power production imposed by the system operators has a strong influence on the forecast error. To build a model that, driven by the available forecast, allows the inclusion of true power production with a prescribed degree of uncertainty, it is necessary to remove the data segments affected by wind curtailment.

Once we removed all the 24-hour segments showing wind curtailment, we set up a dataset containing 147 daily segments. In the absence of the curtailment intervention, the forecast error histograms shown below in Figure 4 , can appreciate skewness reduction, except for low power forecast error histogram. In this stage of data preprocessing, we obtain another useful result by applying the first-order difference operator to the forecast errors. The forecast error transition histograms, displayed in the next Figure 5 , will later constitute a reference for the visual assessment of the global fit of the proposed models. The histograms in Figure 5 feature a non-Gaussianity trait and provide initial input for the model-building stage, which is described in the next section. Later, in Section 4, guided from inferring the unknown model parameters, we will also propose transforming data as a strategy that naturally leads to an alternative model.

After analyzing the available dataset, we are now in the condition to build a type of phenomenological model for the normalized wind power generation forecasts that, in its most general form, is a stochastic process X = {X t , t ∈ [0, T ]} defined by the following stochastic differential equation (SDE):

where • a(·; p t ,ṗ t , θ) : [0, 1] → R denotes a drift function,

• θ is a vector of unknown parameters,

• {W t , t ∈ [0, T ]} is a standard real-valued Wiener process.

In this work, (p t ) t∈[0,T ] is to be considered a deterministic forecast for the normalized wind power generation, which is provided by Administrator of Electric Market.

Our goal is to achieve a specification of the model (1) to follow the available normalized wind power forecasts closely while ensuring its unbiasedness with respect to the forecast.

Let (p t ) t∈[0,T ] be the available prediction function for the normalized wind power, which is an input to this approach. To start with the model specification, first, we introduce a time-dependent drift function that features the mean-reverting property as well as derivative tracking:

where (θ t ) t∈[0,T ] is a positive deterministic function, whose range depends on θ, as will be explained shortly. Now, looking at the normalized wind power generation forecast process X t , modeled as solution to the Itô stochastic differential equation (1) with the drift specified in (2)

Taking expectation on (3) we obtain

At this stage, the process defined by (1) with drift (2) satisfies the two following properties:

• it reverts to its mean p t , with a time-varying speed θ t that is proportional to the deviation of the process X t from its mean,

• it tracks the time derivativeṗ t .

Remark: Observe that a mean-reverting model without derivative tracking shows a delayed path behavior. For instance, consider the diffusion model (1) 

0ṗ s e θ 0 s ds. The next Figure (6) illustrates how different behave the estimated confidence bands for two diffusion models with and without derivative tracking, fitting the same daily segment. 

99% confidence interval 90% confidence interval 50% confidence interval Forecast Real production Figure 6 : Pointwise confidence bands fitted, for the same daily segment, through diffusion models without derivative tracking (plot on the left) and with derivative tracking (plot on the right).

The forecast and production wind power data of Uruguay are normalized with respect to the installed power capacity during the period of observation. Thus, the mean-reverting level lies in [0, 1], and the process X t must take values in the same interval, a requirement that is not automatically fulfilled through the derivative tracking. To impose that the state space of X t is [0, 1], we may choose a convenient diffusion term, and require that the time-varying parameter θ t satisfies an ad-hoc condition.

Let θ = (θ 0 , α), and choose a state-dependent diffusion term that avoids the process exiting from the range [0, 1] as follows:

where α > 0 is an unknown parameter that controls the path variability. This diffusion term belongs to the Pearson diffusion family and, in particular, it defines a Jacobi type diffusion. It is useful to recall that (Forman and Sorensen 2008, p. 440) a Pearson diffusion is a stationary solution to a stochastic differential equation of the

where θ > 0, and a, b, and c are parameters such that the square root is well defined when X t is in the state space. These parameters, together with µ, the mean of the invariant distribution, determine the state space of the diffusion as well as the shape of the invariant distribution. An exhaustive classification of the (stationary) Pearson diffusions is presented in (Forman and Sorensen 2008, pp. 440-443) where, in particular, it is discussed the case a < 0 and b(x; θ) = 2aθx(x − 1), where the invariant distribution is a Beta distribution with parameters µ −a , 1−µ −a , that leads to the well-known Jacobi diffusions, so-called because the eigenfunctions of the infinitesimal generator of these processes are the Jacobi polynomials (see, for example, Leonenko and Phillips 2012 , pp. 2860 -2861 .

It is worth mentioning that Jacobi diffusions have been successfully applied in several disciplines, among them finance (see Valéry and Gouriéroux 2011 and references therein) and neuroscience (D'Onofrio, Tamborrino, and Lansky 2018).

However, a distinctive feature in our proposed model

is that the drift term contains the time-varying parameter θ t , rendering the solution X t of (7) to a nonstationary and time-inhomogeneous process. To ensure that the process X t is the unique strong solution of (7) for all t ∈ [0, T ] with state space [0, 1] a.s., the mean-reversion time-varying parameter must satisfy the condition:

The proof of this theoretical statement is presented in Section 8.

Remark: Condition (B) shows that the time-varying parameter θ t becomes unbounded when p t = 0 or p t = 1. Therefore, we consider the following truncated prediction function

Remark: In this work, the analysis of the three different forecast datasets shows that there exists, for any forecast provider, a small ǫ > 0 to define the truncated prediction function fulfilling the above condition.

From now on, we will keep the notation p t to denote the truncated prediction function (8), unless specified otherwise.

After applying to (7) the simple change of variables

we may introduce the following model for the forecast error of the normalized wind power production:

4 State independent diffusion term: Lamperti transform

Our model (9) for the forecast error has a diffusion term that depends on the state variable V t . Under the conditions that permit the use of Itô's formula on a well-chosen transformation of the process V , John Lamperti (Lamperti 1964) first showed that the transformed process is again a diffusion process that is solution to a SDE with unit coefficient for the diffusion term. The vast literature nowadays refers to this result as the so-called Lamperti transform (see, for example, Iacus 2008, pp. 40-41; Møller and Madsen 2010; Panik 2017, pp. 199-203; Särkkä and Solin 2019, pp. 98-100) , which is a basic tool to obtain a SDE for the transformed process whose diffusion term does not depend anymore on the state variable. A remarkable effect of removing the state dependency from the random noise term is to increase the numerical stability of the simulated paths of the transformed process. For this reason, some estimation methods of the unknown parameters of non-linear SDE models incorporated the Lamperti's change of variable as part of a more complex approximation procedure (for example, in the case of one-dimensional diffusions, the local linearization method in Shoji and Ozaki 1998, or the expansion method in Aït-Sahalia 2002, later extended to time-inhomogeneous SDEs in Egorov, Li, and Yuewu Xu 2003) . We consider the following Lamperti transform with unknown parameters

that, after applying Itô's formula on h(V t , t; θ), leads to the following SDE with state independent unit diffusion term

After (11), we obtain that the process Z t satisfies the SDE

To summarize the effect of the Lamperti transform visually, we can see in the next Figure (7) how the forecast error transition histograms (without curtailment) modify in comparison with Figure (5) .

In this case, the Lamperti transform has been applied using the optimal estimates of the parameters in the SDE model (9), obtained applying our numerical procedure detailed later in Subsection 5.4. The shape of the forecast error transition histograms after applying the Lamperti transform has similarities with the Gaussian distribution, motivating toward the use of Gaussian-like approximations of the unknown density transition functions of the process Z t .

Remark: Moreover, this obtained Gaussian distribution supports the validity of the choice of our model diffusion coefficient given by (6). 

Let ρ(v|v j,i−1 ; θ) be the conditional probability density of V t j +i∆ ≡ V j,i given V j,i−1 = v j,i−1 evaluated at v, where θ = (θ 0 , α) are the unknown model parameters.

The Itô process V defined by the SDE (9) is Markovian, and the likelihood function of the sample V M,N +1 can be written as the following product of transition densities:

where t j,i ≡ t j + i∆ for any j = 1, . . . , M and i = 0, . . . , N .

Remark: In the last subsection of this section, we will extend the statistical model (13) by adding the transition that occurs during the time interval, say of length δ, between the epoch when the forecast is done and the first epoch (1 pm) of each day-ahead forecast. To this purpose, the likelihood function (13) must include for any of the M paths an additional factor, say ρ 0 (V j,0 |V j,−δ ; θ, δ), expressing the conditional density of the early transition. The parameter δ can be calibrated together or after the estimation of θ, suggesting an optimal time for the scheduling of the forecasts.

The exact computation of the likelihood (13) relies on the availability of a closed-form expression for the transition densities of V that, on the basis of the Markovian property of V , are characterized for t j,i−1 < t < t j,i , as solutions of the Fokker-Planck-Kolmogorov equation (Iacus 2008, p. 36; Särkkä and Solin 2019, pp. 61-68) :

subject to the initial conditions ρ

However, closed-form solutions to initial-boundary value problems for time-inhomogeneous diffusions can be obtained only in a few cases (see, for example, Egorov, Li, and Yuewu Xu 2003, Section 3.1). In our case, solving numerically (14) for the transition densities of the process V at every transition step is computationally expensive. Several numerical techniques have been devised to obtain estimates for the unknown parameters of continuous-time SDE models with discrete observations (see, for example, Preston and Wood 2012 for likelihood-based inference techniques, Sørensen 2012 for an estimating function approach). As explained in the next subsection, we have considered approximate likelihood methods, similar in spirit to (Särkkä and Solin 2019, Section 11.4).

Gaussian approximations to the transition densities of nonlinear time-inhomogeneous SDEs are available through different algorithms (Särkkä and Solin 2019, Chapter 9). However, as Figure 5 may suggest at first glance, the choice of a Gaussian density could be inadequate when straightly applied to approximate the transition density of the forecast error V of the normalized wind power production.

Therefore, we propose to use a surrogate transition density for V other than Gaussian. The moments of the SDE model (9) are then matched to the surrogate density moments.

For m ≥ 2, using Itô's lemma we derive

For any t ∈ [t j,i−1 , t j,i [, the first two moments of V , m 1 (t) and m 2 (t) ≡ E V 2 t , can be computed by solving the following system

with initial conditions m 1 (t j,i−1 ) = v j,i−1 and m 2 (t j,i−1 ) = v 2 j,i−1 .

A suitable candidate for a surrogate transition density of V is a Beta distribution on a compact interval parameterized by two positive shape parameters, ξ 1 , ξ 2 . Recall that the choice of the beta proxy distribution is a natural choice as it is the invariant distribution of the Jacobi type processes. For any t ∈ [t j,i−1 , t j,i [, we approximate the transition densities of the process V using a Beta distribution. We equal the first two central moments of V with the corresponding moments of the Beta surrogate distribution on [−1 + ǫ, 1 − ǫ] with shape parameters ξ 1 , ξ 2 .

The shape parameters are given by

where µ t = m 1 (t) and

The approximate log-likelihoodl(·; v M,N +1 ) of the observed sample v M,N +1 can be expressed as

where the shape parameters ξ 1 (t − j,i ) and ξ 2 (t − j,i ), according to (17), depend on the limit quantities µ(t − j,i ; θ) and σ 2 (t − j,i ; θ) as t ↑ t j,i that are computed solving numerically the initial-value problem (16). B(ξ 1 , ξ 2 ) denotes the Beta distribution with parameters ξ 1 and ξ 2 .

The transition density of the process Z, which has been defined through the Lamperti transformation (10) of V , can be conveniently approximated by a Gaussian surrogate density.

The drift coefficient a(Z t ; p t ,ṗ t , θ) of the process Z that satisfies (12) is nonlinear. After linearizing the drift around the mean of Z, µ Z (t) ≡ E [Z t ], we obtain the following system of ODEs to compute, for any t ∈ [t j,i−1 , t j,i [, the approximations of the first two central moments of Z, sayμ

with initial conditionsμ Z (t j,i−1 ) = z j,i−1 andṽ Z (t j,i−1 ) = 0 , and where

The approximate Lamperti log-likelihoodl Z ·; z M,N +1 for the observed sample z M,N +1 is given bỹ

where the limitsμ Z (t − j,i ; θ) andṽ Z (t − j,i ; θ) are computed solving numerically the initial-value problem (19).

In this subsection, we aim to infer the model's parameters using optimization techniques. We start by finding an initial guess close enough to the optimal value, and from that point, start the optimization.

To guarantee the good behave for our optimization algorithm, we aim to start the optimization as close as we can from the optimal parameters. We use least square minimization and quadratic variation over the data to find an initial guess (θ * 0 , α * ).

• Least square minimization: We consider the observed data v M,N +1 with length between observations ∆, where i ∈ {0, . . . , N } and j ∈ {1, . . . , M }. For any t ∈ [t j,i−1 , t j,i [, the random variable (V j,i |v j,i−1 ) has a conditional mean that can be approximated by the solution of the system

If we have a total of M × N transitions, we can write the regression problem for the conditional mean with L 2 loss function aŝ

As Equation (21) is convex in c, it is enough to verify the first order optimality conditions. From

We approximate θ 0 by Equation (22) setting θ * 0 =ĉ.

• Quadratic variation: We approximate the quadratic variation of the Itô's process V

As initial guess for the diffusion variability coefficient θ 0 α, we choose

where ∆ is the length of the time interval between two consecutive measurements.

To find the optimal parameters, we minimize the negative log-likelihood (negative version of (18)) using the derivative-free function fminsearch from MATLAB R2019b over the parameters (θ 0 , α). At each step of the iteration, we:

• Use the training dataset to find the SDE's first and second moments as explained in Subsection 5.2.

• Match the proxy distribution moments with the SDE's moments.

• Evaluate the negative log-likelihood using the training dataset.

Let v M,N +1 be the observed data, and h(v j,i , t j,i ; θ) the Lamperti transform of the observation v j,i . As we can see in Section 4, the transformed observations z M,N +1 depend on the vector θ.

The problem of maximizing the approximated Lamperti log-likelihood (20), i.e.,

is not totally defined as the data z M,N +1 depend on θ. However, we propose to find a fixed point θ ⋆ such that

At a fixed point, the likelihood has a maximum for the data set corresponding to that point. We are interested in finding solutions for (24). If θ ⋆ solves (24), then

For each θ ⋆ , we define the relative error function Ψ :

and remark that θ ⋆ is a fixed point if and only if Ψ(θ ⋆ ) = 0. To find minimizers for Ψ, we proceed in the same way as described in Subsection 5.4.2, but using Gaussian surrogate densities.

We observe that for most days, the forecast error at time t j,0 = 0 is not zero. According to the forecasts procedure, we may assume that there is a time in the past t j,−δ < t j,0 , such that the forecast error V j,−δ = 0.

For any j = 1, . . . , M , we extrapolate backward linearly the truncated prediction function to get its value at time t j,−δ , p j,−δ , and set v t j,−δ = 0. We assume that the initial transition (V j,0 |v j,−δ ; θ, δ) has a Beta distribution and apply to it the same moment matching method used above. Given a vector of parameters θ, we estimate δ solving the following problem arg max δL δ θ, δ; v M,1 = arg max

whereL δ is the approximated δ−likelihood. To solve this problem, we repeat the steps described in Subsection 5.4.2, with the additional initial step of creating the linear extrapolation for p j,−δ at each j ∈ {1, 2, . . . , M }.

As remarked, we extend the statistical model (13) to include the extra parameter δ. The approximated complete likelihoodL c , which estimates the vector (θ 0 , α, δ), is given bỹ

whereL θ; v M,N +1 is the non-log version of (13). As we can provide initial guesses for θ and δ, we have a starting point for the numerical optimization of the approximated complete likelihood (27).

6 Application to the April-December 2019 Uruguay wind and forecast dataset

Our statistical analysis starts with partitioning the 147 segments of normalized wind power production, each 24-hours long. We select 73 non-contiguous segments for the models' calibration procedure, assigning them to the training set. The other 74 non-contiguous segments compose the test set. Such an allocation mechanism guarantees independence among the segments, matching the assumption we did in Section 5 to formulate the statistical models. Additional cross-correlation tests were performed to ensure this assumption. All the following results involving a single provider refer to provider A. Furthermore, all calibrations involve the training sets and all simulations, the test sets. Following the instruction for the initial guesses from Subsection 5.4 and assuming that

we obtain the initial guess (θ * 0 , α * , δ * ) ≈ (1.54, 0.072, 073).

As an auxiliary verification, we plot the negative log-likelihood (negative version of (18)) as a function of the parameters, and we use additional minimization functions from MATLAB R2019b. Moreover, we realized an additional inference utilizing the test sets to guarantee the robustness of our numerical methods. Figure 8 : Negative log-likelihood's level sets for the training sets (plot on the left), and for the test sets (plot on the right). All optimal values are located over the curve θ 0 α = 0.097 and θ 0 α = 0.089 for the training and test sets, respectively.

On Figure ( 8), we can see the level sets for the negative log-likelihood for both training and test sets. The numerical values of each relevant point can be seen in Table (1) . We set the optimal parameters in the V -space (θ V 0 , α V ) = (1.93, 0.050), as it is where the negative log-likelihood for the training sets reaches its minimum value. Figure 8 .

We observe that all the local (possibly global) minimizers are located over the curves θ 0 α = 0.097 and θ 0 α = 0.089 for the training and test sets, respectively. This effect shows that the optimization is more sensitive to the diffusion than to the drift.

In the Z-space, we obtain the optimal parameters (θ Z 0 , α Z ) = (1.87, 0.043) as minimizers for the relative error function (25) using fminsearch from MATLAB R2019b and the training sets.

To verify and compare these two vector of parameters, (i.e., (θ V 0 , α V ) and (θ Z 0 , α Z )), we simulate error paths in the V −space. We simulate five error paths for each day in the test set and construct histograms with the transitions. The histograms can be seen in Figure (9) . We observe a slightly better approximation using (θ Z 0 , α Z ). Figure 9 : Probability histograms for error transitions. Using provider A, we overlap the real transitions from the test set with the simulated ones from the V −space SDE. On the left, simulations use (θ V 0 , α V ). On the right, simulations use (θ Z 0 , α Z ).

We compare two candidate models to find the best-fit that maximizes the retained information, the Model 1, introduced in (Elkantassi, Kalligiannaki, and Tempone 2017, p.383) , and our proposed model (7), from hereafter called Model 2.

• Model 1: This model does not feature derivative tracking:

with θ 0 > 0, α > 0.

• Model 2: This model features derivative tracking and time-varying mean-reversion parameter:

with θ 0 > 0, α > 0 and θ t satisfying condition (B) .

To show the better performance of Model 2, we have computed the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) for the two considered models, and any combination of the three different forecast providers with three approximate likelihood methods, the one based on the Beta surrogate density in theV -space (Subsection 5.2), the one based on the Gaussian surrogate density in the Z-space (Subsection 5.3), and the Shoji-Ozaki local linearization method (Shoji and Ozaki 1998) . Table ( 2) summarizes these results, also reporting the estimate of the variability diffusion coefficient αθ 0 . It is worth observing that the best fitting is achieved with Model 2 and adopting Beta distributions as proxies of the transition densities. The optimal estimates of the parameters of Model 2, for the three forecast providers, when using Beta surrogates for the transition density are presented next:

Forecast Provider Parameters (θ 0 , α) Product θ 0 α Once derived optimal estimates of the parameters of the complete likelihood for Model 2, we obtain empirical pointwise confidence bands for wind power production. Figure (11) shows the empirical pointwise confidence bands for wind power production for each day of interest, assuming Model 2 specification, a given forecaster, and 5000 simulations per day. Figure 11 : Empirical pointwise confidence bands for the wind power production using the approximate MLEs for Model 2 (θ 0 , α, δ) = (2.22, 0.044, 0.054). Blue line: real production.

As a final verification, we study the behavior of δ as a function of the vector θ. Given a parameter vector, we calculate an initial guess for δ solving problem (26). Even when it is a guess, it helps us understand the meaning of this additional parameter qualitatively.

We choose as domain the most significant values of θ 0 and θ 0 α, regarding the previous numerical results. In Figure ( 12) we can observe that:

• The initial time δ decreases as θ 0 α increases. This is a consequence of the increment in the diffusion as θ 0 α increases. As there is more diffusion, less time is needed for the initial transition density to cover the initial error observations.

• The initial time δ increases as θ 0 increases. As we increment θ 0 , the mean reversion becomes larger and reduces the variance for the initial transition density. Then, more time is needed for the initial transition density to cover the initial error observations. Figure 12 : Initial value for δ as a function of the elements of the parameter vector θ.

In this work, we propose a methodology to assess the short-term forecast of the normalized wind power, which is agnostic of the wind power forecasting technology.

To this purpose, we built a phenomenological stochastic differential equation model for the normalized wind power production forecast error, with time-varying mean-reversion parameter in the linear drift coefficient, and state-dependent and time non-homogenous diffusion coefficient. We also used the Lamperti transform with unknown parameters to provide a version of the proposed model with a unit diffusion coefficient, increasing its stability properties.

We used approximate likelihood-based methods for the models' calibration, and developed original algorithms to derive optimal estimates of the unknown parameters. The likelihood approach allowed for the extending of the SDE models in a very effective way, incorporating an early transition with an additional parameter that accounts for the forecast's uncertainty at the beginning of each future period. As a result, we obtained a robust procedure for synthetic data generation that, using the available forecast input, embraces future wind power production paths through empirical pointwise bands with prescribed confidence.

On the basis of historical data of wind power production and forecast from different sources, our method came up with an objective tool for forecast assessment and comparison by performing the model selection stage. The application of the modeling procedure, inference through numerical optimization, and model selection through information criteria, to the wind power production dataset in Uruguay between April and December 2019, with three different providers, shows the excellent performance of our proposed model, which preserves the asymmetry of wind power forecast errors and their correlation structure.

We conclude that our SDE model, featuring a time-derivative tracking of the forecast, a time-dependent meanreversion parameter, and a state-dependent diffusion term that suitably adjusts to the problem under study, contributes toward the management of renewable energies efficiently. This methodology paves the way for stochastic optimal control methods enabling principled decision making under uncertainty in the presence of complex energy matrices.

For a time horizon T > 0, a parameter α > 0, and (θ t ) t∈[0,T ] a positive deterministic function, let us consider the model given by dX t = ṗ t − θ t (X t − p t ) dt + 2αθ 0 X t (1 − X t ) dW t , t ∈ [0, T ]

where (p t ) t∈[0,T ] denotes the prediction function that satisfies 0 ≤ p t ≤ 1 for all t ∈ [0, T ]. This prediction function is assumed to be a smooth function of the time so that sup t∈[0,T ] |p s | + |ṗ s | < +∞.

The following proofs are based on standard arguments for stochastic processes that can be found e.g. in Alfonsi 2015 and Karatzas and Shreve 1998 that we adapted to the setting of our model (30). Proof. Let us first consider the following SDE for t ∈ [0, T ]

According to Proposition 2.13, p.291 of Karatzas and Shreve 1998, under assumption (A) there is a unique strong solution X t to (31). Moreover, as the diffusion coefficient is of linear growth, we have for all p > 0 E sup

Then, it remains to show that for all t ∈ [0, T ], X t ∈ [0, 1] a.s. For this aim, we need to use the so-called Yamada function ψ n that is a C 2 function that satisfies a bunch of useful properties:

|ψ n (x)| → n→+∞ |x|, xψ ′ n (x) → n→+∞ |x|, |ψ n (x)| ∧ |xψ ′ n (x)| ≤ |x| ψ ′ n (x) ≤ 1, and ψ ′′ n (x) = g n (|x|) ≥ 0 with g n (x)x ≤ 2 n for all x ∈ R.

See the proof of Proposition 2.13, p. 291 of Karatzas and Shreve 1998 for the construction of such function. Applying It's formula we get ψ n (X t ) = ψ n (x 0 ) + t 0 ψ ′ n (X s )(ṗ s + θ s p s − θ s X s ds + t 0 ψ ′ n (X s ) 2αθ 0 |X s (1 − X s )| dW s + αθ 0 t 0 g n (|X s |)|X s (1 − X s )| ds. Now, thanks to (A), (32), and to the above properties of ψ n and g n , we get E ψ n (X t ) ≤ ψ n (x 0 ) + t 0 ṗ s + θ s p s − θ s E[ψ ′ n (X s )X s ] ds + 2αθ 0 n t 0 E |1 − X s | ds.

Therefore, letting n tends to infinity, we use Lebesgue's theorem to get

Besides, taking the expectation of (31), we get E [X t ] = x 0 + t 0 ṗ s + θ s p s − θ s E [X s ] ds, and thus we have

Then, Gronwall's lemma gives us E |X t | = E [X t ] and thus for any t ∈ [0, T ] X t ≥ 0 a.s. The same arguments work to prove that for any t ∈ [0, T ] Y t := 1 − X t ≥ 0 a.s. since the process (Y t ) t∈[0,T ] is solution to

Then similarly, we need to assume thatṗ t + θ t p t ≥ 0. This completes the proof.

Theorem 8.2. Assume that assumptions of Theorem 8.1 hold with x 0 ∈]0, 1[. Let τ 0 := inf{t ∈ [0, T ], X t = 0} and τ 1 := inf{t ∈ [0, T ], X t = 1} with the convention that inf ∅ = +∞. Assume in addition that for all t ∈ [0, T ], p t ∈]0, 1[ and that

Then, τ 0 = τ 1 = +∞ a.s.

where M t = t 0 2αθ 0 (1−Xs) Xs dW s is a continuous martingale. Then as for all t ∈ [0, T ], we haveṗ t + θ t p t − θ 0 α ≥ 0, we deduce that X t ≥ x 0 exp αθ 0 t − t 0 θ s ds + M t .

By way of contradiction let us assume that {τ 0 < ∞}, then letting t → τ 0 we deduce that lim t→∞ 1 {τ 0 <∞} M t∧τ 0 = −1 {τ 0 <∞} ∞ a.s. This leads to a contradiction since we know that continuous martingales likewise the Brownian motion cannot converge almost surely to +∞ or −∞. It follows that τ 0 = ∞ almost surely. Next, recalling that the process (Y t ) t≥0 given by Y t = 1 − X t is solution to

we deduce using similar arguments as above τ 1 = ∞ a.s. provided that θ t (1 − p t ) −ṗ t − αθ 0 ≥ 0.

Remark: As the diffusion coefficient of X t given by x → 2αθ 0 x(1 − x) is strictly positive for all x ∈]0, 1[, the condition (B) ensures that the transformation between Z t and X t is bijective, so that we deduce the properties of existence and uniqueness of Z t from those of X t . The application of Itô's formula in Section 4 is subjected to the condition (B) that avoids the process X t hits the boundaries of the interval ]0, 1[, otherwise the Lamperti transform is not applicable.

Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach

Closed-form likelihood expansions for multivariate diffusions

Affine Diffusions and Related Processes: Simulation, Theory and Applications

Estimation in Stochastic Differential Equations with a State Dependent Diffusion Term

Day-Ahead Probabilistic Forecast of Solar Irradiance: A Stochastic Differential Equation Approach

boot: Bootstrap R (S-Plus) Functions. R package version

A Literature Review of Wind Forecasting Methods

The Jacobi diffusion process as a neuronal model

Maximum likelihood estimation of time-inhomogeneous diffusions

Inference and Sensitivity in Stochastic Wind Power Forecast Models

The Pearson Diffusions: A Class of Statistically Tractable Diffusion Processes

The State of the Art in Short-Term Prediction of Wind Power. A Literature Overview

Seeing the Wood for the Trees: A Critical Evaluation of Methods to Estimate the Parameters of Stochastic Differential Equations

Simulation and Inference for Stochastic Differential Equations: With R Examples

IEA Press release

Uruguay P ower System F lexibility assessment: IRENA F lexT ool Case Study

Innovation landscape for a renewable-powered future: Solutions to integrate variable renewables

Probabilistic forecasts of solar irradiance using stochastic differential equations

Hybrid estimators for stochastic differential equations from reduced data

Hybrid estimation for an ergodic diffusion process based on reduced data

Hybrid multi-step estimators for stochastic differential equations based on sampled data

Estimation of an ergodic diffusion from discrete observations

A simple construction of certain diffusion processes

High-order approximation of Pearson diffusions processes

From State Dependent Diffusion to Constant Diffusion in Stochastic Differential Equations by the Lamperti Transform

Probabilistic Forecasts of Wind Power Generation by Stochastic Differential Equation Models

Stochastic differential equations : an introduction with applications in population dynamics modeling

Approximation of transition densities of stochastic differential equations by saddlepoint methods applied to small-time Ito-Taylor sample-path expansions

R: A Language and Environment for Statistical Computing

Short term wind power prognosis with different success criteria

Renewables

ltm: An R package for Latent Variable Modelling and Item Response Theory Analyses

Applied Stochastic Differential Equations

Short-range wind speed predictions for complex terrain using an interval-artificial neural network

Estimation for nonlinear stochastic differential equations by a local linearization method

Estimating functions for diffusion-type processes

A review on state of art development of model predictive control for renewable energy applications

Pricing of fluctuations in electricity markets

Adaptive estimation of an ergodic diffusion process based on sampled data

Adaptive Bayes type estimators of ergodic diffusion processes from discrete observations

A quasi-likelihood approach based on eigenfunctions for a boundedvalued Jacobi process (Working Paper)

Application of probabilistic wind power forecasting in electricity markets