key: cord-0220682-bpq65ho9 authors: Cramer, Eike; Paeleke, Leonard; Mitsos, Alexander; Dahmen, Manuel title: Normalizing Flow-based Day-Ahead Wind Power Scenario Generation for Profitable and Reliable Delivery Commitments by Wind Farm Operators date: 2022-04-05 journal: nan DOI: nan sha: 9883a782baa62169535e01af2a1dd88f3b53705d doc_id: 220682 cord_uid: bpq65ho9 We present a specialized scenario generation method that utilizes forecast information to generate scenarios for the particular usage in day-ahead scheduling problems. In particular, we use normalizing flows to generate wind power generation scenarios by sampling from a conditional distribution that uses day-ahead wind speed forecasts to tailor the scenarios to the specific day. We apply the generated scenarios in a simple stochastic day-ahead bidding problem of a wind electricity producer and run a statistical analysis focusing on whether the scenarios yield profitable and reliable decisions. Compared to conditional scenarios generated from Gaussian copulas and Wasserstein-generative adversarial networks, the normalizing flow scenarios identify the daily trends more accurately and with a lower spread while maintaining a diverse variety. In the stochastic day-ahead bidding problem, the conditional scenarios from all methods lead to significantly more profitable and reliable results compared to an unconditional selection of historical scenarios. The obtained profits using the normalizing flow scenarios are consistently closest to the perfect foresight solution, in particular, for small sets of only five scenarios. Since the liberalization of the electricity markets, electricity is traded on the electricity SPOT markets (European Power Exchange, 2021; Mayer and Trück, 2018) . The auction-based format of the day-ahead markets requires electricity producers and large scale consumers to specify fixed amounts of electricity they want to buy or sell one day prior to the delivery (European Power Exchange, 2021). Thus, renewable electricity producers have to account for the uncertain and non-dispatchable nature of renewable electricity generation from wind and photovoltaic when submitting their bids (Perez-Arriaga and Batlle, 2012; Mayer and Trück, 2018; Mitsos et al., 2018) . To find profitable solutions, operators often leverage optimization techniques from the process systems engineering (PSE) community Grossmann, 2021) . In particular, scheduling optimization identifies cost-optimal operational setpoints and leverages variable electricity prices (Schäfer et al., 2020; Leo et al., 2021) . To address the uncertainty stemming from the uncertain renewable electricity production and volatile price curves, scheduling problems are often implemented as stochastic programs that include the uncertainty in the problem formulation (Conejo et al., 2010; Grossmann, 2021) . Typically, stochastic programs are based on scenarios, e.g., possible realizations of renewable production trajectories (Conejo et al., 2010; Morales et al., 2013; Chen et al., 2018a) . The PSE community has been on the forefront of finding solutions to scheduling problems and stochastic programs for decades (Grossmann and Sargent, 1978; Halemane and Grossmann, 1983; Pistikopoulos and Ierapetritou, 1995; Sahinidis, 2004) . Thus, energy system scheduling problems are solved successfully by the PSE community Schäfer et al., 2019 Schäfer et al., , 2020 . Many PSE examples address electricity procurement for power-intensive processes and demand-side-management Leo et al., 2021) . Examples with energy focus include Garcia-Gonzalez et al. (2008) , who derive a stochastic bidding problem for a wind producer with pumped hydro storage, and Liu et al. (2015) , who propose a model to obtain bidding curves for a micro-grid considering distributed generation. In their book, Conejo et al. (2010) derive a wind producer bidding problem considering both price and production uncertainties. While most works focus on optimization problem formulations, obtaining high-quality scenarios is also critical for operational success. The scenarios for stochastic programming either stem from historical data or specialized scenario generation methods (Conejo et al., 2010) . Kaut and Wallace (2003) state that different finite scenario sets should consistently give results close to the perfect foresight case and that the optimal objective value should be approximately equal throughout the different scenario sets (Kaut and Wallace, 2003) . Critically, the scenarios must fit the given time horizon, i.e., for a day-ahead bidding schedule optimization, the stochastic problem requires scenarios that span the time frame between 00:00 am and 11:59 pm of the following day. Established methods for scenario generation often utilize univariate, i.e., step-by-step prediction, approaches like classical autoregressive models (Sharma et al., 2013) or autoregressive neural networks (Vagropoulos et al., 2016; Voss et al., 2018) . As opposed to univariate models, multivariate modeling techniques model a series of time steps in a single prediction step. This makes them particularly suitable for day-ahead operation problems as the multivariate predictions best capture the correlations throughout the day (Ziel and Weron, 2018) and can be set up to model the distribution of full 24 h trajectories directly. Prominent multivariate scenario generation models are Gaussian copulas Staid et al., 2017; Camal et al., 2019) as well as deep generative models like generative adversarial networks (GANs) (Chen et al., 2018b; Jiang et al., 2018; Wei et al., 2019) and variational autoencoders (VAEs) (Zhanga et al., 2018) . Despite their widespread application, the training success of both GANs and VAEs is sometimes poor and their loss functions are difficult to judge as they are not directly concerned with the quality of the generated data (Salimans et al., 2016; Borji, 2019) . Furthermore, GANs and VAEs often result in a mode collapse, i.e., the models converge to a single feasible scenario instead of describing the true probability distribution . Besides GANs and VAEs, normalizing flows are another type of deep generative model (Papamakarios et al., 2021) . The major advantage of normalizing flows is their training via direct log-likelihood maximization, which leads to interpretable loss functions and stable convergence (Rossi, 2018) . In prior works, normalizing flows performed well for scenario generation of residential loads Ge et al., 2020) as well as wind and photovoltaic electricity generation (Dumas et al., 2022; Cramer et al., 2022b) . Many authors argue that their scenario generation approach samples high-quality scenarios Chen et al., 2018b; . However, a connection of scenario generation to downstream applications in stochastic programming is missing in most contributions. Exceptions are Zhanga et al. (2018) and Wei et al. (2019) who both solve operational problems for wind-solar-hydro hybrid systems. However, their respective VAE and GAN are restricted to unconditional scenario generation, i.e., they sample unspecific scenarios without considering the day-ahead setting and without including forecasts or other available information. For a day-ahead bidding problem, this can potentially lead to suboptimal solutions based on an unrealistic scenario set containing many unlikely scenarios. Meanwhile, conditional scenario generation incorporates forecast and other available information to specifically tailor the scenarios to the following data, i.e., the conditional scenarios better describe the trends of the following day and maintain a lower spread. Examples of conditional scenario generation are the Gaussian copula approach by and the normalizing flow by Dumas et al. (2021) , where only Dumas et al. (2021) solve a stochastic optimization problem using quantiles derived from the conditional normalizing flow. Herein, we extend our previous work on normalizing flow-based scenario generation (Cramer et al., 2022b) to perform conditional scenario generation Dumas et al., 2021) of wind power generation with wind speed forecasts as conditional inputs, i.e., we use the wind speed forecast to generate day-ahead wind power generation scenarios that are specifically tailored to the given day. We then apply the generated scenarios in a stochastic day-ahead wind electricity producer bidding problem based on Garcia-Gonzalez et al. (2008) and Conejo et al. (2010) . We compare the results obtained using the normalizing flow scenarios with unconditional historical scenarios and two other multivariate conditional scenario generation approaches, namely, the well-established Gaussian copula and the recently very popular Wasserstein-GAN (Chen et al., 2018b) . Our analysis shows that all conditional scenario generation methods result in significantly more profitable decisions compared to the historical data, and that the profits obtained using the normalizing flow scenarios are closest to the perfect foresight solution. Unlike Wei et al. (2019) or Dumas et al. (2021) , we also perform a statistical investigation of the reliability of the scenarios based on the criteria formulated by Kaut and Wallace (2003) . In particular, we show that normalizing flows result in the lowest variance of objective values for very small scenario sets of only five scenarios. To our knowledge, this is the first work to investigate the reliability of different day-ahead scenario generation models for application in stochastic programming. The remainder of this work is organized as follows: Section 2 details the concept of conditional density modeling using normalizing flows. Then, Section 3 details the conditional day-ahead scenario generation method and reviews the input-output relation of normalizing flows, Gaussian copulas, and W-GANs. Section 4 draws a comparison of historical scenarios and scenarios generated using the three different methods based on the analysis outlined in Pinson and Girard (2012) and Cramer et al. (2022a) . Section 5 introduces the stochastic bidding problem and analyzes the obtained profits and the reliability of the different scenario sets. Finally, Section 6 concludes this work. 2 Conditional density estimation using normalizing flows Normalizing flows are data-driven, multivariate probability distribution models that use invertible neural networks T : R D → R D to describe a data probability density function (PDF) as a change of variables of a D-dimensional Gaussian distribution (Kobyzev et al., 2020; Papamakarios et al., 2021) : Here, x ∈ X ⊂ R D are samples of the data and z ∈ N (0 D , I D ) are the corresponding vectors in the Gaussian distribution with I D being the D-dimensional identity matrix. New data x is generated by drawing samples z from the known Gaussian distribution and transforming them via the forward transformation T (·). Since the transformation between the data and the Gaussian is a change of variables, the PDF of the data can be expressed explicitly via the inverse transformation T −1 (·) using the change of variables formula (Papamakarios et al., 2021) : Here, J T −1 is the Jacobian of the inverse transformation T −1 , and p X and φ are the PDFs of the data and the Gaussian, respectively. Intuitively, Equation (1) describes a projection of the data onto the Gaussian and a scaling of the distribution's volume to account for the constant probability mass. If the transformation T is a trainable function, the normalizing flow can be trained via direct log-likelihood maximization using the log of Equation (1). To describe a conditional PDF p X|Y (x|y) with conditional inputs y ∈ Y , i.e., the joint PDF of X and Y where the realization y ∈ Y is known, the transformation T and its inverse T −1 must accept the conditional information vector y in addition to the transformed variables z and x, respectively (Winkler et al., 2019 ): x = T (z, y) z = T −1 (x, y) If T remains differentiable for any fixed value of the conditional inputs y, the likelihood can still be described using the change of variables formula: In this work, we employ the real non-volume preserving transformation (RealNVP) , which is based on a composition of affine coupling layers. In each coupling layer, one half of the data vector undergoes an affine transformation, which is parameterized via functions of the remaining half of the data vector: x 1:D/2 =z 1:D/2 x D/2+1:D =z D/2+1:D exp(s(z 1:D/2 , y)) + t(z 1:D/2 , y) Winkler et al., 2019) , with conditioner models s I , t I , s II , and t II , Gaussian sample vector z, data sample vector x, intermediate sample vector z I , conditional input vector y. The indices 1 : D/2 and D/2 + 1 : D refer to the two halves of the data vectors, respectively. The dashed lines indicate the flow of the conditional input data y. Here, denotes element-wise multiplication, the indices 1 : D/2 and D/2 + 1 : D refer to the two halves of the data vectors, respectively, and s and t are the so-called conditioner models that are implemented as neural networks. The clever design of the affine coupling layer results in lower triangular Jacobians. Hence, the Jacobian determinant required for the likelihood computation is simply given by the product over the diagonal elements. The log-form Jacobian determinant used for training then is: s(z 1:D/2 , y) Large and highly expressive normalizing flows can be built using compositions of Equation (3) in an alternating manner. Figure 1 shows an illustrative sketch of an exemplary RealNVP architecture with two conditional affine coupling layers. In Cramer et al. (2022b) , we showed that normalizing flows sample uncharacteristically noisy scenarios when applied to sample for the distributions of renewable electricity time series, due to their inherent lower-dimensional manifold structure. To address the issue, we proposed dimensionality reduction based on the principal component analysis (PCA). In this work, we use PCA to reduce the dimensionality of the data x and the Gaussian samples z. The conditional input vectors y are not affected by the PCA. For more information on the effects of manifolds we refer to Brehmer and Cranmer (2020) This work addresses scenario generation with a particular focus on applications in day-ahead scheduling problems. Thus, all scenarios describe a possible realizations covering the time between 00:00 am and 11:59 pm of the following day. In particular, we generate day-ahead wind power generation scenarios and use day-ahead forecasts of wind speeds as conditional inputs to narrow down the range of possible trajectories and make the scenarios specific to the following day. For reference, we include a comparison to historical data, which represents unconditional scenarios, i.e., randomly drawn sampled from the full distribution p X (x) that does not consider the wind speed forecasts. Meanwhile, the scenario generation methods aim to fit models of the full conditional PDF p X|Y (x|y) that are valid for every possible wind power realization x ∈ X and every possible day-ahead wind speed forecast y ∈ Y . In the application, the wind speed predictions are known one day prior to the scheduling horizon and the scenario generation models are evaluated for fixed conditional inputs y. Normalizing flows, Gaussian copulas, and W-GANs all employ multivariate modeling approaches, i.e., the models generate full daily trajectories in a vector form Ziel and Weron, 2018; Chen et al., 2018b) . All models use multivariate Gaussian samples z and the wind speed forecast vectors, i.e., the conditional information y, as inputs to generate wind power generation scenario vectors x. For a given fixed wind speed forecast y = const., sampling and transforming multiple Gaussian samples z results in a set of wind power generation scenarios, i.e., the Gaussian acts as a source of randomness to generate sets of scenarios instead of point forecasts: Here, T (·) can be any of the scenario generation models. For more details on the evaluation of the different models, we refer to our supplementary material and the papers by and Chen et al. (2018b) . All models generate capacity factor scenarios, i.e., the actual production scaled to installed capacity, of the 50 Hertz transmission grid in the years 2016 to 2020 (Open power systems data, 2019). The year 2019 is set aside as a test set to avoid complications in the stochastic programming case study due to the unusual prices resulting from the COVID-19 pandemic (Michał Narajewski, 2020; Badesa et al., 2021) . To avoid including test data in the scenario sets, the unconditional historical scenarios are drawn from the training set. The 15 min recording interval renders 96-dimensional scenario vectors that fit the 24 h time horizon of a day-ahead bidding problem. The day-ahead wind speed forecasts have hourly resolution and are obtained from the reanalysis data set "Land Surface Forcings V5.12.4" of MERRA-2 (Global Modeling and Assimilation Office (GMAO), 2015) which is based on previously recorded historical data. We use the predictions at the coordinates 53.0 • N, 13.0 • E, in the center of the 50 Hertz region. Note that due to potential wind speed forecast errors and agglomeration effects in the power generation, there is no direct known functional relationship between the wind speed forecast and the realization of regionally distributed power generation. Due to numerically singular Jacobians and non-invertible transformations (Behrmann et al., 2021; Cramer et al., 2022b) , full-space normalizing flows fail to accurately describe the distribution of daily wind time series trajectories residing on lower-dimensional manifolds (Cramer et al., 2022b) . Therefore, we use PCA (Pearson, 1901) to reduce the data dimensionality following our recent contribution (Cramer et al., 2022b) . We select the number of principal components based on the explained variance ratio, i.e., the amount of information maintained by the PCA (Pearson, 1901) . For an explained variance ratio of 99.95%, we obtain 18 principal components to represent the original 96-dimensional scenario vectors. The adversarial training algorithm for the W-GAN did not converge consistently for the considered learning problem. Thus, the results presented below are drawn from the best performing model out of 20 different trained models w.r.t. the metrics outlined in Section 4. For more detailed information on the implementation, we refer to the supplementary material. We start by analyzing the scenarios without a specific application in mind. To this end, we present some examples, analyze the described distributions, and investigate whether the models can identify the correct daily trends. Figure 2 shows example scenarios for two randomly selected days of the test year 2019. The left, center-left, center-right, and right columns show historical scenarios and scenarios sampled from the conditional normalizing flow, the Gaussian copula, and the W-GAN, respectively. The historical scenarios are randomly selected from the training set and are, therefore, unspecific to the respective days. Thus, they fail to identify the daily trends and show large discrepancies for both days. For both example days in Figure 2 the normalizing flow identifies and follows the general trend of the realized wind capacity factor. For the presented examples, the realization lies within the span of the scenarios. Similarly, the Gaussian copula also identifies the trend of the realization. However, there are some scenarios with significantly higher or lower capacity factors in the case of both days, i.e., the Gaussian copula appears prone to sample outliers that do not follow the trend. The W-GAN-generated scenarios fail to identify the trend and, instead, appear tightly agglomerated and only represent the daily average of the realization, which can be observed for the morning hours of the first day and, to a lesser extend, on the afternoon hours of the second day. The failed identification of the trend is likely due to a mode collapse of the W-GAN, which is a frequently observed phenomenon with GANs . Mode collapse happens when the adversarial training algorithm converges to a small range of realistic scenarios but fails to identify the true distribution. Note that due to the multivariate modeling approach of generating full daily trajectories, this type of deviation may occur at any time step throughout the day. To gain insight into the quality of the full scenario sets, we analyze whether the scenario generation methods are able to reproduce the probability distributions and the frequency behavior of the actual time series by looking at the full year of 2019 in comparison to the eventual realization. To this end, we look into the marginal PDF (Parzen, 1962) , the quantile distribution in Q-Q plots (Chambers, 2018) , and the power spectral density (PSD) (Welch, 1967) . For a detailed introduction to the interpretation of PDF and PSD, we refer to our previous work (Cramer et al., 2022a) . Figure 3 shows the marginal PDFs (left), the Q-Q plots (center), and the PSD (right) of the historical-and the generated scenarios from the normalizing flow, the Gaussian copula, and the W-GAN in comparison to the realizations in 2019. In Figure 3 , the historical scenarios and the normalizing flow scenarios describe the test set PDF well and show good matches of the quantile distribution in the Q-Q-plot. Meanwhile, the Gaussian copula produces a broader PDF with a much lower peak than the realization, whereas the W-GAN's PDF shows a shift towards higher values. The Q-Q-plot also shows the shift of the W-GAN generated distribution, as there is an offset between the W-GANs and the other quantile lines. The poor distribution match by the Gaussian copula is likely due to the linear quantile regression that is unable to represent the nonlinear relation between the predicted wind speed and the capacity factor. Furthermore, the copula relies on linear interpolation of quantiles which can inflate the PDF in the tails and, thus, lead to outlier sampling . The W-GAN can theoretically model any distribution (Goodfellow et al., 2014) . In our analysis, however, the adversarial training algorithm was very difficult to handle with the time series data and often resulted in poor fits. The presented results are the best of 20 training runs in terms of matching the criteria in Figure 3 . Meanwhile, both the Gaussian copula and the normalizing flow with PCA converge consistently and typically yield the presented results after the first training attempt. The Q-Q-plot reveals that all methods yield distributions with longer tails than the realizations, i.e., 3 . Distribution and fluctuational behavior of generated wind capacity factor scenarios from historical training data ("Historical"), normalizing flow ("Normalizing Flow"), Gaussian copula ("Copula"), and W-GAN ("W-GAN") in comparison to historical test data ("Realization") (Open power systems data, 2019). Left: marginal probability density function (PDF) estimated using kernel density estimation (Parzen, 1962) , center: quantile-quantile plots (Q-Q-plots) (Chambers, 2018) , right: power spectral density (PSD) estimated using Welch's method (Welch, 1967) . they produce scenarios with higher capacity factors than the maximum realized capacity factor. The reason is that for days with the highest capacity factor of the year, even higher capacity factors are still feasible as the realizations never reach the full installed capacity. Also, the log tail of the PDF makes the offset appear inflated as it only occurs for the 99-th and 100-th percentile. Note that both Copula and W-GAN are restricted to sample from the [0,1] interval via the boundaries of the inverse CDF ) and the tanh output activation function, respectively. Meanwhile, the normalizing flow has no such restriction and yields some scenarios surpassing 1, which leads to the normalizing flow having the strongest deviation in the Q-Q-plot. Although these scenarios are theoretically infeasible, they have a very low probability and can efficiently be removed in postprocessing. The PSD in Figure 3 shows a good match of the frequency behavior by the historical scenarios, the normalizing flow, and the Gaussian copula. The W-GAN is close to the overall power-law, i.e., the slope of the PDF curve, of the data, but fails to match the exact frequency behavior. In addition to the analysis of the full scenario sets in Figure 3 , we also compute the energy score (ES) for each day in 2019. ES is a quantitative measure for the assessment of multivariate scenario generation models that compares the conditional scenario set with the respective realization (Gneiting et al., 2008; Pinson and Girard, 2012) : Here, x is the realization vector,x s are the scenario vectors, N S is the number of scenarios, and || · || 2 is the 2-norm. The energy score is a negatively oriented score, i.e., lower values indicate better results. The two parts of the energy score reward closeness to the realization and diversity of the scenario set, respectively. In Figure 4 , we display the energy score for each day in 2019 as well as boxplots that showcase the overall energy score distributions for the historical data and the three different models. The normalizing flow energy score is lower on average compared to the Gaussian copula and the W-GAN, indicating a better fitting of the realizations and more diverse scenarios. The Gaussian copula shows the highest energy score which is likely a result of the outliers observed in Figure 2 . Furthermore, the normalizing flow leads to the lowest spread of energy scores, indicating consistently good results. Meanwhile, the historical scenarios consistently result in significantly higher energy scores compared to the conditional day-ahead scenario generation methods. By design, the unconditional historical scenarios do not identify the daily trends and are not generated specifically for the respective days. Thus, the deviations from the realizations penalized by the energy score are significant for most days. . Historical-("Historical") and generated scenarios from normalizing flow ("Normalizing Flow"), Gaussian copula ("Copula"), and Wasserstein-GAN ("W-GAN"). Boxes indicate quartiles and diamonds indicate outliers (Waskom, 2021) . Note different y-scale for historical ES. In conclusion, we find that the conditional normalizing flow presented in Section 2 generates scenarios that match the true distribution of realizations closely, while also providing a diverse set of possible realizations. Furthermore, the normalizing flow outperforms the Gaussian copula and the W-GAN with respect to all important metrics. The historical scenarios describe the overall distribution well, but are not specific to the individual days and, hence, return poor results in day-ahead problem-specific metrics like the energy score. We apply the scenarios generated in the previous section in a wind producer bidding problem based on Garcia-Gonzalez et al. (2008) and Conejo et al. (2010) . We first state the problem formulation and then analyze the obtained profits based on the different scenario sets. Finally, we investigate the reliability of small scenario sets based on the criteria defined by Kaut and Wallace (2003) . We consider the deterministic equivalent formulation (Birge and Louveaux, 2011) of the stochastic wind producer problem from Garcia-Gonzalez et al. (2008) and Conejo et al. (2010) shown in Figure 5 that aims to find an optimal bidding schedule for the operator of a wind farm participating in the European Power Exchange (EPEX SPOT) market (European Power Exchange, 2021). First, the operator places bids at the day-ahead auction market one day prior to delivery and, thereby, commits to deliver a certain amount of electricity P D t during the given trading time interval t. The revenue made is given by λ D t P D t δ h , where λ D t is the day-ahead price and δ h = 1 h is the trading interval. As wind electricity generation is stochastic and non-dispatchable (Conejo et al., 2010) , the placed bids may not always be met by the actual production. To balance the difference between the placed bids and the actual production we allow for a small electricity storage that can store the electricity of up to 15 min of maximum production. For any remaining production imbalance, we enforce a penalty on the absolute value of the imbalance (Garcia-Gonzalez et al., 2008) . The full objective then reads (Garcia-Gonzalez et al., 2008) : Conejo et al. (2010) with generated electricity P s,t,q , (dis-) charging rates P in s,t,q and P out s,t,q , placed bids P D t , and day-ahead electricity prices λ D t . The indices s, t, and q indicate scenarios, hourly time intervals, and quaterhourly time intervals, respectively. Here, ∆ t,s is the imbalance at time point t and scenario s. The penalty term is based on the absolute values of the day-ahead price |λ D t | to compensate for possible negative electricity prices (Garcia-Gonzalez et al., 2008) . The penalty term in Equation (4) contains the absolute value operator | · |, leading to a nonlinear problem. However, any positive deviation can be avoided via curtailment of the plant and the imbalance will only take negative values in practice, which makes the absolute value operator obsolete. Thus, the absolute imbalance is substituted by its negative parts to obtain a linear problem (Conejo et al., 2010) . The complete linear formulation of the wind producer market participation problem including the electricity storage is shown in Problem (WP). max P D t ,P in s,t,q ,P out s.t. SOC s,t,q = SOC s,t,q−1 + ηδ q P in s,t,q − 1 η δ q P out s,t,q , ∀s ∈ S, ∀t ∈ T , ∀q ∈ Q SOC s,t=24,q=4 = SOC 0 , ∀s ∈ S ∆ − s,t ≤ δ h P D t − δ q q∈Q P s,t,q − P out s,t,q − P in s,t,q , ∀s ∈ S, ∀t ∈ T 0 ≤ P D t ≤ P D,max , ∀t ∈ T 0 ≤ ∆ − s,t , ∀s ∈ S, ∀t ∈ T 0 ≤ P in s,t,q ≤ P max , ∀s ∈ S, ∀t ∈ T , ∀q ∈ Q 0 ≤ P out s,t,q ≤ P max , ∀s ∈ S, ∀t ∈ T , ∀q ∈ Q 0 ≤ SOC s,t,q ≤ SOC max , ∀s ∈ S, ∀t ∈ T , ∀q ∈ Q Tables 1, 2, and 3 list the indices, parameters, and variables of Problem (WP), respectively. The problem is implemented in pyomo (Hart et al., 2017) , version 6.2, and solved using gurobi (Gurobi Optimization, LLC, 2021), version 9.5. Note that in Problem (WP), simultaneous charging and discharging of the storage is feasible, however, does not occur in the optimum due to the losses associated with using the storage. The problem operates on both the trading time scale with hourly intervals and the production time scales with 15 min intervals. 6 . Boxplot of profits obtained in 2019 in Problem (WP). Each problem uses 100 scenarios generated from the historical data ("Historical"), the normalizing flow ("Normalizing Flow"), the Gaussian copula ("Copula"), and the W-GAN ("W-GAN") or the realization for the perfect foresight ("Realization"), respectively. Boxes indicate quartiles and diamonds indicate outliers (Waskom, 2021) . Tab. 4. Average, average percentage, and maximum expected value of perfect information (EVPI) (Birge and Louveaux, 2011 ) of actual profits obtained over all days in 2019 with 100 scenarios each generated from normalizing flow, Gaussian copula, and W-GAN, respectively. Best results are marked in bold font. Solving Problem (WP) yields a fixed schedule of electricity delivery commitments for the day-ahead market P D t . By fixing P D t and solving Problem (WP) with the realized electricity production instead of the scenarios, we can compute the actual profits. To gain statistically relevant results, we solve Problem (WP) for each day in 2019, each time using 100 historical or generated scenarios. Figure 6 shows box plots of the distribution of profits obtained by using scenarios from the historical data and the three different generation methods in comparison to the perfect foresight problem (Realization). The profits obtained by using the normalizing flow scenarios are the highest, while the Gaussian copula scenarios yield profits between the normalizing flow and the W-GAN. The profits obtained by using the unconditional historical scenarios are significantly lower than those of all models generating day-ahead scenarios. Table 4 lists the average and maximum expected value of perfect information (EVPI) (Birge and Louveaux, 2011) , i.e., the difference between the scenario-based profit and the profit with perfect foresight, in 2019. The average EVPIs show that the normalizing flow scenarios yield solutions that are on average 6% and 12% points closer to the perfect foresight profit compared to the Gaussian copula and the W-GAN, respectively. Meanwhile, the historical scenario profits are over 80% lower on average than the perfect foresight solutions. The maximum EVPI, i.e., the worst performing days, also show that the normalizing flow scenarios give significantly more profitable results compared to the other generation methods. The higher profits obtained using the normalizing flow scenarios reflect the findings of Section 4, i.e., the normalizing flow identifies the correct trends and also reflects a diverse distribution. Meanwhile, the Gaussian copula shows outliers and does not match the distribution, and the W-GAN even struggles to identify the daily trends. The unconditional historical scenarios do not describe the daily trends and, thus, result in significantly lower profits in the day-ahead scheduling optimization. Hence, the results shown in Figure 6 and Table 4 confirm that the normalizing flow generates the best scenarios and yields the most profitable bids. Historical Normalizing Flow Copula W-GAN Realization Fig. 7 . Boxplot of the objective function, i.e., expected profit, distributions for 50 iterations with five scenarios from the historical data set ("Historical"), normalizing flow ("Normalizing Flow"), Gaussian copula ("Copula"), and W-GAN ("W-GAN"), respectively. Boxes indicate quartiles and diamonds indicate outliers (Waskom, 2021) . The perfect foresight objective ("Realization") is depicted as a black bar, i.e., shows zero variance, as there is only one realization. Next, we analyze the reliability of the scenarios generated using the three different methods based on the criteria defined by Kaut and Wallace (2003) , i.e., we consider reliability to indicate whether different small scenario sets consistently yield objectives close to the perfect foresight solution. In particular, we compare the spread of the objective values and the average EVPIs for small scenario sets. The reliability is particularly important, as larger and more complex problems often cannot be solved for a large number of scenarios due to the increasing computational complexity (Birge and Louveaux, 2011; Kaut and Wallace, 2003) . To analyze the reliability of the scenario generation methods, we solve Problem (WP) for each day of the year and each scenario generation method 50 times using small scenario sets of only five scenarios, each. Figure 7 shows box plots of the objectives using small scenario sets from the historical data set, normalizing flow, Gaussian copula, and W-GAN, as well as the perfect foresight objective (Realization) for ten randomly selected days from 2019. Note that Figure 7 shows the expected profit as opposed to the actual profits shown in the previous section. As an indicator of reliability, we look at the range of objectives, i.e., the height of the box plots, that results from the 50 different scenario sets. For the ten randomly selected days shown in Figure 7 , the normalizing flow scenarios result in the lowest spreads. The Gaussian copula scenarios show significantly larger spreads than both the normalizing flow and the W-GAN scenarios. Meanwhile, the randomly selected historical scenario sets lead to by far the largest spreads. It appears that the outlier scenarios of the Gaussian copula observed in Figure 2 are weighed more significant in the case of only a few scenarios. As the normalizing flow shows no extreme outliers and identifies the overall trends well, the distributions described by small scenario sets are closer to the true distribution. The normalizing flow shows some outliers in Figure 7 , however, for the ten presented days, these are few and typically smaller than the spread of the Gaussian copula and W-GAN objectives. Table 5 shows statistics derived over all days in 2019, namely, the average standard deviation, the max-min spread, i.e., the difference between the maximum and minimum objective value, and the average EVPI of the objective. The results in Table 5 confirm the observation from Figure 7 that normalizing flows yield the most reliable scenarios with the lowest standard deviation and lowest spread. The average EVPI shows that the normalizing flow is consistently closest to the perfect foresight objective. In conclusion, the normalizing flow yields the most reliable decisions among the three considered methods. Tab. 5. Average standard deviation (StD), average max-min spread (Spread), and average expected value of perfect information (EVPI) (Birge and Louveaux, 2011) The present work considers the scenario generation problem for a day-ahead bidding problem of a wind farm operator to participate in the EPEX spot markets. We utilize a data-driven multivariate scenario generation scheme based on conditional normalizing flows to model the distribution of wind capacity factor trajectories with wind speed predictions as conditional inputs. The generated scenarios are specifically tailored to stochastic optimization problems concerning the time frame between 00:00 am and 11:59 pm. We analyze the normalizing flow scenarios in comparison to randomly selected historical data and scenarios generated from other more established methods, namely, Gaussian copulas and Wasserstein generative adversarial networks (W-GANs), and compare them to the actually realized power generation in 2019. The historical scenarios reflect the overall distribution of realizations well, but fail to identify daily trends and show large spreads independent of the investigated day. Among the conditional scenario generation methods, the normalizing flow scenarios best reflect the realized power generation trends and their distributions while also displaying a diverse set of possible realizations. Meanwhile, the Gaussian copula results in uncharacteristic outliers and the W-GAN struggles to identify the main trends of the realizations. Furthermore, both Gaussian copula and W-GAN result in skewed distributions. To assess their value for stochastic programming, i.e., whether they lead to profitable and reliable decisions, the scenarios are applied in a stochastic programming case study that aims to set bids for electricity sales on the day-ahead market. The analysis of the results of all days of 2019 shows that there is a significant advantage of using day-ahead scenarios that are specifically tailored to the investigated day. Using randomly selected historical scenarios results in an average expected value of perfect information (EVPI) of over 80% while the conditional scenario generation methods return EVPIs between 10 to 23%. The bids placed using the normalizing flow scenarios obtain the highest profits and have the lowest EVPI, i.e., the solutions are closest to the perfect foresight profits. Furthermore, we showed how normalizing flows yield reliable scenarios that result in consistent solutions for small scenario sets. In particular, the normalizing flow scenarios result in the smallest standard deviation, the smallest spread, and the lowest EVPI in the objective values. In conclusion, utilizing conditional, day-specific scenarios in day-ahead scheduling problems leads to significantly more profitable and reliable decisions compared to relying on unconditional historical data. Furthermore, the conditional normalizing flow model yields high-quality scenarios that result in highly profitable and reliable solutions for stochastic programs, in particular, for small scenario sets. Therefore, we argue that normalizing flow scenarios have a high potential for scheduling problems that cannot be solved with a large scenario set. Wind speed [-] q0.0 q0.1 q0.2 q0.3 q0.4 q0.5 q0.6 q0.7 q0.8 q0.9 q1.0 Both normalizing flow and W-GAN use both Gaussian samples and conditional inputs as direct inputs to their respective ANN structures (Chen et al., 2018; . The Gaussian copula uses linear quantile-regression based on the conditional inputs to estimate the inverse cumulative distribution function (CDF), which is then used to transform the Gaussian samples . For more information on scenario generation using Gaussian copulas and W-GANs, we refer to and Chen et al. (2018) , respectively. Due to the significant dimensionality reduction from 96 to 18 dimensions, a small RealNVP model is sufficient. The employed model uses four affine coupling layers with fully connected conditioner models with 2 hidden layers with 9 neurons each. The Gaussian copula was implemented using the linear quantile regression in Skipper Seabold and Josef Perktold (2010) and the required inverse CDF is estimated using linear interpolation with 20 intervals. The model structures of the generator and * Manuel Dahmen, Forschungszentrum Jülich GmbH, Institute of Energy and Climate Research, Energy Systems Engineering (IEK-10), Jülich 52425, Germany E-mail: m.dahmen@fz-juelich.de Tab. 1. Layers of generator and critic for W-GAN . Attributes of layer types are: Linear (fully connected): Number of nodes, Conv (1D convolutional layer): (Number of filters, filter size, strides, padding), Conv-T (1D convolutional layer transpose): (Number of filters, filter size, strides, padding), and Reshape: (output dim 1, output dim 2, . . . ). The W-GAN is trained for 2000 epochs and the sampling dimensionality is 20. (Abadi and Agarwal, 2015) . The PCA is computed using the scikit-learn library (Pedregosa et al., 2011) . Towards principled methods for training generative adversarial networks Ancillary services in Great Britain during the COVID-19 lockdown: A glimpse of the carbon-free future Understanding and mitigating exploding inverses in invertible neural networks Introduction to stochastic programming Pros and cons of GAN evaluation measures Flows for simultaneous manifold learning and density estimation Scenario generation of aggregated wind, photovoltaics and small hydro production for power systems applications Graphical methods for data analysis Advances in clean and low-carbon power generation planning Model-free renewable scenario generation using generative adversarial networks Decision Making Under Uncertainty in Electricity Markets Validation methods for energy time series scenarios from deep generative models Principal component density estimation for scenario generation using normalizing flows Density estimation using Real NVP A probabilistic forecast-driven strategy for a risk-aware participation in the capacity firming market A deep generative model for probabilistic energy forecasting in power systems: Normalizing flows EPEX SPOT documentation Stochastic joint optimization of wind generation and pumped-storage units in an electricity market Modeling daily load profiles of distribution network for scenario generation using flow-based generative network MERRA-2 inst1_2d_lfo_Nx: 2d, 1-Hourly, Instantaneous, Single-Level, Assimilation, Land Surface Forcings Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds Generative adversarial nets Advanced Optimization for Process Systems Engineering. Cambridge Series in Chemical Engineering Optimum design of chemical plants with uncertain parameters Gurobi Optimizer Reference Manual Optimal process design under uncertainty Pyomo-optimization modeling in python Scenario generation for wind power using improved generative adversarial networks Evaluation of scenario-generation methods for stochastic programming Normalizing flows: An introduction and review of current methods Stochastic short-term integrated electricity procurement and production scheduling for a large consumer Bidding strategy for microgrid in day-ahead market based on hybrid stochastic/robust optimization Electricity markets around the world Changes in electricity demand pattern in europe due to COVID-19 shutdowns Challenges in process optimization for new feedstocks and energy sources Integrating Renewables in Electricity Markets: Operational Problems Open power systems data Normalizing flows for probabilistic modeling and inference On estimation of a probability density function and mode On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine Impacts of intermittent renewables on electricity generation system operation Evaluating the quality of scenarios of short-term wind power generation From probabilistic forecasts to statistical scenarios of short-term wind power production. Wind Energy: An International Journal for Progress and Applications in Wind Power Conversion Technology Novel approach for optimal process design under uncertainty Mathematical statistics: An introduction to likelihood based inference Optimization under uncertainty: state-of-the-art and opportunities Improved techniques for training GANs Waveletbased grid-adaptation for nonlinear scheduling subject to time-variable electricity prices Model-based bidding strategies on the primary balancing market for energy-intense processes Wind power scenario generation and reduction in stochastic programming framework Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators ANNbased scenario generation methodology for stochastic variables of electric power systems Residential short-term load forecasting using convolutional neural networks Seaborn: statistical data visualization Short-term optimal operation of hydro-wind-solar hybrid system with improved generative adversarial networks The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms Learning likelihoods with conditional normalizing flows Scenario forecasting of residential load profiles Risk-based integrated production scheduling and electricity procurement for continuous power-intensive processes Enterprise-wide optimization for industrial demand side management: Fundamentals, advances, and perspectives. Chemical Engineering Research and Design Optimized operation of cascade reservoirs considering complementary characteristics between wind and photovoltaic based on variational autoencoder Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks TensorFlow: Large-scale machine learning on heterogeneous systems Model-free renewable scenario generation using generative adversarial networks Density estimation using Real NVP Scikit-learn: Machine learning in Python From probabilistic forecasts to statistical scenarios of short-term wind power production. Wind Energy: An International Journal for Progress and Applications in Wind Power Conversion Technology Statsmodels: Econometric and Statistical Modeling with Python Scenario forecasting of residential load profiles We would like to thank Marcus Voß (Technical University of Berlin, Distributed Artificial Intelligence Laboratory) for his valuable input on Copula methods, scenario evaluation, and the supervision of L. Paeleke. This work was performed as part of the Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE) and received funding from the Helmholtz Association of German Research Centres.