key: cord-0595521-q905cxmm authors: Priesmann, Jan; Munch, Justin; Ridha, Elias; Spiegel, Thomas; Reich, Marius; Adam, Mario; Nolting, Lars; Praktiknjo, Aaron title: Artificial Intelligence and Design of Experiments for Assessing Security of Electricity Supply: A Review and Strategic Outlook date: 2021-12-07 journal: nan DOI: nan sha: 0e9903ecd19d6231a795cc52b4c6c69c7cad0a37 doc_id: 595521 cord_uid: q905cxmm Assessing the effects of the energy transition and liberalization of energy markets on resource adequacy is an increasingly important and demanding task. The rising complexity in energy systems requires adequate methods for energy system modeling leading to increased computational requirements. Furthermore, with complexity, uncertainty increases likewise calling for probabilistic assessments and scenario analyses. To adequately and efficiently address these various requirements, new methods from the field of data science are needed to accelerate current methods. With our systematic literature review, we want to close the gap between the three disciplines (1) assessment of security of electricity supply, (2) artificial intelligence, and (3) design of experiments. For this, we conduct a large-scale quantitative review on selected fields of application and methods and make a synthesis that relates the different disciplines to each other. Among other findings, we identify metamodeling of complex security of electricity supply models using AI methods and applications of AI-based methods for forecasts of storage dispatch and (non-)availabilities as promising fields of application that have not sufficiently been covered, yet. We end with deriving a new methodological pipeline for adequately and efficiently addressing the present and upcoming challenges in the assessment of security of electricity supply. Assessing the security of electricity supply is an increasingly important and demanding task. In particular, depicting the effects of the energy transition and liberalization of energy markets on resource adequacy is becoming more relevant and challenging. This is mainly due to the rising complexity in energy systems [1] calling for efficient and adequate methods for energy system modeling (see e.g. [2] for the case of optimization models). As a direct reaction to this, the European Agency for the Cooperation of Energy Regulators (ACER) has proposed a comprehensive set of requirements that assessments of resource adequacy in the European context should fulfill: the so-called Methodology for the European resource adequacy assessment (ERAA methodology [3] ). While high standards for assessments of security of electricity supply are formulated in that document, their implementation in practice comes with some challenges. These include the following:  conducting prognoses of electricity demand for all countries that are part of the European interconnected grid in future scenarios,  forecasting unavailabilities of power plant units accounting for common-mode events 1 and temporal linkages 2 ,  simulating international power flows,  depicting storage dispatch,  accounting for climate change in the weather models,  simulating market mechanisms that cause incentives for (des-)investments in electricity assets, and  appropriately representing uncertainties in the aforementioned areas. These challenges can be summed up in three key points: (1) Improving the availability and the quality of input data, (2) improving data forecasts, and (3) reducing the computational complexity of models to allow incorporating the additional requirements listed above into the probabilistic assessment models. Probabilistic models for the assessment of supply security that tackle the aforementioned challenges turn out to be computationally complex (i.e. requiring a high amount of computational resources such as core-hours or random access memory (RAM)). The detailed analysis of different future scenarios is therefore limited by the necessary hardware availability and computing time. Hence, only a few scenarios can be evaluated adequately. For reducing the model complexity, the reduction of input data [9] , the reduction of depicted systemic complexity [10] , and metamodeling approaches can be applied. For the case of metamodeling, Nolting et al. [11] showed that particularly approaches from the fields of artificial intelligence (AI) and design of experiments (DOE) seem to be promising for mapping the relationships between model input variables and model results without encountering limitations in terms of available computing resources. However, the overall potential of AI-based methods in the context of the assessment of security of electricity supply in systems with high shares of renewable energy sources (RES) has not yet been systematically evaluated. Hence, the goals of this review are to (1) identify relevant methods and algorithms from the field of AI and DOI, (2) to associate potential fields of application, and (3) to synthesize the findings and provide a strategic outlook on how to beneficially embed AI-based methods within the assessment of security of electricity supply. By conducting a review of existing approaches and providing an outlook on their potential to enhance resource adequacy assessments, we substantially contribute to the existing body of literature. Figure 2 shows the process applied for conducting the systematic review. The process of our systematic review is illustrated in Figure 2 . We start with the conceptualization of the review by defining and filling the two dimensions (1) fields of application and (2) AI-based methods. These two dimensions are based on previous works such as [11, 12] . We then conduct a first qualitative literature review to identify keywords and assign the keywords to the categories defined in the first step. Using these keywords, we conduct a large-scale literature review using the Scopus database, the provided application programming interface (API), and the package pybliometrics [13] . After generating our article database, we conduct a second qualitative literature review and identify patterns in model specifications and trends in the use of AI-based methods. Finally, we summarize our findings across all fields of application and AI-based methods. The remainder of this manuscript is structured as follows: In section 2, we provide a brief overview of methods to assess security of electricity supply. In section 3, we review existing AI methods. The metamodeling approach is introduced in section 4. Section 5 is dedicated to identifying fields of application regarding the AI-based forecast. In section 6, we demonstrate metamodeling to be another promising field of application and show its linkage to DOE. We conclude in section 7 by summarizing our findings and deriving their implications for the applicability of AI and DOE for assessments of security of electricity supply. While security of electricity supply covers multiple dimensions and there is a broad range of definitions (see [14] , we will focus on resource adequacy in the sense of the ex ante evaluation of the energy system's ability to cover electricity load. For assessing this ex ante perspective, two general approaches can be distinguished: On the one hand are rather straightforward deterministic capacity balances between secured feed-in power and electricity load during the hour of peak load. On the other hand, complex probabilistic simulations in hourly resolution are used to determine key figures of supply security under consideration of stochastic influences on (1) the availability of fossil power plant blocks, (2) the fluctuating feed-in of renewables, and (3) electricity load. Both approaches have been applied in various studies by consulting companies, research institutions, and transmission system operators (TSOs) in different contexts. Table 1 provides an overview of existing studies, methods used and core results achieved 3 . Figure 3 summarizes the essential characteristics and common implementations of the two model classes. Here, it can be seen that deterministic capacity balances represent rather straightforward, top-down models to derive non-probabilistic key figures such as capacity margins. They are usually conducted for one hour per year (i.e., the hour with the highest electricity load) and consider only one (often worst-case) weather situation. On the other hand, probabilistic simulation models represent rather complex, bottom-up models that are used to calculate stochastic key figures such as expected loss of load durations per year. They are commonly performed in hourly resolution and reflect different weather situations (so-called historic weather years 4 ). Table 1 . From the sheer number of studies, the range of different and often opposing key findings, and the heterogeneity of the authors and principals, it can be concluded that there is a considerable need to provide the scientific basis for sound assessments of security of electricity supply. Further, the band of uncertainty of the results that comes with the different input data calls for assessing a larger variety of scenarios to depict possible future developments. We hence transfer methods from the fields of AI and DOI to contribute to more sound assessments of security of electricity supply. Although there is no commonly agreed-upon definition of the term "Artificial intelligence", it is typically used to describe behavior exhibited by computers that was initially thought to require (human) intelligence [33] . There is, however, a consensus on the distinction between strong or general AI, which mirrors the capabilities of intelligence as a whole on the one side and weak or narrow AI, which is developed in order to solve specific problems on the other side [34] . The methods explained in this paper are problem-specific approaches that fit under the term of narrow AI. Moreover, they are also part of the realm of machine learning, a subfield of AI concerned with learning statistic relationships. Machine learning can thus also be regarded as a subfield of statistics. Contrary to other statistical measures, the exact nature of the statistic relationship (e.g., a functional relationship) between the input data and the output data is not explicitly defined but rather implicitly inferred by the machine learning model itself. Data is typically present as a number of samples or observations which have values in certain features [35] . If the data is imagined as a table, the columns typically denote the features, while the rows are the observations. Often, the term "feature space" is used when talking about data in a machine learning context. This stems from regarding features as the axis of andimensional space where every observation is one point in the space. Table 2 gives an example of features and observations for time series: The columns contain the electricity generated through various conversion technologies in Germany on 1 st June 2021, with each column corresponding to one feature. The observations are denoted through the time axis. In supervised learning, the data itself contains a set of variables to be explained, the so-called labels, and data that is used as explanatory input, the so-called features. The input data and labels are fed into the model which then learns the relationship between the two [37]. This is referred to as training. A part of the data is usually withheld from the model during training in order to test whether it generalizes well, i.e., whether it correctly predicts labels for data it has not been trained on. It is then possible to predict labels for input data for which no label is available (e.g., future values of a time series). Supervised learning can be further divided into regression, where the label is a metric variable, and classification, where the label is one of several discrete classes [36] . In unsupervised learning, no labels exist and, therefore, problem-specific measures are used to evaluate the model's quality. For example, clustering is the unsupervised counterpart to classification: In clustering, no classes are known a priori and the model creates its own classes, which are called clusters. Finally, reinforcement learning works with a cost function that defines rewards and penalties. This approach is particularly useful when it is impossible to cover all possible system states during training, such as when teaching machines to play games or in autonomous driving. [37] With regard to energy system modeling, AI methods are used broadly in three fashions: Preprocessing of relevant input data, forecasting of time series, and metamodeling energy system models. In the first case, an AI method (or model) prepares data for other models while in the second case, an AI method is used to forecast relevant data that might be fed into further energy system models or directly analyzed. The third case refers to using a conventional energy system model in order to generate data for training the AI model. That is, the AI model is used to model the conventional model's behavior and to allow for a broader scope of scenarios to be investigated (see [11] . The following section 3.1 provides an overview of AI-based methods used for forecasting, while section 3.2 gives an overview of a selection of AI-based methods for data reduction in the context of energy system modeling. Both subsections are not to be understood as comprehensive reviews of all available AI-based methods, but rather as guides on models and methods that are of particular interest to energy system modelers looking to integrate AI into their research. In section 3.3 methods for evaluating the accuracy of AI-based models are presented. Detailed applications of forecasting methods and AI-based metamodels will be discussed in sections 4 and 5. The methods presented in the following belong to the field of supervised learning, i.e. they are concerned with learning the relationship between input features and a label to be predicted. The methods can be applied for data consolidation, forecasting, and metamodeling. Data consolidation comprises the handling of data gaps or multiple data sources that need to be merged. Forecasting refers to extrapolating historical information (e.g. on electricity load or renewable feed-in) into the future. Metamodeling simulates the behavior of system models by learning the relationship between the model input and output data (for more information on metamodeling see sections 4 and 6). Artificial neural networks (ANNs) are a versatile and powerful tool for forecasting, which are used for a multitude of applications, including image recognition, natural language processing, and time-series forecasting. The mathematical concept of ANNs is inspired by the human brain: Like a biological brain, a neural network consists of neurons that exchange information [38] . Like a biological neuron, an artificial neuron receives inputs from other neurons, which are weighted and summed. The resulting sum is then put through an activation function (e.g., a sigmoid function). The output of the activation function constitutes the neuron's output, which is in turn part of the next neuron's inputs [39] . Figure 4 shows the structure of a basic neural network. An ANN consists of several layers, each layer, in turn, comprising one or multiple neurons [40]. All ANNs have an input layer, which receives the inputs fed into the model, and an output layer, which contains the model's output. In addition, more layers may be added between the input and the output layer. These layers are called hidden layers and a model including hidden layers is called a deep learning model [39] . In a so-called densely connected ANN, all neurons within one layer are connected to all neurons in the preceding and the next layer. These networks constitute the basic form of neural networks and are called feed-forward neural networks (FFNNs) or multilayer perceptrons [38] . Other network architectures, which are particularly well-suited for specific purposes, have been developed and will be briefly described next. Convolutional Neural Networks: Convolutional Neural Networks (CNNs) are a type of neural network particularly suited for dealing with data that is inherently structured in a grid-like fashion [41] . In general application, CNNs are most known for having been used with great success in image recognition and image classification [40]. In energy system modeling, they find use for example as tools for detecting and classifying power quality disturbances (e.g. [42,43]), in order to prepare data for other models (e.g. [44, 45] ), or for feed-in and load forecasting (e.g. [42, 46] ). CNNs make use of convolution, a mathematical operation whereby one mathematical function is averaged using a second function. In machine learning, convolution is not performed on continuous functions, but discrete data, making it a matrix multiplication [38] . During convolution, the input data is passed over with a kernel, which can be understood as a set of weights with which every entry in the input data and its surrounding entries are weighted and then summed. After the convolution, an activation function (as in fully connected, linear ANNs) is employed and the data is pooled. Pooling maps the input data to a reduced output. Support vector machines (SVM) are a method mainly used for binary classification. The principle derives from the idea of creating a hyperplane that divides the feature space into two areas, each of which contains the observations belonging to one class [54] . The hyperplane in an -dimensional space is always − 1-dimensional. For example, in a two-dimensional space, the hyperplane is a line dividing the space into two areas, with each class lying on one side of the line [55] . Support vector machines are an extension of the hyperplane approach that remedies some of its drawbacks, such as the inability of a linear hyperplane to correctly depict a non- The Gaussian process is a stochastic process in which every random variable is assumed to be multivariate normal distributed. As a multivariate normal distribution is defined by a mean vector and a covariance matrix, a Gaussian process is defined by a mean and a covariance function. Generally speaking, the covariance function describes the similarity between the random variables and defines the smoothness of the function [60]. When a Gaussian process regression (GPR) is applied to supervised learning problems, it provides a distribution over functions, inherently utilizing uncertainties. GPR can be used for a variety of tasks in a supervised learning setting, for example, anomaly detection [61] or as a basis for model predictive control [62] . Due to computational limitations, GPRs are generally used on small to medium-sized data sets, but recent work is exploring ways into handling big data problems [63]. Another deep learning methodology is the so-called transformer (a.k.a. X-former), which was presented at the Neural Information Processing Systems conference in 2017 (Vaswani et al., 2017) . Transformers are mainly used in natural language processing, computer vision, speech processing, and audio processing and using the mechanism of "attention". In this area, for example in aspect-level sentiment classification, attention mechanisms have also been added prior to LSTMs (AT-LSTMs) to achieve better results (Wang et al., 2016) . For various use cases, Transformer-based pre-trained models (Qiu et al., 2020) can achieve state of the art, so they are preferred especially in the field of natural language processing. (Lin et al., 2021) The transfer of the methodology to the forecasting of time series is discussed [64] . Here, transformers are used to predict synthetic and real-world datasets (electricity and traffic). In this work, two weaknesses of transformers were uncovered when used to predict time series. Direct modeling of long time series is not feasible because the space complexity of the canonical transformer grows quadratically with sequence length (leading to a memory bottleneck). Second, there is a susceptibility to anomalies in time series due to the insensitivity of the pointwise dot-product self-observation in the canonical transformer architecture to the local context. This local context provides information on whether the pattern of the time series is changing due to an event (e.g., holiday), a change point, or an anomaly. The authors present two potential solutions to these problems. It is proposed that problem of susceptibility to anomalies in time series can be solved using convolution self-attention. To remove the memory bottleneck, LogSparse transformers are proposed, which reduce the dot products to be calculated. Based on their results, Li et al. (2019) conclude that transformers can capture longterm dependencies better than LSTMs. Decision trees are a tool used in regression and classification that represents a sequential perspective on machine learning [65]. The learning process of a decision tree works by repeatedly dividing the data depending on the input features and assigning a label to the groups of observations created in this way. Starting from the whole feature space, one of the features is selected and a threshold value for this feature is defined, dividing the data into two groups. Each observation is then assigned to one group, depending on which side of the threshold it lies on [66] . This way, the feature space is divided into two subspaces, each of which is then assigned a predicted label (e.g., the mean label of its observations in regression or the most frequent class of its observations in classification). The feature and threshold which serve as a decision boundary are selected according to which selection minimizes the prediction error [55] . This process of feature space division is repeated iteratively, leading to smaller and smaller subgroups with finer predictions [55] . This can be visualized in a hierarchical tree structure, giving decision trees their name. An exemplary visualization of a decision tree is given in Figure 5 . Their capability of being easily visualized makes decision trees easily interpretable, which is one of their greatest strengths [66] . Like other supervised learning methods, in energy system analysis, they find use in various forecasting contexts, e.g. in predicting buildings' energy consumption (e.g., [67]). Ensemble methods are hybrid solution strategies. The idea of hybridization is based on the no-free-lunch theorem [68] . The theorem states that there is no single optimal algorithm for every optimization problem. A hybridization strategy utilizes multiple algorithms and combinations of model results to improve optimization techniques [69] aiming for better overall model performance in terms of the speed-accuracy-complexity tradeoff [70] . The ensemble training process can be distinguished into bootstrap aggregation (bagging) and boosting strategies [69] which are described in the following. Bootstrapping describes a resampling method to train and validate models by using random subsets of the data set [71] . Bootstrap aggregation (bagging) is a method that (1) trains multiple models in parallel, i.e. independently based on different bootstrap samples (data subsets), and (2) creates an overall prediction by averaging all model prediction results [72] . Averaging predictions of several bootstrap sample models reduces the variance component of the overall generalization error [66, 73] . The bagging method can be applied to various model architectures, e.g. SVMs (Kim et al., 2003; Drucker and Cortes, 1996) , k-nearest neighbors [74], or random forests [75] . Random forest models have become the most popular approach for applying bagging to decision trees [55, 73] . This is based on the extension of the bagging process in terms of resampling the data by its samples and features. Boosting: Boosting is an ensemble method to train multiple models sequentially. Successive models attempt to optimize the overall model performance based on the knowledge of previous model error [76] [77] [78] . This approach is different from bagging where models are trained in parallel without knowledge of the performance of other trained models. Preprocessing data has a wide variety of purposes, such as consolidating multiple data sources, filling data gaps, reducing computation time by aggregating data fed into an energy system model, or boosting a subsequently used machine learning model's prediction quality by filtering and preparing relevant features. Energy system modeling usually involves handling time series data. Time series are in turn often the result of several effects superimposed over each other. Thus, decomposing a time series into its components (e.g. constant, cyclical, and trend components) can help interpret and model time series data [86] . Recently, there has been research conducted focusing on the combination of decomposition and feature selection as preparation for a forecasting model (see e.g. [87] and [88] ). Feature selection refers to deciding on which features to feed into the model [89] . This is particularly potent in combination with time series decomposition methods, as it can allow keeping only those components of a time series that are both meaningful and predictable. This can increase the quality of the prediction while not needlessly adding unnecessary dimensions to the input data. Typically, feature selection approaches are classified as filters, wrappers, or embedded methods [89] . A filter is independent of the subsequent model the selected features are fed into, while a wrapper "wraps around" a predictive algorithm [90] . The inner algorithm's performance when being fed different selected features is used to decide which features are selected. Embedded methods are predictive methods that inherently include some sort of learning that can be used for feature selection [35]. Clustering is useful in energy system modeling as a tool for reducing the amount of data fed into the energy system model. Clustering unveils the structure inherent in the data by allocating the observations to distinct groups. Generally, the aim is to find clusters of observations that are as dissimilar to each other as possible while the objects (or observations) within the clusters are as similar to each other as possible [91] . Usually, these similarities are measured by a (dis)similarity or distance measure [92] . The clusters are not known beforehand but rather formed by the algorithm during the clustering process. Thus, clustering is an unsupervised learning technique: there is no a-priori known correct cluster allocation. Clustering data has a variety of uses: On the one hand, it can help identify patterns that are not easily discernable in the original dataset. On the other hand, clusters can reduce the amount of data by identifying representative cluster centers (typically expressed through the mean, centroid, or medoid of the observation's features). In energy system analysis, it often finds use as a data reduction method, particularly for time series data. Clustering time series (i.e. time series aggregation) allows representing the data through a considerably lower number of data points, which can reduce model runtimes while also impacting model accuracy. This principle is visualized in Figure 7 Figure 8 shows the number of articles on some of the most popular clustering algorithms in the field of energy system modeling since 2010 5 . A variety of clustering algorithms exist and choosing a well-suited algorithm is a highly data-and problem-specific task. Therefore, we focus on a brief introduction of two of the most popular clustering approaches: k-means and hierarchical clustering. There is a considerable body of literature on clustering (e.g. [91], [92], or [97]), which the reader may consult for a more comprehensive overview. k-means: k-means is a widely used clustering approach that creates clusters by allocating observations to cluster centers in such a fashion that the overall sum of distances between the data points and their nearest cluster centers is minimized [36] . In a first step, cluster centers are initialized. Then, all observations are allocated to the cluster center nearest to them (measured in Euclidean distance), forming the initial clusters. Next, the algorithm recalculates the cluster centers' coordinates, i.e. the "column means" across all observations allocated to each cluster. This process of allocating observations to clusters and recalculating the cluster centers is repeated iteratively until a convergence criterion is met [91]. The advantage of k-means is its ease of use combined with its tendency to create evenly sized clusters. However, not all data is suited to be evaluated using Euclidean distance. In particular, non-metric data is difficult to cluster with k-means. In addition, the desired number of clusters needs to be specified in advance, which demands a priori knowledge on the dataset [91]. Some heuristics exist in order to determine suitable cluster numbers. k-medoids is a clustering approach similar to k-means. The main difference between the two approaches are: (1) in k-medoids, the representative cluster centers are not calculated as means, but rather, selected out of the elements that form the cluster and (2) k-medoids is compatible with similarity measures other than Euclidean distance [98] . However, k-means is still the vastly more popular clustering algorithm in energy system analysis, as evidenced by Figure 8 . Hierarchical clustering: Hierarchical clustering gains its name due to the fact that it results in a hierarchy of clusters. There are two approaches towards this: Agglomerative and divisive clustering. In agglomerative clustering, the algorithm starts by assigning each observation to its own cluster. Next, the similarities between all clusters are calculated and those two clusters that are the most similar to each other are merged. The merging of clusters is repeated until a desired number of clusters is reached. Similarly, in divisive clustering, all observations are initially part of one cluster, which is subsequently divided into subclusters [91]. There are several ways of determining which clusters are most similar [91, 92] . This makes hierarchical algorithms flexible but requires an in-depth understanding of the data. Therefore, a certain level of expertise and heuristics are needed to cluster data well with hierarchical algorithms. In energy system modeling, hierarchical clustering is often used for time series aggregation. Liu and Sioshansi [99] , for example, employ hierarchical clustering approaches in order to find representative time periods for capacity modeling. Hoffman et al.'s [9] review also includes time series aggregation methods based on hierarchical clustering. While data in a two-dimensional space (i.e., with two features) can be easily visualized and understood, data with tens of dimensions or more is impossible to understand visually. In addition, high-dimensional data often poses problems (curse of dimensionality, see e.g. [100]), is difficult to handle mathematically, or carries with it a high computational cost [101,102]. For example, distance measures like those used for clustering lose meaning in high-dimensional spaces since the data becomes naturally sparse [103] . That is why finding a representation of high-dimensional data in a lower-dimensional space is often desirable. This is called reducing the dimensionality of the feature space. Thus, if the reduced-space representation is used for further algorithms and models (such as an ANN), the data to handle is reduced, resulting in faster computation times. Figure 9 contains a visual representation of this principle. There are several tools available to achieve this, widely-spread examples of which are Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Figure 10 shows the number of articles on some of the most popular methods for dimensionality reduction in the field of energy system analysis since 2010 6 . Principal Component Analysis: PCA assumes that the data's variance determines its meaningfulness. Thus, the lower-dimensional representation of the data is found by maximizing the data's variance [101] . This is done by identifying those "directions" in the data that possess the most variance, the so-called principal components. The principal components are created as linear combinations of existing features [55] . Mathematically, this is done by performing an eigenvalue decomposition on the data's covariance matrix [101] or by performing singular value decomposition (SVD) to the data itself, which tends to yield more robust results [104] . The user can select the number of dimensions they wish to transform the data into. Autoencoders: An autoencoder is a type of deep neural network design that has been existing for several decades [38,108-112]. The design has been seeing renewed interest throughout recent years due to its various application areas, e.g. dimensionality reduction, feature extraction or anomaly detection, and its capability of preserving and representing non-linear relationships of input data. The fundamental structure of an autoencoder is based on (1) an encoder to map the input space with a linear or non-linear transformation to a lower-dimensional latent space and (2) a decoder to map the salient input features represented by the latent space back to reconstruct the input space. Formally, an input example x ∈ X with X ∈ ℝ is mapped to a hidden representation ℎ( ) ∈ ℝ with < ∈ ℝ and ℎ( ) = ( 1 + 1 ), where represents a non-linear activation function, e.g. sigmoid function, 1 ∈ × a weight matrix and 1 ∈ ℝ a bias vector. The output layer decodes the latent space, i.e. hidden representation of input examples, aiming a reconstruction of ̃∈ ℝ with ̃= ( 2 ℎ( ) + 2 ), where 2 ∈ × is a weight matrix and 2 ∈ ℝ a bias vector. The training procedure minimizes the reconstruction error by finding parameters = { 1 , 2 , 1 , 2 } where ( ) = ∑‖ − x∈X̃‖ 2 Having introduced a variety of machine learning methods, the question that is yet to be answered remains which method to apply under which circumstances. This also includes evaluation of the model's quality or "goodness of fit". Unfortunately, there is no straightforward answer to these questions. Machine learning methods are highly problem-specific and as has been mentioned already, often deep knowledge of the data is necessary in order to decide which method to apply. Thus, there are guidelines for choosing models (such as in [113] ), but these are to be understood as recommendations rather than objective rules. Similarly, evaluating the quality of a particular model being applied to data is also a problemspecific and even an algorithm-specific matter. For supervised learning methods, model quality is ensured by separating a part of the data from the rest of the dataset and not using it when training the model. After the model has been trained on the rest of the dataset (the training set), this so-called test set is used to compare the model's prediction against the correct label Figure 11 for the calculation of the different error measures). This allows comparing the performance of different machine learning approaches for the same task. Ensuring the quality of an unsupervised learning approach, such as clustering or PCA, is less straightforward since these algorithms are designed for tasks where there is no correct label. For example, in clustering, a variety of generic (so-called external) evaluation metrics have been proposed, which allow comparing the results of different clustering algorithms. However, these metrics rely on external data (i.e. correct labels), which is often not available [117, 118] . So-called internal metrics rely only on the information inherent in the input data and are often based on the metric the algorithm tries to optimize [117], making them algorithm-specific. This makes different clustering results difficult to compare. Both supervised and unsupervised machine learning methods, however, share the necessity to perform so-called hyperparameter tuning. The various methods introduced in this paper typically require the user to set a number of parameters, the number of neurons and layers in an ANN, or the number of clusters in k-Means clustering [119] . The optimal set of hyperparameters often depend not only on the task at hand but also on the concrete dataset and the choice of hyperparameters can influence the model performance to a great extent [120]. Thus, careful choice of hyperparameters is important in order to achieve satisfactory results. Typically, hyperparameter tuning is performed manually or through heuristics, although automated approaches for this procedure are being investigated [119] Depending on the complexity of a simulation model and the available time span, the simulation duration can limit the scenario scope and thus the depth of analysis. A very effective method to increase the scenario scope of complex simulation models is the so-called metamodeling. To create a metamodel, information about the system behavior of the given simulation model is needed. In this section, metamodeling (see section 4.1) and a very effective method for minimizing the simulation effort to generate the required information about the system behavior, the so-called design of experiments (see section 4.2), are presented in more detail. The direct use of complex simulation models for in-depth analyses is only possible to a limited extent due to long simulation durations. Metamodels are able to achieve predictions with high quality in a few milliseconds. The term metamodel and the concept goes back to the works of . This method became more popular with the work of Kleijnen who extended it with some statistical tools [124] . These metamodels represent the system behavior of the simulation model by mapping a relationship between the input and output variables. This is realized by a mathematical approximation. Metamodels are also called approximation models, surrogate models, and response surface models [125] . Figure 12 schematically shows the procedure of metamodeling for the case of security of electricity supply assessment. Metamodels are generated from real simulation data and are valid for a predefined design space. The design space represents a multidimensional structure spanned by the input data of the simulation model and that comprises the complete range of all input data combinations. The boundaries of the design space are thus defined by the minima and maxima of the input data of the simulation model. A subset of this input data is selected as the feature set and the corresponding output data (that is calculated via the simulation model) serves as the label set. Together, these selected features and calculated labels constitute the sample to which metamodel is fitted. An illustration of a design space and selected features for a metamodel is given in Figure 13 . Once the metamodel has been created, the label, i.e. the output data, of any feature combination of the input data within the design space can be predicted. When selecting the feature sample from the input data, care must be taken that the amount of information gained is sufficient to represent the system behavior. At the same time, the effort to generate the information must be minimized. The most effective way to implement this is to use methods from design of experiments (s. section 4.2). For metamodels, a variety of methods can be applied which achieve varying degrees of accuracy depending on the problem at hand. Several classical statistical methods can be used. Among them are for example linear or polynomial regression. In many use cases, simulations have a more complex nature. This often leads to nonlinear relation between the input and output data which cannot be sufficiently approximated by classical statistical methods. For these cases, machine learning methods can be applied (see section 3.1 for details on a selection of methods). The choice of the approximation method and the experimental design depends on the problem at hand and the optimal choice, in most cases, cannot be determined a-priori (see section 6). For this reason, testing and validating different approaches is a very important part of metamodeling. To this end, many evaluation variables, such as the coefficient of determination R 2 , are available from statistics. Decisive for the testing of different metamodels is the quality of the validation data. These samples must be independent of the training data, which were used for the metamodeling itself. Furthermore, test data must be evenly distributed over the design space, so that the evaluation of the predictive quality is representative of the entire design space. [126] Design of experiments (DOE) is a method for efficient planning and designing experiments. Depending on the complexity, experiments can be costly and lengthy. As a consequence of a limiting budget or given time span it is rarely feasible and probably never reasonable to carry out a series of experiments in an unplanned manner. This is particularly the case for models that are intended to depict complex relationships of the energy system. To get the most information out of the system under different limiting circumstances DOE is applied. The most important part of the DOE is the selection of the appropriate design. In this subsection, a short overview of a selection of designs will be given. For reasons of brevity, we focus on describing basic features of the respective methodologies. In addition to these designs, there are several other important designs. Among them are screening designs, Box-Behnken designs, and the quasi-Monte Carlo method. Central composite designs: If a non-linear relationship is suspected, the above experimental designs are no longer sufficient. It becomes necessary to extend the ability of the metamodel to also consider the quadratic terms of the main effects. An experimental design often successfully used is the so-called central composite experimental design (CCD). The CCD can be thought of as an extension to the formerly described designs: A (fractional) factorial experimental design is extended by so-called "star" points (see Figure 15 ) as well as center points. The additional points added to the experimental design allow for evaluating potential quadratic effects. The star points extent the (fractional) factorial design space in most cases and are then called circumscribed. Choosing the distance of the star points to the (fractional) factorial design space is given by choosing a desired statistical property the final design should have. There is also a subtype of this design, the so-called face-centered-central-composite design. In this design, the factor combinations of the "star-shaped" design lie on the plane spanned by the corner points of the full/partial factorial experimental design. However, this worsens the representation of the non-linear behavior. Therefore, it is only used if a factor can only be varied as an integer. Another class of designs that are increasingly used in non-linear contexts, especially when dealing with computer-aided experiments (CAE), are the spacefilling designs. One example of those designs is the so-called Latin-hypercube design (LHD) [132, 133] . For the LHD, the factor combinations are determined using some sort of stochastic procedure, for example by using a randomly permutated array to construct the full design by a given logic [134] . This method is resembling the Monte Carlo method, where the factor combinations are also determined randomly. In contrast to this, the Latin Hypercube uses a methodology to ensure uniform coverage of the entire multidimensional design space (see for an exemplary LHD). If an LHD is well constructed, the variance of the global mean will be significantly lower than when using a random Monte Carlo field with the same number of test points. Basically, in an LHD the design space is first divided into zones. From this zoned design space, a random factor combination is then determined in each zone. A uniform and correlation-free coverage of the factor space is not automatically ensured. For this, further methods such as orthogonal designs or rather space-filling design would have to be applied [134] . There are a variety of possible methods for constructing an LHD. The selection is strongly dependent on the problem at hand. Recommended approaches are described for example by Moon DOE has been used successfully for a long time and the individual methods have a high standard. A very large potential for improvement and further development lies in a subarea of machine learning for whose application design of experiments is essential, the so-called area of active learning also known as query learning [138, 139] . In the statistical literature, this application area is also called optimal or adaptive DOE. Settles describes the basic idea of active learning as follows: "[…] that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns". The goal is to minimize the number of factor combinations while achieving a certain forecast quality in combination with metamodeling. Settle defines the term active learning in his work as follows: "Active learning systems attempt to overcome the labeling bottleneck by asking queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). In this way, the active learner aims to achieve high accuracy using as few labeled instances as possible, thereby minimizing the cost of obtaining labeled data". For the case of the assessment of security of electricity supply, this oracle is the probabilistic simulation model. Settles divides active learning into three main scenarios membership query synthesis [140], stream-based selection sampling [141,142], and poolbased sampling [143] . Figure 17 shows an illustration of these concepts. According to Settles [138] , there are a variety of different approaches to active learning. For example, one approach to metamodeling a simulation model using active learning is to add iterative factor combinations to a baseline experimental design. In each iteration, metamodeling and validation are repeated until the desired prediction quality is achieved. The difficulty is to scan the experimental space as effectively as possible and to select the factor combination that provides the greatest possible additional information benefit about the system behavior. For the selection of these factor combinations, there are different basic strategies on which also more current approaches are based. Among others are uncertainty sampling [143] , query by committee also known as ensemble-based strategy [144] , and expected error reduction [145] . Current work is focused on the further development of approaches for different application areas and optimization for different metamodeling methods. For example, neural networks with active learning still have a lot of potential. In particular, with the increased use of deep learning methods in recent years, a large research field of deep active learning has emerged. The combination of active learning and deep learning poses some challenges. In contrast to statistical approximation methods, the strengths of deep learning methods do not lie in showing where the uncertainty in the prediction is large. Also, iterative approaches are very computationally intensive since the network must be retrained in each iteration. A very first general overview of this very broad field is given by Ren A possible methodical approach to metamodeling simulation models is demonstrated in the work of Reich et al. [148] . This approach is done using a model for the simulation of an energy supply systems but can also be applied to other simulation models. The authors conclude that using an LHD to sample the information to train an artificial neural network is the best approach for approximating the response of the analyzed energy supply system. Furthermore, LHDs and ANNs can be used more flexibly and can thus be better adapted to the problem at hand. For the presented approach, six steps are defined by the authors: Problem definition, defining the design space, developing experimental designs, developing approximation models, comparison, and validation as well as the system analysis (see Figure 18 ). In the following, the individual steps of the approach are presented only briefly and in generalized form. Problem definition: Depending on the problem, factors are determined which could have a decisive influence on the system behavior of the simulation model. In the second step, the design space is defined for which the later metamodel is valid. For this purpose, the boundaries of the factors selected in step one are set depending on the problem. To reduce the required factor combinations, the third step is the selection of the experimental design and transfer to the created design space. Developing approximation models: Once the simulation model has been used to generate the system information for the factor combinations of the experimental design, approximation methods can be used to create a metamodel. To evaluate how well the metamodel can approximate the systems response, the prediction quality must be evaluated. The meaningfulness of the prediction quality depends decisively on the quality of the test points. High-quality test points have two characteristics. Firstly, the test points have as uniform a distribution as possible in the design space and, secondly, they were not used to create the metamodel. System analysis: With a metamodel capable of representing system behavior, a variety of possible analysis options are possible. These are among others, the evaluation of the effects of individual input variables on the output variables, sensitivity analysis, large-scale scenario analysis, and multi-criteria optimization. In this section, we will introduce a selection of applications for machine learning in energy system analysis with a focus on the assessment of the security of electricity supply. We do not claim the list to be exhaustive but rather selected those applications that we regard as most relevant. For each field of application, we conducted a systematic literature research. A general distinction of the structure of forecasting models can be seen in Table 3 . The information provided is a simplified representation of model structures and does not account for feedback or other more complex variants. Forecasting electricity load profiles can be categorized into four different time horizons shown in Table 4 : Forecasting data: Electricity load profiles are given as time series with varying temporal resolutions. Electricity loads can be provided as an aggregated information, usually from minute to hourly averages [150] . Average values have the advantage that, aggregated, they represent total annual consumption. However, due to the aggregation, peak loads are usually neglected that would be better represented by maximum instead of average values. In turn, using the maximum function for aggregating temporal data would overestimate the total annual consumption. Hybrid solutions using averaging as an aggregation method and adding peak load values are used to compensate for the individual disadvantages of the aggregation methods. Electricity loads can be subject to different aggregation levels [151]:  Sectoral aggregation: Total load profiles vs. sectoral load profiles  Regional aggregation: No regional differentiation vs. including regional identifiers  Temporal aggregation: Time series data in sub-hourly, hourly, daily, or weekly resolution In principle, the less aggregated the data, the more accurate the representation of the real system. However, this principle does not apply if the quality of the data sample is low. The level of aggregation of the data should, therefore, be chosen so that the accuracy of the data at the chosen resolution meets the requirements. This means that data may need to be aggregated. The main features influencing electricity loads are calendrical information, meteorological (temperature and weather) data, and economic factors (such as future prices or prices of other energy carriers) [152] . Features, as well as the forecasting data, can vary in terms of sectoral, regional, and temporal aggregation. Table 5 shows a selection of studies on medium-and long-term forecasting and the features used in the forecasting models. Prediction models: Figure 19 shows the results of the systematic literature review on the development of electricity load forecasting models within the Scopus database. 7 Figure 19 : Number of scientific articles found on load forecasting in energy-related journals in the Scopus database from 2010 to 2020, categorized by AI-based methodology. Neural Networks are the most widely used AI methodology for predicting electricity load time series. Studies using FFNN are published at numbers between ~80 and ~200 per year. Studies using CNNs and RNNs have become increasingly popular since 2018, with RNNs being used even more frequently than FFNNs in 2020. In the field of RNN, LSTM neural networks dominate and are used in the majority of studies. Support vector machines are constantly used in ~40 to ~100 studies per year, making them a popular alternative to neural network approaches. Finally, the number of studies based on decision trees and ensemble methods has recently increased significantly since 2018. In addition to the demand side, there is also uncertainty on the supply side, driven in particular by the expansion of renewable energies. This is the most critical scheduling input, as both situations of oversupply and undersupply are possible. In the context of assessing the security of electricity supply, undersupply is of particular relevance [12] . However, the inclusion of situations of oversupply in the analysis becomes more important the more storage facilities in the system can absorb this energy and make it available again at times of undersupply [158] . Similar to forecasting electricity load profiles, the time horizon of the forecast can be used as an initial distinction of the fields of application: For the field of application portrait in this paper, long-term forecasting is again the relevant time horizon. Renewable feed-in profiles combine the availability of the power plant with a capacity credit due that is based on weather forecasts [158]. We will address the forecasting of nonavailabilities in section 5.3 in more detail. Forecasts of renewable feed-in profiles can now focus on the solely weather-dependent aspects or a combination of the two. Forecasting data: Renewable feed-in profiles are given as time series with varying temporal resolutions. Similar to electricity load profiles, they are given as an aggregated information providing, e.g., minute or hourly averages [161] . In addition to average values and in contrast to electricity load profiles, it is not maximum but minimum values that are of interest for assessing the security of electricity supply in order to perform a robust analysis that can also map extreme events. Renewable feed-in profiles can be subject to different aggregation levels:  Technological aggregation: E.g. aggregating or disaggregating rooftop PV and groundmounted PV  Regional aggregation: No regional differentiation, including regional identifiers, or per unit  Temporal aggregation: Time series data in sub-hourly, hourly, daily, or weekly resolution Table 7 shows a selection of studies on wind power feed-in forecasting and the features used in the forecasting models. Air temperature X X X X Air density X X Air pressure X X X Relative humidity X X X Prediction models: Figure 20 shows the results of the systematic literature review on renewable feed-in forecasting. 8 Similar to studies on predicting electricity load profiles, FFNNs are again the most widely used AI methodology for predicting renewable feed-in time series. CNNs and RNNs have drastically increased in popularity since 2018. While support vector machines have a constant number of publications of ~20, other methods such as decision trees, Bayesian models, or ensemble methods are not yet used as frequently for predicting renewable feed-in time series. A major uncertainty in assessing security of electricity supply is the availability of generation capacities and network components. Components in the energy system can be non-available due to planned and unplanned outages [12] . Planned non-availabilities are known in advance and are usually due to scheduled (e.g. annual) maintenance. Unplanned non-availabilities are not known in advance, are subject to a much more random distribution than planned nonavailabilities, and can be due to malfunctioning or uncontrollable external factors such as extreme weather conditions. Both types of events have a major impact on supply security [169] . Due to the differences in their distribution and influencing factors, they are usually modeled and predicted separately [170]. The application field of the availability of power plants for machine learning-based methods is therefore twofold. For planned unavailability, scheduled maintenance cycles need to be predicted [12] . Insights from predictive maintenance can be transferred to improve predictions on maintenance schedules [171] . In contrast, in the case of unplanned unavailability, factors such as complex thermodynamic and the security of supply of fuels need to be modeled [172] . This imposes fundamentally different requirements on predictive models. Another challenge that stems from the systemic perspective in the assessment of security of electricity supply is, that no further operational data from specific sites is available. This is due to the fact that the entirety of power plants must be depicted, with various operators and at various locations. Forecasting models, therefore, have to rely solely on external factors such as weather conditions that can be monitored and predicted. Finally, a distinction needs to be made between modeling non-availability of components independently or (1) considering common-mode situations and (2) considering timedependence [173-175]. Forecasting data: Component unavailability can be available as binary information, multiple discrete states, or continuous availability levels [170, 173] . Binary data indicates whether the component is available or not while discrete and continuous data additionally show the share of non-available capacity. Data on non-availability are subject to different aggregation levels:  Technological aggregation: Total capacity, capacity per generation technology, or capacity per generation unit  Regional aggregation: No regional differentiation or including regional identifiers  Temporal aggregation: Annual data (availability factor), data for certain points in time (e.g. during annual peak load), or time series data (e.g. hourly resolved) Feature selection: We assume the main features influencing the availability of generation capacities to be calendrical information, technology-specific data, weather and further environmental data, price data, and load data. Table 8 shows a selection of studies on (non-)availability forecasting for thermal power plants and the features used in the forecasting models. The studies listed do not exclusively apply machine learning-based methods as the number of such studies found in the literature research is too low at the time the search was conducted. We assume that independent of the methodological approach, the listed features can serve as a good starting point for constructing machine learning-based models. Interestingly, price data (forward or spot prices) are not found as explanatory variables in literature. Prediction models: Figure 21 shows the results of the systematic literature review on the nonavailability of generation capacities. 9 The field of prediction models for the (non-)availability of power plants using machine learning is not yet established on a large scale. Neural networks are the most relevant method applied and numbers are rising over the last years. Support vector machines and Bayesian models are used in few works. Looking into the few available studies, the most cited works rely on sensor data [178] [179] [180] . As sensor data is usually used for forecasting individual plant outages, we conclude that on a system level, the application of machine learning-based methods for forecasting (non-)availability of power plants is a major research gap. . To overcome this problem, model coupling approaches can be applied. Submodels that take the perspective of the storage operators can then represent the storage operation, which is, for example, iteratively fed into a larger system model. Such submodels could again apply cost-minimization methods or depict storage operation by applying machine learning-based methods that learn such behavior from historic data. Forecasting data: Storage operation will ultimately be needed in the same temporal resolution as other temporal input data that is used for the assessment of the security of electricity supply. Data on storage operation is subject to different aggregation levels:  Technological aggregation: Total storage dispatch and state of charge (SoC), per storage technology, or per storage unit  Storage-specific information: Storage dispatch and/or SoC  Regional aggregation: No regional differentiation or including regional identifiers  Temporal aggregation: Time series data in sub-hourly, hourly, daily, or weekly resolution Information on storage operation can be based on historic data or simulations. Scapino et al. [187] use a physics-based model to simulate a storage system and generate the forecasting data for the prediction model (this approach belongs to the field of metamodeling, see sections 4 and 6). The main features influencing storage operation are besides technical characteristics price and other market information, weather data, and load and generation data. The studies listed do not exclusively apply machine learning-based methods as the number of such studies found in the literature research is too low at the time the search was conducted. We assume that independent of the methodological approach, the listed features can serve as a good starting point for constructing machine learning-based models. Prediction models: Figure 22 shows the results of the systematic literature review on predicting storage operation. 10 The field of prediction models for storage operation using machine learning is not yet established on a large scale though has made a significant jump in 2020. Neural networks are the most frequently applied method but also support vector machines, Gaussian process regression, and decision trees are used. In the field of neural networks, a trend towards RNNs is emerging, even if this is not yet clearly visible in the figures of the publications. Wang et al. [192] developed a prediction model for distributed electric heating storage systems. They find that their correlation-based LSTM model outperforms support vector machines and regular RNN models. Xiao et al. [193] come to a similar conclusion when comparing multiple methods for behavior learning in microgrids. They find LSTM models most suitable for microgrids that include storage systems. The basic experimental design method has been successfully used in combination with metamodeling in many publications. In the following, a possible approach is presented, as well as the benefits of metamodeling based on exemplary publications. Further, modeling recommendations for the combination of metamodeling and DOE are provided. Unlike in real-world experiments, the input variables in computer-based experiments can be varied continuously with less effort. This allows the design space to be sampled at a higher resolution. However, in many use cases, simulations have a rather complex nature with nonlinear behavior. Metamodeling using a full factorial experimental design (full-FD) and linear or polynomial regression is not possible in these cases. In these cases, more complex experimental designs (see section 4.2) and more complex methods for metamodeling, e.g. from the field of machine learning (s. section 3.1), must be used. In the past ten years, especially models based on FFNN and GPR have been used for metamodeling in publications in energy-related journals. Other methods of deep learning such as CNNs and RNNs have only been used from 2017 onwards. In addition to these methods, the number of publications per year applying SVMs and Bayesian models for metamodeling is relatively constant. A trend can be seen for the application of decision trees and ensemble methods that have become more popular since 2017. Metamodeling and DOE are combined constantly since 2010 with numbers between ~35 and ~60 publications per year. AI-based methods only make up for a small share of these publications and the numbers are only slowly increasing. In 2020, ~20% of all publications we found on metamodeling and DOE in energy-related journals included the application of AIbased methods. Storti et al. [194] apply a metamodeling approach for the optimization of the shape of ector plates for a drag-driven vertical axis Savonius wind turbine. Here, a two-dimensional computational fluid dynamics (CFD) model is metamodeled using an ANN and an LHD. To reduce the reverse moment of the turbine, the size and shape of the deflector plates are optimized. A genetic optimizer was used, which requires many results from a wide variety of scenarios. This can only be implemented with metamodeling, otherwise, the results have to be generated with many time-consuming simulations. Using an ANN-based approach, a regression coefficient R 2 >0.97 on the training data and an R 2 >0.95 on the validation data were obtained. Nolting et al. [11] show that linear regression and full factorial experimental designs should not be excluded in advance from computer simulations. In this comparative study, the prediction performance of linear regression (with full-FD) and artificial neural networks (with LHD) for the approximation of a probabilistic simulation model for assessing the security of electricity supply in Germany were investigated. The investigation showed that linear regression has better prediction quality for the present use case with fewer trial points and thus reduced associated simulation time. The results also show that, analogous to the selection of a suitable approximation method, a suitable method must be selected from the design of experiments on an application-specific basis. A comprehensive comparison of metamodeling methods is presented by Østergård et al. [195] . In this study, the six most commonly used methods for metamodeling are applied to 13 application cases with different dimensionality and complexity. The authors conclude that the choice of method depends largely on the problem and that literature cannot provide a general preference for the method. From their results, the authors draw the following general conclusions, among others:  Standard settings generally provide poor or mediocre accuracies, so optimization of the hyperparameters is necessary,  hyperparameters must be adapted to the respective problem,  in general, the best results were achieved with GPR followed by ANNs, and multivariate adaptive regression splines (MARS),  linear regression models achieved the worst accuracy due to the nonlinearity of the problems considered,  for large datasets, ANNs performed most effectively, while GPR was slow and less robust,  dimensionality has only a small influence on accuracy. The authors impressively demonstrate the possible prediction quality of metamodeling methods. A coefficient of determination of up to R 2 >0.99 was obtained for the eight mathematical benchmarks. For the building performance simulation problems, an R 2 >0.90 for CO 2 -emissions and R 2 >0.99 for the remaining output parameters could be achieved. Having demonstrated a broad variety of methods from the field of artificial intelligence that can be applied to energy system modeling in general and assessing security of electricity supply in particular, one core finding can be highlighted: As the necessity for and complexity of assessments of resource adequacy increase, combining AI-based methods with an adequate design of experiments offers the possibility for efficient metamodeling of complex energy system models. Hence, a broad variety of scenarios can be investigated and prevailing limits regarding runtime and hardware requirements can be efficiently circumvented while maintaining high degrees of accuracy. In addition to that, we identified several potential fields of applications for the introduced AIbased methods within different steps in the model toolchain: -Preprocessing of input data and data consolidation -Forecasting of relevant input data such as electricity loads, feed-in from renewable energy sources, electricity prices, availabilities of power plants, and storage operations While some of these fields have already received attention from the scientific community and there are many relevant publications, others have merely been investigated. In particular, forecasts regarding storage operation and the (non-)availability of individual power plants are rare, while there is a comprehensive body of literature on electricity load forecasts and the feed-in of renewables (see Figure 25 ). Overall, we conclude that there is the necessity for future research regarding (1) the efficient metamodeling of complex models to assess security of electricity supply using AI-based methods and (2) applications of AI-based methods for forecasts of storage dispatch and (non-)availabilities as these are promising fields of application that have not sufficiently been covered, yet. Our review contributes by providing a quite comprehensive overview of candidate methods and potential fields of applications. Regarding prevailing requirements for assessments of security of electricity supply [3], we find that the approach of AI-based metamodeling described in this review would be a beneficial supplement as it can help do depict the influences of prevailing uncertainties regarding the future development of necessary input data while allowing for a high level of detail of the model. Energy and complexity: New ways forward Formalizing best practice for energy system optimization modelling Methodology for the European resource adequacy assessment: in accordance with Article 23 of Regulation (EU) 2019/943 of the European Parliament Data science for building energy management: A review A review of energy models Applications of artificial neural-networks for energy systems How to model European electricity load profiles using artificial neural networks A review of deep learning for renewable energy forecasting Are complex energy system models more accurate? An intra-model comparison of power system optimization models Can energy system modeling benefit from artificial neural networks? Application of two-stage metamodels to reduce computation of security of supply assessments Can we phase-out all of them? Probabilistic assessments of security of electricity supply for the German case Scriptable bibliometrics using a Python interface to Scopus Conceptualizing energy security 16] frontier economics, FORMAET Services GmbH. Strommarkt in Deutschland -Gewährleistet das derzeitige Marktdesign Versorgungssicherheit? (Electricity market in Germany -Does the current market design guarantee security of supply?): Bericht für das Bundesministerium für Wirtschaft und Energie (BMWi) Scenario Outlook and Adequacy Forecast Bedarf nach einer Kapazitätsreserve aus Kohlekraft im Demand for a reserve capacity from coal-fired power plants in the German market until 2023) Versorgungssicherheit in Deutschland und seinen Nachbarländern: länderübergreifendes Monitoring und Bewertung (Security of supply in Germany and its neighbouring countries: international monitoring and evaluation) Security of supply: a pan-European approach Szenarien der Versorgungssicherheit in Deutschland und Süddeutschland (cenarios of supply security in Germany and Southern Germany) Mid-Tern Adequacy Forecast Coal phase-out, electricity imports/exports and security of supply) Final report -system analysis Versorgungssicherheit in Süddeutschland bis 2025 -Sichere Nachfragedeckung auch in Extremsituationen? (Security of supply in Southern Germany until 2025 -Secure demand coverage even in extreme situations?) BUND shutdown plan for nuclear power plants and coal-fired power plants): Für einen schnelleren Automausstieg und die umgehende Stilllegung der klimaschädlichsten Kohlekraftwerke (For a faster phase-out and the immediate decommissioning of the most climate-damaging coal-fired power plants) Generation Adequacy of the European Electricity System Bericht der deutschen Übertragungsnetzbetreiber zur Leistungsbilanz Illustrated computational intelligence: Examples and applications An Introduction to Data: Everything You Need to Know About AI, Big Data and Data Science Recent Advances in Ensembles for Feature Selection Introduction to statistical machine learning An introduction to machine learning Deep Learning Efficient Processing of Deep Neural Networks Deep learning Competition and cooperation in neural nets: Proceedings of the U.S.-Japan joint seminar held at Kyoto A novel deep learning method for the classification of power quality disturbances using deep convolutional neural network Convolutional Neural Network Based Fault Location Detector for Power Grids Deterministic and probabilistic forecasting of photovoltaic power based on deep convolutional neural network Deep solar radiation forecasting with convolutional neural network and long short-term memory network algorithms Electricity Load Forecasting --An Evaluation of Simple 1D-CNN Network Structures Meteorological Data Forecast using RNN Accurate photovoltaic power forecasting models using deep LSTM-RNN Deep Learning for Household Load Forecasting-A Novel Pooling Deep RNN Multi-Sequence LSTM-RNN Deep Learning and Metaheuristics for Electric Load Forecasting A Deep Neural Network Model for Short-Term Load Forecast Based on Long Short-Term Memory Network and Convolutional Neural Network Learning representations by back-propagating errors Long short-term memory Support-vector networks An Introduction to Statistical Learning: With Applications in R Support vector regression Application of support vector machine models for forecasting solar and wind energy resources: A review Prediction of building energy consumption using an improved real coded genetic algorithm based least squares support vector machine approach Forecast of Power Grid Investment Scale Based on Support Vector Machine. E3S Web Conf Gaussian processes for machine learning Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm Predictive Control of District Heating System Using Multi-Stage Nonlinear Approximation with Selective Memory When Gaussian Process Meets Big Data: A Review of Scalable GPs Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting Classification and Regression Trees The elements of statistical learning: Data mining, inference, and prediction A review of data-driven approaches for prediction and classification of building energy consumption No Free Lunch Theorems for Search Dimensionality reduction with unsupervised nearest neighbors Machine learning: A probabilistic perspective An introduction to the bootstrap Heuristics of instability and stabilization in model selection Pattern classification using ensemble methods Exact Bagging with k-Nearest Neighbour Classifiers Random Forests The strength of weak learnability Boosting a Weak Learning Algorithm by Majority A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting Experiments with a New Boosting Algorithm Greedy function approximation: A gradient boosting machine Stochastic gradient boosting Boosting Decision Trees Advances in Neural Information Processing Systems Training methods for adaptive boosting of neural networks Convex Neural Networks: Advances in Neural Information Processing Systems 18 An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants Principles and Practice Short-term wind speed forecasting using empirical mode decomposition and feature selection A new wind power prediction method based on ridgelet transforms, hybrid feature selection and closed-loop forecasting Feature Selection for Predicting Building Energy Consumption Based on Statistical Learning Method Feature selection in machine learning prediction systems for renewable energy applications Data clustering: Theory, algorithms, and applications Cluster analysis Clustering methods to find representative periods for the optimization of energy systems: An initial framework and comparison Impact of different time series aggregation methods on optimal energy system design Clustering methods of wind turbines and its application in short-term wind power forecasts Using Smart Meter Data to Improve the Accuracy of Intraday Load Forecasting Considering Customer Behavior Similarities Eine anwendungsorientierte Einführung Analysis of K-Means and K-Medoids Algorithm For Big Data Hierarchical Clustering to Find Representative Operating Periods for Capacity-Expansion Modeling The Curse of Dimensionality in Data Mining and Time Series Prediction Mathematics for machine learning Enabling immersive engagement in energy system models with deep learning Clustering high dimensional data Water quality assessment using SVD-based principal component analysis of hydrological data Adaptive dimension reduction using discriminant analysis andK-means clustering Short-term wind power forecasting approach based on Seq2Seq model using NWP data Auto-association by multilayer perceptrons and singular value decomposition Autoencoders, Minimum Description Length and Helmholtz Free Energy Reducing the dimensionality of data with neural networks Deep learning in neural networks: an overview Modèles connexionnistes de l'apprentissage Choosing the right estimator -scikit-learn 0.24.2 documentation Considerations for artificial intelligence and machine learning: Approaches and use cases Encyclopedia of bioinformatics and computational biology Typical periods or typical time steps? A multi-model analysis to determine the optimal temporal aggregation for energy system models Evaluation Metrics for Unsupervised Learning Algorithms Experimental evaluation of cluster quality measures Hyperparameter Search in Machine Learning Hyper-Parameter Tuning of a Decision Tree Induction Algorithm The Sources and Uses of Sensitivity Information Response to Michel, Kleijnen and Permut The construction and implementation of metamodels A Comment on Blanning's "Metamodel for Sensitivity Analysis: The Regression Metamodel in Simulation Winter Simulation Conference (WSC) The Design of experiments Design and analysis of experiments Design and analysis of experiments The design and analysis of factorial experiments A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code Small sample sensitivity analysis techniques for computer models.with an application to risk assessment Orthogonal Column Latin Hypercubes and Their Application in Computer Experiments Algorithms for Generating Maximin Latin Hypercube and Orthogonal Designs On the construction of nested orthogonal Latin hypercube designs Space-filling designs Active Learning Literature Survey Active learning Training connectionist networks with queries and selective sampling Improving Generalization with Active Learning A Sequential Algorithm for Training Text Classifiers Query by committee Toward Optimal Active Learning through Sampling Estimation of Error Reduction A Survey on Active Deep Learning: From Model-driven to Data-driven Comparison of different Methods for approximating models of energy supply systems and polyoptimising the systemsstructure and componentsdimension Long Term Electricity Forecast: A Systematic Review Electrical load forecasting models: A critical systematic review Long-term Sector-wise Electrical Energy Forecasting Using Artificial Neural Network and Biogeography-based Optimization. Electric Power Components and Systems A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models Long term load forecasting for Nigeria's electric power grid using ann and fuzzy logic models Long term forecasting using machine learning methods Long Term Load Forecasting using Grey Wolf Optimizer -Artificial Neural Network Forecasting Daily Electric Load by Applying Artificial Neural Network with Fourier Transformation and Principal Component Analysis Technique A hybrid load forecasting model based on support vector machine with intelligent methods for feature selection and parameter optimization The impact of flexible resources in distribution systems on the security of electricity supply: A literature review Innovation landscape brief: Advanced forecasting of variable renewable power generation Review on probabilistic forecasting of wind power generation A review of wind power forecasting & prediction Wind power forecasting of an offshore wind turbine based on highfrequency SCADA data and deep learning neural network SAUPEC/RobMech/PRASA: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) 28-30 Short-term prediction of wind power based on deep Long Short-Term Memory Wind turbine power curve modelling using artificial neural network Wind turbine power output prediction model design based on artificial neural networks and climatic spatiotemporal data A novel wavenets long short term memory paradigm for wind power prediction Wind power forecast using neural networks: Tuning with optimization techniques and error analysis Comprehensive resilience assessment of electricity supply security for 140 countries Future security of power supply in Germany-The role of stochastic power plant outages and intermittent generation Future Trends for Big Data Application in Power Systems. In: Big Data Application in Power Systems Comparative studies among machine learning models for performance estimation and health monitoring of thermal power plants Modeling Dependent Outages of Electric Power Plants A time-dependent model of generator failures and recoveries captures correlated events and quantifies temperature dependence Impact of the Combined Integration of Wind Generation and Small Hydropower Plants on the System Reliability Hydro-climatic conditions and thermoelectric electricity generation -Part I: Development of models Simulation of operational reliability of thermal power plants during a power crisis: Are we underestimating power shortage risk? Gas turbine sensor validation through classification with artificial neural networks Continuous machine learning for abnormality identification to aid condition-based maintenance in nuclear power plant Fault diagnosis for photovoltaic array based on convolutional neural network and electrical time series graph The development of stationary battery storage systems in Germany -A market review A review of energy storage types, applications and recent developments A Price-Maker/Price-Taker Model for the Operation of Battery Storage Systems in Electricity Markets Li-ion batteries for peak shaving, price arbitrage, and photovoltaic self-consumption in commercial buildings: A Monte Carlo Analysis Impact of Storage Dispatch Assumptions on Resource Adequacy Assessment: Preliminary Work Modeling the performance of a sorption thermal energy storage reactor using artificial neural networks Optimal bidding and offering strategies of merchant compressed air energy storage in deregulated electricity market using robust optimization approach Economic Dispatch of Energy Storage System in Micro-grid A Supervised Machine Learning Approach to Control Energy Storage Devices A real-time energy management strategy for pumped hydro storage systems in farmhouses Optimal dispatch based on prediction of distributed electric heating storages in combined electricity and heat networks A Comparative Study of Deep Neural Network and Meta-Model Techniques in Behavior Learning of Microgrids Improving the efficiency of a Savonius wind turbine by designing a set of deflector plates with a metamodel-based optimization approach A comparison of six metamodeling techniques applied to building performance simulations This research was funded by the German Federal Ministry for Economic Affairs and Energy (BMWi) within the project KIVi (grant ID:0 3EI1022A).