key: cord-0644375-ac8va2cj authors: Xie, Le; Zheng, Xiangtian; Sun, Yannan; Huang, Tong; Bruton, Tony title: Massively Digitized Power Grid: Opportunities and Challenges of Use-inspired AI date: 2022-05-10 journal: nan DOI: nan sha: 798dabdbd38a66dc609255d615a147c108de812d doc_id: 644375 cord_uid: ac8va2cj This article presents a use-inspired perspective of the opportunities and challenges in a massively digitized power grid. It argues that the intricate interplay of data availability, computing capability, and artificial intelligence (AI) algorithm development are the three key factors driving the adoption of digitized solutions in the power grid. The impact of these three factors on critical functions of power system operation and planning practices are reviewed and illustrated with industrial practice case studies. Open challenges and research opportunities for data, computing, and AI algorithms are articulated within the context of the power industry's tremendous decarbonization efforts. Digitization of the electric power grid, which broadly refers to the deployment of sensing, communication, and computational capabilities, has been an integral part of the electrification process over the past century and is a key enabling factor that drives power grid transformation by spreading its outreach vertically over plants, transmission grids, distribution grids, and end-use customers. As data availability and computing capacity continue to grow, large-scale power grids are built and operated with very high levels of reliability and efficiency, providing electricity services to billions of customers. The state of today's power grids in the United States (U.S.) can be summarized in three aspects: (i) for system reliability, the average duration of annual electric power interruptions in the U.S. varied from 3 to 8 hours in the period between 2013 and 2020 [1] ; (ii) for cost of electricity, the average wholesale electricity price across the U.S. varied from $30 to $60 per MWh in the period between 2016 and 2021 [2] ; and (iii) for carbon footprint, electricity generation in the U.S. produced an average of about 0.4 kilograms of carbon dioxide emissions per kWh in 2020 [3] . In response to climate change, which has emerged as a global concern, rapid decarbonization is imperative to reduce carbon emission, a quarter of which are contributed by the electricity sector. It is foreseeable that numerous decarbonization measures will cause profound changes in the electricity sector in the next few decades [4] . Such changes have two major drivers: (i) the energy portfolio transition from high-carbon to low/zero-carbon generation sources, such as hydrogen, nuclear, wind and solar-based commercial generation units and distributed energy resources (DERs), and (ii) electrification in other sectors, including construction, transportation and other infrastructure systems. Deepening penetration of intermittent resources, such as wind farms and solar photovoltaic (PV), is introducing more variability and uncertainty. The proliferation of power electronics-based inverters is changing system dynamic characteristics. Increasing numbers of DERs at grid edge are strengthening the interaction between transmission and distribution systems. Rapid expansion of electric vehicles (EVs) will lead to substantial changes in electricity demand patterns. Therefore, it is imperative for the grid operators to adopt a more flexible and risk-aware approach. Given the massive data availability and computing capacity provided by digitized power grids, data-driven artificial intelligence (AI) methods are feasible solutions for complementing traditional model-based approaches to address these complex emerging challenges. From a broader economic perspective, AI has transformed a variety of domains over the past decade [5] , including language processing [6] , speech recognition [7] , facial recognition [8] , real-time object detection [9] , multiplayer game [10] [11] [12] , recommendation system [13] , intelligent robotics [14]- [16] , driving assistant system [17] , disease diagnosis [18] , drug discovery [19] , finance [20] , and others. We attribute such unprecedented success of AI as an intricate interplay between three factors, namely, massive data acquisition, high computing performance, and advanced AI algorithms [21] [22] [23] . The availability of data from heterogeneous resources has been increasing at an unprecedented rate [24] [25] [26] and provides fuel for developing AI-based, data-driven applications for valuable knowledge extraction in wide-range domains. In addition, remarkable improvements in computing performance have enabled a variety of practical large-scale AI models, credited to the collective advances in hardware, software, and computing architecture [27] . Alongside rapidly-growing AI infrastructure that provides massive data and computing capacity, numerous advanced AI algorithms have been developed in the past decade. State-of-the-art performance on benchmark datasets for tasks in multiple research fields has been improved by pre-trained models [28] [29] [30] [31] and novel AI model architectures [32] [33] [34] [35] . Given the widespread success of AI applications, the development and deployment of interpretable, robust, and scalable AI may help to accommodate the emerging changes brought by decarbonization, aiming to reduce carbon emission and meanwhile "keep the lights on" in a reliable and economic way (Fig. 1) . However, to facilitate the process towards decarbonization, many open questions persist in implementing practical AI approaches in digitized power grids, including domain-agnostic computing and AI advances, use-inspired AI algorithm development, and cyber-physical security and privacy in a massively digitized power grid. To this end, this paper aims to provide a comprehensive review of the stateof-the-art practice of power grid digitization transformation, which focuses on three backbone factors: data, computing, and algorithms. Specifically, this paper provides a review of the recent progress in data acquisition, computing capability, and AI algorithms that are applicable to power systems. Successful industry use cases are introduced to illustrate applications of AI algorithms on large real-world data sets. The rest of the paper is organized as follows. Section II provides an overview of power grid operation and planning practices, as well as the challenges posed by decarbonization. Sections III, IV and V provide a comprehensive review of data, computing, and algorithmic advances in power systems. Section VI provides an industry perspective on AI adoption. Finally, Section VII concludes the paper with remarks on future directions for power grid modernization. Power grid decarbonization Fig. 1 . Tri-factors of digitization are enabling technologies that facilitate the process towards power grid decarbonization while simultaneously meeting requirements in the aspects of reliability, cost of electricity, and carbon emission, while power grid decarbonization steers use-inspired development of power grid digitization. Modern power grids are being driven by strong momentum of decarbonization [36] with decentralization and transportation electrification. Fig. 2 shows the brief conceptual diagram of a modern power grid, which can be separated into transmission and distribution systems. Transmission systems refer to bulk systems that have voltages higher than 66 kV and consist of generation, substation and transmission lines, which are usually operated by state-wide or cross-state system operators. Distribution systems refer to close-to-users systems that have voltages lower than 33 kV and connect to residential, commercial and industrial load, which are usually operated by local utility companies. Power grid decarbonization is changing the energy portfolio in terms of generation resources, such as increasing commercial-size solar PV and wind farms in transmission systems, and DERs such as rooftop solar PV in distribution systems. Power electronics-based inverters are thus being deployed to convert electricity by renewables from Electrical measurement data Step-up substation Transformer Inverter Step-down substation direct current (DC) to alternating current (AC). Transportation electrification introduces a rapidly expanding number of electric vehicles into distribution systems. The modern power system operations in high-voltage transmission systems can be broken down into two categories [37] . The first category is physical operations, which are responsible for the grid's physical security 1 and resource adequacy; 2 the second concerns market operation. Both physical and market operations are summarized in Fig. 3 . Power system operation and planning fulfills the reliability of power systems via multiple functions including real-time monitoring, control, protection, and system reliability analysis. A system-wide monitoring system collects and processes measurements, and presents intuitive information to system operators via visualization and alarming. A control system performs control actions either manually or by automated procedures. A protection system executes prescribed corrective measures upon detection of anomalies within targeted system components, which is achieved mainly by local sensors and actuators. Reliability analysis provides instructions on decision making of multiple time horizons to guarantee the system within adequacy and security criteria. Load and renewable forecasting provides input for both system and market operation, by estimating uncertain net load and renewable generation of various projection horizons. Load forecasting covers various prediction horizons spanning hours, days, weeks, months, and years ahead, whereas renewable forecasting provides only hours and days-ahead predictions. In real-world power grids, short-term load forecasting typically has high accuracy and renewable forecasting also has acceptable errors that can be mitigated by real-time operation of dispatchable resources. Real-time monitoring and control are implemented mostly by energy management systems (EMS) in the control center, the primary functional modules of which mainly include supervisory control and data acquisition (SCADA), state estimation (SE), and automatic generation control (AGC). The SCADA system fulfills measurement acquisition and control telemetry through communication channels between the control center and remote terminal units (RTUs), at the respective electrical station or device. Typically, the data acquisition function collects measurements every 2 to 10 seconds, of which the data stream is a key enabling factor for realizing other functionalities such as state estimation, real-time control, unit commitment, and economic dispatch. For accurate situational awareness of the system's current operation, function SE provides the steady-state estimation of system variables that are not directly observed in streaming SCADA data. As one of the major real-time control, primary and secondary generation control are implemented to (i) regulate load frequency, and (ii) balance power generation, load demand and cross-area interchange in real time. Droop-based generator governors that are responsible for primary control perform instantaneous power quality corrections before triggering protection relays. AGC, considered as secondary control, mitigates unavoidable errors of primary control by sending commands from the control center to participating generation units every 2 to 4 seconds [38] . Real-time protection is mainly implemented by protective relays that are equipped to critical assets, such as generation units and substations. In high-voltage transmission systems, protective relays should clear faults within several cycles 3 to avoid further system deterioration. Similarly, a distribution management system (DMS) enables real-time monitoring in the distribution system, with a few similar functions to EMS, such as SCADA and event analysis [39] . It is worth noting that most field devices in the distribution systems are manually operated rather than remotely controlled, indicating a lower level of automation compared to the transmission system. System reliability analysis entails adequacy, static security and dynamic security analysis [23] , [40] , [41] . Security analysis focuses on the process of system state transitions initiated by reasonable disturbances such as short circuits and loss of system components. Static security analysis (SSA) evaluates the viability of post-event equilibrium by calculating power flow or optimal power flow to check whether a power or voltage violation happens after an N − 1 contingency. 4 Dynamic security analysis (DSA) evaluates the ability of the system to transition from one equilibrium to another postevent equilibrium within security criteria [23] by simulating on system dynamic models. Adequacy analysis quantifies the system's capacity for sustainable supply that accommodates load variation, renewable uncertainty and system component outages by several manually defined indices. A typical method for adequacy and security analysis is numerical simulation. Due to time intensity, these reliability analysis methods tend to be impractical for real-time security control during contingencies. SSA and DSA are used in short-term scheduling, such as generation scheduling, which is performed daily or every few hours. Adequacy analysis and SSA are typically used for midterm planning, such as facility maintenance, that is performed every several months to one year. Also, both adequacy and security analysis are used for long-term planning, which occurs annually or every few years. Market operation in wholesale electricity markets aims to maximize social welfare while obeying physical constraints. Wholesale markets comporise day-ahead and real-time energy markets, capacity markets, financial transmission right (FTR) markets and ancillary service markets. Both day-ahead and real-time energy markets determine clearing prices based on bids from market participants, incorporating physical constraints and potential restrictions. Capacity markets ensure long-term system reliability. FTR markets entitle market participants to offset potential losses (hedge) related to the price risk of delivering energy to the grid. Ancillary service markets provide regulation and reserve. Unit commitment (UC) and economic dispatch (ED) are two major security-constrained, bid-based mechanisms to handle the scheduling of generation and the management of system congestion. Both UC and ED are typically formulated as large-scale nonlinear/linear programming problems, known as optimal power flow (OPF). Providing forecasted load and renewable as input, the UC function determines when and which generation units start up and shut down in day-ahead markets. The ED function calculates the power output of each committed generation unit and associated local marginal prices (LMPs). ED is performed to meet the day-ahead hourly forecasted load in day-ahead energy marketsas well as to meet the minute-ahead forecasted load every 5 to 10 minutes in real-time energy markets [42] . In today's distribution grids, the retail merket contains few centralized operation or scheduling functions, such as UC and ED, in the retail market. Given the proliferation of DERs in distribution grids such as distributed generation, interruptible load, and electricity storage, the retail market will involve system upgrades and reforms in the future to accommodate DER market participation, and to establish an appropriate mechanism of scheduling and compensation [43] . Renewable integration and transportation electrification at scale impose challenges on the paradigm of protection and control. The emergence of massive grid-following and gridforming inverted-based resources (IBRs) may challenge the effectiveness and efficiency of the current central control frame due to the unknown impacts of electromagnetic dynamics and low inertia. DERs at the grid edge may create bi-directional power flows that potentially incur malfunctions of the protective relays in distribution grids. Besides, typical methods for adequacy and security analysis are numerical simulations that highly rely on grid models of multiple time scales, including electromagnetic dynamic (very fast), electromechanical dynamic (fast), and steady state (slow). However, system characteristics are being changed due to the proliferation of inverter-interfaced renewable resources and EVs in modern power grids, such as low inertia and deeper integration of transmission and distribution systems. These emerging system characteristics create a need for new requirements on the existing models to determine whether the system is within critical security criteria. For example, there is an urgent need for the study of several topics in order to handle the growing system complexity, including (i) electromagnetic transient models to reveal the fast dynamics by power electronicbased system components, (ii) system-level joint simulation between transmission and distribution models to reveal the increasing cross-system interaction, and (iii) cross-domain electricity-transportation models to incorporate the impacts of transportation networks on EVs. Market operation also faces the challenge of managing potential market risks resulting from the variability and stochasticity of renewable generation [44] . Strong uncertainty is a key obstacle to economic dispatch to (i) maintain system stability as tertiary frequency control and (ii) avoid unexpected renewable curtailment to the greatest possible extent to achieve decarbonization. Current wholesale markets may not be sufficiently prepared to accommodate increasingly frequent extreme weather events such as the 2021 Texas power outage event [45] to prevent spiking price and mitigate energy scarcity. Specifically, strong uncertainty regarding system net load and intermittent renewables generation in future grids will raise severe challenges for the accuracy and robustness of short-term load and renewable prediction. Deepening transportation electrification may also undermine the existing enduse and econometric models for medium and long-term load forecasting [46] . The distribution system also faces a growing number of facility challenges. Aging power lines may limit maximum use of renewable energy sources, such as wind farms and utility scale solar, especially in less populated areas where large renewable energy installations are located. The utilization and availability of DERs installed in densely populated areas can be affected by frequent localized outages intermittently that may be recognized by the control center. Given stronger integration and correlation between transmission and distribution grids, facility outages, such as transformer failures, may cause wider impacts. Furthermore, in aiming to establish a competitive retail market in the distribution system, there multiple critical problems remain unsolved, such as LMP calculation and demand response modelling; however, these are beyond the scope of the paper. Overall, the profound changes by decarbonization are posing and will continue to pose numerous challenges to all aspects of physical reliability and economics. Given massive data acquisition as the "fuel" and high computing power as the "engine," applying advanced data-driven AI-based approaches as an "autopilot" have the potential to steer the vehicle forward in a flexible and risk-aware manner. In broad industry sectors, large-volume and heterogeneously structured data have been generated at an unprecedented rate by diverse resources since 2010 [24] [25] [26] , such as Internet of Things (IoT) records, social media, smart devices, and healthcare systems. The availability of such tremendous volumes of data has facilitated numerous applications of valuable knowledge extraction in sectors [47] such as spanning manufacturing [48] , healthcare [49] , government [50] , retail [51] , infrastructure [52] [53] [54] . In particular, numerous high-quality open-source training datasets [55] have been created to boost AI research in the aspects of model training, testing, calibration, and benchmarking. Moving with the tide of digitizing power systems, the explosive growth of data resources has also created massive volumes of data in heterogeneous formats, including electrical measurements that span across grids vertically, such as sensors installed on grid-level components, smart meters and smart appliances as well as non-electrical measurements, such as weather, social media, traffic and geographic information [56] . These data have proven very valuable in many use cases such as asset assessment, operation planning, real-time monitoring, and protection [57] . It is worth noting that these basic functionalities have distinct requirements for data quality in perspectives of data accuracy, latency, and sampling rate [58] . This section will review data acquisition approaches of electrical measurements in the power grids. A. Real-world Measurements in Power Systems 1) Sensors in Transmission Systems: SCADA systems, which have played an important role in transmission system operation, are capable of collecting facility information and sending control signals, which are implemented by the critical component(i.e. RTUs). SCADA systems collect asynchronous data on bus voltage magnitude as well as active and reactive power flows; the typical reporting rate is merely 1 sample per 2 to 6 seconds. The wide-range acquisition of SCADA data has facilitated remote monitoring and system operation automation. For example, the EMS at the control center is capable of estimating physical state variables that are not directly observable based on SCADA data alone. However, due to increasing system complexity and uncertainty, even this successful SCADA-based application is becoming inadequate. Phasor measurement units (PMUs) have been deployed in the bulk transmission grid at an accelerated rate after the 2003 U.S. blackout [59] . PMUs are able to measure the voltage phasors 5 at the installed bus (typically substations) and current phasors of the lines connected, along with synchronized time stamps, for which the typical reporting rate is 30 or 60 samples per second. Compared to SCADA, PMUs' high accuracy of time stamps and sensing, low latency, and high sampling rate of PMU benefit basic functionalities to different degrees [60] : (i) more real-time control and protection applications become potentially implementable due to all of these advantages, such as remedial action schemes including grid islanding and shortterm stability control; (ii) online system security analysis, such as disturbance detection and situational awareness, can be significantly improved due to low latency; (iii) system adequacy analysis for long-term planning, such as model calibration, can be improved due to high accuracy. However, it is worth noting that, due to several factors such as high costs and time consumption of installation, only around 2,500 production-grade PMUs have been installed across the North America transmission power grid [61] , [62] . Digital fault recorders (DFRs) capture and store transient data and sequence of events (SOE) data that can be used for various purposes such as protection scheme monitoring and fault diagnosis, which tend to be implemented offline. DFRs have three typical recording mechanisms: steady-state, lowspeed and high-speed disturbance recording modes. The disturbance recording modes are usually triggered by signals from 5 Phasors contain magnitude A and phase angle φ of sinusoidal waveforms that can be expressed as Asin(ωt + φ), where ω is 2π × 60 rad/s in a 60-Hz system. protection relays. The steady-state recording mode captures the min, max and mean values of phasors at a low sampling rate of 1 sample per 10 seconds to 1 hour. The low-speed disturbance mode aims to provide phasor-domain information of long-term and short-term disturbances at a sampling rate of 1 sample per 1 to 10 cycles. The high-speed disturbance mode aims to record instantaneous time-domain voltage and current measurements of transient faults at a sampling rate of hundreds of samples per cycle. 2) Sensors in Distribution Systems: The rapid expansion of advanced metering infrastructure (AMI) meters at grid edge has created massive amounts of residential electricity consumption data, typically at a rate of 1 sample every 1 or 5 minutes. For example, the Pacific Gas and Electric Company collects more than 3 terabytes of power data from 9 million smart meters across the grid in the territory, and the State Grid Corporation of China collects 200 terabytes of data per year [63] . SCADA in distribution systems has facilitated remote monitoring and automated operation in multiple aspects, such as substation, feeder, and end-user load control. In substation systems, SCADA gathers data including voltage magnitude, current magnitude and binary status of facilities such as switches, breakers, and transformers. In typical feeder systems, SCADA facilitates the collection of historical data from feeder status of devices such as controlled load break switch and reclosers. In end-user load, SCADA collects all meter data from the end users. The frequency disturbance recorder (FDR), one of representative PMU applications in distribution systems, is a GPS-synchronized single-phase PMU at ordinary 120-volt wall outlets. FDRs have the advantages of low cost and high deployability; they can be deployed even at residential households and campuses [64] . Using hundreds of FDRs that have been strategically placed across the U.S., the frequency monitoring network FNET/GridEye [65] is able to provide visualized nation-wide frequency monitoring. Artificially generated data are commonly used for power system research for two major reasons: (i) most real-world operational data are protected by policies such as Critical Energy/Electric Infrastructure Information (CEII) owing to confidentiality, and (ii) real-world measurement datasets of high-impact events are usually insufficient for data-driven model training, due to the reliability of real-world power grids, which ensures that high-impact events are rare. Alternatively, artificial data generation methods facilitate the gathering of arbitrary numbers of data samples under varying scenarios and conditions, including voltage, current, frequency, and even machine inner state measurements across grid models. 1) Model-based Simulation: Model-based simulation is one of the most common data acquisition approaches for research and education purposes. Simulation models of transmission and distribution systems can be categorized into two major types: (i) small-scale standard systems and (ii) large-scale synthetic systems, which are available at [66] . IEEE standard 10 100 test systems are typically used for investigations such as algorithm assessment and power system analysis. Researchers have recently contributed to the creation of large-scale synthetic grid models [67] that possess realistic system characteristics. These large-scale synthetic grids have been used for analysis such as macro-scope energy portfolio transition [68] , [69] and quantitative assessment of measures against extreme events [70] . For intuitive impression, we show the "popularity" of simulation models in Fig. 4 by counting the number of corresponding IEEE transaction papers, 6 which are used for both machine learning model training and testing. It is clear that the most commonly used models for AI algorithm training, testing and calibration are the IEEE 39-bus and 118-bus systems, whereas the large-scale models are rarely adopted. Please refer to Tables II-V for other simulation models that are not included in Fig. 4 . 2) Hardware Test Bed: The development of hardware-inloop (HIL) simulators has been used to support various types of research, including event detection, situational awareness, wide area monitoring and control, and cyber security [71] . HIL leverages the interface between a real-time software simulator and a hardware system to enable closed loop control [72] . HIL may play an important role in electromagnetic transient simulation of electronics-rich power grids because of its ability to represent realistic very-fast dynamics. This section gives an overview of data acquisition approaches in today's electric power grids. The rapid expansion of advanced sensors across systems and the development of simulation have facilitated massive data acquisition spanning multiple spatial and temporal scales and have further accelerated practical data-driven applications. Efforts to explore datadriven innovation, such as big data hubs [73] , [74] , have also promoted data-intensive research in the power system industry as well as academia and education. Despite these advances, there are several key challenges regarding the data for AI algorithms. First, in contrast to numerous datasets that have benefited broad AI communities, the lack of publicly accessible high-quality power datasets may be impeding the advancement of AI research in power systems. For example, insufficient data representativeness is one of the decisive factors for data-hungry AI methods. Real-world measurements cannot provide a sufficient volume of publicly available data due to confidentiality rules and strong grid reliability. Randomly sampled scenarios in simulation can generate massive amounts of data, but they do not necessarily guarantee representativeness;therefore they likely lead to unexpected training biases, which was demonstrated by the example of ACOPF scenario generation [75] . Second, the feasibility of the proposed AI algorithms may be constrained by the current data acquisition system, as indicated by the data quality requirements of major power system applications [58] . For example, limited and inappropriate placement of high-sampling sensors that determine situational awareness for a specific task may confine advanced analysis and control including but not limited to practical applications of AI methods. Third, although AI methods may offer unique creativity given cross-domain datasets, they require deep interdisciplinary knowledge and collaboration to identify useful combinations of heterogeneous datasets, which has been demonstrated by few AI-based canonical studies, such as automatic classification of distribution grid phases by camera imaging [76] and comprehension COVID impacts on power sectors by mobile phone location data [77] . Given sufficient available data resources, the implementation of data-driven applications in modern power grids faces computational burdens derived from large-volume, heterogeneous data. Such implementation is critical to handle the associated challenges, which include data streaming storage, querying and processing. This section will give an overview of state-of-the-art computing that has facilitated general AI, and will then introduce data streaming management systems and data processing platforms [63] , [78] in power systems. The remarkable improvement of computing performance is the key factor in the proliferation of AI, which is attributable to advances in hardware, software, and generic algorithms [27] . Quantum leaps in computing performance have yielded a variety of practical large-scale AI models, among which the amount of computation for model training has been increasing exponentially with a 3.4-month doubling period [22] , [79] . The rapid progress of hardware computing resources has been the main driver behind the development of AI models. Of particular note, the emergence of general purpose graphics processing units (GPUs) [80] and AI accelerator applicationspecific integrated circuits (ASICs), such as [81] [82] [83] [84] , are capable of dramatically accelerating AI model training. In addition, AI-tailored software has been developed to exploit hardware computing resources [85] . For instance, basic linear algebra subroutine (BLAS) libraries, which were created decades ago [86] [87] [88] , have been used to optimize common linear algebra operations that are recursively executed in deep neural networks [89] [90] [91] [92] . In particular, Nvidia GPUs, which are widely supported by mainstream deep learning framework [93] [94] [95] , have a highly optimized library cuDNN [96] enabling high-performance GPU acceleration. The progress of generic algorithms has also improved computing performance, exhibiting enormous heterogeneity on problems of different types and sizes [97] . It is worth noting that some large-size problems benefit just as much or even more from algorithmic improvement than from Moore's law. For instance, the total speedup of solving mixed integer optimizations (MIO) was 2.2 trillion times during the 25 years between 1991 and 2016 [21] , of which a factor of 1.6 million is due to hardware speedup from 59.7 GFlop/s in 1993 to 93.0 PFlop/s in 2016; another factor of 1.4 million is due to software and algorithmic speedup from CPLEX 1.2 in 1991 to Gurobi 6.5 in 2015. Because power system security highly relies on real-time system operation and control, it is challenging to store and process real-time data streaming effectively and efficiently. Therefore, the building of real-time data streaming systems that mainly influence data latency is critical for the subsequent online data-driven applications including but not limited to AI-based methods. In contrast to traditional database management systems that use statistical data storage, data stream management systems usually store synopsis data (instead of the entire dataset) via processing in order to handle frequent queries and data update. We illustrate several of the most popular data stream management systems summarized in [63] : Aurora [98] has a good balance of accuracy, response time and resource utilization; TelegraphCQ [99] is mainly used for sensor networks, which involves a front end, a sharing storage, and a back end; STREAM [100] has the advantage in situations of limited resources in that it can execute queries with high efficiency. In particular, big data management platforms are being developed to accommodate multi-modal data storage and processing of unstructured heterogeneous data. Hadoop [101] and Spark [102] are two representative open-source designs for distributed data management. Hadoop is able to process massive heterogeneous data efficiently and economically by taking advantages of a programming model [103] , a distributed file system [104] , and a distributed data storage system [105] . Spark, on the other hand, leverages the technology of resilient distributed datasets [106] , which is more suitable for recursive computational operations in machine learning-based applications. In terms of data management platforms that are suitable for power systems, several cases of solutions have been successful in facilitating energy efficiency. For example, CenterPoint Energy has handle streaming messages from intelligent grid devices and smart meters using an IBMdeveloped platform to improve system reliability [107] . For its part, Oncor Energy Delivery has developed AMI databased predictive maintenance to reduce outages and guarantee sustainable supply enabled data platforms [108] . [194] , [195] [196] [197] , [198] Because of power grid digitization, computing tasks in today's power grids have been shifted and evolved to centralized clouds. Advanced computing power, along with massive data acquisition, has enabled many time-sensitive operations, such as real-time monitoring and security analysis. However, with increasing complexity of power grids, such computing paradigm may face several challenges, such as privacy concern and communication bandwidth limit. In contrast, edge computing that leverages computing resources at edge has the potential to improve computation efficiency and protect data privacy by performing data analytic close to customers [109] . Particularly, machine learning approaches that can preserve privacy, such as federated learning [110] , have drawn increasing attention. This section surveys recent AI solutions to the core decision making processes in power grid operations. We report 85 papers, most of which were published in the IEEE transactions of the Power and Energy Society (e.g., IEEE Transactions on Power Systems, and IEEE Transactions on Smart Grid) from 2019 to 2021. For earlier works about AI algorithms for grid operations, we refer readers to previous survey papers [23] , [111] [112] [113] . Table I classifies the approaches used in these 85 papers according to the category to which these approaches belong, (i.e., supervised, unsupervised, and reinforcement learning). In addition, for each decision making process, we provide not only an overview of the state-of-theart, AI-powered grid solutions, but also illustrative examples that give readers a sense of how specific AI techniques can be leveraged to solve grid challenges. We use an independent notation system in each subsection,. Renewables and load introduce many uncertainties to the operation of low-carbon power grids. One way to address such uncertainties in grid operation is to develop an accurate forecast algorithm for renewables and load. The topic areas in renewable/load modeling include renewable (e.g., wind, and solar) generation forecasting, load forecasting, and load clustering. Table II 7 lists the most recent works in these topic areas. Table II also summarizes the data source, AI method, and computation resource used in the references provided. Wind power prediction [127] Wind farm data from NREL (Y) i7-7700 CPU, 16GB RAM ELM PV power prediction [128] 5-MW PV power plant (Y) i7-2600 CPU Deep belief network Cold load pick-up demand assessment [129] Real-world smart meter data (N) -Regression; Gaussian mixture model Dynamic load modeling [130] CIGRE benchmark low voltage network (N) -Decision trees; Ant colony optimization PV power forecasting [131] PV plant dataset (N) -Graph neural network Load forecasting [132] Real data set from residential Irish customers (N) -Random forest Next we provide an example to elaborate on how AI can be leveraged to solve PV forecasting tasks in the grid. The technical details are reported in [199] . Figure 5 shows the geographic locations of a target solar site C6 and its neighboring N solar sites. Let us suppose that we want to predict the solar irradiance of the target solar site C6 at time step (k + 1). Reference [199] formulates the forecasting problem into one of estimating the parameters of the following autoregressive with exogenous input (ARX) model [199] : where x[k] is the solar irradiance at the target solar site at time step k; w i is the solar irradiance at the neighboring solar site i; f (·) is an ARX-structured function; and positive integers n, d i , and m i are user-defined parameters that can be determined at training stages [199] . The intuition of the formulation (1) is that the next-step solar irradiance x[k + 1] at the target solar site depends not only on the local solar irradiance, but also on the solar irradiance at its neighboring solar sites. The case studies based on real-world renewable data from California and Colorado suggest such an algorithm is suitable for 1-h and 2-h ahead PV forecasting [199] . However, the algorithm proposed in [199] dose not provide a probability description for the forecast quality. One potential avenue for future work is to investigate such a description [199] . The large-scale deployment of renewables poses unprecedented challenges to the electricity market operation. Conventional deterministic tools may not be able to support the electricity market operation of the electricity infrastructure with a significant amount of uncertain renewables. Reference [200] proposes a scenario-based approach that unlocks the potential of data in order to incorporate renewables' uncertainties into the dispatch of grid resources. Let us suppose that there are N historical scenarios ∆ N = {δ 1 , δ 2 , . . . , δ n , . . . , δ N } that is a subset of all possible scenarios ∆. In each historic scenario δ n , the net-load forecasting errors at each bus is recorded. Reference [200] formulates the ED problem as follows [200] : where vector p concerns the power generation of all generators in all intervals during a planning horizon; vector c collects cost coefficients associated with generators; (2b) represents the scenario-independent constraints [200] , such as ramp and capacity constraints of generators; and (2c) represents the scenario-dependent constraints [200] , such as generation-load balance constraints. Suppose that p * N is the solution to the optimization (2) given N historical samples. Because ∆ N is a subset of all possible scenarios ∆, it is possible that there exists a scenario δ that causes the scenario-dependent constraints to be violated, i.e., g 2 (p * N , δ) > 0. The probability that such an event may occur is termed the "risk" in [200] . where Prob.(·) denotes the probability that event "·" occurs. We expect that the probability that the risk v(p * N ) of solution p * N exceeds a small number will be small, i.e., Prob where 0 < , γ 1. With the risk preference parameters and γ, a natural question is how to determine the size of ∆ N , i.e., N , to achieve the risk preference (4). Reference [200] provides a lower bound of N that depends solely on the look-ahead intervals and risk preference parameters [200] . Such a lower bound can help system operators determine how many scenarios must be drawn from the historical observations based on their risk preference. For example, in an opensource, 2000-bus synthetic Texas grid, if we suppose that the risk preference parameters of the system operators are γ = 10 −6 and = 0.0083, then 2000 historical scenarios are needed to be embedded into the ED formulation (2) [200] . A rigorous investigation of the relationship between the quantity of support constraints and the design parameters (γ and ) is still needed to further refine the algorithm in [200] . Other recent AI solutions to the problems of UC, ED, and OPF are summarized in Table III . The AI methods associated with the data sources and computation resources in the references are listed in Table III . To decarbonize the power grids, fossil-fueled generators are being replaced by inverter-based resources, e.g., wind/solar farms, and energy storage. To assess grid security and resource adequacy, it is necessary to develop new planning tools that explicitly consider these new elements. The grid security and resource adequacy analysis include steady-state,dynamic security analysis, and reliability analyses. Table IV summarizes the state-of-the-art AI adoption in these analyses. Next, we will present a learning-based approach to networked microgrid security analysis [164] , in order to show how an AI technique can be adopted in this specific topic area. Fig. 6 shows the physical architecture of n networked microgrids where the n microgrids interact with one another via distribution lines. The dynamics of the networked microgrids can be described byẋ = f (x), where state vector x is related to voltage magnitudes and phase angles at the points of common coupling (PCCs). In the networked microgrids, large disturbances may come from (i) the microgrid operating mode change, e.g., one microgrid enters an islanded mode; and (ii) the distribution network, e.g, distribution line tripping. The security analysis attempts to quantify the disturbance magnitude that the networked microgrids can tolerate [164] . The result of this analysis is critical for both distribution system planners and operators. In [164] , Huang et al. formulate the security analysis problem as one of searching for a legitimate Lyapunov function, i.e., a system-behavior summary function for a dynamic system. A Lyapunov function V (x) satisfies two conditions: (i) V (x) is a positive-definite function in a region R around the system equilibrium point; and (ii) the time derivativeV is a negative-definite function in R. In [164] , the Lyapunov function is assumed to possess a neural network (NN) structure with parameter vector θ. To make the NN-structured function satisfy the two conditions of a Lyapunov function, a cost function c(θ) is designed. The cost function incurs a positive penalty if the NN with θ violates one or both of the two Lyapunov function conditions. Vector θ is tuned by the following procedure: 1) Create a sample pool by randomly drawing a large number of states x within the region R; 2) Update θ n times based on the cost function c(θ) and the gradient descent algorithm [164] ; [150] Historic price data (N) -ELM 3) For the NN with the latest θ, search for samples that violated one or both of the two Lyapunov conditions via the satisfiability modulo theories (SMT) tool. If no sample is found, claim the NN is a Lyapunov function; otherwise, add the samples to the sample pool in step 1) and repeat step 2). Fig. 7 visualizes a Lyapunov function learned from a state space for a grid-tied microgrid [164] . The parameters of the system are reported in [164] . It take 32.18 seconds to learn the Lyapunov function [164] . Having learned the Lyapunov function shown in Fig. 8 , a security region can be estimated, which is visualized in Fig. 9 . If a disturbance leads the state vector to deviate from the equilibrium (the origin of Fig. 9 ) while also remaining within the solid red circle in Fig. 9 , one can conclude immediately that the system trajectory will converge to the equilibrium without conducting any simulations. The region in the solid blue circle is the security region estimated by a conventional approach. It can be observed in Fig. 9 that the learning-based approach is much less conservative than the conventional approach, since the red-solid circle is larger than the blue circle. Although the approach in [164] can address heterogeneous interface dynamics and can provide less conservative results than the conventional approach, it incurs large computational costs when analyzing large-scale systems. Deep penetration of clean energy resources is changing power grid behaviour (for example, clean-energy resources may lack physical inertia). As a result, the power grids are becoming increasingly sensitive to disturbances and impact anomalies may become more frequently observable. Effectively monitoring and correcting these anomalies in real-time defines a key challenge facing system operators. A large body of literature in the last three years has argued in favor of leveraging streaming data to make operational decisions in real time. Table V summarizes these recent works from the perspectives of data sources, methods and computation resources. The following are two specific examples that address online operational challenges in the grid. 1) Forced oscillation localization based on robust principal component analysis (RPCA): Forced oscillations are one type of the critical phenomena that concern system operators, because these oscillations may cause large-scale blackouts and decrease the lifespans of power grid components [201] . Fig. 10 illustrates the mechanism of forced oscillations. Let us consider a power grid as a blackbox with some inputs and outputs, as shown in Fig. 10 . The inputs can be thought of as setpoints of generators, while the outputs are PMU mea- surements. If one of the inputs varies periodically, oscillations can be observed in the PMU measurements. These oscillations are termed "the forced oscillations," and the periodic input is called the source of the forced oscillations. Different PMU measurements have different geographical distances from the oscillation source. The objective of the forced oscillation localization is to pinpoint which PMU measurements are close to the oscillation source, based only on the PMU data without information on the inputs and the power grid models. Locating the oscillation source is a challenging task, because the measurement closest to the source may not exhibit the largest oscillations. Fig. 11 shows such a counter-intuitive case in which the measurement (the red curve) closest to the oscillation source does not exhibit the largest oscillation magnitude. Reference [202] reports a real-world, counterintuitive case in which the distance between the source and the measurement exhibiting large oscillations is more than 1100 miles [201] . In reference [201] , Huang et al. formulate the forced oscillation localization as decomposing the measurement matrix Y t into a low-rank matrix L t and a sparse matrix [165] 13-, 123-IEEE (N) -Bayes classifier Actuator placement [194] 118-, 123-IEEE (N) 2-core Xeon CPU, 32GB RAM K-means clustering Frequency prediction assessment and control [166] 140-NPCC (N) i5-5200U CPU, 8GB RAM ELM Emergency control [196] [178] 30-, 118-, 300-IEEE (N) i7 CPU, 8GB RAM Learning-to-infer Dynamic security prediction [179] 39-IEEE (N) -CNN, LSTM network Distribution system state estimation [180] 37-IEEE (N) -NN Local control design for active distribution grids [181] Typical European radial LV grid (N) Intel Core i7-2600 CPU 16GB RAM Regression; SVM Anomaly detection, localization and classification [195] 14-, 39-IEEE (N) Multiple CPUs Autoencoder Volt-VAR optimization [197] 13-, 123-IEEE (N) i5 CPU, 8GB RAM Reinforcement learning Cyber anomaly detection [182] 39-IEEE (N) -SVM, Decision tree Faulted line localization [183] 39-, 68- Distribution system topology identification [193] 33-IEEE; 135-bus, 874-bus systems (N) -Split expectation maximization Grid restoration [192] 70-bus 4-feeder system (N) i5 CPU, 8GB RAM Regression S t , namely Y t = L t + S t . This matrix decomposition problem can be solved by RPCA as shown in (5), where · * and · 1 denote the nuclear norm and l 1 norm, respectively, Y t represents a measurement matrix up to time t where each row of the matrix represents a time series from one PMU, and S t is the corresponding approximate sparse matrix. Fig. 12 visualizes matrices Y t , L t , and S t , respectively. The computation complexity analysis of RPCA is reported in [203] . The measurement near the source can be located by identifying the largest absolute element in the sparse matrix. Reference [201] also provides a possible interpretation to justify the effectiveness the RPCA-based source localization algorithm. In reference [201] , the authors create 44 counter-intuitive cases in an open-source, benchmark system. The RPCA-based algorithm can pinpoint the sources in 43 cases, and in the wrong case, the algorithm can narrow the searching scope [201] . However, when the RPCA can exactly locate the true source remains an open-ended question. 2) Reinforcement learning (RL)-based protection scheme for renewable-rich distribution systems: The conventional protection paradigm in distribution systems has been challenged by the increasing amount of DERs. Fig. 13 presents the overcurrent protection scheme that is widely deployed in power distribution systems. Such a protection scheme will trip the line once the line current exceeds a threshold value, e.g., 5 times the current I 0 under normal conditions. However, if a DER is installed nearby, it may decrease the fault current by injecting reverse power flow. As a consequence, the current under the faulty condition might be much less than the relay threshold. In order to address the protection challenges in a renewablerich distribution system, reference [204] places the protection problem into a RL framework (Fig. 14) in which the protection scheme is learned by interacting with a distribution system simulator. In the RL framework, the distribution system is modeled by a Markov decision process (MDP) described by states s ∈ S, actions a ∈ A, a reward function r(s, a) , transition probability P , and a user-defined discount factor β ∈ (0, 1]. The implication of the states, action, and reward function in the protection problem are annotated in Figure 13 . In particular, the state s i,t and action a i,t of relay i at time t are defined by where s c i,t represents local current measurements, s b i,t represents the status of the local breaker, s c i,t represents the value of the countdown timer, a set i,t represents the action of triggering the countdown timer, a d i,t represents the action of decreasing the value of the counter by one, and a reset i,t represents the action of resetting the counter. The reward function gives deterministic positive rewards to the tripping action under fault conditions and stay-in-silence action under normal condition, and it gives negative rewards to malfunctions. The transition probability is determined by the distribution system; in practice it is unknown. The optimal action a * (s) at state s is obtained by where E(·) is the expectation operator; a is the possible nextstep action; and s is the next-step state given the current state and action; it is determined by the distribution system. In [204] , the Q function in (7) is approximated by an NN. The NN's parameters are learned by a sequence of {s, a, r, s } observations from the framework shown in Fig. 14 . The dataset reported in [205] can be used for training the algorithm. The simulation results in [204] suggests that the failure rate of the RL-based relay is only 0.32% in a distribution system with 30% DER penetration, whereas the conventional overcurrent relay has a much higher failure rate, i.e., 15.46%, under the same condition. One future direction of this work is to investigate a rigorous convergence guarantee for the sequential reinforcement learning algorithm [204] . To summarize this section, we provide two-fold guidance on applying use-inspired AI methods in power systems. First, it is critical to find appropriate application scenarios that take precedence over proposing innovative methodology. With deep neural networks as representatives, current AI techniques that are essentially model-agnostic function approximators usually present outperforming performance in application scenarios where there is only heuristic experience with no clear firstprinciple physical model, such as in load and renewable prediction. The illustrated neural network-based Lyapunov function [164] is another example. Although a Lyapunov function itself has rigorous definition, there is no traditional cost-effective analytical or numerical way to construct such a function for a large-scale real-world dynamical system, in which neural networks can provide an alternative effective solution. Second, it is desirable to intelligently and insightfully formulate critical challenges in traditional power systems into AI-friendly formats. Consider illustrated forced oscillation source localization [201] as one example. Intuitively, it can be formulated as a typical classification problem by taking system global states as inputs and discrete location labels as outputs. However, formulated as a matrix decomposition problem, this problem can be solved by RPCA that is commonly used for image processing, which has both outperforming accuracy and explainability. As more measurement data and data-driven algorithms become available, the power industry continues to adapt and improve operations by leveraging new technology and systems that enable it to meet and exceed customer expectations. This section presents some industry use cases to illustrate the continuing adoption of machine learning techniques by Oncor, a regulated utility that operates operates the largest distribution and transmission system in Texas. The following use cases were selected to show instances of AI adoption with relatively high maturity. In addition, we illustrate use cases (e.g., asset management) that are not considered in powersystems research, but that are essential for business operations with physical devices spread over large distances. All use case development is based on business needs and the value of the investment must be justified before a use case is developed, even if data are readily available. Moreover, the value-add of some high performance algorithms in many cases may not offset the maintenance cost required to keep such models operating properly (e.g., due to model drift). Table VI provides a brief introduction to the industry use cases that will be described in detail. Because some use cases involve proprietary information, details about pre-and post-processing steps and model accuracy level will not be disclosed. In many industry use-cases, the methods currently used may appear simplistic compared to the latest research; however, these use-cases are of high value, and large amounts of data are readily available. Utilities usually have multiple databases for various systems, such as outage management, advanced metering, work orders, geographical and meteorological data, and financial info. An essential challenge for conducting any big data analysis is to unify this data and enforce consistent formats for each data type. At Oncor, a datalake was created to consolidate the data needed for analytics. The datalake replicates data from all of Oncor's operational databases. In addition to supporting uniformity, this approach also minimizes stress on operational databases because they are accessed only during each scheduled copy rather than whenever an analyst makes a query. As the industry continues to adopt machine learning continues, and available platforms become more mature, advanced techniques will be more feasible at lower cost; these will be necessary to address more complex problems in power systems. Most importantly, collaboration between practitioners and researchers must intensify to achieve efficient and contin- uous adoption. For all utility companies, monitoring and maintaining their assets is critical to realizing system reliability and providing the highest quality service to their customers. Some assets, such as distribution class transformers, can be monitored by utilizing AMI meter data, such as voltage and kWh readings. For assets where digital measurements are not available, health monitoring may be possible by analyzing asset images using advanced image processing techniques. Several Oncor use cases are presented below to illustrate how asset health can be monitored by utilizing machine learning methods. As the largest utility company in the state of Texas, Oncor provides power to nearly 4 million customers through more than 1 million distribution class transformers, which can fail from damaged coils or overload degradation. Reactive replacement of a failed transformer can take more than 4 hours, but proactive replacements often take less than 1 hour. Thus, detecting failure precursors can significantly reduce both labor cost and outage time. Fig. 15 shows a plot of the voltage and load measurements from a single phase 240V AMI meter. Both voltage "V1" (in Volts) and load "LOAD" (in kWh) time series, in red and grey respectively, have a 15-minute resolution. The two horizontal lines are the upper and lower limits of the operating voltage ratings defined by the American National Standards Institute (ANSI C84.1-2020), which are ±5% of the nominal voltage. On June 24 th , 2018 the voltage suddenly rose above the upper limit due to a damaged coil on the primary side of the transformer. The sudden drop in voltage on July 18 th , 2018 denotes the time of the replacement. Typically, a transformer will not fail immediately after a coil is damaged. Therefore, proactive replacement is realistic and valuable if a change in voltage can be detected soon enough. After examining the pre-outage voltage profiles of all transformers replaced in Oncor's system during an 18-month period, a change point detection algorithm was designed to detect over/under voltage issues. A change in mean and/or variance of a meter's voltage was detected by a PySpark implementation of the functions provided in [206] . Several post-processing steps were implemented to remove change points due to outages or temporary voltage changes. The thresholds for these steps were selected from the ground truth data. Based on the number of issues seen on the same feeder, the detected issues were then categorized into various types, such as meter, transformer, or regulation issues, to enhance the troubleshooting process of the distribution operations organization. The algorithm and thresholds were tuned and improved using feedback received from the field. Currently the voltage monitoring process runs every weekday on data from 3.7 million AMI meters. The weekly-average accuracy for June-November 2021 is 94%. Oncor began to monitor distribution transformer health in 2016. As of November 2021, 3834 issues have been resolved proactively using transformer health analysis. These issues include damaged transformers or meters, as well as installation, regulation, and secondary issues that affect voltage measurements. Proactive transformer maintenance has saved Oncor approximately $3.25 million in equipment, labor and expenses as well as 5.5 million customer interruption minutes. Another asset health use case is defective insulator detection, due to, for example lightning strikes, forceful impacts or aging. Defective insulators are hazardous to the operation of power lines and pose a risk to system reliability. Oncor has more than 18, 000 circuit-miles of transmission lines with over 500, 000 transmission insulators. Rapid identification of damaged insulators, especially after a storm, is therefore a critical task in asset management. Due to the scale of Oncor's transmission system, manual inspection is infeasible. An automated inspection method was developed that use aerial/drone images of transmission lines and convolutional neural networks. The insulator defect detection method employs YOLOv3 (You Only Look Once, Version 3 [9] ), which is a real time object detection model that uses Darknet-53 [207] as the backbone feature extractor in a deep convolutional neural network. The model was initialized with YOLO's pre-trained weights using the Microsoft COCO (Common Objects in Context) dataset [208] and insulator images, provided by Electric Power Research Institute (EPRI), were used for transfer learning and validation (confidential data). The defect detector successfully recognized the insulators in an image, pinpointed those issues of each damaged insulator, and classified the issues as either "broken" or "flashed." For the 50 testing images, each containing multiple flashed/broken locations, 100% of the broken points were detected correctly and 90% of the flashed points were detected. There were no misclassified issues. The recent Texas House Bill 4150, also known as the "William Thomas Heath Power Line Safety Act," which was passed through the Legislature in May 2019, requires all utilities to make regular inspections of their power lines to ensure that they comply with state and federal safety regulations. Although Oncor completes routine inspections of all transmission power lines, detailed manual inspections of all structures are time consuming, impactful to land owners and costly. In an effort to reduce resources such as on right-of-way truck traffic, another deep model was trained and applied to aerial images of the power lines. This model is being developed in stages to ultimately identify reliability risks due to structures damaged by impacts or aging. The first stage of this model requires Oncor to verify all structure asset information in the Oncor Transmission Information System. Because many transmission lines are 40+ years old, information in historical records may be inaccurate for structures where components were replaced or added after the initial installation. Additional stages include identifying attributes that can indicate structural issues that may cause outages and affect reliability performance. These attributes include • Composition: wood, steel, concrete • Design: H-frame, A-frame, lattice tower, multi-pole, single-pole • Cross arm: beam, double-plank • Brace: V, X, knee The effort to classify transmission line attributes made use of YOLOv3; the initial results were promising, with accuracy rates of 89% for braces and 87% for cross arms. Fig. 16 and Fig. 17 show several examples of successful classification results. As more images are labeled to augment training data, the model's performance is expected to improve; furthermore, by including images with defective structures, the system can be used to inventory components as well as their degradation levels. Load forecasting is an essential building block in operating and planning tasks in both the power industry [209] and commercial building energy [210] . It is needed in many decision making processes for electric energy generation, DERs management, transmission, distribution, markets, and demandresponse. The pursuit of models that can achieve accurate load forecasts for short-, mid-, and/or long-term purposes is a long standing research area with a large body of literature [211] , [212] . For utility companies, short-and mid-term load forecasts are used to plan switching operations in control centers. Moreover, load forecasts contribute to network reconfiguration and infrastructure development/improvement decisions. For example, to better prepare for high power demand seasons, Oncor conducts load analyses to forecast summer and winter feeder load peaks. In some cases, a contingency plan will be made ahead of these peak seasons for feeders that are at risk of overload based on historical load data leveraged by analytics. These efforts have significantly improved Oncor's reliability performance; there has not been a feeder lockout event due to overload since 2018. Switching operations, however, are a major challenge for feeder load forecasting because a feeder's load can change significantly due to a load switching event (e.g. feeder reconfiguration due to an outage or planned maintenance). A robust model is needed to respond to these events quickly and adjust the forecasts correspondingly. Oncor currently is developing deep learning methods to surpass the performance of the current approach. Besides feeder load forecasts, load forecasting at any device is needed for making operational decisions in the control rooms. One approach is to forecast the load at each distribution transformer using AMI meter data and then aggregate at each device as needed. With a large quantity of distribution transformers (e.g., more that 1 million in Oncor's system), if computational power is limited, cluster analysis can be used to group transformers with similar load behaviors. Normalization (re-scaling each load profile to range [0, 1]) is needed before clustering so that the clustering results are affected mainly by the shape of the load profiles. After the transformers have been assigned into clusters, load forecasts for each cluster center (the representative of all transformers in that cluster) can be obtained; they are then scaled back to each transformer's load level by undoing the normalization steps. If distributed computing platforms are available, transformer load forecasting can be conducted by directly training individual models for every transformer, which will introduce fewer errors. Oncor implemented a regression tree model [213] on Spark that serves both short-and mid-term needs. The load of a transformer is affected by both numerical and categorical factors. The most important numerical factors include temperature, wind speed, humidity, and solar radiation, whereas categorical factors include time of day, day of week, month, etc. To avoid over-fitting, the maximum numbers of layers and leaves were tuned based on model performance. Fig. 18 shows an example of the hourly load forecasting results for one distribution transformer over the course of 3 days. The blue and red curves on the top plot give the actual and predicted load based on the predicted temperatures in the bottom plot (blue curve) using a regression tree model trained for a particular transformer. There is a trade-off between model performance (error level) and computing time, which can be calibrated to suit shifting business needs at any given time. This approach is able to capture non-periodic activity that sometimes deviates from the temperature as seen on Day 2 in Fig. 18 . The accuracy of load forecasts is highly dependent on the accuracy of weather forecasts, which utility companies usually obtain from vendors. The uncertainty in the exogenous factors must be accounted for in the final forecast, and because several of those factors are forecasts themselves, errors can be large. In this case, the model's performance is sufficient to add value to business operations at normal operating levels and in typical seasonal weather. The accuracy will be reduced during time of extreme cold or heat due to the lack of historical meter data. A special case in load forecasting is cold load characterization. During steady state, the heating or cooling load on a feeder is typically a smaller percentage of the total heating or cooling load. This reduced load results from the diversity of HVAC (heating, ventilation, and air conditioning) units simultaneously running due to normal cycling between on and off. After an extended outage the temperature in the residence will likely fall outside the setpoint range. Once the power is restored to the feeder, diversity of the heating or cooling load would be lost due to all the units turning on at the same time. This increase in load is referred to as "cold load". After some time period passes, the diversity will be restored because the unit run times will vary depending on factors such as HVAC rating, home size, and temperature setpoints. Cold load peak values are affected by pre-outage load behavior, season (winter/summer), time of day, ambient temperature, and load composition (customer types). Predicting these values at feeder breakers or other downstream protective devices enables optimal sequencing of operations to restore power quickly while minimizing the likelihood of damaging equipment. In addition, EMS typically has a load shed/restoration tool that can automatically conduct outage rotations among all feeders in the system during a short supply situation such as the recent Texas power crisis [70] . With predictions of each feeder's postoutage load peaks, the EMS can automatically and accurately follow ISO's load-shed requirements to protect the entire power grid. Oncor is currently testing a linear regression model to predict the ratio of the peak cold-load (post-outage) and preoutage load of a feeder. The data used are outage duration, preand post-outage temperatures, and the fraction of residential customers on the feeder. The residential load fraction is a good proxy for feeder load diversity (i.e., the independently controlled cyclic loads such as HVAC systems that may be energized at any given time during normal operating conditions). Since feeder breaker level outages are relatively rare, feeders are grouped by their residential fractions and a model is learned for each feeder-group. A total of 1127 breakers were evaluated and training data were collected for fitting the regression model. To accurately capture the cold load behavior, switch operation logs and fuse level events were reviewed to ensure that the cold load peaks were neither overestimated due to switching operations nor underestimated due to fuse level events behind the breakers. During an emergency situation, this model will take the pre-selected outage durations for feeder rotations and post-outage temperatures as inputs. The model will output a predicted load ratio for each (phase) feeder and the power ratio, then the cold-load peaks can be estimated. These four predictions are useful for unbalanced feeders; in balanced feeders, a single estimate of the power ratio is sufficient. Fig. 19 shows an example of the cold load peak prediction for one feeder using the trained regression model. The two highlighted points in the figure mark the pre-outage current and predicted post-outage current for one phase of a feeder. The predicted value is marked at the same location as the post-outage load peak only for better visualization and easier comparison. For many transmission and distribution planning models, RIC percentages at each substation transformer bank are used to allocate load in the base-case models. These percentages are also used to derive the number of various motor types for dynamics models and simulations. Likewise, distribution planners must sometimes perform weather corrections for load projections. In these cases, industrial and other nonweather-sensitive loads (such as water pumping and/or oil field pumping loads) are not weather-corrected, because these load types are rarely weather-sensitive or weather-dependent. Traditionally, the customer category of a premise is established at the creation of the premise and may not get updated when the customer type changes. For example, a commercial building can be leased to a new business that has a completely different load profile from that of the previous business, but the utility may not be aware of the change. Before system-wide installation of advanced meters, the RIC process used typical summer and winter hourly load profiles for each category of the building distribution feeder models. With the availability of AMI interval data and distributed computing, the process can be improved by directly analyzing the load profile of each premise. Because residential meters can usually be identified using information provided by ISOs, it is more valuable to focus on the non-residential meters. As an initial approach, cluster analysis was conducted on data from approximately 490, 000 non-residential meters. Domain experts selected 12 weeks (non-contiguous) over a 1-year period that adequately covered different seasonal and holiday effects (e.g., extended hours during holidays). The 15 minute interval load data was collected from each week and the timeseries for each meter were stacked into 8064-dimensional vectors (12×7×24×4). K-means clustering was applied to the data with, initially, k = 100. The initial parameter values were chosen as subject matter experts' estimation. Subsequently, large clusters were checked by comparison of random samples within the cluster to the cluster center (i.e., comparing the average load profile with the other profiles within the cluster). If a large deviation was found, then the cluster was split. A less heuristic approach would be to use the V -measure or silhouette-coefficient to determine an optimal number of clusters [214] , [215] ; however, cluster-splitting was found to be effective for this use case. This cluster analysis was conducted using Spark; 3-4 hours were needed for a cluster with 2 namenodes (dual Xeon-4208, 768 GB RAM per node) and 8 datanodes (dual Xeon-5218, 768 GB RAM per node) as shown in Table VI . The analysis will be repeated annually to capture any premises with changes in load type. Many companies in the power industry have been developing data-driven methods for their business needs. Exelon Utility and ComEd applied classification methods to aerial/satellite images as well as light detection and ranging (LiDAR) data for vegetation management to better understand the system's tree trimming workload in the system seeking to cut rimming costs while reducing the number of tree-related outages [216] , [217] . ISO New England proposed a prediction method based on decision tree to instruct interface limit values for different operating conditions [218] . Researchers in Hitachi proposed a three-layer wind power prediction model based on the data from historical power measurements and numerical weather prediction tools [219] . In addition, Bhattarai et al. reviewed related literature on big data analytics from the perspectives of electric utilities and industry [56] . In this paper, we have briefly reviewed the structure of power system physical and market operation, today's AI infrastructure of data acquisition and computation in power systems, state-of-the-art AI-based approaches for multiple critical functions, and industrial use cases of AI methods. In the following, we propose several research directions from the aspects of data, computing and AI algorithms. Despite the advances in data acquisition, in contrast to numerous datasets that have benefited broad AI communities, the lack of publicly accessible high-quality power datasets may be impeding the advancement of AI research in power systems. There are several reasons for the limited public access to power datasets. First, most real-world operational data are protected by policies such as CEII in the interest of confidentiality. Second, due to the reliability of real-world power grids, the rairty of opportunities to observe high-impact events may produce an insufficiently robust real-world measurement dataset. Third, the value of creating comprehensive and trustworthy benchmark power datasets has been overlooked by the power system community. There have been few open-source datasets [220] , [221] and online contests dedicated to topics such as forced oscillation localization [222] and power system operation [223] , [224] . However, far more will be needed to build a standard library of open-source benchmark datasets along with critical tasks in clear mathematical formulation that can be used to train, calibrate, test and benchmark data-driven models. One critical challenge is that commonly used random sampling and data generation methods do not guarantee representativeness [44] and may introduce unexpected biases into subsequent data-drive methods. Therefore, it is critical to investigate data generation methods that guarantee comprehensiveness and representativeness; these may be datasetinspired or task-tailored. In the meantime, it is also necessary to propose algorithm-agnostic metrics to consistently assess the property of representativeness. As mentioned in Section II, complex control algorithms are too time-consuming for real-time security control, especially in contingency scenarios. The rapid expansion of sensors has enabled massive data acquisition; however, although this data is necessary for realizing a digitized power grid, using all of it is beyond current computing capacity for centralized methods. Therefore, to explore and exploit advanced algorithms and massive streaming data, hybrid edge and cloud computing are necessary to dynamically balance the computational-load and escalate computing power as needed. For example, edge devices can compute partial results across several hundred sensors (e.g., half of a neural network's layers) and forward the results to the control center for final computations, effectively distributing computational load. Furthermore, new ASIC devices, dedicated to power system computations could be used in edge devices for real-time data processing and to accelerate simulations. In addition, communications between edge and cloud may contain sensitive information, requiring privacy preserving methods such as federated learning [110] . Besides accelerating computation, platforms are needed to manage the complexity introduced by digitization. The software development industry uses a set of (automation) practices called "DevOps" to manage development, integration, testing, deployment, and monitoring of distributed software systems. In sectors where data-driven and machine learning algorithms are used, another layer is added to DevOps [225] , [226] that encompasses automated training, testing, deployment, and monitoring of models-this is called "MLOps" [227] , [228] . Both DevOps and MLOps lower the maintenance cost of complex software systems through automation, but the initial investment is high. For efficient digitization of the power grid, both DevOps and MLOps will be necessary; however, there are unique aspects of power systems that require investigation. Because the grid is primarily hardware, it would be highly imprudent to blindly adopt methods developed for pure software environments. The instrumentation and sensors being deployed into modern grids also bring cyber-security challenges. If the data and contols are transmitted over the internet (e.g., cloud computing), the grid is vulnerable to the same cyber-attacks as a website, except the stakes are much higher: outages, energy theft, and loss of private data. Monitoring and detecting cyberthreats to the grid is an important area for cross disciplinary research combining power systems, cyber-security, and AI. Because power grids are large-scale critical infrastructure systems for human society, future research efforts ought to use-inspired AI algorithms that possess three key properties, namely interpretability, robustness, and scalability, aiming to facilitate practical applications. First, AI algorithms ought to be explainable by first-principle-based physical models, because only interpretable algorithms are acceptable for participation in the human-in-the-loop decision making process. In particular, interpretable AI approaches should provide clear causal inference for the purposes of real-time monitoring, control and diagnosis, such as identifying root cause of complex observations. Preliminary efforts have been devoted to physicsinformed ML as summarized in [229] . The principle is to steer the learning process towards identifying physically consistent solutions, of which instructive guidance contains three aspects, namely data processing, loss function modification, and model architecture design. For example, incorporating ordinary different equation (ODE) formats into loss function as regularization terms can improve the performance of system identification algorithms based on transient data or improve the fidelity of transient data generation methods. Second, AI algorithms must have performance guarantees extending beyond the basic, unperturbed scenarios. Particularly, the robustness to perturbation is critically important for reinforcement learningbased algorithms for decision making. Meta reinforcement learning [230] , [231] and transfer learning can potentially accommodate the gap between reality and simulation environment, thereby rendering the decision making adaptive to varying conditions and scenarios. Third, another highly desirable feature of AI algorithms is scalability, which refers to adequate effectiveness and efficiency in large-scale real-world systems. The concern regarding scalability arises from the aforementioned observation that the performance of existing AI algorithms in the power system domain is mostly demonstrated by small-scale grids without validation in large-scale cases. As high-dimensional measurements in power systems empirically have properties such as approximate low-rankness and sparsity, they may be potentially efficacious to discover intrinsic low-dimensional manifolds and linear coordinates in data structure [232] . In summary, digitization of the power grid will play a major role in transforming the electricity sector into a decarbonized system while simultaneously improving grid reliability. The synergy of high-dimensional dynamic data, increased computing power, and use-inspired AI algorithms, will enable improvements to the reliability and operational efficiency of the power grid at multiple scales. Challenges remain on the integration of heterogenous data sets, cyber-physical security, and development of robust, interpretable AI algorithms. Strong collaboration between industry and academia will be crucial for the successful adoption of use-inspired AI methods in a decarbonized power system. Tony Bruton (Member, IEEE) received his B.S. degree in Electrical Engineering from Texas Tech University in 2000. He began working for Oncor as a substation design engineer. For several years, he managed Oncor's high voltage grid in east Texas then managed the group that designed and built high voltage transmission lines. He also managed the routing and acquisition of right-of-way for new Transmission lines in west Texas. Mr. Bruton's current role as Director of T&D Services involves ensuring an accurate system that monitors and controls Oncor's Transmission and Distribution systems, EMS and ADMS, as well as data analytics and operator training. U.S. electricity customers experienced eight hours of power interruptions in 2020 US vs UK wholesale electricity market outlook How much carbon dioxide is produced per kilowatthour of U.S. electricity generation?" 2021 Mitigation pathways compatible with 1.5 c in the context of sustainable development Gathering strength, gathering storms: The one hundred year study on artificial intelligence (AI100) 2021 study panel report Pre-trained models for natural language processing: A survey Speech recognition using deep neural networks: A systematic review What is facial recognition used for? Yolov3: An incremental improvement Grandmaster level in StarCraft II using multi-agent reinforcement learning Mastering the game of go without human knowledge A general reinforcement learning algorithm that masters chess, shogi, and go through self-play Artificial intelligence in recommender systems Spot Atlas AutoPilot Chexnet: Radiologistlevel pneumonia detection on chest x-rays with deep learning Artificial intelligence in drug discovery and development Towards a new generation of artificial intelligence in china MIT plus Opening of OR62 Conference AI and compute Recent developments in machine learning for energy systems reliability management Big data technologies: A survey Big data monetization throughout big data value chain: A comprehensive review A review paper on big data: technologies, tools and trends There's plenty of room at the top: What will drive computer performance after Moore's law Xlnet: Generalized autoregressive pretraining for language understanding Bert: Pre-training of deep bidirectional transformers for language understanding Inception-v4, inception-resnet and the impact of residual connections on learning Improving language understanding by generative pre-training Imagenet classification with deep convolutional neural networks Very deep convolutional networks for large-scale image recognition Deep residual learning for image recognition Attention is all you need The Digitized Grid Power system control centers: Past, present, and future Automatic generation control and its implementation in real time Distribution management systems: Functions and payback Power system security assessment Reliability evaluation of power systems Power system economics: designing markets for electricity Toward a retail market for distribution grids Challenges for wholesale electricity markets with intermittent renewable generation at scale: The US experience The February 2021 cold weather outages in Texas and the south central United States Load forecasting Frontiers in massive data analysis Big data for development: A review of promises and challenges Medical internet of things and big data in healthcare Big data and ai-a transformational shift for government: So, what next for research Economics in the age of big data The promises of big data and small data for travel behavior (aka human mobility) analysis Big data driven smart energy management: From big data to big insights Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: Review, challenges, and a case study List of datasets for machine-learning research Big data analytics in smart grids: State-of-the-art, challenges, opportunities, and future directions The role of big data in improving power system operation and protection Data quality issues for synchrophasor applications part i: a review Power System Outage Task Force A comprehensive survey on phasor measurement unit applications in distribution systems Applications of synchrophasor technologies in power systems MASPI PMU map with and without data connections Complex power system status monitoring and evaluation using big data platform and machine learning algorithms: A review and a case study Frequency disturbance recorder design and developments FNET/GridEye web display Electric grid test case repository Grid structural characteristics as validation criteria for synthetic networks US test system with high spatial and temporal resolution for renewable integration studies A 2030 United States macro grid: Unlocking geographical diversity to accomplish clean energy goals An open-source extendable model and corrective measure assessment of the 2021 texas power outage Development of power system test bed for data mining of synchrophasors data, cyber-attack and relay testing in RTDS Advanced laboratory testing methods using real-time simulation and hardware-in-the-loop techniques: A survey of smart grid international research facility network activities Big data regional innovation hubs and spokes workshop The South Big Data Innovation Hub OPF-Learn: An opensource framework for creating representative AC optimal power flow datasets Computational imaging on the electric grid A cross-domain approach to analyzing the short-run impact of covid-19 on the US electricity sector A review of big data resource management: Using smart grid systems as a case study The computational limits of deep learning NVIDIA TESLA V100 GPU architecture Cloud TPU A million spiking-neuron integrated circuit with a scalable communication network and interface Intelligence processing unit Jetson Nano developer kit Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead Algorithm 539: Basic linear algebra subprograms for Fortran usage [f1 Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs Matrix computations AMD math library (LibM) Intel oneAPI math kernel library Basic linear algebra on NVIDIA GPUs OpenCL overview Tensorflow: Large-scale machine learning on heterogeneous distributed systems Caffe: Convolutional architecture for fast feature embedding Automatic differentiation in PyTorch NVIDIA cuDNN Point of view: How fast do algorithms improve Monitoring streams-a new class of data management applications Adaptive query processing: Technology in evolution Models and issues in data stream systems The Hadoop distributed file system Apache Spark: A unified engine for big data processing Mapreduce: simplified data processing on large clusters The Google file system Bigtable: A distributed storage system for structured data Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing Energy and utilities, sustainably fueling the future Oncur puts data to work to prevent energy outages Smart grid encounters edge computing: Opportunities and applications Communication-efficient learning of deep networks from decentralized data 1992, special Issue Expert System Applications in Power Systems Artificial intelligence in electric power systems: A survey of the Japanese industry Reinforcement learning for electric power system decision and control: Past considerations and perspectives A hybrid ensemble model for interval prediction of solar power output in ship onboard power systems Chance constrained extreme learning machine for nonparametric prediction intervals of wind power generation Operating reserve quantification using prediction intervals of wind power: An integrated probabilistic forecasting and decision methodology Deep learning-based socio-demographic information identification from smart meter data Exploring key weather factors from analytical modeling toward improved solar power forecasting A datadriven methodology for probabilistic wind power ramp forecasting Multimeteorological-factor-based graph modeling for photovoltaic power forecasting A combination interval prediction model based on biased convex cost function and auto-encoder in solar power prediction Improved deep belief network for short-term load forecasting considering demand-side management A data-driven customer segmentation strategy based on contribution to system peak demand A multi-timescale data-driven approach to enhance distribution system observability Multi-source and temporal attention network for probabilistic wind power prediction Estimating demand flexibility using Siamese LSTM neural networks An adaptive bilevel programming model for nonparametric prediction intervals of wind power generation Integrating gray data preprocessor and deep belief network for day-ahead pv power output forecast A data-driven framework for assessing cold load pick-up demand in service restoration An adaptive approach for dynamic load modeling in microgrids Spatiotemporal graph neural networks for multi-site pv power forecasting Aggregation of multi-scale experts for bottom-up load forecasting Reinforced deterministic and probabilistic load forecasting via q -learning dynamic model selection Learning the optimal strategy of power system operation with varying renewable generations Data-driven optimal power flow: A physics-informed machine learning approach Toward distributed energy services: Decentralizing optimal power flow with machine learning Statistical machine learning model for stochastic optimal planning of distribution networks considering a dynamic correlation and dimension reduction Spatial network decomposition for fast and scalable ac-opf learning Machine learning assisted stochastic unit commitment during hurricanes with predictable line outages Combining deep learning and optimization for preventive security-constrained dc optimal power flow DeepOPF: A deep neural network approach for security-constrained DC optimal power flow Maximizing the financial return of non-technical loss management in power distribution systems An agent-based hierarchical bargaining framework for power management of multiple cooperative microgrids Algorithmic bidding for virtual trading in electricity markets Data-driven screening of network constraints for unit commitment Data-driven regulation reserve capacity determination based on bayes theorem Multiclass learning-aided temporal decomposition and distributed optimization for power systems Machine learning-driven virtual bidding with electricity market efficiency analysis A class-driven approach based on long short-term memory networks for electricity price scenario generation and reduction Conditional density forecast of electricity price based on ensemble ELM and logistic EMOS A machine learning-based reliability evaluation model for integrated power-gas systems Reliability analysis of power systems integrated with high-penetration of power converters Fast yet accurate energy-lossassessment approach for analyzing/sizing PV in distribution systems using machine learning Machine learning-enabled distribution network phase identification Chance-constrained outage scheduling using a machine learning proxy Bayesian energy disaggregation at substations with uncertainty modeling Support matrix regression for learning power flow in distribution grid with unobservability Approximating trajectory constraints with machine learning -microgrid islanding with frequency constraints Solar panel identification via deep semi-supervised learning and deep one-class classification Improving supervised phase identification through the theory of information losses Probabilistic modeling for optimization of resource mix with variable generation and storage Data-driven classifier for extreme outage prediction based on bayes decision theory Automating the verification of the low voltage network cables and topologies A neural Lyapunov approach to transient stability assessment of power electronics-interfaced networked microgrids Flexible machine learning-based cyberattack detection using spatiotemporal patterns for distribution systems Integrating model-driven and data-driven methods for power system frequency stability assessment and control Deep feedback learning based predictive control for power system undervoltage load shedding Hierarchical deep learning machine for power system online transient stability prediction Datadriven power system operation: Exploring the balance between cost and risk A simulation-based classification approach for online prediction of generator dynamic behavior under multiple large disturbances An intelligent data-driven learning approach to enhance online probabilistic voltage stability margin prediction Semi-supervised ensemble learning framework for accelerating power system transient stability knowledge base generation Using vine copulas to generate representative system states for machine learning Sensorless maximum power extraction control of a hydrostatic tidal turbine based on adaptive extreme learning machine Anomaly detection and mitigation for wide-area damping control using machine learning Designing reactive power control rules for smart inverters using support vector machines Networked time series shapelet learning for power system transient stability assessment A learning-to-infer method for realtime power grid multi-line outage identification A unified online deep learning prediction model for small signal and transient stability Data-driven learningbased optimization for distribution system state estimation Data-driven local control design for active distribution grids using off-line optimal power flow and machine learning techniques Multi-agent based attack-resilient system integrity protection for smart grid Real-time faulted line localization and pmu placement in power systems through convolutional neural networks A deep learning-based feature extraction framework for system security assessment An adaptive pv frequency control strategy based on real-time inertia estimation Maximum power tracking for a wind energy conversion system using cascade-forward neural networks Residential household non-intrusive load monitoring via graph-based multi-label semi-supervised learning Deep learning-based real-time building occupancy detection using AMI data Real-time prediction of the duration of distribution system outages Distributed intelligence for online situational awareness in power grids Joint detection and localization of stealth false data injection attacks in smart grids using graph neural networks Real-time resilience optimization combining an ai agent with online hard optimization Topology identification of distribution networks using a split-em based data-driven approach Actuator placement for enhanced grid dynamic performance: A machine learning approach Anomaly detection, localization and classification using drifting synchrophasor data streams Adaptive power system emergency control using deep reinforcement learning Deep reinforcement learning based volt-var optimization in smart distribution systems On-line building energy optimization using deep reinforcement learning Multitime-scale data-driven spatiotemporal forecast of photovoltaic generation Scenario-based economic dispatch with tunable risk levels in high-renewable power systems A synchrophasor data-driven method for forced oscillation localization under resonance conditions Analysis of november 29, 2005 western american oscillation event Robust principal component analysis? Deep reinforcement learning-basedrobust protection in der-rich distribution grids Pyprod: A machine learning-friendly platform for protection analytics in distribution systems changepoint: An R package for changepoint analysis Darknet: Open source neural networks in c Microsoft COCO: Common objects in context Probabilistic electric load forecasting: A tutorial review Data-driven occupant-behavior analytics for residential buildings Short-term load forecasting Electric load forecasting using an artificial neural network Classification and regression trees V-measure: A conditional entropybased external cluster evaluation measure Silhouettes: A graphical aid to the interpretation and validation of cluster analysis Segmentation of LiDAR point clouds: LiDAR application in vegetation management Using data science to minimize power outage due to vegetation An enhanced transmission operating guide creation framework using machine learning techniques A three-layer hybrid model for wind power prediction A test cases library for methods locating the sources of sustained oscillations PSML: A multi-scale time-series dataset for machine learning in decarbonized energy grids 2021 IEEE-NASPI oscillation source location contest Learning to run a power network challenge for training topology controllers Learning to run a power network challenge: A retrospective analysis DevOps DevOps and its practices Hidden technical debt in machine learning systems Introducing MLOps Physics-informed machine learning Learning to adapt in dynamic, real-world environments through meta-reinforcement learning Meta reinforcement learning for sim-to-real domain adaptation Discovering governing equations from data by sparse identification of nonlinear dynamical systems His research interests include modeling and control of large-scale complex systems, smart grids application with renewable energy resources, and electricity markets. Xiangtian Zheng (Graduate Student Member, IEEE) received his B.E. degree from Tsinghua University She is currently a data scientist at Oncor Electric Delivery. Her expertise lies in data analytics and machine learning using power system data, which she has employed to develop many data-driven algorithms for load forecasting The authors sincerely thank Jimmy Liu, Steven Dennis, and Thomas Wilson for their help on the Oncor use cases presented in this paper.