key: cord-0221661-6melwer7 authors: Browne, Pierre; Lima, Aranildo; Arcucci, Rossella; Quilodr'an-Casas, C'esar title: Forecasting emissions through Kaya identity using Neural Ordinary Differential Equations date: 2022-01-07 journal: nan DOI: nan sha: 6f97875d43661c7b6d354ef93aec2cfcc5f53cd0 doc_id: 221661 cord_uid: 6melwer7 Starting from the Kaya identity, we used a Neural ODE model to predict the evolution of several indicators related to carbon emissions, on a country-level: population, GDP per capita, energy intensity of GDP, carbon intensity of energy. We compared the model with a baseline statistical model - VAR - and obtained good performances. We conclude that this machine-learning approach can be used to produce a wide range of results and give relevant insight to policymakers Lowering human greenhouse gases emissions is one major goal of the efforts against climate change, and the focus and concern of international cooperation (Paris Agreement, 2015) . Many indicators of human development -population, Gross Domestic Product (GDP), environmental footprint -have been following exponential curves during the past decades (Steffen et al., 2015) ; hence, drastic measures are needed if we are to switch from increasing to quickly decreasing emissions, as expressed in global organisations goals (IPCC Fifth Assessment Report (2014)). Understanding and forecasting the evolution, on a countryscale, of various indicators related to carbon emissions, may help to give a clear idea of the progress we are making, or not, towards lower emissions. The main indicators that we chose to study are the variables appearing in Kaya identity (Kaya & Yokoburi, 1997) , on a country level: population, national GDP, energy supply and CO2 emissions. Our main objective is to develop a model able to use this data to make accurate forecasts, on a medium/long-time horizon. Machine-learning models offer interesting advantages in comparison of traditional methods used for this type of work, typically statistical models (Cerqueira et al., 2019 In particular, the recent development of Neural Ordinary Differential Equations offers a promising perspective in this case (Chen et al., 2019 ) -we explain the reasons for this choice in 3.2. We adapted Neural ODEs for this problem and compared the performance with a baseline statistical model. The Kaya identity expresses a very simple relation between carbon emissions F , energy supply E, Gross Domestic Product (GDP) G and population P : G P is the GDP per capita, representing our average life standard, which we generally want to increase; E G is the energy intensity of the GDP -the energy needed to create one unit of GDP -, it quantifies the efficiency of energy usage in our activities; E F is the carbon intensity of energy -the CO2 emission corresponding to the supply of one unit of primary energy -and indicates how reliant on carbon emissions our electricity production is. Forecasting human development and carbon emissions is possible both with the set of raw variables {P, G, E, F } and with the set of indicators appearing in the Kaya identity {P, G P , E G , F E }. However, the latter gives a clearer analysis, from a macroscopic point of view (Ritchie & Roser, b) . While the raw variables are very strongly correlated altogether and vary greatly between countries, the variables from the Kaya equation look more like consistent indicators actionable under the right choice of policies (Hwang et al., 2020) . Overall, using these four indicators seems to be a good choice in order to assess efforts made by a country or region concerning carbon emissions (Å treimikienÄ— & Balezentis, 2016). Four datasets were used for this study: datasets for population (World Bank, 2020b) and GDP (World Bank, 2020a) were collected from the World Bank Open Data platform; the total energy supply (IEA, 2020a) and the CO2 emissions from fuel combustion (IEA, 2020b) were extracted from public datasets from the International Energy Agency. It should be noted that this is not the total emission for each country -however, greenhouse gases emitted by fuel combustion represent around 75% of all greenhouse gases emissions (Ritchie & Roser, a) . Each variable is available yearly, from 1971 to 2019, for at least 54 countries. In the domain of energy and emissions, forecasts are often relying on expert knowledge and statistical models (York et al., 2003; Auffhammer & Steinhauser, 2008) ; an extensive set of Integrated Assessment Models are used to explore how system Earth evolves during the next decades, following specific scenarios (IPCC Fifth Assessment Report (2014)), (Lucas et al., 2015) . In other fields, grey models (Deng, 1989) have been used successfully to model future emissions, linking with population, GDP and energy supply (Pao et al., 2012) . In some cases, black-box, data-driven models give good performances in comparison with traditional models (Rehman et al., 2017) . Here, the problem naturally appears as a multivariate time-series forecasting problem, for which machinelearning already offers an extensive toolbox (Hochreiter & Schmidhuber, 1997; Chung et al., 2014) . Here, the variables that we are trying to predict are physical values, that may be independent, or follow a simple or complex relationship (Hwang et al., 2020) . Since we may lack understanding of the physical system, a black-box model comes with the advantage of sparing the risk of making wrong hypotheses. In addition, even if the individual time series are only 50 years long, the dataset comprises a large set of countries to train a model with. Hopefully, a suitable machine-learning model would benefit from the variety of countries and produce more satisfying results training on several countries (5.1). Neural Ordinary Differential Equations (Chen et al., 2019) are a type of models recently brought into the spotlight. The model learns the dynamics of a system with a neural network: if X(t) is a state vector obeying dynamics of the shape dX dt = f (X, t), the network must approximate f . NODEs work together with a Differential Equation solver, and were presented with a practical method to compute a loss and back-propagate its gradient. Since the architecture of the neural network is entirely free, a NODE model can be used for very different problems. In our case, NODEs offer a good extrapolation capacity, allowing us to make long-time forecasts. Since they naturally model a vector field representing the evolution of a physical system, we hope that such a model can capture the complex Figure 1 . Boxplots of performances on 15 countries. V stands for VAR model, N stands for NODE model, the number corresponds to the forecast length, in years. We manually tuned some parameters. All models were trained for the same number of epochs, using Adam optimiser. physical pattern existing between the forecast indicators. In addition, several countries could help the model to represent the dynamics more accurately. Finally, a NODE model can handle sporadic data (not regularly sampled), both in time and across dimensions (Brouwer et al., 2019) -although not required with our original dataset, this property may become relevant for this problem (5.2). We used a NODE model to forecast the evolution of 7 variables, for each country. Apart from the 4 indicators from Kaya identity (population, GDP per capita, energy intensity of the GDP, carbon intensity of the energy), we added 3 variables representing how electricity is produced: proportion of electricity produced via fossil fuels combustion, via nuclear power or via renewable sources. The last 3 variables remain in [0, 1] at all time, and sum up to 1. Our model wraps a very simple neural network (one hidden layer), which takes 8 inputs (7 variables plus time) and outputs 7 values (the 7 variables derivatives). We also tried deeper architectures ; without clear evidence of better performances on this simple case, we kept the simplest model for performance evaluation. We used an Adam optimisation strategy (Kingma & Ba, 2017) . As a first step, we ran the model for each country individually, dividing the rime range [1971; 2019] in a training set (earlier dates) and validation set (latest dates). We compared the NODE model with a very simple statistical Vector Autoregression model (VAR model), which can typically be used for multivariate time-series problems (Liu et al., 2018) . To quantify performances, we computed the mean squared error on the validation set for a set of 15 European countries, for both models; figure 1 reports these results using boxplots. This was done for several forecast lengths. In particular, the table indicates that NODE models really shine for longtime horizons, while for very short forecasts VAR performs better; NODE models also give more stable performances Tackling Climate Change with Machine Learning workshop at ICML 2021 Forecasting emissions through Kaya identity using Neural ODEs Figure 2 . Example of forecasts obtained from a NODE model (red) versus a VAR model (green) ; training set is France data between 1971 and 2008, validation set is data between 2008 and 2020 (12-years forecast). -lower variance -when the forecast length increases. In addition, figure 2 presents a typical output of our NODE model, compared with forecasts from a VAR model. According to our results, Neural ODEs give overall good performances compared with VAR models, especially for long-time horizons. This justifies our approach as a reasonable method to forecast emissions in the long term. Through a data-driven model only, without expert knowledge, this process would hopefully learn from present and past patterns in the evolution of indicators from the Kaya identity, in order to give more accurate insight to policymakers. One of the motivations for a machine-learning solution to this problem is the perspective of benefiting from training the same model on several countries data. We started from the assumption that countries with similar profiles -e.g. same continent, same economic system, etc. -should have their forecast indicators obey the same hidden law; in other words, the system dynamics should be identical for these countries (Otto et al., 2001; Martin, 2015) . If this hypothesis is true, crossing data from these countries allows to train the model on more samples0 and, hopefully, the trained model will be more robust and able to forecast a broader set of behaviours. In the present state, we need to rethink how our model is able to distinguish different countries. With the normalisation process used so far, it was sometimes impossible for the model to tell that time-series come from different sources, with the output being just an average of all the observed trajectories. One possibility for this type of issue is to add meta-information about the data source, as additional inputs to the neural network (Kratzert et al., 2019) . In our case, as first idea, we added country encoding as an input (one-hot encoding). This led to better forecasts, but work remains to be done in this direction ; we made preliminary experiments on deeper networks, which might also help. A major prospect and objective of our study is the modelling of particular scenarios with this machine-learning approach: if we believe that our model has captured correctly the interactions between the forecast indicators, it is desirable to be able to explore particular possible futures (Moss et al., 2008; O'Neill et al., 2014) . This is a common idea when studying mitigation against climate change, and allows to make the connection between public policies, global effort, and actual effect on climate change (Moyer & Hedden, 2020) . Here, interesting scenarios could model how emissions forecast change if the electricity production shifts towards more nuclear or renewable energy; or study trajectories that allow to meet national or international goals on emissions reductions; or model the impact of a crisis such as Covid-19 on long-term energy supply and emissions. We envisioned two ways to incorporate scenarios to our NODE model -in both cases, we train the model beforehand. First, we could provide the model with a full, hypothetical trajectory for one indicator -the chosen scenario -, and forecast all other indicators. Hopefully, the model would take advantage of its training experience to output relevant correlations. Second, we could add after training one or several "hypothetical observations" to the training set, for future dates -the chosen scenario -; we would then train the model for a few more epochs on the augmented training set, expecting the model to adapt its forecasts to the new observations, while keeping the structure it has acquired during the first training. A simple way to verify this idea would be to provide the model with observations from the validation set, and to examine if the performance on the validation set increases significantly after the additional training. Further work is required to assess if this approach can lead to consistent results. A Framework for Forecasting the Path of US CO2 Emissions Using State-Level Information Continuous modeling of sporadicallyobserved time series. arXiv, abs/1905.12374 Machine Learning vs Statistical Methods for Time Series Forecasting: Size Matters. arXiv, abs/1909.13316 Neural Ordinary Differential Equations. arXiv Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change Introduction to grey system theory Long Short-term Memory Evaluating the causal relations between the kaya identity index and odiac-based fossil fuel co2 flux data-and-statistics/data-product/ co2-emissions-from-fuel-combustion -highlights All rights reserved Environment, energy, and economy : strategies for sustainability A method for stochastic optimization. arXiv Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets A vector autoregression weather model for electricity supply and demand modeling Are Worldwide Economies Correlated ? Towards New Scenarios for Analysis of Emissions, Climate Change, Impacts, and Response Strategies Are we on the right path to achieve the sustainable development goals? World Development Understanding OECD Output Correlations. RBA Research Discussion Papers rdp2001-05 A New Scenario Framework for Climate Change Research: The Concept of Shared Socioeconomic Pathways Forecasting of CO2 emissions, energy consumption and economic growth in China using an improved grey model Forecasting CO2 Emissions from Energy, Manufacturing and Transport Sectors in Pakistan: Statistical Vs Emissions by sector Emissions drivers The trajectory of the Anthropocene: The Great Acceleration GDP (current US$) -data Population, total -data ImPACT: analytic tools for unpacking the driving forces of environmental impacts Kaya identity for analysis of the main drivers of GHG emissions and feasibility to implement EU "20-20-20" targets in the Baltic States