key: cord-0058533-4srzhyx7 authors: Muralikrishna, Amita; Vieira, Luis E. A.; dos Santos, Rafael D. C.; Almeida, Adriano P. title: Total Solar Irradiance Forecasting with Keras Recurrent Neural Networks date: 2020-08-26 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58814-4_18 sha: bf95bd7ac6f6480c1176ad053648bd6bd2d53c3b doc_id: 58533 cord_uid: 4srzhyx7 The prediction of solar irradiance at the top of the atmosphere is useful for research that analyzes the behavior and response of the different layers of the Earth’s atmosphere to variations in solar activity. It would also be useful for the reconstruction of the measurement history (time series) of different instruments that suffered from time failures and discrepancies in scales due to the calibration of equipment. In this work we compare three Keras recurrent neural network architectures to perform forecast of the total solar irradiance. The experiments are part of a larger proposal for modularization of the prediction workflow, which uses digital images of the Sun as input, and aims to make the process modular, accessible and reproducible. The energy the Earth receives from the Sun is what drives life and many of the processes on the planet. Many of the studies of this relationship and its consequences requires constant monitoring of the Sun, in order to investigate its activity and to try to predict certain types of events on Earth. The main way to study this energetic influence is through the radiation received by the Sun in the form of temperature and irradiation over time. Recurrent neural networks (RNNs) have shown to be a powerful tool for dealing with time series or data of a sequential nature. New architectures have emerged to further improve the prediction capabilities of the network. This work proposes to perform the prediction of one of the measures of data from the Sun, the total solar irradiance (TSI), using as the input for the network 40 parameters extracted from areas that measure sunspots and active regions. Two types of images of the Sun are used for that. A workflow describes the whole procedure, based on [21] , organizing the steps in modules. The purpose is to make it available for the validation of the results, the execution of more experiments with the same data, or even a new one, and the modification of modules by other researchers with different techniques for tasks like identification, classification and the prediction itself. The intention is, therefore, to make the workflow modular, accessible and reproducible. As options for experiments with RNNs, the simplest architecture -Sim-pleRNN (Simple Recurrent Neural Network) -is used, as well as those created to solve the problem of vanishing gradient, LSTM (Long Short Term Memory) and GRU (Gated Recurrent Unit). This work presents their performance for different net configurations. The Sun owes much of its activity to its rotation and consequent complexity generated in its magnetic field. The events resulting from this activity can be observed under different types of records, in different layers of the star. The presence and the evolution of active regions and sunspots are two important indicators of disturbances in the solar magnetic field. And when studying the effects of this disturbances from the Earth, the TSI is also an important indicator [11] . The sunspots appear visually distinct on the surface of the Sun because they are darker and colder regions, emitting less energy than the rest of the surface [14] . They are formed by two regions: the umbra, a central and darker region, and the penumbra, which thrives around the umbra in only half of the spots, and has a lighter, gray coloration [5] . And the active regions are directly related to the presence of sunspots and can be observed in moments of time that precede and accompany the birth of the spots and can still be visible after their disappearance. Figure 1 shows a morning record from 2011 November 9th of the SDO (Solar Dynamics Observatory) HMI (Helioseismic and Magnetic Imager) instrument 1 through two types of images, collected in different wavelengths. Active regions are visible in the image on the left, where the opposite colored pixels represent opposite polarities of the solar magnetic field lines, and with more extensive areas than those of the corresponding sunspots, present in the image on the right, at the same instant of time. The presence of sunspots occurs in areas where the magnetic field is most concentrated and intense, but the active regions extend over the entire area where the field disturbance is visible. Data extracted from both types of images in Fig. 1 will be combined and used at the beginning of the procedure developed to forecast solar irradiance in this work. The solar irradiance can be defined as a certain amount of solar radiation per unit area. When measured on the ground, the solar irradiance is different from when measured at the top (outside) of the Earth atmosphere, since its emission at different wavelengths varies according to the physical and chemical properties of the Earth's atmosphere layers [24] . The solar irradiance measured at the top of the Earth's atmosphere is considered an influential parameter in studies of the atmosphere layers properties' and the consequences of the disturbances they suffer from the influence of solar activity [3] . Other relevant studies about weather and climate on Earth recognized the great influence of solar irradiation for the creation of climate models, as, for example, [4, 6, 22] . For the second form of irradiance there are two measures: the spectral solar irradiance (SSI), defined by a range of wavelengths covered by solar radiation and received by Earth at a distance of 1 AU 2 ; and the TSI: radiation emitted by Sun in all spectral regions, equivalent to the integral of all SSI spectral bands centered per second in 1 m 2 at 1 AU. The TSI, focus of this work, has a history of measurements performed by instruments on board of different satellites. However, this collection of measurements has gaps in the time series discrepancies (Fig. 2) , due to problems such as calibration and degradation of the equipment, which prevents studies that require a long and continuous period of data. Since 2003, TSI measures have been collected by the instrument TIM (Total Irradiance Monitor) on board the SORCE (Solar Radiation and Climate Experiment) mission 3 . This data set is used in this work. In order to solve this problem, many works attempted to estimate the irradiance curve with different methods [2, 8, 17, 21, 23] . Most of them use physical models that allow the reconstruction of time fractions of TSI or SSI missing data and not specifically the task of series prediction. One of them [21] , whereas, can predict continuous TSI values using a type of Recurrent Neural Network present in a proprietary software. Although it presents results with good accuracy, the software restrictions such as the need for a paid license and the incompatibility of toolboxes between different versions does not allow the procedure to be easily distributed and reproduced. The aim of this work is in the direction of offering a procedure that performs the TSI estimation, which can allow its reconstruction, but also predict its future variation six hours in advance, using neural networks implemented in free programming languages and libraries. The complete workflow for solar irradiance prediction (in this work, specifically TSI) begins with the processing of two types of solar images (as shown in Fig. 1 , collected by NASA's SDO mission), both of which are records of the solar photosphere (essentially the solar "surface"): -The continuum images: corresponding to those visible to the human eye, which record the sunspots; -The magnetograms: which allows the visualization of the active regions, through the polarity of the solar magnetic field. A set of both images are downloaded considering the desired time period, and then a merged procedure of identification, classification and calculation of the disturbed areas over both is performed. In the sequence, a data matrix is prepared with the calculated and classified areas which are used as input to the neural network that will perform a supervised training, presenting as desired output the TSI values observed for an instant of 6 h ahead. The workflow tasks, separated into modules, where each black outlined block represents a module, can be seen in Fig. 3 . This figure also suggests, through the dash outlined boxes, replacement options for some of the tasks. Parallel to the specific objective of comparing the performances of different Recurrent Neural Networks architectures in TSI forecasting, this work is part of a larger proposal that aims to share the complete procedure, with the partial and final results, in a modular way, in order to allow modifications on steps of the procedure and extensions that includes other preprocessing and classification approaches. Research in areas such as Space Geophysics uses computational tools and techniques to process and analyze data, focused on generating quality final results. However, the detailed record of the entire analytical workflow that was adopted to arrive at such results, and its availability for reuse, is also a valuable tool, and could bring a series of benefits for the researchers. This concept is already used in other scientific areas. Bioinformatics researchers, for example, can share robust and reliable workflows in the cloud, joined by a large volume of data, that can be reused by a research community who need the same data and processing [7, 18, 19] . This work is the beginning of a proposal for the modularization of the solar images analysis process, with the final objective of making available a solar irradiance prediction workflow that is, mainly: Accessible: available on a free platform, language and libraries in the cloud, so that it can be used by an interested researcher, offering with it all the necessary processing devices like operating system, software and libraries, with the originally used versions; Reproducible: with notebook commented codes that can be: re-executed to validate the developed procedures and the partial and final results obtained; or edited to, for example, allow the reuse of the workflow to process and analyze other input data; Modular: in order to allow parts (modules) of the workflow to be used separately, or edited to have part of the process replaced, purposing, for example, the creation of new tests with new techniques or tools that perform the same task. This work focuses on the tests performed for TSI forecasting using three types of recurrent neural networks (RNNs). RNNs are a class of artificial neural networks (ANN) with dynamic behavior, which consider the existing dependency between the instances of entry. This is what differentiates them from the traditional Feed Forward Networks, which treat the inputs as independent from each other. In order to generate internal memory devices, RNNs also have connections between neurons of the same layer, different from the traditional Feed Forward, in which the connections exist only between the different layers. What happens basically in this new architecture is that the output of a neuron can be fed back into the neuron itself [9] . This flexibility of RNNs allows them to be used in language processing problems such as speech recognition, language modeling, machine translation [10] , price monitoring [20] and time series in general, cases in which considering the historical context of the inputs can be an essential differential in the forecast quality. Essentially, the RNNs follow the basic functioning of other ANNs, regarding the internal processing of neurons, which results in the weighted sum of the input over which an activation function is applied. However, besides the traditional matrix of weights, the simple types of RNNs have another matrix of weights that is used exclusively in the processing of the feedback coming from the previous processing moments. In Fig. 4 , it is possible to visualize the basic process which occurs in a recurrent neuron to calculate its state h in the time instant t, given the input at the same time (x (t) ) and its previous state (h (t−1) ), generating its output (o (t) ). On the left of the picture a briefer representation is made and on the right the unfolding of the process considering instants of time from t − 1 to t + 1. In addition to this information, the figure also distinguishes the three weight matrices, with U the input layer weight matrix, V the output layer weight matrix and W the weight matrix applied to the feedback signals [9] . Equations 1a-1c formalize the operations indicated in Fig. 4 , where we observe that the cell in the state h (t−1) that receives and processes the input x (t−1) , in addition to transmitting the result as output (o (t−1) ) to the next layer, also uses it as input to update its own state to h (t) . Therefore, when that same cell receives the next entry x (t) , it has the memory of the previous entry x (t−1) . And so the process continues, successively, until it processes all the input instances. The activation function commonly used in recurrent layers is the hyperbolic tangent (tanh), to deal with nonlinearity of data. Equations 1b and 1c show, respectively, the weighted sum on which the activation function tanh with the respective weight matrices is applied, and then the calculation of the output of each recurring neuron. The training algorithm normally used in RNNs is Backpropagation Through Time, adapted from the traditional backpropagation used in multilayer perceptrons, but, in this case, using the time instants as the basis for the backpropagation. It makes the calculations of the chain rule more complex, since the backpropagation of the error must reach not only the neurons of the previous layer in question, but all those who influenced them with the feedback through time. Thus, the large number of steps through which the error is retropropagated generates a drastic reduction in the value of the gradient, approaching zero, resulting in the so-called vanishing gradient problem, in which the weights begin not to suffer significant updates, not contributing then to the learning process [13] . The operation of the simple recurrent nets architectures, like Keras Sim-pleRNN, tested in this work, is based on the previously described functioning and does not deal with the vanishing gradient problem, which limits the memory offered by this type of nets to short term. This problem motivated improvements in the basic RNN algorithm, creating variations, among which the main and most used today are LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit), which will be the ones also used in this work. LSTM is a special type of RNN, which is able to learn long-term dependencies, solving the problem of vanishing gradient by adding more interactions in the processing of the recurrent neuron [12] . Instead of the single layer of simple RNNs, where the tanh function is applied on the weighted sum between the input and the previous state, LSTMs have more layers interacting in a very specific way, each one with its own matrix of weights [10] : Forget Gate f : Defines how much of the previous state h (t−1) should be allowed to pass on. Activates the signal with sigmoid function (Eq. 2). Input Gate i: Defines how much of the newly computed state with the x t entry will be kept for the next instants of time. Activates the signal with sigmoid function (Eq. 3). Output Gate o: Defines how much of the internal state h (t−1) is desired to be exposed for the next layer. Activates the signal with sigmoid function (Eq. 4) . Internal Hidden State g: It is computed based on the current entry x (t) and the previous state h (t−1) of the neuron, similar to what happened in the elements of Simple RNNs, using the function tanh (Eq. 5). After obtaining the results of f, i, o and g, the so-called Cell-State is calculated (c (t) ), given by Eq. 6, in which a combination is made between the long-term memories and the most recent ones, through the Forgot gate and the Input gate. This way, their values are weighted in order to ignore the desired memories (with value 0) or making them relevant (with value 1). The output h of the neuron is finally calculated as a function of the Cell-State value, after applying the tanh function to it, and then determining, through the value of the output gate, how much this output is relevant to this instant of time (Eq. 7). It can be considered that the GRU is a variation of LSTM and very similar to it, with the advantage of having a considerably simpler gates structure than LSTM. GRUs are composed of only two gates: Update and Reset, which are able to be trained, respectively, to define how much of the older information should be kept, and to merge the new information entries with the previous memories. Equations 8a-8d summarize the computations that take place in GRU cells [10] . According to some works, LSTM and GRU are networks that have shown themselves to be very similar in terms of performance [15] . In this work, both of the architectures, as well as the Simple RNN, will be tested to see which will bring better results for the TSI forecast. Experiments were performed with Keras library for Python RNN architectures layers: SimpleRNN, LSTM and GRU. As training and validation data set, it was used the period from 2011 November 05 to 2012 March 30, which presented records of quiet and disturbed sun activity. They had 6-h temporal resolution, making a total of 588 time instants, from which 80% were used as training set and 20% as validation set. The networks fixed configuration involved: 40 input parameters, a recurrent layer and a linear output layer (Keras Dense layer) with a neuron, to predict a single TSI value. The nets parameters that were varied in the experiments were: number of recurrent hidden neurons, batch size, time steps, the use of Dropout (an algorithm to prevent the over-fitting problem in neural networks training process [1] .) and the net architecture. For all the architectures, experiments were made varying the recurrent layer hidden numbers, considering two experiment groups: Few units without a dropout layer: For the first batch of experiments, few units were used, from 1 to 8, taking as basis [21] . For each different number of hidden units, five complete training processes were done, being adopted as fixed parameters: batch size of 5 and 1 time step. The Fig. 5 shows the least Validation Root Mean Square Error (RMSE) variation obtained in five training processes for each hidden unit number. Many units with a dropout layer: For the second batch of hidden units experiments, 10 to 80 units were used, with 5 full training processes for each configuration. This time, a Dropout layer was added to the net, trying to prevent over-fitting, that was observed on the first group of hidden units tests. An initial dropout rate of 0.2 was assumed. The Fig. 5 shows the results for this experiment. The first class of tests showed that low quantities of hidden units, specifically one unit, brought better accuracy results. However, the experiment with more units and the use of Dropout layer brought even better accuracy, besides of faster convergence with 20% of the epochs that were needed in the first group of tests. The second group also presents less variability in RMSE for the same batch of tests. In most of cases, in both groups of tests, specially in the second one, the GRU and LSTM nets performance looked very similar, bringing minor error than the SimpleRNN. For droptout rates varying tests, rates from 0% to 50% were used, considering that 50% is the higher recommended rate in Keras documents. The results with GRU and LSTM, the architectures which presented minor RMSE, did not show a considerable variance for the different dropout rates, but the higher rate showed a significantly less RMSE. Therefore, once a big number of hidden units was used, a rate of 50% was chosen for the subsequent experiments. The batch number variation was defined calculating different percentages of the training data length. It started on 0.5% and was increased until 50%. Tests were made for few as for many hidden units and both showed not a considerable difference in performance throughout, but a little worst performance over the batch size increasing in both test groups. Therefore, even though bigger size batches result in faster training processes, smaller batches were opted for the subsequent tests. This set of tests was made varying the number of time steps in the input data composition. All previous tests used only one time step, what means that only the input at time t was being used for predicting the output at time t + 1 (6 h ahead). When using more time steps, more previous time instants are used. For example, when using 3 time steps, for predicting the output at time t + 1, the input at times t, t − 1 and t − 2 are used. Figure 6 shows the results obtained for the three net architectures (on the left) when varying the time step from 1 to 5. The same figure (right side) shows the performance comparison between the two architectures that showed best performance. The tests for 2 and 3 time steps gave a minor value for RMSE, maybe suggesting time dependence between the Sun activity and TSI. More experiments should be done to confirm that relationship. In an attempt to combine the best results obtained in the experiments described above, the last training steps were conducted using the considered best parameters. Figure 7 shows, for one of these trainings, the relationship between the desired and obtained TSI values for the training and the validation data. Figure 8 shows the validation performance for each architecture for the same training parameters. In some experiments, the GRU net showed a slightly better performance, but in the most of them LSTM and GRU showed almost the same performance. The SimpleRNN, for the first experiments, showed good results too, but in subsequent tests, it presented considerably worse performance than the two other architectures. The time steps varying results suggest a better investigation over the time dependency period between the sun activity and its consequences at the top of the Earth's atmosphere. Further tests are considered necessary to arrive at a final configuration for the TSI forecast. For that, it is intended to train a larger set of data after finding a satisfactory way of linking long periods of data, considering the existing gaps between them. The use of new practices to better analyze the network performance for different configurations is considered too for the continuity of this work. There is the intention of continuing this work towards the mentioned proposal, having as a future step the testing of available tools in the cloud for turning available the workflow and its modular structure. Parallel to this objective, the exchange of workflow modules, such as input images to images in other frequency bands is expected, in order to predict some of the SSI spectra instead of TSI. Afterwards, a change in the method of classification of disturbed regions is intended. The intention is to use Machine Learning techniques in all module exchanges. Understanding dropout A new SATIRE-S spectral solar irradiance reconstruction for solar cycles 21-23 and its implications for stratospheric ozone Influence of solar irradiance on polar ionospheric convection A solar irradiance climate data record O Numero de Manchas Solares Recent variability of the solar spectral irradiance and its impact on climate modelling A review of scalable bioinformatics pipelines Modelling short-term Solar Spectral Irradiance (SSI) using coronal electron density and temperature profiles based on solar magnetic field observations Deep Learning Deep Learning with Keras The solar cycle LSTM can solve hard long time lag problems Recurrent neural net learning and vanishing gradient. Fuzziness Knowl.-Based Syst The Role of the Sun in Climate Change An empirical exploration of recurrent network architectures An assessment of the solar irradiance record for climate studies Reconstruction of solar spectral irradiance since the Maunder minimum A review of bioinformatic pipeline frameworks The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows Multivariate analysis and neural networks application to price forecasting in the Brazilian agricultural market Short-term forecast of the total and spectral solar irradiance Solar ultraviolet radiation in a changing climate UV solar irradiance in observations and the NRLSSI and SATIRE-S models A review of vertical coupling in the atmosphere-ionosphere system: effects of waves, sudden stratospheric warmings, space weather, and of solar activity Acknowledgements. This work was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001.