key: cord-0074449-ixeobjxj authors: Zhang, Fan; Li, Hao; Xu, ZhiChao; Chen, Wei title: A Novel ABRM Model for Predicting Coal Moisture Content date: 2022-02-03 journal: J Intell Robot Syst DOI: 10.1007/s10846-021-01552-6 sha: 3bbb0f9cffd3d2f0d153b05ab7fe1b10d94862c6 doc_id: 74449 cord_uid: ixeobjxj Coal moisture content monitoring plays an important role in carbon reduction and clean energy decisions of coal transportation-storage aspects. Traditional coal moisture content detection mechanisms rely heavily on detection equipment, which can be expensive or difficult to deploy under field conditions. To achieve fast prediction of coal moisture content, a novel neural network model based on attention mechanism and bidirectional ResNet-LSTM structure (ABRM) is proposed in this paper. The prediction of coal moisture content is achieved by training the model to learn the relationship between changes of coal moisture content and meteorological conditions. The experimental results show that the proposed method has superior performance in terms of moisture content prediction accuracy compared with other state-of-the-art methods, and that ABRM model approaches appear to have the greatest potential for predicting coal moisture content shifts in the face of meteorological elements. In September 2020, China set the path and timeline for achieving peaking carbon dioxide emissions and net zero carbon emissions, namely "carbon neutrality". Currently, the energy industry is actively implementing an energy transition to gradually move towards the goal of achieving carbon peaking. At the same time, the coal industry is also promoting a new process of using fossil energy in a more environmentally friendly way, and achieving cleaner use of coal and lower carbon emissions through technological innovation and intelligent technologies in coal mining, storage, and transportation [1] . And the dynamic monitoring of coal moisture in coal storage and transportation will provide effective technical means for coal storage and transportation, dust suppression in coal yards and clean utilization. Therefore, it is important to study coal moisture content prediction and monitoring technology to improve coal transportation and storage safety, reduce environmental pollution in the yard, and reduce carbon emission index of coal storage and transportation [2] . The online measurement methods of coal moisture can be categorized into two types, namely, direct hardware sensing and indirect soft sensing. Cutmore et al. [3] proposed a contactless microwave gauge for the on-belt determination of total moisture content in coal preparation plants, thus achieving the online detection of moisture content in coal. Deliang Zeng et al. [4] presented a soft sensing model for coal moisture based on the energy and mass balance of material in the inlet and outlet of a positive pressure, direct combustion, MPS-type mill, which reduces the delay time for moisture measurement. Yuman Wang [5] modified the microwave transmission coal moisture content detection method using least squares support vector machine algorithm to further improve the detection accuracy. Mao et al. [6] proposed a rapid detection of the total moisture content of coal using low-field nuclear magnetic resonance (NMR), thereby providing a new and effective tool for online detection of moisture content in coal from coal processing plants. The above coal moisture online detection methods have been applied to coal storage yards and played a role in improving the decision of coal storage and transportation control, but there are still some shortcomings in detection speed, measurement accuracy and stability: (1) They require long moisture content analysis time with poor real-time detection performance of the target, thus have difficulty in providing real-time effective decision support. For example, the commonly used vacuum high temperature drying method requires at least 2 h to determine the moisture content. (2) During coal storage and transportation, water mist and dust is common interference in the environment. The above-mentioned methods have lower detection accuracy. (3) Dynamic monitoring models cannot be constructed, and the whole system may be susceptible to external perturbations such as meteorological changes and operational activities. In recent years, with the extensive research on machine learning technology, deep learning has been widely used in many fields [7] [8] [9] [10] [11] . Currently, in regards to moisture dynamics prediction, neural network models are often applied to soil, crops, wood, etc [12] [13] [14] [15] [16] [17] . There is a complex interaction between moisture changes and meteorological factors in these objects, and only by fully extracting the features of this complex relationship can reliable prediction results be obtained [18, 19] . According to relevant studies, the moisture content is a dynamic and continuous process in the time dimension, i.e., there is an interaction between the moisture content of adjacent moments [20] . The mechanism of change of moisture content in coal is similar to that of soil and other substances, which is susceptible to the influence of meteorological factors such as temperature and humidity, and shows dynamic continuity in the trend of change [21] . Thus, by obtaining a large amount of time series data on moisture and meteorology, a data-driven prediction model for coal moisture content can be built. This process does not require complex statistical or physical equations for a single variable, but is based entirely on the learning of a set of predictors for the variables of interest. Therefore, the technical idea of the research on the prediction of moisture content of coal stored and transported in open yards is considered in conjunction with the characteristics of coal moisture variability as well as in-depth extraction of the correlation feature analysis of moisture and meteorological factors [22] . In view of this, this paper proposes a machine learning-based online monitoring method for coal moisture content, which effectively solves the above problems by deep learning. In this paper, based on deep neural network theory and its prediction method, a coal moisture prediction model based on the residual nets (ResNets) and bidirectional long short-term memory (Bi-LSTM) is proposed. A one-dimensional convolution can ensure correct understanding of sequence data and better extracting features. A simplified ResNet can prevent a deep CNN from overfitting on small-scale datasets, and at the same time avoid the negative impact of redundant convolutional layers [23] . Bi-LSTM can learn the effect of both antecedent and postcedent sample features on the current time step [24] . The attention mechanism can further strengthen the key information in the time series by redistributing the weight of each time step's output [25] . The model combines one-dimensional ResNet, Bi-LSTM and attention mechanism, incorporating the feature extraction capability of CNN and the time-series feature memory capability of Bi-LSTM, to achieve the trend prediction of water content of coal in open dumps by training the model using multi-source data such as meteorological conditions in a specific region. The rest of this paper is organized as follows. The architecture of our proposed methodology is presented in Section 2. Section 3 describes the experimental setup and evaluation metrics, as well as the analysis and discussion of the experimental results. Finally, Section 4 summarizes our conclusions. In order to enable the network model to correctly understand the serial data and extract its key features, this paper constructs a time series data consisting of two parts: meteorological features and coal moisture content features. Based on the data feature extraction capability of the CNN and the contextual association capability of long short-term memory (LSTM), we propose an attention-based bidirectional ResNet-LSTM model (ABRM). The network structure of the ABRM is shown in Fig. 1 . The model consists of input layer, one-dimensional convolution layers, Bi-LSTM layer, attention layer, and output layer. The residual convolution layer is composed of one-dimensional-convolution-based ResNet to accomplish feature extraction; this is described in detail in Section 2.1. The recurrent layers and attention layers together perform the sequence prediction task, which is described in detail in Sections 2.2 and 2.3. The CNN is one of the most important core technologies in the field of computer vision (CV). It is a neural network structure composed of multiple convolutional layers, which have the characteristics of sparse interaction, parameter sharing, and equivalent representation. When performing image processing tasks, the input data of a CNN are a matrix of image pixels so that its pixel points in each direction have essentially the same correlation with each other. While using the time series data as the input of the CNN, the sequence step length is usually taken as the width, the number of features as the height, and each feature value as the pixel point value. Assuming that the sequence step size is n and the number of features is d, the input to the CNN forms an n × d matrix. It is obvious that the input sequences are not equally correlated in different directions. In this study, a 1D CNN is used to ensure that the model understands the data correctly while making the width of the convolution kernel equal to the number of features of the sample points. With the top-down sliding convolution of the convolution kernel, the features and interrelationships of each time step can be extracted efficiently. When the CNN is deep and the size of the dataset is not huge, the overfitting problem is easily triggered. ResNet can avoid this and has shown excellent performance [23] . Moreover, ResNet adds identity mapping between different convolutional layers so that the information in the network can flow across layers, ensuring that the model does not suffer from increasing errors as the depth of the network increases. In other words, the deep network has the possibility to transform autonomously into a shallow network, thereby avoiding the negative effects of redundant convolutional layers. Figure 2 shows the structure of the basic residual unit in ResNet. The calculation of the residual unit is as follows: where x l is the identity mapping part; F x l , W l is the residual mapping part, which is composed of two layers of convolution (see Fig. 2 ). x l+1 is the output of the residual unit. LSTM is an important technique in Natural Language Processing (NLP) [26] and solves the gradient vanishing or exploding problems that may occur in traditional Recurrent Neural Networks (RNNs) when processing longer sequences. The LSTM was designed with three gate structures: the forget gate, update gate (input gate), and output gate. Through the weighting of the input vector by these three gates, LSTM can selectively extract the antecedent information, and better understand and preserve the global information to accomplish more accurate prediction. Therefore, LSTM has been widely used in tasks such as speech recognition, sentiment analysis, and text analysis. Bidirectional LSTM can read the sequence data in the reverse direction for re-learning. The final output of Bi-LSTM is determined by forward and reverse learning. This structure further ensures the backward and forward correlation of the sequence prediction results. Therefore, in some tasks, Bi-LSTM will perform better than LSTM. Taking the input time step t as an example, the specific computational flow of the Bi-LSTM is as follows. (1) The output h t−1 of the previous time step and the input x t of the current time step are combined, and sent to the forget gate for selective forgetting: where f t is the output of the forget gate, represents the Sigmoid activation function, W xf and W hf are the weights assigned to x t and h t−1 by the forget gate, respectively, and b f is the bias of the forget gate. (2) The stitching vectors of x t and h t−1 are selectively memorized through the input gate as shown in Eq. 3. Meanwhile, x t and h t−1 are scaled by the activation function tanh to produce the candidate memory cell c t that stores the information of the current time step, as shown in Eq. 4. Here, i t is the output of the input gate, W xi and W hi are the weights assigned to x t and h t−1 by the input gate, respectively, and b i is the bias of the input gate. (3) The contents of the antecedent memory cell c t−1 are updated by deciding which new information in the current candidate memory cell c t will be stored; the latest memory cell c t is obtained as follows: (4) The output gate determines what information should be included in the output state o t of the current stage: where o t is the output of the output gate, W xo and W ho are the weights assigned to x t and h t−1 by the output gate, respectively, and b o is the bias of the output gate. (5) The output value h t of LSTM is obtained by calculating the output o t of the output gate and the state c t of the current cell, as follows: (2) (6) Bi-LSTM adds an inverse learning cell. It will combine the forward and backward outputs of the LSTM cell as the final output based on the above operations: where ⊕ is the summation operation, �� ⃗ h t is the output of the forward LSTM cell, ⃖�� h t is the output of the corresponding backward cell, and h t is the final output. The LSTM cell structure is shown in Fig. 3 , and the Bi-LSTM model structure is shown in the orange part of the recurrent layer in Fig. 1. The attention mechanism was proposed in 2014 and has achieved great success in machine translation [25] . In recent years, attention mechanisms have been widely used in various different types of deep learning tasks such as NLP [27] and image recognition [28] and have become one of the most interesting and insightful approaches in the DL field. In a representative study in 2017, the Google Brain team abandoned the classic RNN/CNN structure and proposed a transformer model composed of only the attention mechanism [29] . The attention mechanism is able to redistribute the input weights, thus enabling the selection of information that is more critical to the current task goal from among the many pieces of information. In the model of the present paper, the attention mechanism assigns different weights to the output h 1 , h 2 , ⋯ , h t , ⋯ , h T of LSTM units, enabling the model to give different levels of attention to different time steps in the sequence. Taking the output time step t as an example, the specific calculation process is as follows. Forget gate Update gate Output gate (1) When the Bi-LSTM model produces a hidden output h t at time step t, it is given a weight W to obtain u t as follows: (2) The importance of the current time step u t is then calculated using the similarity between u t and u , and is normalized to obtain t : where u denotes the context vector, which is randomly initialized and jointly learned in the training phase. (3) Finally, by weighted summation of these weights t , the comment vector S is summarized. This vector summarizes the information of all time steps in the sequence: The attention structure is shown in the attention layer in Fig. 1. In this section, we first introduce the experimental datasets, and discuss the parameter settings and evaluation metrics. Finally, we compare the proposed ABRM model with several other advanced algorithms. All of the experimental results originate from the proposed algorithm which is performed 50 times independently and the average value is obtained. In this experiment, the data of moisture content of the coal stack surface as well as the meteorological statistics from on-site weather stations were sampled hourly during June to August 2020, October to December 2020, and January to April 2021 at Huanghua Port coal stockpile in Hebei Province, China (As shown in Figs. 4 and 117°48'46 "E, 38°18'52 "N). The coal moisture content was obtained by the drying method. The drying was done by maintaining a constant temperature of 80 °C in a vacuum for 120 min. Real-time meteorological data were obtained from the nearest weather monitoring stations distributed within the coal yard to the stacks. The meteorological data contained four characteristics: temperature, air humidity, air pressure, and wind speed. We integrated the moisture content data and meteorological data by time consistency, and eliminated the anomalous data, and finally retained 4350 valid data records. The valid data were divided into sequences using the sliding window method, and a total of 2004 sequences of length 5 were generated. The sample data contained five characteristics: temperature, air humidity, air pressure, wind speed, and moisture content at the previous moment. The current moment moisture content was used as the label. By comparing other deep learning application cases, the amount of data obtained in this experiment can provide reliable support for the training of prediction models for a single coal species [22] . Finally, the ratio of number of training set samples to that of test set samples was set to 5:1. Partial sample data are shown in Table 1 . Table 2 shows the minimum (Min), maximum (Max), mean (Mean), and standard deviation (SD) of each variable of the dataset, including training set and test set. Because the data was collected in distinct seasons, the variation is large in meteorological data, especially for temperature and humidity. The observed values of coal moisture content and the fluctuation values are small, in the range of [0.90%, 27.00%]. The parameter settings of the ABRM for this experiment are shown in Table 3 . According to the parameter setting of ABRM, we can know the operation process of the model as follows: the input training dataset is a three-dimensional data vector (batchsize, 5, 5) , in which the parameters in parentheses represent batch size, time step and feature size, respectively. First, the data enter a 1D convolutional layer with a residual structure to extract features and obtain a 3D output vector (batchsize, 5, 128), in which the number of output channels is 128. Next, the vector enters the Then, the output vector enters the Bi-LSTM layer and the attention layer for training, and the output vector (batchsize, 32) is obtained, where 32 is the hidden size of the Bi-LSTM. Finally, the output vector of Bi-LSTM is interpreted by the fully connected layer to obtain the output value. Dropout was added to the LSTM layer to prevent overfitting [30] . Training was completed for 10,000 epochs, and the learning rate decayed to 1/10 of the previous rate for every 2,500 epochs that passed. The training environment of the model was a graphics workstation, which was configured with CPU: Intel (R) Xeon (R) CPU 4210R*2@3.20 GHz, GPU: Nvidia Geforce RTX3080 and RAM: 64GB. The Anaconda platform was adopted as the basic platform for deep learning training, pytorch (version 1.6.0) was the deep learning framework, CUDA technology was used to realize parallel computing. The Python version was 3.7. In order to validate and evaluate the performance of the model, five evaluation metrics are used in this paper: the mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R-squared, R 2 ), and mean absolute percentage error (MAPE). These evaluation metrics have been widely used in prediction tasks. The formulae of these metrics are expressed as follows: Here, N is the sample size of the test set. y Pred is the predicted value, and y is the true value. Among the above indicators, the closer the value of R 2 is to 1, the better discriminating the model is. The closer the values of MAE, RMSE, MAE, and MAPE are to 0, the higher the prediction accuracy. In this section, six models, SVM, RNN, LSTM, CNN, A-LSTM (attention-based LSTM), and simple CNN-LSTM (without residual unit and attention mechanism), were selected for comparison and analysis with the proposed ABRM. The selected models include traditional ML models, common recurrent/convolutional neural network models, and combined models of neural networks. These have been applied to various sequence data prediction tasks and have achieved satisfactory results. We trained each model with the same training dataset and parameters. After the training, the models were tested using the same test dataset. The prediction accuracy of each model for coal moisture content was evaluated by comparing the predicted values with the observed values for the next 1 h. Table 4 shows the results of prediction accuracy comparison between the ABRM and other models. In this experiment, the R 2 and MAE scores are particularly important to indicate the performance of the model. R 2 is used to detect the degree of fit of each prediction model to the observed moisture content of the samples; MAE reflects the deviation of the observed moisture content of all individual samples from the arithmetic mean in the experiment. MAE intuitively reflects the magnitude of the actual prediction error. The results in Table 4 show that the R 2 and MAE scores of the ABRM model proposed in this paper are 0.9971 and 0.0812 (boldfaced in Table 4 ), respectively, which are significantly better than the performance metrics of other models. In addition, the performance of MSE, RMSE and MAPE metrics of ABRM also outperformed other models, reaching 0.0528, 0.2299 and 1.6117, respectively. Figure 5 demonstrates the predicted and observed values of coal moisture content for model. For better comparison, we label the R 2 scores of each model in red text. It is apparent from this figure that the fitting ability of the model improves to different degrees as the complexity of the model increases. The R 2 score of both the simple CNN-LSTM model and ABRM in this paper exceeds 0.99. The ABRM proposed in this paper performs relatively better, with an R 2 score of 0.9971 and achieving a satisfactory fitting effect (Fig. 5(g) ). It can be seen that the feature extraction capability of the CNN and the sequence prediction capability of LSTM are fully utilized in the coal moisture content prediction task. It is worth noting that in Fig. 5(g) , there are 2 obvious anomalies, which appear at 4% and 14%, respectively, also in (a) to (f). By querying the dataset, it was found that these 2 anomalies were caused by sudden changes in meteorological parameters due to the occurrence of short-term strong convective weather at the experimental site. This indicates that the model proposed in this paper is less capable of handling such changes. However, the model is able to capture the smoother meteorological changes very well. In other words, in most cases, the predicted values of moisture content output by the model can be trusted and guide industrial production. In addition, we conducted a comparative analysis of the absolute errors of the predicted values of each model and plotted the error distribution in Fig. 6 . Table 4 summarizes the comparison results of prediction accuracy between the ABRM and other models. The range of moisture content is only [0.9,27] and the SD is only 4.57, that is, the absolute error range of [-2,2] does not meet the requirements of long-term guiding production. Comparatively, ABRM can control most of the errors within the range of [-0.5,0.5], and the prediction performance is more superior. Figure 7 further illustrates the predicted values of the ABRM on the test set compared to actual values. Different sequences of testing data and their corresponding prediction curves are put in the same plot. In Fig. 7 , the majority of ABRM prediction curves marked in green highly overlap with real observation marked in red, showing good generalization capability of the model. In summary, we find that the ABRM constructed in this paper has relatively optimal prediction performance. The model structure ensures that the neural network correctly understands the sequence data and extracts key features, while structurally suppressing the overfitting issue to ensure the network's robustness against perturbations. In this paper, an ABRM model structure for predicting the moisture content of coal in the surface layer of coal stacks is proposed. The model combines ResNet based on one-dimensional convolution and Bi-LSTM based on attention mechanism. It fuses the ResNet feature extraction ability and the time series feature memory ability of Bi-LSTM. The sample features extracted by ResNet are used as the input sequence of LSTM for learning to obtain the final prediction results. The experimental results show that the ABRM model proposed in this paper outperforms common regression prediction models in terms of prediction accuracy and convergence rate. The proposed model scores better than other models in all evaluation metrics. Excellent performance is achieved in predicting the moisture content of coal in the surface layer of coal stacks. The high precision coal moisture content prediction value provides an important reference for the coal storage and transportation management system of coal storage base, which eases the management work for dust-suppression sprinkling and spontaneous combustion prevention. Future work can incorporate light intensity, coal type, etc. into the data feature set to improve the prediction stability of the model. It is meaningful to develop a multi-task learning algorithm to further predict the change in coal moisture content for several hours by predicting the change in meteorological parameters for several hours in the future. In addition, this method can be applied to most natural minerals, such as sand and dirt whose moisture content is influenced by the environment. It is hoped that interested scholars will verify it in future experiments. The data used to support the finding of this study are available from the corresponding author upon request. How to handles the crisis of coal industry in China under the vision of carbon neutrality Intelligent and ecological coal mining as well as clean utilization technology in China: Review and prospects Microwave Technique for the On-Line Determination of Moisture in Coal Soft sensing of coal moisture Mechainsm and methods of Coal Moisture Measurement Based on Microwave Transmission Method Rapid detection of the total moisture content of coal fine by low-field nuclear magnetic resonance Intelligent Intraoperative Haptic-AR Navigation for COVID-19 Lung Biopsy Using Deep Hybrid Model Deep air quality forecasting using hybrid deep learning framework Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era Multiple contextual cues integrated trajectory prediction for autonomous driving Daily traffic flow forecasting through a contextual convolutional recurrent neural network modeling interand intra-day traffic patterns Modeling and theoretical analysis of GNSS-R soil moisture retrieval based on the random forest and support vector machine learning approach New empirical equation to estimate the soil moisture content based on thermal properties using machine learning techniques Deep learning Artificial neural network modeling for predicting wood moisture content in high frequency vacuum drying process Combination of LF-NMR and BP-ANN to monitor water states of typical fruits and vegetables during microwave vacuum drying Soil moisture quantity prediction using optimized neural supported model for sustainable agricultural applications The value of SMAP for long-term soil moisture estimation with the help of deep learning Assessment of a spatiotemporal deep learning approach for soil moisture prediction and filling the gaps in between soil moisture observations. Front A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction How moisture loss affects coal porosity and permeability during gas recovery in wet reservoirs? A hybrid CNN-GRU model for predicting soil moisture in maize root zone Deep Residual Learning for Image Recognition Variational autoencoder bidirectional long and short-term memory neural network soft-sensor model based on batch training strategy Neural machine translation by jointly learning to align and translate Deep sentiment classification and topic discovery on Novel Coronavirus or COVID-19 online discussions: NLP Using LSTM recurrent neural network approach ABCDM: An attention-based bidirectional CNN-RNN Deep Model for sentiment analysis Lightweight attention convolutional neural network for retinal vessel image segmentation Attention is all you need Dropout: a simple way to prevent neural networks from overfitting