key: cord-0631070-0a64e0b4 authors: Fki, Zeineb; Ammar, Boudour; Ayed, Mounir Ben title: Towards automated optimisation of residual convolutional neural networks for electrocardiogram classification date: 2021-12-11 journal: nan DOI: nan sha: fa7089d54154800626faa40c2294008297771b03 doc_id: 631070 cord_uid: 0a64e0b4 The interpretation of the electrocardiogram (ECG) gives clinical information and helps in assessing heart function. There are distinct ECG patterns associated with a specific class of arrythmia. The convolutional neural network is currently one of the most commonly employed deep learning algorithms for ECG processing. However, deep learning models require many hyperparameters to tune. Selecting an optimal or best hyperparameter for the convolutional neural network algorithm is a highly challenging task. Often, we end up tuning the model manually with different possible ranges of values until a best fit model is obtained. Automatic hyperparameters tuning using Bayesian optimisation (BO) and evolutionary algorithms can provide an effective solution to current labour-intensive manual configuration approaches. In this paper, we propose to optimise the Residual one Dimensional Convolutional Neural Network model (R-1D-CNN) at two levels. At the first level, a residual convolutional layer and one-dimensional convolutional neural layers are trained to learn patient-specific ECG features over which multilayer perceptron layers can learn to produce the final class vectors of each input. This level is manual and aims to lower the search space. The second level is automatic and based on our proposed BO-based algorithm. Our proposed optimised R-1D-CNN architecture is evaluated on two publicly available ECG Datasets. Comparative experimental results demonstrate that our BO-based algorithm achieves an optimal rate of 99.95%, while the baseline model achieves 99.70% for the MIT-BIH database. Moreover, experiments demonstrate that the proposed architecture fine-tuned with BO achieves a higher accuracy than the other proposed architectures. Our optimised architecture achieves excellent results compared to previous works on benchmark datasets. The ECG is non-invasive electrical recording of the heart. The signal provides a useful information about heart health and can tell more about individuals such as gender, age, biometry and emotion recognition. Researchers have explored this peripheral physiological signal to extract useful markers for future outcome research. Several research works have been achieved in ECG analysis. Challenges have been raised to provide an accurate ECG beats classification. In recent years, Deep Learning (DL) or Deep Neural Network (DNN) is becoming increasingly an important research area. Typically, a DL is a neural network that contains an input layer, successive intermediate layers, namely hidden layers and an output layer. The structure of DL tries to emulate the structure of human brain given a bulky dataset, fast enough processors and a sophisticated algorithm by constructing layers of artificial neurons that can receive and transmit information. DL offers accurate results with more training data. It is useful for unstructured data. Complex problems can be solved with a greater number of hidden layers. This structure makes possible to continuously adjust and make inferences. It outperforms the traditional machine learning in several applications such as electrocardiogram (ECG) [1] and Electroencephalography (EEG) [2] classification, and more recently industry 4.0 [3] and COVID-19 detection [4] [5] . Recent researches provide many successful algorithms in deep learning. The Convolutional Neural Network (with the acronyms CNN) is currently one of the commonly employed deep learning algorithms for image recognition including detection of anomalies on ECG. Ebrahimi et al. [6] revealed that the CNN is dominantly found as the appropriate technique for feature extraction, observed in 52% of the studies about explainable deep learning methods for ECG arrhythmia classification. The idea of CNN comes from the biological visual cortex. The cortex consists of small regions that are sensitive to the specific areas of the visual field. Similarly, the CNN builds on small regions inside of an object that perform specific tasks. The algorithm is a hierarchical neural network. It gets the input and processes it through a series of hidden layers. The One-Dimension Convolutional Neural Network (1D-CNN) is a distinguished variant of CNN. It is typically used for time series input with one direction x that represents the time axis. While they have achieved excellent results in working with a variety of hard problems [7] [8] , the CNNs are usually exposed to overfitting or underfitting problems. Hence, the model fails to predict the output of unseen data or even the output of the training data. In fact, the noise introduced to the input signal slows the learning process. Various types of artifacts could lead to noisy ECG signals such as baseline wander, drift, powerline interference and muscle artifacts. The noisy signals lead to produce high false alarm rates thus the misclassification of ECG beats and misdiagnosis of cardiac arrhythmias. In addition, the ECG signals are non-stationary. Furthermore, the high level of hyperparameters especially in fully connected layers makes the network prone to overfitting. Several previous works have proposed methods to boost classification results based on CNN hyperparameters and regularization. Regularizing the network structure or designing specific training schemes for stable and robust prediction is considered among the hottest topic for efficient and robust pattern recognition in deep learning [9] . A complex model may achieve a high performance on training data since all the inherent relations in seen data are memorized. However, the model is usually unable to perform well for unseen data including validation and test data. In order to solve this issue, different regularization methods were applied in the literature. Xu et al. [10] proposed SparseConnect to alleviate overfitting by sparsifying regularization on dense layers of CNNs. However, they raise, furthermore, the complexity of the model which in turn put the model harder to optimise. In our work, we choose to randomly dropping a few nodes. Unlike conventional methods of tuning based on manual tries to choose the best hyperparameter value, our work proposes to use BO to select an optimal configuration of dropout rate and the number of convolutional layers. In our proposal, two-level process has been established for building a robust Residual 1D-CNN (R-1D-CNN). The level one has the potential of reducing the search space of hyperparameters. The second level allows to test some configurations of the model. The innovative contributions associated with this work can be described as follows: 1. We build a novel R-1D-CNN architecture to detect features of ECG automatically. The proposed architecture presents good performance. 2. To solve the overfitting issue and give robust classification results in real-time through automatic hyperparamters tuning, we develop an algorithm based on BO. 3. We further explore two datasets for experimental study. Our proposal outperforms another technique of optimisation and all the previous works in ECG classification, which displays the performance of our proposed architecture. The rest of this paper is organized as follows: In Section 2, we outline a short background of the CNN and BO technique. In section 3, the proposed architecture is detailed. The demonstration and performance of the proposed architecture are indicated in Section 4. At last, Section 5 concludes the paper. Much of the current research on deep learning has focused on improving and validating existing deep learning algorithms rather developing new algorithms. The CNN is one of the commonly employed deep learning algorithm. A CNN learns different level of abstraction about an input. CNN performs well in image processing, including image recognition and image classification thanks to its hierarchical layers. The hierarchical property allows to increasingly learning a complex model.For instance, the model learns in the first time the basic elements, then it learns later their parts. Another advantage of CNN is the automatic extraction of feature and it requires minimal pre-processing [11] . The input of CNN is an array of pixels in the format of H xW xD where H= Height, W=Width and D=dimension. The H xW constitute the feature map and D is the depth. A grey image of size 32x32 pixels is represented by an array 32X32X1 while an RGB image of the same size is represented by an array of 32X32X3. The structure may include convolutional layers hence its name, pooling layers, Rectified Linear Unit (ReLU) layers and fully connected layers. • Convolutional layer: the main layer of CNN. It consists of a set of filters that exploits the local spatial correlation assuming that near pixels are more correlated than distant pixels. The size of the filter defines the size of each feature map and its depth defines the number of feature maps. All local regions share the same weights called weight sharing. Mathematically saying, a convolution acts as a mixer, mixing two functions to obtain a reduced data space while preserving the information. The model involves training a multilayer architecture without the explicit need of handcrafted input features and is able to extract automatically the features such as edge, blur and sharpen. It helps to remove noise. • Pooling layer: common use is the max-pooling, which implements a sliding window. The maxpooling operation slides over the layer and takes the maximum of each region with a step of stride vertically and horizontally. • ReLU: is a non-linear activation function. It performs a threshold operation. The output takes the same value as the input for the positive values and zero otherwise. The function is used by default for many deep learning algorithms since it performs well and avoids vanishing problem. • Fully connected or dense layer: in a fully connected layer, every neuron is connected to every neuron in the next layer. A model may contain one or more fully connected layer. The dense layer can be the last layer for the classification. Based on the output, different convolutional dimensions can be used. 1D-CNN is typically used for time series input with one direction x that represents the time axis. Common uses of 1D-CNN are proposed for ECG data classification and anomaly detection [12] . The output shape is one dimension. 2D-CNN performs well for image recognition and classification as the input is an image is of 2 dimensions. The output shape is 2 dimensions. The convolution is calculated based on two directions (x,y). With increasing number of dimensions, CNN 3D applies a three dimensions filter. The filter moves in three directions (x, y, z). The model is helpful in drug discovery [13] . The effective use of machine learning algorithms is associated with hyperparameters tuning. The hyperparameters adjust the model to a specific database and avoid ongoing training costs. To get up speed on hyperparameters tuning, BO can be used. The technique is based on Bayes' theorem [14] to select the best configuration of hyperparameter values. The Bayes' theorem consists of calculating the conditional probability of an event. Bayes'theorem uses prior probability distributions to be able to produce posterior probabilities. Prior probability could be the probability of an event before new knowledge is collected. The probability of A conditional on B is defined as 1. Mathematically saying, the algorithm is interested in solving equation 2: This optimisation method takes into account the problem of noise present in the evaluations of equation (3) Where f is a black box and expensive to evaluate. Starting from default parameters e.g. parameter ranges that are used in the literature, the performance evaluation calculated using a numeric score or cost such as the accuracy rate. The aim is selecting a best configuration that maximize or minimize the cost. The best result achieved by a couple of hyperparameters would be used to construct the tuned model. Hence, the hyperparameters are assigned. For more details about hyperparameter optimisation for machine learning models based on BO, please see [15] . The proposed classifier is based on CNN algorithm. The first setting of hyperparameters is done manually. The process is iterative to accomplish an acceptable rate of accuracy. We add layers and nodes to the model gradually. The increasing layer number made the manual optimisation harder. This configuration is given to the optimisation algorithms as the default parameters and runs as the first iteration. By optimising the neural network loss, the smoothing parameters are optimised to perform the prediction task. A novel R-1D-CNN architecture is presented in our work. The optimisation method is described below. Our proposed CNN at level one was created with 41 plain hidden layers, the first five layers consisting of three convolutional layers, one maxpooling layer and a dropout layer. The block of five layers is repeated seven times with different filter size. The final block of layers consists of three convolutional layers connected to max pooling layer which is followed by another three dense layers. As the network becomes more and more deeper, residual connection [16] is introduced. Hence a new level of depth is appeared. Deep residual learning comes with the benefit of solving the issue involved with vanishing/exploding gradients as well as degradation problem. Residual network achieves this by employing skip connections, or shortcuts to leap across a number of layers. The number of residual blocks in our architecture is fixed to one. The skip connection is located in the position displayed as a residual block in Figure 1 . This new architecture allows an efficient training by including skip connections. To enhance the architecture performance and avoid overfitting issue, we choose to use the Bayesian optimiser. The last is made using Bayesian inference and Gaussian process (GP). This approach is an appropriate algorithm to optimise hyperparameters of classification. By choosing which variables to optimise, and specifying the ranges to search in, the algorithm selects the optimal values. The GP is a well known surrogate model for BO employed for approximating the objective function. It performs well in small dimensional spaces specifically when the number of features meets five features. Table 1 illustrates the selecting hyperparameters, their type and their ranges. A deep learning model is constructed according to the first level, and the most likely point to be maximized by acquisition function is identified. Some hyperparameters that are very responsive to changes are chosen at first level such as learning rate. BO enables fine-tuning of the model through the regularization of the penalty and determining the optimal number of layers. We choose dropout layer as a technique of regularization. We used the python and its data science library to implement our algorithms. We implemented the algorithms using the Keras of TensorFlow library version 2.5 on a Tesla P100 GPU and 25 GB that are provided by Google Colaboratory Notebooks. The training set contains 70% of randomly selected beats and the rest is divided into test and validation set. Each set contains 50 % of the remained beats. Our proposed model is trained on two publicly available datasets: (1) The MIT-BIH dataset [17] that includes 48 ECG recordings of 30 minutes duration of 47 subjects and 250 sampling rate. Each record is annotated by specialists and can be utilized as ground truth for training, validation and test. The collected data is preprocessed. We build a new dataset that consists of 82813 segments. For beat segmentation, we consider a fixed window multiple of frequency. The raw ECG signal doesn't require any pre-filtering technique or feature extraction step as used in traditional machine learning algorithms. The database is relatively noise free. Furthermore, the CNN is robust to the noise and features are automatically extracted during the learning process. (2) The second dataset is 10,000 ECG patients dataset [18] . This dataset consists of 10646 subjects of 10 seconds duration and 500 Hz sampling rate. The performance of our proposed model was evaluated using seven experiments. The first experiment is used to build an architecture to fit MIT-BIH dataset and is achieved 99.70%. The figures 2 and 3 illustrate the accuracy and loss obtained during the training phase. The model runs 100 epochs with early stopped enabled. While the size of the input vector and the number of the hidden layers is large, the model converges in an extremely small time (8 epochs). However, the gap between the validation and the training is significant. At level two, we introduce the BO. A form of pseudocode is written to provide precise descriptions of what BO does. The pseudo-code is presented in Algorithm 1. The performance of the algorithm has increased. The BO produced an improvement right after the 13 th iteration. The numerical experiments are showing that the resulting accuracy for the optimisation with a finite budget outperforms the accuracy of the baseline model. Finally, we build the optimised model for test. The figures 4 and 5 illustrate the accuracy and loss obtained during the training phase. The training accuracy and training loss are respectively close to the validation accuracy and validation loss at second level. The confusion matrix is displayed in Figure 6 . Typically, the diagonal elements present the rate of items that are well predicted. Off-diagonal items are mislabelled. The proposed optimised R-1D-CNN properly predicted ECG signals of five distinct classes with a high accuracy of 99.95%. By reviewing the individual Both of our architectures at level one and level two present novelties. We demonstrate that the proposed In our experiments, we exploit the R-1D-CNN to classify the ECG signal of two databases. In this paper we address optimisation challenges for the R-1D-CNN model and propose a novel architecture for ECG analysis. In addition, we develop an algorithm based on BO to produce robust classification results in real time through automatic hyperparamters tuning. Comparative experimental results performed on two publicly available ECG Datasets demonstrate that the our BO-based algorithm can outperform the state-of-art approaches. The BO achieves for instance an optimal rate of 99.95%, while the baseline model achieves 99.70% for the MIT-BIH database. In future, we plan to test the algorithms on other databases, especially for dialysis applications. We will also will introduce others type of layers and classifiers to manage and optimise the complexity of the network. • Funding: The research leading to these results has received funding from the Ministry of Higher Education and Scientific Research of Tunisia under the grant agreement number LR11ES48. • Disclosure of potential conflicts of interest: The authors declare no conflict of interest. • Ethics approval: This article does not contain any studies with human participants or animals performed by any of the authors. • Informed consent: Not applicable • Consent to participate: Not applicable • Consent for publication: Not applicable • Availability of data and materials: The ECG signals are obtained from the MIT-BIH arrhythmia database and 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. All the databases are public and available online: Link to MIT-BIH arrhythmia database: https://physionet.org/content/mitdb/1.0.0/ The original publication is referenced by [17] . Link to the 12-lead electrocardiogram database: https://doi.org/10.6084/m9.figshare.c.4560497.v2 The original publication is referenced by [18] . • Authors' contributions: Zeineb Fki: developed the method, collected the data, performed the experiments and drafted the manuscript. Boudour Ammar: revised the software, interpreted the results and supervised the project. Mounir Ben Ayed:conceived the study and supervised the project. All authors read and approved the final manuscript. Accurate detection of atrial fibrillation from 12-lead ecg using deep neural network Unsupervised learning in reservoir computing for eeg-based emotion recognition Deep learning-based visual control assistant for assembly in industry 4.0. Computers in Industry Role of deep learning in early detection of covid-19: Scoping review A novel multi-stage residual feature fusion network for detection of covid-19 in chest x-ray images A review on deep learning methods for ecg arrhythmia classification End-to-end sleep staging using convolutional neural network in raw single-channel eeg Classification of non-small cell lung cancer using onedimensional convolutional neural network. Expert Systems with Applications Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments Overfitting remedy by sparsifying regularization on fully-connected layers of cnns. Neurocomputing A new algorithm for sar image target recognition based on an improved deep convolutional neural network Onedimensional convolutional neural network-based active feature extraction for fault detection and diagnosis of industrial processes and its understanding via visualization Deep learning in drug design: Protein-ligand binding affinity prediction Bayes' theorem and naive bayes classifier Automatic tuning of hyperparameters using bayesian optimization. Evolving Systems, 3, 2020 Deep residual learning for image recognition The impact of the mit-bih arrhythmia database A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. Scientific Data, 7, 2020 A new approach for arrhythmia classification using deep coded features and lstm networks Muhammad Naufal Rachmatullah, Jannes Effendi, Firdaus Firdaus, and Bambang Tutuko. Electrocardiogram signal classification for automated delineation using bidirectional long short-term memory Heart disease detection using deep learning methods from imbalanced ecg samples Exploring deep features and ecg attributes to detect cardiac rhythm classes. Knowledge-Based Systems Inter-patient arrhythmia classification with improved deep residual convolutional neural network Application of convolutional neural networks featuring bayesian optimization for landslide susceptibility assessment. Catena, 186, 2020 Designing a lightweight 1d convolutional neural network with bayesian optimization for wheel flat detection using carbody accelerations An ensemble one dimensional convolutional neural network with bayesian optimization for environmental sound classification