key: cord-0043162-f16d1zov
authors: Qiu, Xi; Liang, Shen; Zhang, Yanchun
title: Simultaneous ECG Heartbeat Segmentation and Classification with Feature Fusion and Long Term Context Dependencies
date: 2020-04-17
journal: Advances in Knowledge Discovery and Data Mining
DOI: 10.1007/978-3-030-47436-2_28
sha: c1592129b251efc1e31bb644e826ef6a9dfc779f
doc_id: 43162
cord_uid: f16d1zov

Arrhythmia detection by classifying ECG heartbeats is an important research topic for healthcare. Recently, deep learning models have been increasingly applied to ECG classification. Among them, most methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification. However, this methodology has two drawbacks. First, explicit heartbeat segmentation can undermine model simplicity and compactness. Second, beat-wise classification risks losing inter-heartbeat context information that can be useful to achieving high classification performance. Addressing these drawbacks, we propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. Compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. Moreover, our model is more context-aware, for it takes into account the relationship between heartbeats. To achieve simultaneous segmentation and classification, we present a Faster R-CNN based model that has been customized to handle ECG data. To characterize inter-heartbeat context information, we exploit inverted residual blocks and a novel feature fusion subroutine that combines average pooling with max-pooling. Extensive experiments on the well-known MIT-BIH database indicate that our method can achieve competitive results for ECG segmentation and classification.

Arrhythmia occurs when the heart rhythms are irregular, which can lead to serious organ damage. Arrhythmias can be caused by high blood pressure, heart diseases, etc [1] . Electrocardiogram (ECG) is one of the most popular tools for arrhythmia diagnosis. To manually handle long ECG recordings with thousands of heartbeats, clinicians have to determine the class of each heartbeat to detect arrhythmias, which is highly costly. Therefore, great efforts have been made to create computer-aided diagnosis tools that can detect irregular heartbeats automatically. In recent years, deep learning models have been gradually applied to ECG classification. Among them, most methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification (see Sect. 2). The preprocessing step removes various kinds of noise from raw signals, the heartbeat segmentation step identifies individual heartbeats, and the beat-wise classification step classifies each heartbeat. This methodology has the following drawbacks: First, explicit heartbeat segmentation can undermine model simplicity and compactness. Traditional heartbeat segmentation methods explicitly extract ECG features for QRS detection. Since deep learning methods can produce feature maps from raw data, heartbeat segmentation can be simultaneously conducted with classification with a single neural network. Second, beat-wise classification uses isolated heartbeats, which risks losing inter-heartbeat context information that can be useful to boosting classification performance.

Addressing these drawbacks, we propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. Compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. The difference between our model and existing deep learning models is shown in Fig. 1 . As is shown, our model takes in a 1-D ECG sequence and outputs both the segmented heartbeats and their corresponding labels. Besides, our model is more context-aware, for it takes into account the relationship between heartbeats.

To achieve simultaneous segmentation and classification, we present a Faster R-CNN [2] based model that has been customized to handle ECG sequences. To capture inter-heartbeat context information, we exploit inverted residual blocks [3] to produce multi-scale feature maps, which are then fused by a novel feature fusion mechanism to learn inter-heartbeat context information. Moreover, the semantic information and morphological information are explored from the fused features to improve performance.

Our main contributions are as follows:

-We propose a novel deep learning model for simultaneous heartbeat segmentation and classification. -We present a novel Faster R-CNN based model that has been customized to handle ECG data. -We use inverted residual blocks and a novel feature fusion subroutine to exploit long term inter-heartbeat dependencies for context awareness. -We conduct extensive experiments on the well-known MIT-BIH database [4, 5] to demonstrate the effectiveness of our model.

The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 presents our model. Section 4 reports the experimental results. Section 5 concludes this paper.

Traditional arrhythmia detection methods extract handcrafted features from ECG data, such as R-R intervals [6, 7] , ECG morphology [6] , frequency [8] , etc. Classifiers such as Linear Discriminant Analysis models [6] , Support Vector Machines [7] and Random Forests [9] are then built upon these features.

In recent years, many researchers turn to deep neural networks for heartbeat classification. The majority of deep learning models take raw signals as their input, omitting explicit feature extraction and selection steps. In [10] , Kiranyaz et al. proposed a patient-specific 1-D CNN for ECG classification. In [11] , Yildirim et al. designed a deep LTSM network with wavelet-based layers for heartbeat classification. Some methods [12, 13] combine LSTM with CNN.

These aforementioned deep learning methods work in three steps: preprocessing, heartbeat segmentation and beat-wise classification. They do not explicitly utilize context information among heartbeats. By contrast, Mousavi et al. [14] proposed a sequence-to-sequence LSTM based model which maps a sequence of heartbeats in time order to a sequence of labels, where context dependencies are captured by the cell state of the network. Hannun et al. [15] proposed a 33-layer neural network for arrhythmia detection which maps ECG data to a sequence of labels. However, the precise regions of arrhythmias cannot be obtained. Oh et al. [16] used a modified U-net to identify regions of heartbeats and the background from raw signals, yet this method needs extra steps to detect arrhythmias from the generated annotation. Several researches applies Faster R-CNN to ECG analysis. For example, Ji et al. [17] proposed a heartbeat classification framework based on Faster R-CNN. 1-D heartbeats extracted from original signals are converted to images as the input of the model. Sophisticated preprocessing is required before classification. He et al. [18] and Yu et al. [19] use Faster R-CNN to perform heartbeat segmentation and QRS complex detection. In our method, we present a modified Faster R-CNN for arrhythmia detection which works in only two steps: preprocessing, and simultaneous heartbeat segmentation and classification.

The architecture of our model is shown in Fig. 2 , which takes 1-D ECG sequence as its input and conducts heartbeat segmentation and classification simultaneously. To achieve this, our model consists of 6 modules: a backbone network, a region proposal network (RPN), a region classification network (RCN), a filter block, a down sampling block and a region pooling block. The backbone network produces multi-scale feature maps from the ECG signal. The modules in the upper part of Fig. 2 performs heartbeat segmentation, while ones in the lower part performs heartbeat classification. We now elaborate on the details of our method.

The preprocessing step removes noise from raw signals. Here we employ a three-order Butterworth bandpass filter with a frequency range of 0.27 Hz-45 Hz because this range contains the main components of ECG signals [20] .

The backbone network generates multi-scale semantic and morphological feature maps from raw ECG signals. For efficiency, we choose the inverted residual block [3] as the building block. We customize it for ECG data in the following manner: (1) We increase the kernel size from 3 to 5 to enlarge the receptive field. (2) The activation function is replaced by ELU for less information loss. (3) The residual connection is added to the building block in stride 2 condition. There are two branches in stride 2 condition (Fig. 3) . (4) Stride 2 convolution is replaced by max-pooling to make the model more lightweight. There are 6 layers in our backbone. Each layer is composed of several building blocks (Fig. 3) . Besides, each layer downsamples the feature map by a factor of two. Different from most deep learning methods which compute feature maps for a single heartbeat, our backbone model takes a long ECG sequence as its input. The produced feature maps encode not only morphological and semantic information of individual heartbeats, but also context information amongst multiple heartbeats. The bottom layers of the backbone generate feature maps of strong morphological information while the top layers produce feature maps of strong semantic information [21] . Moreover, the receptive field increases from bottom layers to top layers, thus the feature maps encode involve inter-heartbeat context dependencies with varying differences in time.

We fuse multi-scale feature maps to utilize both morphological and semantic information of the heartbeats in segmentation and classification. Besides, the fused feature maps can provide more context information. All feature maps except those in the first two layers are used for efficiency.

In the segmentation task, the feature maps need to be normalized to have equal dimensions by the downsampling block before fusion. Feature maps are downsampled to a fixed length by a novel mechanism shown in Eq. 1, which is a trade-off between performance and complexity. Contrary to convolution based down sampling methods [21] , our down sampling block is parameter-free. Average pooling is exploited for less information loss during downsampling, while max-pooling highlights the discriminative features in the feature maps.

The RPN fuses the feature maps from the downsampling block and performs heartbeat segmentation. Here we directly segment the heartbeats without QRS detection. As is shown in Fig. 4 (a) , RPN has two branches performing regression (top) and classification (bottom). The classification branch produces a binary label for each region indicating whether it contains a single heartbeat. The regression branch produces endpoints of each region which encloses a heartbeat. Intuitively, the regression task is far more difficult than binary classification for RPN, thus we use multi-size convolutional layers (3, 5, 7 in this paper) to further extract features before regression. Following the practice of Faster R-CNN [2] , at each position of a feature map, we pre-define three reference regions. These regions have different sizes (128, 192, 256 for ECG heartbeats). To predict a region, we use the center of one of the three regions as the reference point and report its offset to this reference.

However, the regions obtained by RPN can overlap with nearby regions, undermining the efficiency of the model. In response, we use non-maximum suppression (NMS) to filter these regions in the filter block. NMS selects a region with max confidence in each iteration and then compute the overlaps between each remaining region and the selected one. The regions whose overlap exceeds a pre-set threshold (30% in this paper) are discarded, so are the regions containing no heartbeats (confidence below 0.5). 

In the heartbeat classification task, the region pooling block generates heartbeat feature maps for the predicted heartbeat regions [2] . Feature maps in the last four layers (with strides 8, 16, 32, 64) are reused to extract heartbeat features. Because these feature maps have different sizes, the predicted regions are mapped as: Region = (start/stride, end/stride). Moreover, each region is divided into fixed-size sub-regions. Heartbeat feature maps are produced by average pooling on each sub-region. To keep sufficient morphological information, heartbeat feature maps in the bottom layers have larger sizes (8, 4, 2, 1 for strides 8, 16, 32, 64).

Heartbeat feature maps are then fed into the region classification network (RCN, Fig. 4 (b) ) to classify heartbeats inside each region. RCN performs heartbeat classification by fusing the feature maps from the region pooling block.

Note that we do not fine-tune the regions as in Faster R-CNN because it trades efficiency for only minor improvements in accuracy.

Following common practices in the detection task [2, 21] , our backbone network is initialized with a pre-trained network. We extract heartbeats from the experimental database (to be discussed later) and pre-train the backbone network with extra layers on these heartbeats. Then, the last few layers are removed while the remain ones are used as the backbone network.

We coarsely annotate the groundtruth heartbeat region for each heartbeat which ranges from 0.25 s-0.83 s around the R peak so as to most of heartbeat. Since our model can capture inter-heartbeat context information, finer annotation is not necessary. The offsets of reference regions to the groundtruth ones [2] are used to train the RPN regression branch. To train the classification branch, positive labels are assigned a predicted region when the following criteria are met: (1) Its overlap with a groundtruth heartbeat region is over 0.7. (2) It has the highest overlap with a groundtruth heartbeat region. In RCN training, the label of a predict region is assigned to the heartbeat inside it. We use Jaccard distance as the metric for overlap computation.

Similar to [2] , our entire training process has two steps: 1) train the RPN with regression loss and binary classification loss. 2) train the RCN with multiclassification loss. For better performance, we choose Smooth L1 loss (Eq. 2) to train the regression branch and Focal loss (Eq. 3) to train RCN and RPN for classification.

where p t is the estimated probability for the class t. We set γ to 2, and α t to [0. 25, 1] for binary classification and [1, 0.5, 1, 0.5, 1] for multi-classification.

We implemented our model using PyTorch 1 . Our source code is available 2 for reproducibility. The experiments were run on a Linux server with an Intel Xeon W-2145 CPU @3.7 GHz, 64 GB memory and a single NVIDIA 1080Ti GPU. Adadelta was used as the optimizer with weight decay 1e −5 . The learning rate was set to 0.15 for training, decaying every 10 epochs exponentially as follows: lr = 0.3 * 0.9 epoch/50 . The batch size was set to 240 for both training and testing. We used data from the well-known MIT-BIH database [4, 5] , which contains 48 half-hour two-lead ECG recordings. We used the MLII lead in the experiments and excluded 4 recordings with paced beats, following the ANSI/AAMI EC57 standard [22] . Due to limited computational resources, we divided each recording into a series of long sequences with 3600 data points. The first and last 10 s of each recording were discarded. Note that our model can process much longer ECG recordings with abundant computational resources. We run the experiments 5 times, randomly dividing the dataset into training, validation and test sets for each run. The training set contained 70% of all data. The evaluation and test sets included 10% and 20% of all data. The heartbeat labels were mapped into 5 groups by ANSI/AAMI standard, namely N, S, V, F, Q (see Table 1 ). We did not take the Q class into consideration because of its scarcity. We applied the following metrics for evaluation: positive predictive value (PPV), sensitivity (SEN), specificity (SPE) and accuracy (ACC).

To evaluate heartbeat segmentation performance, we define truth positive (TP) as: 1) A predicted region contains only one heartbeat and 2) its non-overlapping area with the groundtruth is less than 150 ms. We define false positive (FP) is as: 1) A predict region encloses more than two heartbeats or 2) its non-overlapping area with the groundtruth is exceeds 150 ms. We define false negative (FN) as: A groundtruth heartbeat is not enclosed by any predicted region.

For baselines, we used two QRS detection based heartbeat segmentation methods: Pan-Tompkins [23] and Wavedet [24] . The results are shown in Table 2 . As is shown, our method is highly competitive against the baselines. It is worth noting that unlike the baselines, our model does not apply QRS detection for segmentation, thus there may be inconsistencies on the definitions of TP, FP and FN between our model and the baselines. However, it is safe to say that our model performs well enough to be applied in real-world scenarios. 

We now evaluate the heartbeat classification performance. The baselines come from [13, 14, 25, 26] . Here we applied SMOTE [27] for data augmentation as it was also used in our baselines. Figure 5 shows the results. Our model achieves an accuracy of 99.6%, a sensitivity of 99.75% and a specificity of 99.6%. These results are similar to those obtained by [14] which used a LSTM-based sequenceto-sequence model to learn context information. The difference between our work and [14] is that we learn context information from raw signals while [14] did so using a sequence of individual heartbeats. Besides, the LSTM-based model has lower efficiency. Compared to other baselines which perform classification on individual heartbeat, our model has a simpler model structure but achieves similar or better results on some metrics, highlighting the power of contextawareness.

We now investigate the impact of the key design features of our model. For better evaluation, we did not use SMOTE [27] here.

To better capture long term dependencies, we have enlarged the receptive field to retain context information. Also, our model captures inter-heartbeat dependencies by learning multi-scale feature maps with RPN and RCN. To demonstrate the effectiveness of these design choices, we conducted the following ablation tests: 1) setting the kernel size to 3 for all convolution filters, 2) using the feature maps in the top layers only, 3) using the feature maps in the bottom layers only, 4) equalize the output sizes of multi-scale feature maps in the region pooling block. Figure 6 and Table 3 presents the results in the segmentation and classification tasks. As is shown, while the effectiveness of these design features in the segmentation task is limited, they are indeed beneficial to the classification task. Figure 7 and Table 4 show the results. By modifying the backbone, our model can learn more strong features to improve performance. Our downsampling method also outperforms using only max-pooling or average pooling.

Our model has more parameters in the regression branch of RPN based on the intuition that regression is more difficult than binary classification in RPN. To evaluate this design, we conducted the following ablation tests: 1) enlarging the classification branch. 2) simplifying the regression branch. Table 5 presents the results. As is shown, simplifying the regression branch has negative impact on performance, while enlarging the classification branch brings about no improvement. 

In this paper, we have propose a novel deep learning model that can simultaneously conduct heartbeat segmentation and classification. Compared to existing methods, our model is more compact as it does not require explicit heartbeat segmentation. Moreover, our model is context-aware by using feature fusion and long term context dependencies. In the future, we plan to extend our model to multi-lead ECG analysis tasks.

Cardiac Arrhythmia: Mechanisms, Diagnosis, and Management

Faster R-CNN: Towards real-time object detection with region proposal networks

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

The impact of the MIT-BIH arrhythmia database

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals

Automatic classification of heartbeats using ECG morphology and heartbeat interval features

Classification of electrocardiogram signals with support vector machines and particle swarm optimization

ECG beat classification using PCA, LDA, ICA and discrete wavelet transform

Medical decision support system for diagnosis of heart arrhythmia using DWT and random forests classifier

Real-time patient-specific ECG classification by 1-D convolutional neural networks

A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification

Cardiac arrhythmia detection from ECG combining convolutional and long short-term memory networks

A LSTM and CNN based assemble neural network framework for arrhythmias classification

Inter-and intra-patient ECG heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network

Automated beat-wise arrhythmia diagnosis using modified U-net on extended electrocardiographic recordings with heterogeneous arrhythmia types

Electrocardiogram classification based on faster regions with convolutional neural network

A deep learning method for heartbeat detection in ECG image

QRS detection and measurement method of ECG paper based on convolutional neural networks

Frequency content and characteristics of ventricular conduction

Feature pyramid networks for object detection

ANSI/AAMI EC57: 2012-Testing and Reporting Performance Results of Cardiac Rhythm and ST Segment Measurement Algorithms

A real-time QRS detection algorithm

A wavelet-based ECG delineator: evaluation on standard databases

Arrhythmias classification by integrating stacked bidirectional LSTM and two-dimensional CNN

A selective ensemble learning framework for ECG-based heartbeat classification with imbalanced data

SMOTE: synthetic minority over-sampling technique

Acknowledgement. This work is funded by NSFC Grant 61672161 and Dongguan Innovative Research Team Program 2018607201008. We sincerely thank Prof Chun Liang and Dr Zhiqing He from Department of Cardiology, Shanghai Changzheng hospital for their valuable advice.