key: cord-0453117-0nbrm03u
authors: Li, Yuenan; Tian, Xin; Zhu, Qiang; Wu, Min
title: A Lightweight Neural Network for Inferring ECG and Diagnosing Cardiovascular Diseases from PPG
date: 2020-12-09
journal: nan
DOI: nan
sha: 698d9ab950b4c251df6956a79c4f9a34459e1593
doc_id: 453117
cord_uid: 0nbrm03u

The prevalence of smart devices has extended cardiac monitoring beyond the hospital setting. It is currently possible to obtain instant Electrocardiogram (ECG) test anywhere by tapping a built-in bio-sensor of a smartwatch with a hand. However, such user participation is infeasible for long-term continuous cardiac monitoring in order to capture the intermittent and asymptomatic abnormalities of the heart that short-term ECG tests often miss. In this paper, we present a computational solution for automated and continuous cardiac monitoring. A neural network is designed to jointly infer ECG and diagnose cardiovascular diseases (CVDs) from photoplethysmogram (PPG). PPG measures the variations of blood volume driven by heartbeats, and the signal can be sensed at the wrist or finger via an optical sensor. To minimize the memory consumption on mobile devices, we devise a model compression scheme for the proposed architecture. For higher trustworthiness and transparency, this study also addresses the problem of model interpretation. We analyze the latent connection between PPG and ECG as well as the CVDs-related features of PPG learned by the neural network, aiming at obtaining clinical insights from data. The quantitative comparison with prior methods on a benchmark dataset shows that our algorithm can make more accurate ECG inference. It achieves an average $F_1$ score of 0.96 in diagnosing major CVDs.

C ARDIOVASCULAR diseases (CVDs) are the most prevalent causes of mortality. According to the statistics in [1] , one person dies from CVDs every 37 seconds in the United States. Early treatment can effectively reduce the risk of sudden cardiac death. However, some CVDs, such as heart muscle dysfunction, show no obvious symptoms in the early stage. The presence of symptoms usually indicates the onset of heart failure. A study conducted in the aged population shows that around one third to one half of the heart attacks are clinically unrecognized [2] . The unawareness of diseases makes some patients lose the opportunities of receiving early medical intervention.

Electrocardiogram (ECG) is a non-invasive gold standard for diagnosing CVDs. The patients at higher risks, such as the aging population, can benefit from continuous ECG monitoring. Among the currently available options for continuous ECG monitoring, the Holter monitor is bulky to wear; newer devices attached to the chest with adhesives, such as Zio Patch, are lightweight, but the prolonged use of adhesives with multi-day monitoring may increase the risk for skin irritations, especially for persons with sensitive skins. These patch type of sensors may slide or fall off under excessive sweating. Recent technical advances have integrated bio-sensors into the smart wearables designed for long-term use. For example, taking the crown and back crystal as electrodes, Apple Watch allows users to take ECG tests for up to 30s at a time from the wrist by tapping the crown, so asymptomatic and intermittent cardiac could be missed, while continuous user participation by keeping his/her hand on the sensor is impractical. It is desirable that smart wearables can continuously monitor cardiac conditions without any user participation.

Attempts have been made towards this goal by resorting to optical sensors and computational tools. The pilot study in [3] explored the possibility of inferring ECG from photoplethysmography (PPG). PPG manifests the oscillation of blood volume caused by the movements of heart muscle. The signal can be sensed by an optical sensor attached to the wrist or finger, without a user to be consciously participating all the time. Since PPG carries useful vital signs, miniaturized PPG sensor has become an integral part of smart wearables. ECG monitoring based on PPG sensor can eliminate the need for re-designing bio-sensors and for a user's continuous action to carry out the sensing, and the Apps can be seamlessly integrated into existing devices.

As a low-cost alternative to ECG recorder, PPG based inference of ECG can mitigate the shortage of medical devices during public health crisis. A recent guidance of the European Society of Cardiology (ESC) recommends using mobile device enabled ECG recording to cope with the surge in demand for ECG recorders during the COVID-19 pandemic 1 . Moreover, this initiative can also facilitate home-centered health management and reduce the unnecessary hospital visits of chronic cardiac patients who are among the most vulnerable population for the COVID-19 virus.

The heart pumps blood into the vessels through orderly contraction and relaxation, and the movements of heart muscle are driven by an electrical stimulus. As a result, the dynamics of blood flow is coupled with the transmission of electrical stimulus throughout the heart, so PPG and ECG represent the same physiological process in different signal spaces. Previous studies validate that the vital signs derived from PPG and ECG show strong agreement [4] . In this work, we leverage deep learning to simultaneously infer ECG and diagnose CVDs from PPG, aiming to achieve low-cost, user-friendly, and interpretable continuous cardiac monitoring. As a clinical application of deep learning, this work also addresses the issue of model interpretation. We analyze the input-output behaviors of neural network in both tasks. The contributions of this work are summarized as follows:

1) We propose a multi-task and multi-scale deep architecture for inferring ECG and diagnosing CVDs. To address the scarcity of synchronized PPG and ECG pairs, we formulate ECG inference as a semi-supervised domain translation problem and train neural network to learn the PPG-to-ECG mapping from partially paired data.

2) We study the interpretability of the deep learning based cardiac monitoring. More specifically, we quantify the perpoint contribution of PPG to the two tasks and explain how the morphology of PPG affects the network's outputs.

3) To facilitate mobile cardiac monitoring, we develop a lightweight variant of the proposed architecture. By pruning insignificant parameters and using recursive layers, the lightweight network can achieve comparable performance as the full network while saving about 78% of parameters.

The research on PPG based ECG inference is still in its infancy, and a few prior studies have been dedicated to this problem. The pilot study in [3] proves the feasibility of generating ECG waveforms from PPG sensor via computational approach, going beyond the previous capability of mainly estimating the parameters of ECG from PPG [5] . This pioneering work translates PPG to ECG in the Discrete Cosine Transform (DCT) domain using linear regression. A recent work of Tian et al. casts PPG-to-ECG mapping as a cross-domain sparse coding problem [6] . The algorithm simultaneously learns the dictionaries of PPG and ECG as well as a linear transform for domain translation. The encouraging performance highlights the potential of data-driven approaches in tackling this inverse problem. The dictionary learning algorithm in [6] handles input signals globally, so the learned atoms represent the holistic morphologies of PPG and ECG. Since each heartbeat is composed of a sequence of short-term actions, data-driven approaches are expected to be sensitive to the fine-granular characteristics of waveforms. This motivates us to leverage deep convolutional architecture to model the multi-scale correlation between ECG and PPG and discover the cues for diagnosing CVDs.

Deep learning has been successfully applied to cardiac signal processing and demonstrated impressive performance in many tasks, such as automated PPG and ECG interpretation [7] - [10] , artifacts removal [11] , waveform synthesis [12] , and vital sign measurement [13] , [14] . Hannun at al. trained a deep neural network to classify 12 kinds of arrhythmia from singlelead ECG and have achieved cardiologist-level accuracy [7] .

The work in [9] used deep learning to monitor the aggravation of heart diseases, where a neural network was trained to identify the pathological changes in ECG. To improve the accuracy of patient-specific CVDs diagnosis, Golany et al. developed a generative adversarial network (GAN) for synthesizing the ECG waveforms of a patient [12] . Deep learning also eases the measurement of vital signs. The study in [13] demonstrates that blood pressure can be inferred from PPG using a deep belief network, making it possible to monitor continuous blood pressure in a cuffless way.

ECG measures the electrical impulse generated by the depolarization and re-polarization of heart muscle cells, and these activities are triggered by an electrical stimulus. The stimulus originates from the sinoatrial node, which is known as the pacemaker of the heart, and it also coordinates the extraction and relaxation of heart muscle. The stimulus first triggers the depolarization of the two upper chambers (i.e., atria), resulting in the P-wave on ECG. Following the depolarization, the atria muscle contracts and pumps blood into the two bottom chambers (i.e., ventricles). The electrical stimulus then transmits to the ventricles through the conducting pathway, and the depolarization of ventricles generates the QRS complex on ECG. As the ventricles contract, blood is ejected out of the heart and flows to the vessels. The increase of blood volume in the vessels gives rise to an ascending slope on PPG. Then the ventricles start to relax, and the T-wave on ECG depicts this phase. At the final stage of a heartbeat, both the atria and ventricles relax, and the pressure within the heart drops rapidly. As a result, blood flows back from the vessels towards the atria, which is represented as a descending slope on PPG.

The ECG and PPG sequences are pre-processed using the procedures in [3] . We take the moment when the ventricles contract as the anchor point for PPG-ECG synchronization, where the onset points of PPG are aligned to the R-peaks of ECG. The detrending algorithm in [3] is then applied on aligned sequences to eliminate the slow-varying trends introduced by breathing, motion, etc. The detrended sequences are partitioned into cycles. Each cycle starts at an onset point of PPG or a R-peak of ECG, as shown in Fig.1 . The PPG and ECG cycles are then interpolated to length L as P ∈ R L and E ∈ R L , respectively.

The neural network follows an encoder-decoder architecture. The decoder has two branches, one for inferring ECG and the other for diagnosing CVDs. Since the cardiac events within a heartbeat are of different durations, to capture the correlation between the mechanical and electrical activities of these events, neural network needs to explore the signal spaces of PPG and ECG at diverse scales. We design a multi-scale feature extraction module (FEM) and take it as the encoder's backbone. The architecture of FEM is illustrated in Fig.1 . The FEMs are appended at the end of the first convolutional layer one after another. Without loss of generality, let us denote the input to an FEM by X, then the output is computed as:

where C 1 (·) and C 2 (·) are the two 1D-convolutional layers,

[·] is the concatenation operation along the channel direction. C 1 (·) first uses small-size kernels to analyze the shorttime variation of X. We leverage the combination effect of C 2 • C 1 (·) to expand the receptive fields of feature extraction. The concatenated feature map Y encodes the temporal characteristics of PPG detected at two different scales. The cascade of multiple FEMs progressively increases the scale of feature extraction and forms a contracting (or down-sampling) pathway in the feature space. The decoder forms an expanding (or up-sampling) pathway, where the bottle-neck feature codes learned from PPG are gradually interpolated to ECG via feature transform modules (FTM). Similar to FEM, FTM also adopts the same multiscale fusion architecture, while it uses transposed-convolution to increase the resolution of feature map (see Fig.1 ). The feedforward path formed by the cascade of FEMs and FTMs is not sufficient to guarantee the quality of output ECG. Although stacking FEMs helps to detect the abstract and high-level features of PPG, the down-sampling effect attenuates the fine details of the input, while PPG's short-term variation contains important cues for inferring ECG. To compensate for the loss of high-resolution features, we bridge the encoder and decoder by an attention gate. As Fig.1 shows, the feature map learned by the first convolutional layer, which has the highest resolution, is weighted by the attention gate before fusing with the feature map at the decoder. Take the i-th channel for instance, feature fusion is conducted as:

where F 1 ∈ R C×V and F T ∈ R C×V are the feature maps output by the first convolutional layer and the last FTM (see Fig.1 ), respectively. C is the number of channels, and V is the length of the feature vector in each channel. F is used for inferring ECG and diagnosing CVDs, and {α i,j |i, j = 1, · · · , C} are the weights learned by the attention gate. The attention gate takes F 1 and F T as inputs. The two channels in F 1 and F T with strong correlation probably associate with the same cardiac event, so channel correlation is a key factor for assigning weights. The attention gate first computes the channel-wise correlation coefficients between F 1 and F T , giving rise to the matrix G ∈ [0, 1] C×C :

The weights for feature fusion are learned from G using a softmax layer:

where Φ = G · Θ, and Θ ∈ R C×C are learnable parameters. Finally, ECG is generated by computing the transposedconvolution between the channels of F and kernels:

where * represents the transposed-convolution operator, and K[i] is the i-th 1D-kernel. Eq.(5) actually forms a C-channel representation of ECG. For better interpretability, it is desirable for the neural network to separately synthesize the P-wave, QRS complex, and T-wave of an ECG cycle from different channels of F . Since these channels are also used for diagnosing CVDs, disentangled representation can reflect the connection between CVDs and ECG sub-waves, making it easier to understand the decision rules learned by the neural network. We encourage the network to make localized and sparse representation of ECG. The feature map F is divided into non-overlapping groups along the row and column directions, respectively, and we use the group sparsity ( 1 / 2 norm) [15] , so it constrains the number of active kernels involved in synthesizing each sub-wave. In this way, the convolutional kernels K[i] (i = 1, · · · , C) are forced to represent the intrinsic structures of ECG sub-waves. Similarly, the group sparsity constraint is also imposed on the feature map of PPG learned by the first convolutional layer. In summary, the sparsity constraint can be expressed as:

As will be discussed later, the sparsity constraint also allows us to identify trivial kernels and compress the network. The diagnosis branch accepts the sparse feature map F as input. Some abnormal patterns of ECG are strong indicators of CVDs. For example, the elevation of the ST segment indicates a high risk of myocardial ischemia. Since our training algorithm forces the channels of F to separately depict the morphologies of different sub-waves, to emphasize the informative ones, we incorporate a channel-wise attention gate into the diagnosis branch. Similar to [16] , channel weights are computed from the statistics of each channel, including mean, variance, maximum, and minimum, using a three-layer fully-connected network. The attention gate outputs a weight vector w ∈ [0, 1] C , and each channel of F is scaled by the corresponding weight as F [i, :]w[i], (i = 1, · · · , C). The re-calibrated feature map is fed to a classifier (composed of three convolutional layers and a fully-connected network with softmax output) to infer the probabilities of different kinds of CVDs.

Taking into account the quality of inferred ECG, the accuracy of CVDs diagnosis, and the sparsity of features maps, the training loss can be formulated as:

whereÊ and E are the inferred and ground-truth ECG cycles respectively, p ∈ [0, 1] N represents the estimated probabilities of N kinds of CVDs, l is the one-hot vector indicating the ground-truth disease label, λ D and λ S are weights. We use the cross entropy loss to measure the discrepancy between p and l.

The training loss in (7) requires the supervision of groundtruth ECG. However, simultaneously recorded ECG and PPG sequences only account for a tiny amount of available data.

For instance, the long-term PPG recordings of a user can be read out from a smartwatch, while the reference ECG data may not be available. Likewise, a patient wearing a Holter may not simultaneously record PPG data. When paired training examples are scarce, neural network may bias to the few structural correspondences between ECG and PPG covered by the training set. It is natural to expect that the training algorithm can exploit the information in the plentiful unpaired ECG and PPG data. As highly structured signals, PPG and ECG approximately reside on two manifolds with lower dimensions than the signal spaces. The unpaired data carry rich information about the two manifolds, making full use of which allows neural network to capture the structural priors of PPG and ECG. In this section, we extend the above training method to a semi-supervised setting.

Given a set of paired examples, besides the PPG-to-ECG mapping G P →E (·), the aforementioned architecture can also be trained to map ECG to PPG [denoted by G E→P (·)]. In the ideal case, G E→P (·) should be the inverse of G P →E (·), and vice versa. Similar to [17] , we use the consistency loss to regularize the two mappings. For an unpaired PPG cycle P , sequentially applying G P →E (·) and G E→P (·) on P should bring the signal back to its original state, giving rises to the following loss:

Similarly, given an unpaired ECG cycle E, we have:

We apply (8) and (9) on unpaired examples. Unlike the cycle-GAN [17] , this work does not use discriminators to regularize G P →E (·) and G E→P (·). We find that adversarial training does not bring performance improvement in this problem but increases training complexity. PPG and ECG are of less variation than image, and the inferred waveforms are of high quality and seldom deviate far away from the manifolds. Hence, the regularization effects of discriminators are not obvious.

Most continuous health monitoring applications are deployed on mobile devices. To accommodate the limited memory resource of mobile devices, we develop a lightweight variant of the multi-task architecture by leveraging parameter re-usage and pruning strategies.

We compress the neural network by removing its redundancies in both architecture and parameters. Architectural redundancy exists in the cascade of the modules with the same architecture. For both FEM and FTM, if we require the input and output of an arbitrary module to have the same dimension, the feed-forward computation defined by R cascaded modules can be simplified by the R-depth recursion of one module [18] :

where M (·) represents the module (either FEM and FTM). Take FEM for example, (10) is equivalent to repeatedly applying a fixed feature extractor M (·) on the input for R times. In this case, the basic module is used to extract both low-level and high-level features from X, so the convolutional kernels need to cover the representative patterns of the input at different levels. Since the patterns of PPG and ECG are relatively monotonous, recursion does not noticeably degrade the expressive power of the network. The two convolutional layers at the two ends of the ECG inference pipeline are compressed via parameter pruning. Like the atoms in sparse coding, the kernels are trained to extract PPG features and generate ECG, respectively. Due to the sparsity constraints, a few active kernels play dominant roles in each layer, so the norm of a channel in feature map reflects the significance of the corresponding kernel. It is safe to remove the inactive kernels whose feature channels constantly show small norms on different inputs. The significance of a kernel can also be quantified by the attention weight assigned to the corresponding channel. As mentioned in Section III-C, each channel of F 1 receives a weight for feature fusion, and each channel of F receives a weight for diagnosing CVDs. Hence, we take feature norm and attention weight as the criteria for kernel pruning. Take the i-th kernel at the ECG generation layer for example, its significance score is computed as:

where E[·] represents the expectation operator, and λ w > 0 balances the two criteria. To identify the trivial kernels, we first pre-train the full network for several epochs and compute the significance score of each kernel. For both layers, only half of the kernels with the highest significance scores are preserved, and then the pruned network is fine-tuned on the same training set.

For fair comparison, experiments were conducted on the same training and testing sets as [3] and [6] . The data in Folder 35 of the Medical Information Mart for Intensive Care III (MIMIC-III) database [19] with both lead-II ECG and PPG waveforms were selected, and the signals were recorded at 125Hz. MIMIC-III was chosen as the benchmark dataset for its richness of waveforms and CVD types, detailed diagnostic results from patients, public availability, and the real-world nature. A key challenge in ECG inference is to represent pathological patterns, such as inverted QRS. MIMIC-III has a full coverage of diverse pathological patterns related to major CVDs, and noisy data were intentionally preserved to reflect real-world healthcare settings. Per our best knowledge, other publicly available datasets do not have comparable size and richness of ECG patterns and CVD types. The waveforms were screened using the signal quality assessment function in the PhysioNet Cardiovascular Signal Toolbox [20] , and those labeled as "unacceptable (Q)" were discarded. The dataset contains 34,243 pairs of PPG and ECG cycles and covers the following CVDs: congestive heart failure (CHF), myocardial infarction (MI), including ST-elevated (ST-MI) and non-STelevated (NST-MI), hypotension (HYPO), and coronary artery disease (CAD). The detailed composition of the dataset is listed in Table I . The dataset was split into training (80%) and testing (20%) cohorts. The proposed algorithm was compared with two pieces of prior work on PPG-to-ECG mapping, which are the DCT and linear regression based [3] and the crossdomain joint dictionary learning based (XDJDL) [6] . Since there is no neural network based prior art, we implemented a one-dimensional U-Net [21] and took it an additional baseline. Implementation details are presented in the appendix. 

Following [3] and [6] , we use the Pearson correlation coefficient and the relative Root Mean Squared Error (rRMSE) to evaluate the fidelity of inferred ECG signals:

where µ[·] represents the element-wise mean value of a vector. Table II compares the statistics of the quality scores measured from testing algorithms. The quantitative comparison clearly demonstrates the superiority of data-driven methods. Compared with the generic orthogonal bases of DCT, the convolutional kernels (or sparse coding atoms) learned from data better suit the underlying structures of ECG. In particular, both metrics indicate that the ECG cycles inferred by the proposed algorithm have the highest fidelity. It can faithfully infer the fine detail and abnormal morphology of ECG, such as the elevated ST-segment in Fig.2(b) and the inverted QRS complex in Fig.2(c) . The multi-scale architecture enables the neural network to be sensitive the subtle difference among PPG waveforms. As can be seen from the figure, although the waveforms of PPG are quite similar, the network is to able to represent the distinct morphological difference among ECG waveforms. The diagnostic accuracy of neural network was evaluated at the cycle level. It is worth mentioning that diagnosis can also be made at the sequence level using majority-voting, and it will lead to higher accuracy. As a proof of concept study, this work eliminates the influence of voting on PPG based CVDs diagnosis, so classifications were conducted at the cycle level. For each CVD, we computed the F 1 score by comparing the probability of this disease estimated by the neural network with a threshold sweeping from 0 to 1 with a step size of 5 × 10 −3 . Table III shows the disease-specific and average accuracies. For all the diseases, the multi-task network achieves an F 1 score higher than 0.95. This result demonstrates the feasibility of automated CVDs diagnosis using easily available PPG data. Fig.3 displays the confusion matrix. The major confusion is between MI and CAD. This result is consistent with the pathological bases of the two diseases since both of them reduce the supply of blood to the heart.

A benefit of joint ECG inference and CVDs diagnosis is that the inferred ECG cycles help cardiologists make necessary double-check of the model's prediction, since the manual diagnoses of CVDs are mainly based on ECG. The CVDs detection task forces neural network to be sensitive to the abnormal patterns related to CVDs, so this auxiliary task is beneficial to ECG inference. To examine the mutual influence between the two tasks, we conducted ablation experiments by dropping one task at a time and assessed the performance of the ablated networks. After dropping CVDs detection, the average rRMSE of ECG inference reaches to 0.35. Dropping the ECG inference task also hurts the accuracy of CVDs detection, and the average F 1 score drops to 0.945 from 0.964. 

In this subsection, we attempt to open the black box of deep network by explaining the input-output correlation learned from data. We are curious about the following: 1) How does the neural network infer an ECG sub-wave from the input PPG? 2) Which parts of the input PPG are responsible for the diagnosis made by the neural network? The key to answering the questions is to quantify the per-point contribution of the input PPG to the network's outputs. We adopt the integrated gradient (IG) [22] method to accomplish this task. Let us define by G j (·) : R L → R the mapping from the input PPG P ∈ R L to the j-th dimension of the neural network's output (an ECG point or the probability of a disease). The IG value of P [i] with respect to G j (P ) is:

We used the Riemman sum to approximate the integral. It has been shown in [22] that G j (P ) ≈ L i=1 IG i,j , which is equivalent to breaking down G j (P ) to each dimension of the input PPG. To investigate the correlation between ECG and PPG, we computed the IG values of each PPG point versus all ECG points. The IG values are plotted as a heatmap, where the i-th row visualizes the contributions of the PPG points in synthesizing the i-th ECG point (see the example in Fig.4) . It is obvious from the figure that the PPG points do not contribute equally to this task, and those near the peak have the least contribution. Besides, the distribution of IG values varies across ECG points. For example, when inferring the front part of the ECG cycle, multiple bands in the ascending and descending slopes of PPG show significant contributions (see the red regions at the bottom of the heatmap). While for other parts, the PPG points with large contributions concentrate in one narrow band. Note that the aligned ECG and PPG cycles start at the moment when the heart begins to eject blood into the vessels. The front part of the ECG cycle depicts the contraction of the ventricles. The IG values imply that this event affects both the filling and emptying of blood in the vessels, and it is due to the momentum of fluid. Accordingly, to faithfully synthesize ECG, a model needs to fuse the local features extracted from different parts of the input PPG. This can partially explain the superior performance of neural network over DCT and XDJDL which synthesize ECG using the linear combination of holistic bases (or atoms). We interpret the diagnostic results by attributing the probability of CVD to each PPG point. In Fig.5 , we show the PPG of a subject diagnosed with CAD. CAD is caused by the plaque deposited in the inner walls of the arteries. The PPG points receiving the top 20% most significant IG values are highlighted in red, and the unit normal vectors are plotted to show the local morphology of PPG. We find that the neural network diagnoses CAD mainly based on the following cues: 1) the changing rate of the blood volume at the moments when blood starts to flow out of and back into the heart (see the red segments in the ascending and descending slopes), 2) sudden slowing down of the changing rate (see the inflection point). The reduced supply of blood caused by CAD damages heart muscle, resulting in weak pumping power. The changing rate of blood volume can partially reflect the power of the heart. As mentioned above, several key bands in the two slopes of PPG tell the information about the ventricular contraction. Also, the plaque obstacles the blood flowing to the heart, and the increased fluid resistance can change the morphology of PPG. This can explain why the inflection point, where the second derivative of PPG changes sign, exhibits high significance to the network's decision. We conjecture that this point marks the moment when blood reaches a plaque in the artery.

We also trained the network using the semi-supervised scheme. In the experiment, we only preserved 10% of the PPG-ECG pairs, and the left ones were all decoupled. As can be seen from Table IV , the semi-supervised training scheme is not sensitive to decoupling and can maintain the performance of ECG inference at a reasonable level. The network trained on the partially paired set shows comparable performance as the one trained on the fully paired set. We observe that the PPG inferred by the dual mapping G E→P (·) from unpaired ECG data show strong agreement with the ground-truths, and they can be viewed as the noisy observations of the real PPG. In this sense, G P →E (·) and G E→P (·) benefit each other by augmenting the training set. This is equivalent to making denser sampling of the manifolds of PPG and ECG, which is helpful to modeling the structural variations of ECG and PPG. 

We also examined the efficacy of the network compression scheme. To compress the full network, the kernel pruning algorithm first discarded half of the kernels at the first convolutional layer and the ECG generation layer according to their significance scores. We then replaced the cascaded modules in the full network by 2-depth recursive FEM and FTM, as illustrated in Fig.6 in the appendix, and the pruned network was fine-tuned for 20 epochs. Table V compares the parameter amounts of the full and the compressed networks and their quantitative performance in ECG inference. The compression method can reduce more than 78% of the parameters in the full network. The lightweight network takes up less than 170KB of memory, which eases the deployment on mobile devices, while the reduction of parameters does not incur remarkable performance degradation. For example, the loss in the average correlation score is less than 2%. For all kinds of CVDs, the loss in diagnostic accuracy is also quite minor (see the comparison in Table VI) . 

We have presented a deep learning based approach for userfriendly and continuous cardiac monitoring. The proposed network can capture the correlation between PPG and ECG and detect CVDs by learning from partially paired training examples. Its promising performance validates that the dynamics of blood flow provides essential information about the cardiovascular system. Our model interpretation results demonstrate that the influence of cardiac events on blood flow is highly uneven, and the changing rate of blood flow and its variation are of high diagnostic value. Our future work will focus on enhancing the robustness and generalization of the PPG based cardiac monitoring.

The proposed neural network and the one-dimensional U-Net were implemented in Pytorch. The networks were trained using the Adam optimization algorithm [23] for 40 epochs with parameters β 1 = 0.9, β 2 = 0.999, and = 10 −8 . The initial learning rate was set to 5 × 10 −4 and then decreased to 10 −4 after 20 epochs. Batch size was set to 10. The weights in the objective function are λ D = 0.1 and λ S = 5 × 10 −6 . The criterion for setting these weights is to balance the loss terms. Training the proposed network on a workstation with Intel i7-6850K 3.60GHz CPU, 32GB memory, and 1080Ti GPU took 49 min. Table VII lists the detailed parameter settings of the proposed architecture. We use (N in , N out , K, S) to represent the parameters of a convolutional layer or a transposedconvolutional layer, where N in and N out are the channel numbers of the input and output feature maps, respectively, K is the length of kernel, and S is the stride. Layer normalization [24] is applied to all the convolutional and transposedconvolutional layers except the final ECG generation layer. The encoder and decoder contain two cascaded Feature Extraction Modules (FTM) and Feature Transform Modules (FTM), respectively, as illustrated in Fig.6 (a) .

The lightweight variant of the neural network adopts recursive FEM and FTM, as Fig.6 (b) shows. The parameters of the two convolutional (or transposed-convolution) layers, C 1 (·) and C 2 (·), in a recursive module were set to ensure that the input and output have the same dimension. Table VIII shows the parameter settings of the recursive FEM and FTM. To match the cascaded modules in the full network, the recursive modules use 2-depth recursion. After pruning the kernels at the first convolutional layer and the ECG generation layer of the full network, we replaced the cascaded FEMs and FTMs by the recursive ones and then fine-tuned the network for 20 epochs. The architecture of the U-Net is plotted in Fig.7 . The encoder and decoder are composed of three convolutional and transposed-convolutional layers, respectively. Every two mirrored layers at the encoder and decoder are connected by element-wise summation. The kernel sizes were set to match those of the proposed network, as shown in Table IX . 

Heart disease facts

Layer 1 (1, 60, 30, 1)

Architecture of the U-Net

Incidence of recognized and unrecognized myocardial infarction in men and women aged 55 and older: The rotterdam study

ECG reconstruction via PPG: A pilot study

Assessment of heart rate variability derived from finger-tip photoplethysmography as compared to electrocardiography

PhotoECG: Photoplethysmographyto estimate ECG parameters

Cross-domain joint dictionary learning for ECG reconstruction from PPG

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network

Localization of origins of premature ventricular contraction by means of convolutional neural network from 12-lead ECG

Serial electrocardiography to detect newly emerging or aggravating cardiac pathology: A deep-learning approach

Photoplethysmography and deep learning: Enhancing hypertension risk stratification

Deep learning models for denoising ECG signals

PGANs: Personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification

Innovative continuous non-invasive cuffless blood pressure monitoring based on photoplethysmography technology

Deep PPG: Large-scale heart rate estimation with convolutional neural networks

Model selection and estimation in regression with grouped variables

Squeeze-and-excitation networks

Image-to-image translation with conditional adversarial networks

Deeply-recursive convolutional network for image super-resolution

MIMIC-III, a freely accessible critical care database

An open source benchmarked toolbox for cardiovascular waveform and interval analysis

U-net: Convolutional networks for biomedical image segmentation

Axiomatic attribution for deep networks

Adam: A method for stochastic optimization

Layer normalization