key: cord-0276323-mu013u2w
authors: Goto, S.; Werdich, A. A.; Homilius, M.; John, J. E.; Gan, L.-M.; MacRae, C. A.; DiCarli, M. F.; Deo, R. C.
title: Discovery of cardiac imaging biomarkers by training neural network models across diagnostic modalities
date: 2021-02-08
journal: nan
DOI: 10.1101/2021.02.07.21251025
sha: 7bd08fd8b9ff2909d1babef89b36eb2ad20be607
doc_id: 276323
cord_uid: mu013u2w

Machines can be readily trained to automate medical image interpretation, with the primary goal of replicating human capabilities. Here, we propose an alternative role: using machine learning to discover pragmatic imaging-based biomarkers by interpreting one complex imaging modality via a second, more ubiquitous, lower-cost modality. We applied this strategy to train convolutional neural network models to estimate positron emission tomography (PET)-derived myocardial blood flow (MBF) at rest and with hyperemic stress, and their ratio, coronary flow reserve (CFR), using contemporaneous two-dimensional echocardiography videos as inputs. The resulting parameters, echoAI-restMBF, echoAI-stressMBF, and echoAI-CFR modestly approximated the original values. However, using echocardiograms of 5,393 (derivation) and 5,289 (external validation) patients, we show they sharply stratify individuals according to disease comorbidities and combined with baseline demographics, are strong predictors for heart failure hospitalization (C-statistic derivation: 0.79, 95% confidence interval 0.77-0.81; validation: 0.81, 0.79-0.82) and acute coronary syndrome (C-statistic derivation: 0.77, 0.73-0.80; validation: 0.75, 0.73-0.78). Using echocardiograms of 3,926 genotyped individuals, we estimate narrow-sense heritability of 9.2%, 20.4% and 6.5%, respectively for echoAI-restMBF, echoAI-stressMBF, and echoAI-CFR. MBF indices show inverse genetic correlation with impedance-derived body mass indices, such as fat-free body mass (e.g., {rho}=-0.43, q=0.05 for echoAI-restMBF) and resolve conflicting historical data regarding body mass index and CFR. In terms of diseases, genetic association with ischemic heart disease is seen most prominently for echoAI-stressMBF ({rho}=-0.37, q=2.4x10-03). We hypothesize that interpreting one imaging modality through another represents a type of "information bottleneck", capturing latent features of the original physiologic measurements that have relevance across tissues. Thus, we propose a broader potential role for machine learning algorithms in developing scalable biomarkers that are anchored in known physiology, representative of latent biological factors, and are readily deployable in population health applications.

The automated interpretation of medical images, videos, and signals has been a popular application of machine learning algorithms for decades. Recent examples include detecting breast cancer in mammograms 1 , intracranial hemorrhage in computed tomography scans 2, 3 , and cardiac amyloidosis in electrocardiograms and echocardiograms [4] [5] [6] . These and other prevailing use cases primarily view medical data as a depiction of anatomical structures requiring segmentation or a collection of features enriched in specific disease states. However, with its high information content, medical imaging has the potential to provide a window into underlying latent disease processes, which may have prognostic utility or motivate more precise biologically grounded disease definitions.

For example, coronary flow reserve (CFR) is an intriguing physiological parameter defined as the proportional increase in myocardial blood flow during maximal vasodilatory stress [7] [8] [9] [10] . It is the ratio of the stress myocardial blood flow (stressMBF) to the rest myocardial blood flow (restMBF), and lower values reflect abnormalities in the coronary epicardial vessels and microvasculature. These values have prognostic utility in predicting cardiovascular outcomes [11] [12] [13] [14] and are abnormal in several cardiac and systemic disease states, including dilated cardiomyopathy 15 , heart failure with preserved ejection fraction 16 , osteoporosis 17 , systemic lupus erythematosus 18 , rheumatoid arthritis 18 , and chronic obstructive pulmonary disease 19 . Coronary blood flow indices are unfortunately challenging to measure without invasive coronary angiography or indirectly through magnetic resonance imaging or positron emission tomography (PET), all of which have high associated costs.

CFR has a clear physiological basis, and its association with non-cardiac diseases implies that it reflects more global underlying biologic processes that may be captured by other potentially lower cost modalities. Beyond the pragmatism of developing a more scalable version of these original measurements, building a representation of one modality with another may provide a cleaner readout of the underlying signal free of technical artifacts, reminiscent of the concept of an information bottleneck in machine learning [20] [21] [22] . In this study, we exploited this strategy for biomarker discovery by training convolutional neural networks to estimate rest and stress MBF and CFR using only resting echocardiogram videos as inputs. We deployed the resulting models on thousands of echocardiograms, enabling development and external validation of prognostic models, and investigating their genetic basis.

The echoAI-restMBF values similarly separated patients according to these baseline attributes but with less sharp differences in the prevalence of comorbidities and indices of diminished cardiac and renal performance ( Table 1) . We further validated these results using an external cohort consisting of 5,289 patients from Massachusetts General Hospital (MGH) who had at least one echocardiogram within Feb 2016-Nov 2016 and observed a similar stratification of comorbidities with echoAI measures (Tables S1-S3).

To assess the ability of echoAI measures to predict adverse cardiovascular outcomes, we performed a time-to-event analysis on the same 5,393 patients from BWH ( Figure S4 , Tables S4-S9) for the outcomes of heart failure (HF) hospitalization and acute coronary syndrome (ACS), adjudicated by a machine learning algorithm as described previously 19 . The median follow-up length for this cohort was 4.1 (interquartile range (IQR): 1.7-4.4) years for heart failure and 4.2 (2.1-4.4) Figure S5 and S6). Importantly, the predictive accuracy of echoAI-stressMBF and echoAI-CFR was higher than that of known HF predictors. For example, for HF hospitalization, the AUC for age alone was 0.67, and that of HF history alone was 0.66.

For ACS, these echoAI measures' predictive accuracy was similar to that of known strong predictors such as age and CAD history. We further performed a multivariable Cox proportional hazard model analysis with 9 parameters (the 3 echoAI parameters and 6 baseline characteristics including sex, age, ejection fraction, heart rate, history of CAD, and history of heart failure), which revealed that echoAI-stressMBF and echoAI-CFR were associated with HF hospitalization (hazard ratio per standard deviation change: echoAI-stressMBF=0.78, 95% CI 0.68-0.90, echoAI-CFR=0.71 95% CI 0.63-0.81) and echoAI-stressMBF was associated with ACS (HR: echoAI-stressMBF=0.73, 95% CI 0.56-0.95), independent of other covariates (Figure 3) . The ROC analysis of the output of this 9-parameter Cox proportional hazard model predicted HF hospitalization (C-statistic 0.79 95%CI 0.77-0.88) and ACS (C-statistic 0.77, 0.73-0.80) with high discriminative accuracy (Figure S5B and S6B) and excellent calibration (Figure S5C and S6C) .

We validated this result using the MGH external validation cohort ( Figure S7 , Tables S10-S15).

This cohort's median follow-up time was 3.3 (IQR: 1.4-3.6) years and 3.3 (0.9-3.6) years for heart failure hospitalization and ACS, respectively. As with the BWH cohort, echoAI-stressMBF and echoAI-CFR accurately stratified patients by risk of HF hospitalization and ACS in this cohort ( Figure S8 ). Multivariable Cox proportional hazard models, trained de novo on MGH data, yielded results consistent with findings from the BWH cohort: echoAI-stressMBF and echoAI-CFR both associated with HF hospitalization (HR per standard deviation change: 0.87 95%CI 0.77-0.98 and 0.76 95%CI 0.68-0.84, respectively) and echoAI-stressMBF associated with ACS (HR per standard deviation change: 0.82 95%CI 0.68-0.99) independent of other baseline characteristics (Figures S9) . We further tested the predictive accuracy of the individual echoAI measurements (along with the 6 baseline features) and the combined 9-parameter model trained on BWH data.

In the univariate analysis, predictive accuracy assessed by C-statistics was similar for all 9 parameters (Figures S10A and S11A) . The 9-parameter model performed similarly to the derivation BWH cohort for heart failure hospitalization (Figure S10B, C-statistic 0.81 95% CI 0.79-0.82) and ACS (Figure S11B, C-statistic 0.75 95% CI 0.73-0.78). Furthermore, the models had excellent calibration on the MGH cohort without any adjustment (Figures S10C and S11C)

Given that our echoAI models have a modest input requirement -an echocardiogram with an apical 4-chamber view -they can enable investigations that have been historically challenging because of limited data availability. For example, there have been no systematic large-scale studies to understand the genetic determinants of myocardial blood flow, primarily because the imaging modalities used are invasive or require specialized expertise to acquire. We deployed the echoAI-restMBF, echoAI-stressMBF, and echoAI-CFR on echocardiograms of 3, 926 participants from the Mass General Brigham Biobank, a repository of biomolecular samples combined with clinical data. We used the resulting model outputs as continuous phenotypes and performed a genomewide association study to identify genetic determinants. Although the top SNP for all traits fell below conventional genomewide significance thresholds, we used summary statistics to estimate narrow-sense trait heritability and explore genetic correlations with a diversity of phenotypes, some of which have previously been associated with MBF or CFR. We estimated heritability for echoAI-restMBF, echoAI-stressMBF and echoAI-CFR as 9.2%, 20.4%, and 6.5% respectively and found the MBF traits to be modestly correlated with one another (restMBF/stressMBF r=0.26).

We next focused on understanding whether the genetic determinants of myocardial blood flow indices are shared with any other trait, using summary statistics from the UK Biobank as a source for comparison. Genetic correlations are useful for parameters where phenotypic correlations with the underlying trait of interest may not be feasible, either because the trait has not been measured in the population or it has been modified by treatment. The strongest genetic correlations were seen for echoAI-stressMBF, which also had the highest estimated heritability. echoAI-stressMBF had an inverse genetic correlation (Figure 4) with several traits reflective of impedance-estimated body mass, including whole-body fat-free mass (r=−0.32, q=5.1x10 −05 ) and basal metabolic rate (r=−0.33, q=5.1 x10 −05 ). echoAI-stressMBF also had negative correlation with ischemic heart disease (r=−0.37, q=2.4 x10 −03 ) and C-reactive protein (r=−0.21, q=0.09). Finally, echoAI-stressMBF had a weak positive genetic correlation with heart rate (r=0. 16, q=0.10) .

We observed similar genetic correlations with echoAI-restMBF, including an even stronger correlation with whole body fat-free mass (r=−0.43, q=0.05) and heart rate (r=−0.38, q=0.06). As expected, echoAI-restMBF had a weaker correlation with ischemic heart disease (r=−0. 22, q=0.26) . echoAI-restMBF also had a surprising inverse genetic correlation with standing height (r=−0.35, q=0.04).

Compared with the MBF indices, echoAI-CFR had a weaker correlation with whole body fat-free mass (r=−0.15, q=0.27) but interestingly a stronger negative correlation with body mass index (r=−0.41, q=0.19) and waist circumference (r=−0.37, q=0.20) . echoAI-CFR had a negative but non-significant correlation with ischemic heart disease (r=−0.33, q=0.30), a negative correlation with neutrophil count (r=−0.31, q=0.14), a negative correlation with C-reactive protein (r=−0.37, q=0.19) , and positive correlation with forced vital capacity (r=0. 34, q=0.19) . Larger sample sizes will be needed to estimate these parameters more precisely.

We confirmed several anthropomorphic and physiologic associations at the phenotypic level, using both PET-derived and echoAI indices (Figure 4 ). For height, Spearman correlations were −0.30 −0.26 and 0.0 for PET-restMBF, PET-stressMBF, and PET-CFR and for weight they were −0.14, −0.08, and 0.04, respectively. BMI had no apparent correlation with any of the PET blood flow indices. Resting heart rate was significantly correlated only with and . For the echoAI measures, the Spearman correlations with height were −0.30, −0.25, and −0.13 for echoAI-restMBF, echoAI-stressMBF, and echoAI-CFR, respectively. For weight, Spearman correlations were −0. 24,  and for heart rate, −0.15 −0.22, and −0.08, respectively (Figure 4 ).

The primary, altogether unexpected result of this work is the fact that relatively modest estimators of complex physiologic parameters can be 1) strongly associated with specific comorbidities associated with abnormal coronary flow; 2) powerful predictors of cardiovascular outcomes independent of known risk markers; and 3) have a defined genetic basis that, through genetic correlation analyses, can shed light on latent biological processes that underlie these metrics.

By mapping PET-derived coronary blood flow indices to information derived from a single view of a 2D echocardiogram, we markedly increased the scale of downstream analyses, including longitudinal time-to-event analyses and genetic association analysis. From a workflow standpoint, this mapping of the output of a costly imaging modality to a more readily available one (i.e., echocardiography without Doppler) should also enable broader clinical deployment.

The rationale for this work is a belief in common biological factors or pathways underlying a broader set of disease manifestations. In the case of CFR, the strong association of a coronary blood flow parameter with a diverse group of measurements and disease states, including indices of myocardial function 23 , systemic inflammatory markers 19 , pulmonary disease 19, and bone disease 17 suggest the existence of common latent processes. A parsimonious explanation is that vascular beds across a wide range of tissues have a partially shared susceptibility to internal and external stress and that dysfunction in one bed is reflected in others 24, 25 .

In addition to the pragmatic utility of developing more readily deployable biomarkers, we propose that our strategy of representing one imaging modality's output via another may help estimate such a shared susceptibility. We describe our method as a type of "information bottleneck", consistent with information-theoretic work by Tishby and colleagues, formulated two decades ago 20 , but enjoying recent attention as a possible explanation for the success of deep learning methods 21, 22 . Their work interprets supervised learning as a deliberate process of mapping a complex input signal into a compressed representation but still preserving relevant information about a target. Our application here goes one step further: by placing two modalities (PET and echocardiography) in series, we are driving the underlying physiologic process (i.e., abnormality of coronary blood flow under vasodilatory stress) through an information bottleneck to extract a specific underlying factor that may be partially obscured in the original measurement.

Our genetic correlation work also sheds light on distinctions between stress myocardial blood flow and coronary flow reserve. Previous work has demonstrated independent contributions of these two parameters to cardiovascular outcomes 12 . We found that echoAI-stressMBF had the strongest genetic correlation with ischemic heart disease and a negative correlation with C-reactive protein, a marker of systemic inflammation, in support of prior phenotypic correlation 19, 26 . Among other correlations, the inverse association with rest and stress myocardial blood flow with impedanceestimated fat-free body mass sheds light on previous difficult to reconcile findings. Prior phenotypic association of body mass index with coronary flow reserve and hyperemic blood flow in small groups of patients has been interpreted as the deleterious effects of obesity on endothelial cell function 26 , although body mass index does not unambiguously distinguish between adiposity and muscle mass. Our observed genetic correlation of rest myocardial blood flow with fat-free body mass and standing height (which was further supported by phenotypic data) suggests a complementary hypothesis -there may be competition for blood flow between the heart and remainder of the body: those with greater amounts of metabolically active body mass may have reduced coronary blood flow, independent of any adverse effects of adipose tissue on coronary microcirculation. This observation is in keeping with the previously observed non-monotonic relationship between BMI and CFR 27 , where CFR increases with BMI in the non-obese range but decreases in the obese range. Our results would indicate potential utility in correcting MBF parameters for lean body mass and height to isolate adverse effects of adipose tissue.

The primary limitations of this study are that the PET data were all derived from a single center.

However, the utility of the model for predicting cardiovascular events is validated in an external cohort, which showed good discrimination and calibration, suggesting that the model is generalizable to a different institution. A second limitation is that the sample size for genetic association and correlation analysis, though large enough for multiple insights, is also relatively small to discover trait-associated loci.

In summary, we describe the development of scalable imaging-based biomarkers with surprisingly good discriminative properties for cardiovascular outcomes, and more broadly, a generalizable strategy for the discovery of measures of latent biological factors that may extend across multiple tissues.

Consecutive clinically indicated rest/stress myocardial perfusion PET studies performed at Brigham and Women's Hospital (Boston, MA) between January 1, 2006 and November 30, 2017, with a clinically indicated echocardiogram within 365 days were included. Patients who had undergone coronary artery bypass grafting (CABG) or cardiac transplantation prior to the PET or the echocardiogram study, were excluded. PET studies were performed with either 82 Rubidium or 13 N-ammonia, using computed tomography (CT) for attenuation correction as described previously 28 . Absolute global myocardial blood flow (MBF) was quantified in mL/min/g at rest and at peak hyperemia by a single operator blinded to patient data using a validated two-compartment tracer kinetic model as described previously 29 . Coronary flow reserve (CFR) was calculated as ratio between MBF at stress and MBF at rest. For contemporaneous echocardiograms, we selected all two-dimensional resting echocardiograms that occurred within 365 days of the corresponding PET study.

The video processing involved 3 steps: (1) frame scaling, (2) frame rate adjustment and (3) square padding. For each echocardiography video, frames were scaled in each dimension by the original pixel spacing factors (in mm/pixel) that were extracted from the DICOM headers to account for variations in resolution. All images were subsequently scaled by a universal scaling factor of 1.177 which was chosen so that the largest dimension of all videos used for training, testing and validation was smaller than the square model input size of 299 pixels. This universal scaling made sure that the model input size of 299 by 299 pixels could be achieved for all videos by simple zero padding while keeping the correct feature sizes. Given that dynamic information, such as velocity of the heart walls, are dependent on frame rate we then set the model input frame rate to a uniform value of 30 frames per second (fps) for all the videos. We excluded all the videos with framerates of that were smaller than 30 fps or had lengths of less than 1.33s (40 frames x 30 fps). All qualifying videos were subsampled by the formula: Pn=On•fps/30, where Pn is the n-th frame of the processed video and On is the n-th frame of the original video. When the n•fps/30 term was a noninteger value, it was rounded to the closest integer. Videos were saved together with the corresponding labels (CFR and MBF values) as 3-D arrays in the TensorFlow TFRecord serialized format to be used by input pipeline. During training, images and labels were re-shuffled before each epoch, distributed across all 8 GPUs and finally zero-padded to the required square model input size of 299 by 299 pixels without further scaling.

We used TensorFlow (version 2.2.0) for implementation and deployment of the neural network models for myocardial blood flow indices. Given that the echocardiograms were time-series of multiple frames, we constructed a 3D-CNN model treating the temporal axis as the 3rd dimension, to maximize the ability of the model to use dynamic features. It consisted of 3 layers of 3D-CNN followed by 12 layers with Multi-3D-CNN-modules, which was constructed by 3 parallel multilayer 3D-CNNs and a max pooling operation concatenated at the end of the module (schematic shown in Figure S12 ). The model was trained using data from the training cohort ( Figure S1 ). The echocardiography videos were labeled with the PET-measured CFR or MBF value on the study level and the model was trained to minimize the mean-squared-error between model prediction and the label using an RMSprop optimizer with an initial learning rate of 0.0001. The model was trained for 300 epochs with a batch size of 64 videos distributed across 8 NVIDIA Tesla V100 SXM2 32GB GPUs. At the end of each epoch, the Spearman correlation coefficient was calculated using the validation set. The final models were chosen as the models with the highest correlation coefficient on the validation cohort across all 300 epochs. The code to implement the models along with instructions for installation/usage are provided in a public repository https://github.com/obi-ml-public/echoAI-PET-measurements.

The Massachusetts General Brigham Electronic Data Warehouse (EDW), which includes notes from >20 provider locations across a large integrated delivery network, was the source of notes and structured clinical data used in this study. All information in the electronic health record in the EDW system was gathered including free text medical notes and discharge summaries, and outcome events were adjudicated using a machine learning based model using the discharge summaries as input as described previously 30 . Briefly, after manually labeling 1,372 training, 592 validation and 1,003 test set discharge summaries, we trained neural network models for event adjudication for heart failure hospitalization (test set AUROC 0.97) and ACS (test set AUROC 0.98). We selected classification thresholds to maximize sensitivity at specificity over 0.90 (HF: cutoff 0.12, sensitivity 0.94, specificity 0.91. ACS: cutoff 0.04, sensitivity 0.91, specificity 0.91).

The discharge summary classification models were deployed on 44,150 notes and a binary event status was determined by each model for each note.

To test if the echoAI measurements can predict clinical outcomes, survival analysis was performed using patients who had echocardiography within Jun 2015 and Nov 2015 at BWH. The first echocardiography study for each patient was identified and the echoAI models were ran on those videos to obtain the corresponding measurements. Follow up for each patient was . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 determined using hospital and clinic encounters within the system. Patients were excluded if they had (1) no notes in the system, (2) less than 30 days follow up or (3) had an event during the first 30 days. To avoid the possible bias caused by the influence of COVID-19, all the patients were censored at Feb 2019. An external validation cohort was selected with similar procedure from patients who had at least one echo cardiogram from MGH ( Figure S7 ). Since MGH started to enter notes from Feb 2016, the inclusion period was set to Feb 2016 to Nov 2016 for MGH.

Receiver operating characteristics (ROC) curves for the echoAI measurements along with ejection fractions, heart rate and 4 patient demographic variables (age, sex, heart failure history and CAD history) were calculated using timeROC package version 0.4. A multivariable Cox proportional hazard (PH) model including all 9 parameters were trained using survival package (version 3.2.3) and the ROC curve for the model output was calculated with the timeROC package.

The individual hazard ratios for the multivariable Cox PH model were analyzed to assess the adjusted association of each parameter to the outcome. For external validation, two separate Cox PH models were used. To assess the ability of the 9-parameter model to predict outcomes, the original Cox PH models trained on BWH data were used without any weight adjustment. However, to analyze the individual adjusted hazard ratios for the echoAI measures along with 6 baseline characteristics information, we trained separate Cox PH models with the same input parameters on MGH data.

Genomewide SNP array data is available for over 36,000 individuals enrolled in the Massachusetts General Brigham Biobank 31 , a collection of plasma, serum, and DNA samples of consented subjects with associated linkage to their electronic health records. We downloaded echocardiograms for 4,400 of these patients and were able to compute echoAI-CFR and echoAI-stressMBF on 4,145. To minimize confounding by population structure, we focused our genetic association analyses on 3,925 white study subjects. We performed a genomewide association study using age, sex, and the first 10 principal components as covariates using PLINK 2.0 32 . Summary statistics were used to estimate narrow-sense heritability with the help of the R HDL package 33 . Genetic correlation with a prioritized selection of traits from the UK Biobank was also performed using summary statistics provided by the Neale Lab (http://www.nealelab.is/uk-biobank/). Genetic correlation p-values for UK Biobank traits were adjusted for multiple hypothesis testing using the R qvalue package 34 . We set a false-discovery threshold for "significance" at 0.20, though any such threshold is arbitrary and should be driven by downstream decisions related to the corresponding finding.

Data were collected and stored using Numpy package version 1.19.2 with Python 3.7.3. The scatter plots and ROC curves are plotted using the ggplot2 35 package (R 3.6.1). Continuous values are presented as mean ± standard deviation (SD) and categorical values are presented as numbers and percentages if not otherwise specified.

This study complies with all ethical regulations and guidelines. The study protocol was approved by local institutional review board (IRB) of Mass General Brigham (2019P002651). This study had minimal patient risk: it collected data retrospectively, there was no direct contact with patients, and data were collected after medical care was completed. Thus, and to recruit an unbiased and representative cohort of patients, data were collected under a waiver of informed consent, which was approved by the IRB.

The data that support the findings of this study are available on request from the corresponding author R.C.D. upon approval of the data sharing committees. The data are not publicly available due to the presence of information that could compromise research participant privacy.

The code to run the models along with instructions for installation/usage are provided in a public repository https://github.com/obi-ml-public/echoAI-PET-measurements. The model weights may . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ;  contain patient personal information and thus, could not be shared. We provide a web-interface to run our model and generate predictions at http://onebraveideaml.org

This work was supported by One Brave Idea, co-founded by the American Heart Association and Verily with significant support from AstraZeneca and pillar support from Quest Diagnostics (to CAM and RCD), NIH/NHLBI HL140731 (to RCD) SENSHIN Medical Research Foundation (to SG), the Kanae foundation for the promotion of medical science (to SG), the Uehara Memorial Foundation (to SG), SG acknowledge support from Mower fellowship.

R.C.D is supported by grants from the National Institute of Health, the American Heart Association (One Brave Idea, Apple Heart and Movement Study) and GE Healthcare, has received consulting fees from Novartis and Pfizer, and is co-founder of Atman Health. C.A.M. is a consultant for Pfizer and co-founder of Atman Health. All other authors declare no competing interests.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021 

Myocardial blood flow (MBF), as measured by PET, is a powerful predictor of cardiovascular outcomes, but has restricted availability due to equipment cost, tracer availability, technical expertise required, and radiation risk. In contrast, two-dimensional B-mode echocardiography measured at rest is a widely available, scalable technology. We hypothesized that many biological factors that drive abnormalities in MBF (red dotted arrows) would also be reflected within echocardiograms whereas technical factors unique to PET (black dotted arrow) would not. We trained three-dimensional convolutional neural network-derived models to estimate PET-derived indices of MBF, including CFR, using contemporaneously measured echocardiogram videos as inputs. We deployed the resulting models on tens of thousands of echocardiogram videos and found that echoAI derived MBF parameters and CFR sharply stratify patients by comorbidities, predict cardiovascular outcomes, and have genetic correlation with a breadth of clinical traits, shedding light on underlying biological mechanisms. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ;  https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint

Kaplan-Meier plots for HF hospitalization and ACS free survival for echoAI measurement quartiles.

EchoAI-CFR EchoAI-stressMBF EchoAI-restMBF HF ACS . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ;

Forest plot showing the hazard ratio and 95% confidence intervals for each parameter in the Cox proportional hazard models for heart failure hospitalization and ACS. Hazard ratios are plotted as per standard deviation. Abbreviations, ACS: acute coronary syndrome, EF: ejection fraction, CAD: coronary artery disease, MBF: myocardial blood flow, CFR: coronary flow reserve, CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; 

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint Figure S4 . Consort diagram of patient selection for survival analysis. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint Figure S5 Predictive accuracy of echoAI measurements and various patient characteristics for heart failure hospitalization (A) Receiver operating characteristics curves for the echoAI parameters, ejection fraction, heart rate and other patients' demographics information to predict heart failure hospitalization. (B) Receiver operating characteristics curve and (C) calibration plot for the 9-parameter combined model to predict heart failure hospitalization. Abbreviations, EF: ejection fraction, CAD: coronary artery disease, MBF: myocardial blood flow, CFR: coronary flow reserve. 

Heart rate . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint Figure S7 . Consort diagram of patient selection for survival analysis in external validation cohort from MGH. 5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint

Kaplan-Meier plots for HF hospitalization and ACS free survival for echoAI measurement quartiles. MGH: Massachusetts General Hospital.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint 

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.07.21251025 doi: medRxiv preprint

(A) Receiver operating characteristics curves for the echoAI parameters, ejection fraction, heart rate and other patients' demographics information to predict heart failure hospitalization. (B) Receiver operating characteristics curve and (C) calibration plot for the 9 parameters combined model to predict heart failure hospitalization. Abbreviations, EF: ejection fraction, CAD: coronary artery disease, MBF: myocardial blood flow, CFR: coronary flow reserve. 

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 

Heart rate . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Output . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 8, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Echo-AI CFR, A.U. ± SD 0 (0) 1.9 ± 0.3 1.9 ± 0.3 1.9 ± 0.3 1.9 ± 0.3 <0.001

International evaluation of an AI system for breast cancer screening

An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets

Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration

Automated and Interpretable Patient ECG Profiles for Disease Detection, Tracking, and Discovery

Artificial Intelligence-Enabled, Fully Automated Detection of Cardiac Amyloidosis Using Electrocardiograms and Echocardiograms

Fully Automated Echocardiogram Interpretation in Clinical Practice

Fundamentals in clinical coronary physiology: why coronary flow is more important than coronary pressure

Coronary Physiology Beyond Coronary Flow Reserve in Microvascular Angina JACC State-of-the-Art Review

Coronary Microvascular Disease Pathogenic Mechanisms and Therapeutic Options JACC State-of-the-Art Review

Coronary Flow Reserve from Mouse to Man-from Mechanistic Understanding to Future Interventions

Diagnostic and prognostic value of myocardial blood flow quantification as non-invasive indicator of cardiac allograft vasculopathy

Global coronary flow reserve is associated with adverse cardiovascular events independently of luminal angiographic severity and modifies the effect of early revascularization

Interaction of impaired coronary flow reserve and cardiomyocyte injury on adverse cardiovascular outcomes in patients without overt coronary artery disease

Association of Coronary Microvascular Dysfunction with Heart Failure Hospitalizations and Mortality in Heart Failure with Preserved Ejection Fraction -a follow-up in the PROMIS-HFpEF study

Regional myocardial blood flow reserve impairment and metabolic changes suggesting myocardial ischemia in patients with idiopathic dilated cardiomyopathy

Coronary microvascular dysfunction in patients with heart failure with preserved ejection fraction

Coronary microvascular endothelial dysfunction is an independent predictor of development of osteoporosis in postmenopausal women

Chronic inflammation and coronary microvascular dysfunction in patients without risk factors for coronary artery disease

Myocardial perfusion reserve is impaired in patients with chronic obstructive pulmonary disease: a comparison to current smokers

The information bottleneck method

Deep Learning and the Information Bottleneck Principle

Deep Variational Information Bottleneck

Prevalence and correlates of coronary microvascular dysfunction in heart failure with preserved ejection fraction: PROMIS-HFpEF

Systemic endothelial dysfunction is related to the extent and severity of coronary artery disease

Female (%)

Female (%)

Female (%)

SBP: systolic blood pressure, DBP: diastolic blood pressure, BMI: body mass index, eGFR: estimated glomerular filtration rate, EF: ejection fraction, LV: left ventricle, DM: diabetes mellitus, CAD: coronary heart disease, PAD: peripheral artery disease, COPD: chronic obstructive pulmonary disease, ASA: acetyl-salicylic acid, ACEi: angiotensin-converting enzyme inhibitor, ARB: angiotensin II receptor blocker, MRA: mineralocorticoid receptor antagonist White

HR: heart rate, SBP: systolic blood pressure, DBP: diastolic blood pressure, BMI: body mass index, eGFR: estimated glomerular filtration rate, EF: ejection fraction, LV: left ventricle, DM: diabetes mellitus, CAD: coronary heart disease, PAD: peripheral artery disease

HR: heart rate, SBP: systolic blood pressure, DBP: diastolic blood pressure, BMI: body mass index, eGFR: estimated glomerular filtration rate, EF: ejection fraction, LV: left ventricle, DM: diabetes mellitus, CAD: coronary heart disease, PAD: peripheral artery disease

HR: heart rate, SBP: systolic blood pressure, DBP: diastolic blood pressure, BMI: body mass index, eGFR: estimated glomerular filtration rate, EF: ejection fraction, LV: left ventricle, DM: diabetes mellitus, CAD: coronary heart disease, PAD: peripheral artery disease

Atrial fibrillation, n (%)

HR: heart rate, SBP: systolic blood pressure, DBP: diastolic blood pressure, BMI: body mass index, eGFR: estimated glomerular filtration rate, EF: ejection fraction, LV: left ventricle, DM: diabetes mellitus, CAD: coronary heart disease, PAD: peripheral artery disease, COPD: chronic obstructive pulmonary disease