key: cord-0019463-15i9mjbf authors: Cauwenberghs, Nicholas; Sabovčik, František; Magnus, Alessio; Haddad, Francois; Kuznetsova, Tatiana title: Proteomic profiling for detection of early‐stage heart failure in the community date: 2021-05-29 journal: ESC Heart Fail DOI: 10.1002/ehf2.13375 sha: ac2e34c12b58ed906ce91adb682f569ad53716d0 doc_id: 19463 cord_uid: 15i9mjbf AIMS: Biomarkers may provide insights into molecular mechanisms underlying heart remodelling and dysfunction. Using a targeted proteomic approach, we aimed to identify circulating biomarkers associated with early stages of heart failure. METHODS AND RESULTS: A total of 575 community‐based participants (mean age, 57 years; 51.7% women) underwent echocardiography and proteomic profiling (CVD II panel, Olink Proteomics). We applied partial least squares‐discriminant analysis (PLS‐DA) and a machine learning algorithm [eXtreme Gradient Boosting (XGBoost)] to identify key proteins associated with echocardiographic abnormalities. We used Gaussian mixture modelling for unbiased clustering to construct phenogroups based on influential proteins in PLS‐DA and XGBoost. Of 87 proteins, 13 were important in PLS‐DA and XGBoost modelling for detection of left ventricular remodelling, left ventricular diastolic dysfunction, and/or left atrial reservoir dysfunction: placental growth factor, kidney injury molecule‐1, prostasin, angiotensin‐converting enzyme‐2, galectin‐9, cathepsin L1, matrix metalloproteinase‐7, tumour necrosis factor receptor superfamily members 10A, 10B, and 11A, interleukins 6 and 16, and α1‐microglobulin/bikunin precursor. Based on these proteins, the clustering algorithm divided the cohort into two distinct phenogroups, with each cluster grouping individuals with a similar protein profile. Participants belonging to the second cluster (n = 118) were characterized by an unfavourable cardiovascular risk profile and adverse cardiac structure and function. The adjusted risk of presenting echocardiographic abnormalities was higher in this phenogroup than in the other (P < 0.0001). CONCLUSIONS: We identified proteins related to renal function, extracellular matrix remodelling, angiogenesis, and inflammation to be associated with echocardiographic signs of early‐stage heart failure. Proteomic phenomapping discriminated individuals at high risk for cardiac remodelling and dysfunction. In the presence of cardiovascular risk factors, the heart steadily remodels and its function progressively declines for years until symptoms of heart failure (HF) present. 1 Cardiac remodelling and dysfunction should be detected at a subclinical stage in order to initiate preventive measures in time. 2 Echocardiography enables non-invasive assessment of cardiac morphology and function. Over the years, echocardiographic features of subclinical heart remodelling and dysfunction have been validated as prognostic precursors of overt heart disease. [3] [4] [5] Exploring the network of molecular mechanisms behind cardiac remodelling and dysfunction could help identify novel targets for its detection, prevention, and management. Previously, remodelling and dysfunction of the heart has been linked to numerous molecular perturbations, including a network of interlinked metabolic and inflammatory derangements. 6, 7 The search for pathologically relevant proteins in HF is being facilitated by high-throughput proteomic profiling platforms 8 and by advanced analytical approaches such as machine learning capable of dealing with large protein networks. 9, 10 For instance, the HOMAGE casecontrol study identified 38 proteins associated with incidence of symptomatic HF, that is, the long-term consequence of subclinical heart remodelling and dysfunction, out of 252 proteins. 11 To date, however, these technological advancements have been underutilized for protein-based detection of early-stage heart disease. Indeed, when it comes to subclinical heart remodelling and dysfunction, population studies so far considered only a limited selection of proteomic markers and neglected protein interconnectivity. There is a need for better proteomic characterization of the asymptomatic stages of cardiac remodelling and dysfunction as assessed by echocardiography. Clustering approaches that integrate key proteins of echocardiographic abnormalities could identify phenotypically distinct groups and provide a protein-based classification of cardiac health in the community. If successful, such approach could lead to a targeted proteomic platform for identification of individuals at high risk for HF. In this population study, we therefore applied high-throughput biomarker profiling, feature selection techniques, and an unbiased clustering approach to (i) derive from a large panel of proteins related to cardiovascular disease those associated with echocardiographic signs of heart remodelling and dysfunction and (ii) integrate the most informative proteins in a protein-based classification system for assessment of cardiac health. This study is embedded in the Flemish Study on Environment, Genes and Health Outcomes (FLEMENGHO), which received ethical approval from the Ethics Committee of the University of Leuven (S64406). We recruited a random family-based population sample within north-eastern Belgium as described before. 12 From 2005 to 2015, we invited 1851 FLEMENGHO participants for an echocardiographic examination, of whom 1447 provided written informed consent (participation rate, 78.2%). We performed high-throughput proteomic profiling in 575 FLEMENGHO participants above 40 years old who were free from atrial fibrillation or a pacemaker at the time of examination and who had optimal echocardiographic image quality. Participants refrained from heavy exercise, smoking, and consuming alcohol or caffeinated beverages for 3 h before the examination. Supplemental Methods detail the echocardiographic protocol. Two echocardiographers obtained standardized images along the parasternal long and short axes and from the apical four-chamber and two-chamber and long-axis views using a Vivid 7 Pro and Vivid E9 (GE Vingmed, Horten, Norway) interfaced with a 2.5 to 3.5 MHz phased-array probe. 12, 13 Using EchoPAC software (GE Vingmed), images were post-processed by one expert (T.K.) blinded to the participants' characteristics with good reproducibility. 5 Using clinically recommended criteria 13 and population-based thresholds predictive of cardiac events in the community, 4,5 we defined left ventricular (LV) remodelling as having LV concentric remodelling [relative wall thickness (RWT) > 0.42] and/or having LV hypertrophy (LV mass indexed to body height 2.7 > 50 g/m 2.7 for men and >47 g/m 2.7 for women) 14 ; LV diastolic dysfunction as E/e' ratio > 8.5 [elevation of LV filling pressure, confirmed by differences in durations between mitral A flow and reverse pulmonary veins flow (Ad less than ARd + 10 ms), tricuspid regurgitation > 2.7 m/s, and/or elevation in left atrial (LA) maximal volume index (>40 mL/m 2 ) as measured by the method of disks] 15 ; and LA reservoir dysfunction as an LA reservoir strain < 23%. 5 Besides LV remodelling, we defined the following LV remodelling profiles: normal geometry (RWT ≤ 0.42, no LV hypertrophy), concentric remodelling (RWT > 0.42, no LV hypertrophy), eccentric hypertrophy (RWT ≤ 0.42, LV hypertrophy), and concentric hypertrophy (RWT > 0.42 and LV hypertrophy). 16 We determined 92 proteins related to immune regulation, metabolic pathways, and cardiovascular disease using a Proseek® multiplex platform (CVD II panel, Olink Proteomics, Uppsala, Sweden). Supporting Information, Table S1 lists the 92 biomarkers included in the CVD II panel. Fasting serum samples were analysed by the ARCADIA unit at the University Medical Center in Utrecht, the Netherlands. The platform applies proximity extension assay technology, 8 where each protein gets linked to a unique pair of oligonucleotide -labelled antibodies. Next, hybridization, amplification, and subsequently quantification of the complementary oligonucleotide strands linked to the paired antibodies enable protein quantification by quantitative real-time PCR using a Fluidigm BioMark HD platform. Quantitation data were quality controlled and normalized using internal and external controls, providing Normalized Protein eXpression (NPX) values. NPX is an arbitrary unit on a log2 scale used to quantify relative changes in protein levels. Higher NPX corresponds to higher protein expression. Five proteins were excluded due to bimodal distribution, leaving 87 proteins for analysis. Proteomic profiling of heart remodelling and dysfunction Details on recording medical history, lifestyle, and blood pressure are in Supplemental Methods. For database management and analysis, we used SAS Version 9.4 and JMP Genomics 9.0 (SAS Institute, Cary, NC, USA). Means and proportions were compared by a large sample z-test and χ 2 test, respectively. Significance was P < 0.05 on two-sided test. First, we applied feature selection techniques to select from the pool of 87 biomarkers those related to echocardiographic indexes (LV RWT and mass index, E/A ratio, E/e' ratio, and LA reservoir strain) and echocardiographic abnormalities (i.e. LV remodelling, LV diastolic dysfunction, and LA reservoir dysfunction as well as the LV remodelling profiles). For this, we used partial least squares-discriminant analysis (PLS-DA) 9 and eXtreme Gradient Boosting (XGBoost), 10 two dimension reduction techniques capable of dealing with large sets of interrelated biomarkers. PLS-DA constructs linear combinations that maximize the covariance between the biomarkers and the outcome (here: echocardiographic abnormalities). These latent factors then replace the original features (biomarkers) in outcome estimation. All 87 proteins were considered for construction of the latent factors. Per outcome, the software selected the PLS-DA model that predicted the outcome best at balanced risk for under-fitting and overfitting. In detail, the number of latent factors retained in the final PLS-DA model was the number with the lowest predicted residuals sum of squares (PRESS) explaining a substantial proportion of the variation in features and outcome (max. 15 latent factors). PRESS statistics provide a summary measure of the models' fit and were retrieved by leave-one-out cross-validation, in which each observation in turn was removed and models were refitted using the remaining observations. Per protein, we calculated the variable importance in projection (VIP) scores of Wold, reflecting the importance of each biomarker in the construction of the final PLS-DA model. Similar to PLS-DA, we performed PLS analyses for prediction of main echocardiographic indexes on a continuous scale (i.e. LV RWT and mass index, E/A ratio, E/e' ratio, and LA reservoir strain). In XGBoost, a final model is an additive combination of a number of trees, with each subsequent tree trained on a negative gradient of a loss function. This approach decreases both variance and bias and thus increases prediction performance. 10 XGBoost was optimized with Tree-structured Parzen Estimator Approach using hyperopt 0.2.5. To examine the internals of the trained XGBoost model, we applied Accumulated Local Effects (ALE) forked from PyALE 1.0. Feature importance was implemented as a mean increase in accuracy resulting from the tree splits with a given feature. Predictive performance of XGBoost was evaluated using 10-fold cross-validation. Overall performance of PLS-DA and XGBoost was assessed using the area under the receiver operating characteristic curve. We assessed associations between echo abnormalities and proteins selected in PLS-DA and XGBoost, while accounting for age, sex, body mass index (except for LV remodelling), heart rate, systolic blood pressure, total cholesterol, antihypertensive treatment, current smoking, and history of diabetes mellitus (with Holm-Bonferroni correction for multiple testing). Next, we performed a weighted network analysis on all 87 biomarkers using NetworkX 2.5. The Weighted Gene Co-expression Network Analysis 1.69 was used for the scale-free analysis. 17 We constructed a network from all proteins with edges weighted by the Pearson's correlation coefficients produced from Pandas 1.1.4. We power transformed the network, selecting the smallest power degree (β = 5) with scale-free fitting index ≥ 0.9. We used Louvain modularity (python-louvain 0.14) to identify distinct protein groups in the network. 18 Unsupervised clustering for protein-based phenomapping To identify protein-based phenogroups, we conducted model-based clustering on individuals using the set of biomarkers that were important in both PLS-DA (VIP > 1. 3) and XGBoost (top 10 feature importance) for detecting echocardiographic abnormalities. Clustering methods were taken from the scikit-learn library (0.23) and used within a Python 3.8 environment. 19 We fitted a Gaussian mixture using an expectation maximization algorithm, 20 with each component having its own covariance matrix. Gaussian mixture produces a statistical model of resulting segmentation. 21 The optimal number of clusters was based on the Davies-Bouldin and Silhouette indexes. 22 We compared the clinical and echocardiographic characteristics of the protein-based phenogroups and their odds for presenting cardiac remodelling and dysfunction while adjusting for potential confounders listed before. Figure S1 ) and between 0.72 and 0.83 for the XGBoost models (Supporting Information, Figure S2 ). Overall, 13 proteins were found important in both PLS-DA and XGBoost modelling for at least one of the three echocardiographic profiles: placental growth factor (PGF), kidney injury molecule-1 (KIM-1), galectin-9, cathepsin L1 (CTSL1), prostasin (PRSS8), tumour necrosis factor (TNF)-related apoptosis-inducing ligand receptor 2 (TRAIL-R2), TNF receptor superfamily member 0A (TNFRSF10A) and 11A (TNFRSF11A), matrix metalloproteinase-7 (MMP-7), angiotensin-converting enzyme-2 (ACE2), interleukins 6 (IL-6) and 16 (IL-16), and protein α1-microglobulin/bikunin precursor (AMBP) (Figure 1 ). The Venn diagram in Figure 2 illustrates the overlap between the 13 selected biomarkers. PGF, CTSL1, KIM-1, and galectin-9 were consistently identified as important for detecting all three echo abnormalities. Summary data of the PLS-DA models for protein-based identification of echocardiographic abnormalities are presented in Table 2 , and corresponding V-plots are shown in Supporting Information, Figure S3 . ALE plots in Supporting Information, Figure S4 illustrate the separate probabilistic effect of the five most important biomarkers on the outcome prediction in the XGBoost models. Supporting Information, Figure S5 presents the multivariable-adjusted associations between the echocardiographic abnormalities and the 13 biomarkers selected by PLS-DA and XGBoost. Multiple logistic regression confirmed most biomarkers selected by PLS-DA and XGBoost for LV remodelling. Higher risk for LV diastolic dysfunction also remained independently associated with higher levels of PGF, KIM-1, CTSL1, PRSS8, TRAIL-R2, MMP-7, and TNFRSF11A (but not galectin-9) after correction for multiple testing. For LA reservoir dysfunction, only its association with galectin-9 and MMP-7 survived multiple testing correction (Supporting Information, Figure S5 ). (for XGBoost) . ACE2, angiotensin-converting enzyme-2; AMBP, α1-microglobulin/bikunin precursor; AUC, area under the receiver operating curve; CTSL1, cathepsin L1; IL-6, interleukin-6; IL-16, interleukin-16; KIM-1, kidney injury molecule-1; LA, left atrial; LV, left ventricular; MMP-7, matrix metalloproteinase-7; PGF, placental growth factor; PRSS8, prostasin; TNFRSF10A, tumour necrosis factor receptor superfamily member 10A; TNFRSF11A, tumour necrosis factor receptor superfamily member 11A; TRAIL-R2, tumour necrosis factor-related apoptosis-inducing ligand receptor 2; VIP, variable importance in projection. Supporting Information, Table S2 presents the summary data of the PLS models for protein-based prediction of echocardiographic indexes on a continuous scale. Of note, in the prediction of the continuous echocardiographic indexes, PLS selected 11 of the 13 proteins (all but MMP-7 and IL-16) that were identified previously as important in both PLS-DA and XGBoost modelling for detection of at least one of the three echocardiographic profiles (LV remodelling, LV diastolic dysfunction, and/or LA reservoir dysfunction). Supporting Information, Figure S6 outlines the biomarkers selected by PLS-DA and/or XGBoost for detection of the different LV remodelling profiles (summary data of the PLS-DA models are available in Supporting Information, Table S3 ). Supporting Information, Figure S7 summarizes the proteins that were important in both PLS-DA and XGBoost modelling for each of the three LV remodelling profiles. Notably, ACE2 was consistently identified as important for detecting all three LV remodelling profiles. Supporting Information, Figure S8 shows the weighted network of the 87 biomarkers, which formed three modules. Most of the 13 biomarkers that were important in PLS-DA and XGBoost for detection of echocardiographic abnormalities were located within one module. The complex protein interconnectivity illustrates why protein measurements should never be interpreted in isolation. Using unsupervised clustering, we constructed biomarkerbased phenogroups to evaluate the potential value of the selected set of 13 biomarkers for targeted proteomic screening. Two clusters were constructed based on the lowest Davies-Bouldin index and the highest Silhouette index (indicating most optimal clustering) (Supporting Information, Figure S9 ). As such, the study sample was divided into two distinct phenogroups, with each cluster grouping individuals with a similar biomarker profile. Supporting Information, Table S4 presents the clinical and echocardiographic characteristics of the two phenogroups. The prevalence of LV remodelling, LV diastolic dysfunction, and LA reservoir dysfunction was significantly higher in Cluster 2 (n = 118) as compared with Cluster 1 (n = 457) (P < 0.0001 for all; Figure 3A and 3B). Even after accounting for important confounders, individuals belonging to Cluster 2 remained at higher risk for presenting LV remodelling [odds ratio (OR) with 95% confidence interval (CI); 2.44, 1.51-3.94], LV diastolic dysfunction (OR: 2.04, CI 1.12-3.73), and LA reservoir dysfunction (OR: 1.67, CI 1.03-2.70) as those located in Cluster 1 ( Table 3 ). We investigated the usefulness of proteomic profiling for detection of echocardiographic signs of heart remodelling and dysfunction in the community. By combining highthroughput proteomic profiling with feature selection algorithms capable of handling large protein networks, we identified 13 pathologically relevant proteins associated with echocardiographic signs of early-stage HF. Next, unsupervised clustering on this focused set of proteins enabled protein-driven identification of individuals at high risk for heart remodelling and dysfunction. High-throughput proteomic profiling allows exploring the network of molecular mechanisms behind subclinical heart remodelling and dysfunction. Using supervised feature selection, we first identified 13 proteins associated with echocardiographic abnormalities. These proteins reflect pathological processes behind cardiac remodelling and dysfunction TRAIL-R2 À: GDF-2 ACE2, angiotensin-converting enzyme-2; AMBP, α1-microglobulin/bikunin precursor; AUC, area under the receiver operating characteristic curve; CTSL1, cathepsin L1; Gal-9, galectin-9; IL-6, interleukin-6; IL-16, interleukin-16; KIM-1, kidney injury molecule-1; LA, left atrial; LV, left ventricular; MMP-7, matrix metalloproteinase-7; PGF, placental growth factor; PRSS8, prostasin; TNFRSF10A, tumour necrosis factor receptor superfamily member 10A; TNFRSF11A, tumour necrosis factor receptor superfamily member 11A; TRAIL-R2, tumour necrosis factor-related apoptosis-inducing ligand receptor 2; VIP, variable importance in projection. a LV remodelling was defined as having LV concentric remodelling (relative wall thickness > 0.42) and/or LV hypertrophy (LV mass ≥ 50 g/ m 2.7 in men and ≥ 47 g/m 2.7 in women). Our analysis indicates early involvement of renal biomarkers related to tubular cell damage (KIM-1), and homeostasis of fluid and electrolytes (PRSS8 and ACE2) in cardiac remodelling and dysfunction, even in the absence of symptomatic cardiac or renal deterioration. Previous experimental and clinical studies identified KIM-1 as a promising biomarker for renal proximal tubule injury that is relevant in cardiovascular diseases. 23 Indeed, worsening of tubular damage biomarkers such as KIM-1 predicted adverse events including cardiac death and HF hospitalization in 263 patients with chronic HF. 24 In addition, PRSS8 and ACE2 play an important role in blood pressure homeostasis and electrolyte balance via regulation of the renin-angiotensin-aldosterone and kallikreinkinin systems. 25, 26 In line, our study suggests that assessment of these markers may help to identify asymptomatic patients at risk for developing the cardiorenal syndrome. Other pathways associated with cardiac remodelling and dysfunction were angiogenesis and extracellular matrix remodelling. The observed increase in PGF might indicate its involvement already in early stages of cardiac remodelling and dysfunction in response to pressure overload and ischaemia. In experimental setting, administration of exogenous PGF after acute myocardial infarction stimulates angiogenesis and improves ventricular remodelling and function. 27 Similarly, endogenous PGF was required for adaptive angiogenesis and HF prevention by inducing cardiac hypertrophy after pressure overload in mice. 28 Galectin-9 belongs to a family of carbohydrate-binding proteins and is produced by the extracellular matrix. Among the galectins, galectin-3 is the most studied with regard to involvement and progression of HF so far, whereas the role of galectin-9 in HF progression requires further investigation. Through its cytoplasmic control of AMPK, galectin-9 is important for efficient ubiquitination during lysosomal damage and may thus affect various health conditions impacted by AMPK, including obesity, diabetes, and immune responses. 29 Similarly, CTSL1, an important lysosomal protein-processing enzyme, may also regulate the lysosomal degradation response to stress (i.e. pressure overload) that may alter cardiac function. 30 Previous studies also reported CTSL1 activity in extracellular matrix degradation, another mechanism of cathepsin participation in the development of cardiovascular diseases. 31 Thus, lysosomal protease dysfunction may impair the autophagy-lysosomal pathway, adversely affecting protein degradation. 30 Matrix metalloproteinase-7 is one of the metalloproteases degrading a wide range of extracellular matrix proteins, such as collagen IV, fibronectin, and laminin. 32 MMP-7 can also cleave other MMPs, including MMP-1, MMP-2, and MMP-9, leading to their activation, implicating MMP-7 as key regulator of extracellular matrix composition and, therefore, cardiac remodelling. Among markers reflecting inflammation and apoptosis, TNF family members and IL-6 and IL-16 were related to echocardiographic signs of cardiac remodelling and dysfunction in our cohort. TNFRSF10A and TRAIL-R2, receptors for TNFSF10/TRAIL, are members of the death receptor superfamily and modulate apoptosis. High levels of TRAIL-R2 were associated with incident diabetes, cardiovascular mortality, myocardial infarction, and ischaemic stroke in a large cohort of 4742 individuals recruited from the general population. 33 Another TNF superfamily member, TNFRSF11A, activates nuclear factor-κB and participates in a wide variety of processes controlling cell proliferation, apoptosis, and vascular calcification. Associated with LA reservoir dysfunction in our study, IL-6 significantly predicted atrial fibrillation incidence in 971 participants of the Heart and Soul Study. 34 Another biomarker highlighted in our analysis was AMBP, a precursor of α1-microglobulin, which is up-regulated during increased oxidative stress and can also be taken up intracellularly. 35 α1-Microglobulin can protect against excessive intracellular oxidative stress and localize to the mitochondria to protect mitochondrial function. Proteomic profiling of heart remodelling and dysfunction Interestingly, most of the proteins highlighted in our study were identified in previous proteomic studies as predictors of symptomatic HF. Indeed, Ferreira et al. recently published a post hoc analysis on the association between targeted proteomics profiles (n = 252) with incident HF defined as the first hospitalization for HF using nested matched cases and controls selected from three different cohorts. 11 Of note, eight out of the 13 proteins identified in our study overlapped with the biomarkers reported by Ferreira et al. In contrast, our study identified proteomic signatures associated with the early changes in cardiac function and remodelling that precede HF symptoms by years to decades. Besides unravelling the molecular mechanisms behind cardiac remodelling and dysfunction, proteomic profiling may also aid the characterization of early-stage HF in the community. Here, we provided a pipeline to integrate the most pathologically relevant proteins extracted from high-throughput proteomic data into a protein-based classification system for assessment of cardiac health. Indeed, we applied unbiased clustering to integrate the key proteomic markers of subclinical echocardiographic abnormalities into protein-driven phenomaps. This approach distinguishes phenotypically distinct groups and may provide a protein-based classification system of cardiac health in the community. Indeed, we found that the phenogroups constructed from the 13 proteins provided a clinically meaningful classification for cardiac risk stratification in asymptomatic individuals (Supporting Information, Table S2 ). Participants belonging to the second cluster were characterized by an unfavourable cardiovascular risk profile (older individuals, high prevalence of obesity, hypertension, and diabetes mellitus) and higher risk of presenting cardiac remodelling and dysfunction than the other cluster, even after adjustment for important risk factors. Our findings may lead to better proteomic characterization of asymptomatic stages of cardiac remodelling and dysfunction. Of note, our study illustrates a pipeline to derive clinically meaningful classifiers of cardiac health from high-throughput proteomic data. As such, the proposed protein-based clustering may be a first step towards a protein-based screening platform integrated within the clinical decision-making process for identification of individuals at high risk for HF. Future studies should further validate the usefulness of integrative proteomic profiling to identify individuals at high risk for cardiac dysfunction. Future trials should also unravel the clinical relevance of the highlighted set of proteins in HF prophylaxis and therapy. Effective translation of our findings may thus facilitate the development of strategies for better diagnosis, prevention, and treatment of HF. Conjointly, protein-driven screening, preventive and reactive strategies may help tackling the ever rising HF epidemic. Our study has strengths and limitations. First, although all echocardiographic measurements are prone to error, two experienced observers recorded the echocardiographic images using a standardized protocol and images were post-processed by a single observer with good reproducibility. Second, technical variability may have affected the proteomic measurements. However, the panel used in this study has been thoroughly validated regarding ranges, assay specificity and precision, repeatability and reproducibility, and endogenous interference (https://www.olink.com/resourcessupport/document-download-center/). Third, despite the relatively large population sample, our findings remain to be externally validated in a large-scale and racially diverse cohort. In line, our study findings should be extrapolated with caution to other ethnicities than white Europeans. Fourth, one should not infer causality from our cross-sectional observations. In conclusion, we identified a set of proteins associated with subclinical echocardiographic abnormalities, which may represent key targets for the detection, prevention, and management of early-stage HF. Protein-based clustering of individuals provided a classification system of cardiac health that may facilitate early detection of cardiac remodelling and dysfunction in the community. Future studies should validate the usefulness of integrative proteomic profiling for the management of early-stage HF. Figure S3 . Biomarker Associated with Cardiac Remodeling and Dysfunction in Partial Least Squares-Discriminant Analysis (PLS-DA). V-plots were generated from PLS-DA models for discrimination between normal and abnormal echocardiographic phenotypes. Markers with a VIP score above 1.3 were considered influential. Correlation coefficients were scaled and centered. LA, left atrial; LV, left ventricular; VIP, variable importance in projection. Figure S4 . Accumulated Local Effects for XGBoost with optimized hyperparameters, trained on all 87 biomarkers, shown for 5 most important biomarkers consistently selected for all three echocardiographic abnormalities. Each row belongs to a model trained on the corresponding label. Each ALE plot shows the effect of a single variable on given outcome, aligned to 0 and controlled for correlated variables and interaction effects. Only predictor values with data available (see the rug plot in each figure) should be considered. All variables exhibit a sharp non-linear effect at a corresponding threshold value. The absolute effect of a single variable is relatively low since the outcome prediction is based on adding the effects of many variables. Figure S5 . Multivariable-Adjusted Associations Between Echocardiographic Profiles of Cardiac Remodeling and Dysfunction and Proteins Selected in Feature Selection Modeling. We show per echocardiographic phenotype the proteins selected by both partial least squares-discriminant analysis and XGBoost modeling for discrimination of the particular phenotype. Odds ratios (95% CI) are expressed per doubling in protein level and were adjusted for age, sex, BMI (except for LV remodeling), heart rate, systolic and diastolic blood pressure and antihypertensive treatment. An asterisk (*) indicates that the P value remained <0.05 after Holm-Bonferroni correction for multiple testing. Figure S6 . Biomarkers of Echocardiography-Defined Profiles of Left Ventricular Remodelling. The heat map presents the biomarkers that were in PLS-DA (VIP > 1.3) and XGBoost modelling for detecting the LV remodelling profiles. Participants with normal LV geometry were the reference group (i.e. relative wall thickness (RWT) ≤ 0.42 and no LV hypertrophy; n = 328). For PLS-DA, red dots are positive and blue are negative correlations. Larger dots reflect greater VIP score (for PLS-DA) or greater feature importance (for XGBoost). AUC, area under the receiver-operating curve; VIP, variable importance in projection. Figure S7 . Biomarkers of Echocardiography-Defined Profiles of Left Ventricular Remodelling. The Venn diagram presents the biomarkers that were important in both PLS-DA and XGBoost modelling for detecting at least one of the LV remodelling profile (i.e. LV concentric remodelling without hypertrophy, LV eccentric hypertrophy and LV concentric hypertrophy). Figure S8 . Network of 87 Established and Potential Protein Markers of Cardiovascular Disease from Weighted Network Analysis. Louvain modularity was used to identify distinct groups of proteins in the network. The node size represents the weighted node connectivity. The 13 biomarkers in bold were found important in both PLS-DA and XGBoost analyses for one or more echocardiographic phenotypes. Supplemental Table S1 lists the full names and abbreviations of the 87 biomarkers. Figure S9 . Selection of the Optimal Number of Phenogroups in Unsupervised Clustering. Both Davies-Bouldin index (DBI; lower is better) and Silhouette index (SI; higher is better) indicated 2 as the optimal number of phenogroups ('clusters'). Longitudinal changes in LV structure and diastolic function in relation to arterial properties in general population Longitudinal tracking of left ventricular mass over the adult life course: clinical correlates of short-and long-term change in the Framingham offspring study Prognostic value of left ventricular diastolic dysfunction in a general population Subclinical left atrial dysfunction profiles for prediction of cardiac outcome in the general population Subclinical heart dysfunction in relation to metabolic and inflammatory markers: a community-based study Established and emerging roles of biomarkers in heart failure Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood Partial least squares-discriminant analysis (PLS-DA) for classification of highdimensional (HD) data: a review of contemporary practice strategies and knowledge gaps XGBoost: a scalable tree boosting system Proteomic bioprofiles and mechanistic pathways of progression to heart failure Additive prognostic value of left ventricular systolic dysfunction in a populationbased cohort Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging Guidelines for the management of arterial hypertension The 2013 ACC/AHA risk score and subclinical cardiac remodeling and dysfunction: complementary in cardiovascular disease prediction Recommendations on the use of echocardiography in adult hypertension: a report from the European Association of Cardiovascular Imaging (EACVI) and the American Society of Echocardiography (ASE) WGCNA: an R package for weighted correlation network analysis Fast unfolding of communities in large networks Scikit-learn: machine learning in Python Model-based Gaussian and non-Gaussian clustering Assessing a mixture model for clustering with the integrated completed likelihood Biomedical text categorization based on ensemble pruning and optimized topic modelling Kidney injury molecule-1 and cardiovascular diseases: from basic science to clinical practice Renal tubular damage and worsening renal function in chronic heart failure: clinical determinants and relation to prognosis (Bio-SHiFT study) Adenovirusmediated human prostasin gene delivery is linked to increased aldosterone production and hypertension in rats Angiotensin-converting enzyme 2: SARS-CoV-2 receptor and regulator of the renin-angiotensin system: celebrating the 20th anniversary of the discovery of ACE2 Cardioprotective activity of placental growth factor combined with oral supplementation of L-arginine in a rat model of acute myocardial infarction Placental growth factor regulates cardiac adaptation and hypertrophy through a paracrine mechanism AMPK, a regulator of metabolism and autophagy, is activated by lysosomal damage via a novel galectin-directed ubiquitin signal transduction system Cathepsin-L ameliorates cardiac hypertrophy through activation of the autophagylysosomal dependent protein processing pathways Cysteinyl cathepsins in cardiovascular diseases Matrix metalloproteinases in myocardial infarction and heart failure Elevated markers of death receptor-activated apoptosis are associated with increased risk for development of diabetes and cardiovascular disease Interleukin-6 and atrial fibrillation in patients with coronary artery disease: data from the Heart and Soul Study The role of α1-microglobulin (A1M) in erythropoiesis and erythrocyte homeostasis-therapeutic opportunities in hemolytic conditions Proteomic profiling of heart remodelling and dysfunction 2939