key: cord-0269089-b8vuy1kw authors: Vasquez-Rios, G.; Oh, W.; Lee, S.; Bhatraju, P.; Mansour, S. G.; Moledina, D. G.; Thiessen-Philbrook, H.; Siew, E.; Garg, A. X.; Chinchilli, V. M.; Kaufman, J. S.; Hsu, C.-y.; Liu, K. D.; Kimmel, P. L.; Go, A. S.; Wurfel, M. M.; Himmelfarb, J.; Parikh, C. R.; Coca, S. G.; Nadkarni, G. N. title: Molecular and clinical signatures in Acute Kidney Injury define distinct subphenotypes that associate with death, kidney, and cardiovascular events date: 2021-12-16 journal: nan DOI: 10.1101/2021.12.14.21267738 sha: 61d63afb8225fa1d4ad4559aa0531b95f4c15e42 doc_id: 269089 cord_uid: b8vuy1kw Introduction: AKI is a heterogeneous syndrome defined via serum creatinine and urine output criteria. However, these markers are insufficient to capture the biological complexity of AKI and not necessarily inform on future risk of kidney and clinical events. Methods: Data from ASSESS-AKI was obtained and analyzed to uncover different clinical and biological signatures within AKI. We utilized a set of unsupervised machine learning algorithms incorporating a comprehensive panel of systemic and organ-specific biomarkers of inflammation, injury, and repair/health integrated into electronic data. Furthermore, the association of these novel biomarker-enriched subphenotypes with kidney and cardiovascular events and death was determined. Clinical and biomarker concentration differences among subphenotypes were evaluated via classic statistics. Kaplan-Meier and cumulative incidence curves were obtained to evaluate longitudinal outcomes. Results: Among 1538 patients from ASSESS-AKI, we included 748 AKI patients in the analysis. The median follow-up time was 4.8 years. We discovered 4 subphenotypes via unsupervised learning. Patients with AKI subphenotype 1 (injury cluster) were older (mean age +/- SD): 71.2 +/- 9.4 (p<0.001), with high ICU admission rates (93.9%, p<0.001) and highly prevalent cardiovascular disease (71.8%, p<0.001). They were characterized by the highest levels of KIM-1, troponin T, and ST2 compared to other clusters (P<0.001). AKI subphenotype 2 (benign cluster) is comprised of relatively young individuals with the lowest prevalence of comorbidities and highest levels of systemic anti-inflammatory makers (IL-13). AKI Subphenotype 3 (chronic inflammation and low injury) comprised patients with markedly high pro-BNP, TNFR1, and TNFR2 concentrations while presenting low concentrations of KIM-1 and NGAL. Patients with AKI subphenotype 4 (inflammation-injury) were predominantly critically ill individuals with the highest prevalence of sepsis and stage 3 AKI. They had the highest systemic (IL-1B, CRP, IL-8) and kidney inflammatory biomarker activity (YKL-40, MCP-1) as well as high kidney injury levels (NGAL, KIM-1). AKI subphenotype 3 and 4 were independently associated with a higher risk of death compared to subphenotype 2. Moreover, subphenotype 3 was independently associated with CKD outcomes and CVD events. Conclusion: We discovered four clinically meaningful AKI subphenotypes with statistical differences in biomarker composites that associate with longitudinal risks of adverse clinical events. Our approach is a novel look at the potential mechanisms underlying AKI and the putative role of biomarkers investigation. Acute kidney injury (AKI) is a heterogeneous syndrome that may occur in the context of diverse clinical scenarios including infection, acute organ failure, or hemodynamic changes, among others. 1, 2 Despite the broad definition, clinicians intuitively have classified AKI into 'pre-renal', 'renal' (or intrinsic), and 'post-renal' causes. 1, 3 However, given the complexity of such disease processes, multiple insults may co-exist and induce different biological disturbances leading to AKI and adverse outcomes. Ascertainment of pathological pathways, duration of disease, and prediction of clinical events is difficult via traditional measures such as serum creatinine fluctuations, urine studies, and urine output changes. 4 Furthermore, kidney function following AKI in the short to medium term can display several non-linear trajectories over time (including ongoing recovery) that may not associate with meaningful laboratory abnormalities, nor the etiology of AKI adjudicated during the acute hospitalization. 1, [4] [5] [6] This represents a major limitation to providing individualized care during the acute setting and in the post-AKI stage. Biomarkers reflective of key biological pathways such as kidney injury, inflammation, and health have emerged over the past 10 years showing their potential to anticipate the diagnosis of acute kidney injury (AKI) and forecast adverse outcomes in this population including disease duration, dialysis needs, and mortality. 7,8 Furthermore, among CKD patients, some of these biomarkers have been demonstrated to strongly predict the advent of end-stage kidney disease (ESKD) in patients with and without diabetes mellitus. 9-15 However, their ability to outperform to less expensive and routinely collected laboratory tests such as creatinine and urine albumin-tocreatinine ratio (uACR) has been questioned. 16 While efforts have been made to integrate individual biomarkers (or panels) into clinical prediction models, standard regression models may be limited to evaluate markedly heterogeneous and high-dimensional data, thereby compromising the researcher's ability to find clinical significance from these. Importantly, the . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; importance of incorporating 'big data' to enrich our perspective on AKI has been recently acknowledged 15 th Acute Dialysis Quality Initiative (ADQI). 17 Unsupervised learning plays an important role in data summarization and preliminary structure identification of complex and highly variable data. 18, 19 Considering the complex nature of AKI, contributing clinical factors, and implicated mechanisms, unsupervised clustering has the potential to serve as an informative analytical tool to identify AKI subphenotypes from highdimensional data. 17, 20, 21 Previous studies suggested that AKI subphenotypes based upon tumor necrosis factor receptors (TNFR) and angiopoietin-2/angiopoietin-1 ratio could serve to identify patients' responsiveness to vasopressors and short-term adverse outcomes in a critically ill population via latent class analyses. 22 Furthermore, our group demonstrated that unsupervised learning (deep learning) integrating routine laboratory parameters and electronic health record (EHR) data could lead to subphenotypes associated with short-term kidney outcomes in a sepsis-AKI population. 23 The Assessment, Serial Evaluation, and Subsequent Sequelae in Acute Kidney Injury (ASSESS-AKI was a prospective cohort of AKI survivors whose clinical and biomarker data was rigorously collected on admission and in a protocolized fashion after discharge. 24 While data on an individual or small set of post-operative biomarkers and their associations with AKI and CKD are extensive in the literature, [25] [26] [27] adequate integration of multiple systemic and kidney-related biomarkers of inflammation, injury, and fibrosis along with clinical data to investigate distinct AKI . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The ASSESS-AKI study was a multicenter, prospective, matched parallel cohort of 1538 hospitalized adult participants who developed AKI and matched individuals who did not have AKI and who survived to complete an in-person visit at 3 months after discharge. 24 Patients aged 18 and over were enrolled from the medical and surgical floors and intensive care units (ICUs) in 4 US-based medical centers with matching criteria at each institution for baseline covariates, estimated glomerular filtration rate (eGFR), urine albumin-to-creatinine (uACR) levels and AKI stage. Patients had pre-admission serum creatinine measurements obtained within 1 year of the index hospitalization as outpatients. The etiology of AKI was not adjudicated during the hospitalization. Exclusion criteria for ASSESS-AKI were broad and included the presence of acute glomerulonephritis, hepatorenal syndrome, multiple myeloma, malignancy, urinary obstruction, severe heart failure, kidney replacement therapy (dialysis, transplant) prior to hospitalization, pregnancy, or predicted survival ≤ 12 months. For the present study, the focus of our analyses were those patients who had AKI as defined by the ASSESS-AKI protocol during the hospitalization who survived 3-months after discharge (N=769). AKI was defined as a relative increase of ≥ 50% or absolute increase of ≥ 0.3 mg/dL in peak inpatient serum creatinine concentration above the baseline creatinine. AKI stage was classified following the Acute Kidney Injury Network (AKIN) guidelines. CKD status and eGFR calculation were obtained using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation. Participants had study visits at 3 and 12 months after discharge, and annually thereafter (interim phone contacts every 6 months) for assessment of biomarker data and adjudication of clinical events. Variables including demographic characteristics and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; comorbidities were obtained from medical records. Serum creatinine was measured using an isotope dilution mass spectrometry-traceable enzymatic assay (Roche Di-agnostics, Indianapolis, IN), and a random spot urine protein-to-creatinine ratio using a turbidimetric method (Roche). Biomarker quartiles were defined within the AKI cohort via different diagnostic assays. Plasma and urine samples were collected within 96 hours of the diagnosis of AKI. Systemic biomarkers of inflammation included: IFN-γ, IL-4, IL-13, TNF-α, IL-1β, IL-2, IL-6, IL-8, IL-10, and IL-12. Kidney biomarkers of inflammation were: TNFR1, TNFR2, YLK-40, MCP-1, and IL-18, whereas biomarkers of kidney injury included: KIM-1, NGAL, uACR. UMOD was evaluated to inform on tubule health. Biomarkers of cardiac congestion, injury, and remodeling/fibrosis, respectively, included: pro-BNP, Troponin T, ST2 and Gal-3. Blood samples were collected in ethylenediamine tetra-acetic acid tubes and centrifuged to separate plasma. Sample underwent a single controlled thaw, were centrifuged at 5000xg for 10 minutes at 4*C, separated into 1 mL aliquots, and immediately stored at -80 o C until biomarkers were measured. Urine biomarkers were obtained in a protocolized fashion and measured via multiplex assay or by using Meso Scale Diagnostics when appropriate. Details on biomarker collection and processing have been described previously. 26 We evaluated three main outcomes: Composite kidney outcome (CKD incidence or progression), cardiovascular events, and death. Among patients without underlying CKD, CKD incidence was defined as >25% reduction in the eGFR from baseline; or reaching CKD 3 or worse during follow-up. Among patients with underlying CKD (eGFR <60 ml/min/1.73m 2 during . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; hospitalization), CKD progression was ascertained as >50% reduction in the baseline eGFR; or progression to CKD 5 or development of ESKD on follow-up (hemodialysis or peritoneal dialysis requirement at least once/week for >12 weeks, receiving a kidney transplant and/or death while on dialysis). Cardiovascular events were a composite of myocardial infarction, heart failure, cerebrovascular accidents, peripheral artery disease or coronary or cardiovascular intervention. Death was ascertained from proxy reports, medical records, and death certificate data. Patients' demographics, comorbidities and a comprehensive panel of systemic and organspecific biomarkers were included during data preparation. First, for numerical variables, outliers were adjusted using 95% winsorization. In our study, observations greater than 97.5% were set to 97.5%, and observations smaller than 2.5% were set to 2.5%. Second, variables showing leftskewed distribution were log-transformed and examined whether the mode leans to the median than the first quartile. Third, variables were normalized using a robust scaler, 28 spareness of variables, which is known as a curse of dimensionality, and multicollinearity. While . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; several methods exist for dimensionality reduction to address these challenges, such as matrix factorization 31,32 and deep autoencoder, 33,34 FAMD is a recently developed principal component analysis -family method that simultaneously explores multivariate dependencies between both categorical and numerical variables. 14 out of 53 principal components (eigenvalues >1) were selected based on the Kaiser-Guttman criterion. We conducted hierarchical agglomerative clustering in which each observation starts in its own cluster and pairs of clusters are merged as one moves up the hierarchy. We used ward linkage with Euclidean distance applied to 14 principal components. The number of subphenotypes was determined through the visual evaluation of the dendogram as well as results from 26 indexes available from the NbClust package in R. 35 Furthermore, the robustness of our clustering algorithms was evaluated through alternative techniques including K-means and consensus clustering. After obtention of clusters, we described the clinical and biomarker distribution per cluster group. Final Shapley additive explanations (SHAP) diagrams were obtained to assess the importance of intervening features on each subphenotype by comparing the XGBoost models with and without the presence of such specific feature. Descriptive statistics for continuous variables were reported as mean (standard deviation) or median (interquartile range) accordingly for each of the subphenotypes. Categorical variables were presented as frequencies and percentages. Clinical, demographic and biomarker concentration differences among subphenotypes were evaluated via the analysis of variance (ANOVA) or Chi-square test when appropriate. Cox proportional hazard models were used compare time to censored kidney events (CKD incidence or progression), CVD events and . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; death with death up to 6 years of follow-up. These models were fully adjusted for age, gender, diabetes mellitus (yes/no), BMI, baseline CKD (yes/no), and baseline UACR. Two-tailed P values of less than 0.05 were considered to indicate statistical significance. We performed all statistical analyses using R software version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria) and Python 3.9.3. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; Among 1538 participants included in ASSESS-AKI, 748 AKI individuals had full biomarker and data of interest and were included in the analysis. The median follow-up time for clinical events was 4.8 years. Participants were predominantly men (68%), from Caucasian ethnicity (79%) with a mean (± SD) of 64 (13) . The prevalence of baseline for key covariates such as CKD (eGFR<60 ml/min/1.73 m 2 ), cardiovascular disease, congestive heart failure (CHF), hypertension and diabetes mellitus were: 39.7%, 48%, 26.3%, 78.6% and 50%, respectively. The overall mean (± SD) baseline eGFR was 66.8 (24.9) ml/min per 1.73m 2 and the overall mean (± SD) uACR was 2.2 mg/g in this population. A large proportion of the cases of AKI were stage 1 (72.6%), while AKI stage 2 (15%) and 3 (12.4%) were less frequent. Full baseline and AKI clinical characteristics are reflected in Table 1 . Using the 53 biomarker and clinical characteristics, the hierarchical clustering algorithm identified four clusters that best represented the data patterns of our AKI population. A diagrammatic representation (dendogram) used to ascertain the arrangement of the clusters produced by the corresponding analyses revealed four clusters (Figure 1 ). The Uniform Manifold Approximation and Projection (UMAP) was used for dimensionality reduction and cluster visualization revealing adequate independence (Figure 1) . Additionally, the robustness of these four clusters was established by comparing with another unsupervised algorithm (Kmeans clustering); visualized through the heatmap and UMAP in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; The distributions of the 53 biomarker and clinical characteristics were statistically different between the four clusters ( Table 2) . Subphenotype 1 (N=181) was characterized by older individuals compared to subphenotype 2, 3 and 4, with age (mean ± SD): 71 ± 9 (p<0.001), with high ICU admission rates (94%, p<0.001), moderate rates of baseline CKD (38%, p<0.01) and the highest prevalence of baseline cardiovascular disease (72%, p<0.001) among all subphenotypes. Mean (SD) peak creatinine level at the time of diagnosis of AKI was 1.1 (0.3). Furthermore, individuals in subphenotype 1 had the highest rates of stage 1 AKI (91%). .001) and CHF (55.3%, p<0.001) compared to other subphenotypes. These patients presented the highest mean (SD) peak serum creatinine (1.9 ± 0.6) and albuminuria (5 ± 7.7) during the hospitalization. Also, the frequency of AKI stage 1, 2 and 3 were 74.8%, 9.4% and 15.7% respectively within this subphenotype. Patients in subphenotype 4 had markedly high ICU admission rates (73%, p<001) and the highest incidence of sepsis (51%, p<0.001) among all clusters. Furthermore, these individuals had comparatively lower rates of CKD (25.9%, p<0.001) and baseline cardiovascular disease (29.1%, p<0.001) than subphenotype 3. Notably, approximately 1 in every 3 patients within this subphenotype had stage 3 AKI, with mean (SD) peak serum creatinine of 3.2 (1.8) and albuminuria levels at 3.473 g/mg (5.7). The characteristics within each AKI subphenotype are described in Table 2 . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; In terms of biomarker composites, individuals with AKI in subphenotype 1 were characterized by the highest mean troponin T levels recorded among all clusters. Additionally, biomarkers of cardiac congestion (pro-BNP) and remodeling (ST2) were also higher compared to subphenotype 2. Biomarkers of systemic (IL superfamily) and kidney inflammation such as TNFR1, TNFR2, were comparatively lower to subphenotypes 3 and 4. However, AKI patients in this subphenotype presented higher KIM-1 levels compared to subphenotype 2. AKI patients in They also expressed the lowest levels of cardiac injury activity. Full descriptions of biomarker composites per cluster are reflected in Table 2 and visualization of selected clinical and biomarker signatures are seen in Figure 4 . Table . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; 3. AKI subphenotype 3 and 4 were independently associated with an aHR of death at 2.9 (95% CI: 1.8 -4.6, p<0.001). In terms of CKD outcomes, subphenotype 3 was independently associated with an aHR of 2.6 (95% CI: 1.6 -4.2, P<0.001) in fully adjusted models (Table 3) . Furthermore, subphenotype 3 was independently associated with the risk of CVD events with an adjusted HR of 2.6 (95% CI: 1.6 -4.1, P<0.002). Figure 5 shows the forest plot of the subphenotypes and HRs for each outcome. Supplemental figures 6-7 show different K-M curves for each outcome and subphenotype. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; We uncovered four clinically and molecularly distinct AKI subphenotypes through a series of unsupervised machine learning algorithms. These clusters are characterized by unique biological signatures as evaluated by a comprehensive panel of systemic and organ-specific serum and urine biomarkers reflective of different axes of disease and health. Biomarker composites were statistically different from each other and informed on the complexity of AKI syndromes through an unbiased approach. Furthermore, AKI subphenotype 3 and 4 were independently associated with 1.6 -2.9-fold risk of death when compared with AKI subphenotype 2 (benign/reference cluster). Furthermore, AKI subphenotype 3 was independently associated with longitudinal CKD outcomes and CVD events in fully adjusted models among those patients who survived to hospitalization. AKI has been defined on the basis of the KDIGO criteria using serum creatinine and urine output changes. While this tool has increased our capacity to identify AKI, there is vast evidence suggesting that molecular changes are present before meaningful serum creatinine and UOP changes are noticed clinically. 8,36 Furthermore, detectable tubular-interstitial abnormalities have been demonstrated to associate with AKI duration, dialysis needs and protracted kidney outcomes in large prospective studies. 37-41 Moreover, the fine balance repair and maladaptive processes has been postulated to mediate the progression towards CKD. 42 However, conceptualizing the complex nature of AKI and the intervening biological processes leading to distinct clinical trajectories is challenging for the clinician. This is in part due to large amounts of patient data but also due to the linear regressions used to evaluate the associations between individual or small set of biomarkers and outcomes. We demonstrated that machine learning algorithms allow us to deal with such data heterogeneity in an unbiased manner and discover clusters with unique clinical and molecular characteristics. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; AKI subphenotype 1 (injury-predominant cluster) is characterized by senior patients with comparatively high levels of troponin T and KIM-1. Furthermore, these individuals presented higher levels of MCP-1 and YKL-40, which are markers of kidney and heart inflammation, compared to subphenotype 2 and 3. Yet, the majority of these patients had stage 1 AKI and had as low peak serum creatinine levels as compared to subphenotype 2 (benign AKI). These findings highlight the importance of incorporating additional tools for estimating kidney function in patients susceptible to muscle mass remodeling but also, to identify individuals at risk for clinical events. 43 There are several clinical scenarios that could be represented by these findings, including patients with myocardial infarction or pulmonary embolism who are susceptible to hemodynamic instability, or patients with active cardiovascular disease undergoing contrast-based studies. Patients in AKI subphenotype 3 had the highest prevalence of baseline cardiovascular disease, CKD, and heart failure. These patients presented the highest, pro-BNP, TNFR1 and TNFR2 levels and lowest baseline eGFR (38.9 ± 17 ml/min per 1.73 m 2 ). Although these patients had the highest peak serum creatinine (3.4 ± 1.5) levels during hospitalization, they had low concentrations of urine IL-18, NGAL and KIM-1. A potential explanation relies in the time of collection of NGAL, which tends to have a short half-life compared to other biomarkers. Alternatively, we hypothesize that this scenario is plausible in the context of cardiorenal AKI, where creatinine fluctuations do not necessarily follow from acute injury marker elevations. AKI subphenotype 3 was independently associated with the worst outcomes such as death, CKD, and CVD events. TNFR1 and TNFR2 are relatively more stable markers of inflammation compared to other members of the TNF-alpha superfamily and elevated levels can be encountered in acute and chronic inflammation. Elevated TNFR levels has been found to be prognostic of cardiovascular disease, kidney disease progression and death among patients in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 16, 2021. ; the pre-and post-AKI stage as well as in patients with stable CKD with and without diabetes mellitus. As demonstrated in our study, patients these levels could be elevated and produced in the heart and vasculature while not necessarily driving worse AKI pathophysiology. Yet, given their capability to orchestrate diverse pathological cellular changes in the heart, kidney, vasculature, they have postulated as a redundant pathway leading to death. Patients in this cluster could resemble those with advanced heart and kidney disease, who present with cardiorenal syndrome. Conversely to prior data, it is possible that patients with creatinine fluctuations in the context of ischemic or hemodynamic changes (i.e. cardiorenal syndrome) without significant tubular injury activity could still present increased morbidity and risk of clinical events, mediated by these TNFR1 and TNFR2. Our findings could potentially serve to gain insight on the diagnostic role and clinical relevance of biomarkers when incorporated into multi-dimensional models. Biomarker composites were . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 16, 2021. ; statistically different within each subphenotype and denotes those different degrees of biological disturbances could identify the nature of AKI and inform on future clinical events. In the advent of new therapies that could potentially modulate inflammatory-predominant type kidney disease (i.e. SGLT2 inhibitors) or fibrosis (i.e. mineralocorticoid receptor antagonists), identifying these pathologic domains could be build up in process of exploring therapeutic targets in AKI or in the post-AKI stage; understanding that all AKI are not created equal. Our analyses were robust and validated through different unsupervised algorithms and our data builds upon previous analyses showing that AKI and CKD subphenotypes with can eventually lead to different outcomes. While these studies do not inform on underlying biological pathophysiology, our study is novel by incorporating biomarkers of key domains involved in kidney disease progression. Future studies may attempt to explore whether individuals within the same AKI subphenotype could share similar proteins and transcripts responsible of the biomarker composites and clinical outcomes evidenced in our cohort (multi-omic level). Using this clustering approach could assist with the challenges of stratifying AKI in different scenarios such as sepsis, acute cardiac injury, community-acquired AKI, and advanced organ disease. By using a different approach (latent class analysis and K-means clustering), previous data has demonstrated the presence of two subphenotypes; one driven by congestive heart failure and benign biomarker profile and the other characterized by prevalent CKD and high concentrations of unfavorable biomarker profile when integrating 29 variables. 44 The present study was characterized by more ample clinical and biomarker data integrating 53 variables, which could have resulted in different clustering representation. In detail, we applied several sophisticated methods to transform the raw data into different dimensionality, allowing us to use extensive mixed-type variables, and tackle two typical high dimensionality problems (sparseness and multiple chain imputations). These efforts turned into accessing granular subphenotypes where . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 16, 2021. ; https://doi.org/10.1101/2021.12.14.21267738 doi: medRxiv preprint each subphenotype was well separated from the other, as shown in Figure 1 (dendrogram). In this context, it is possible that our 4 AKI subphenotypes 'bisect' the aforementioned 2 subphenotypes (congestion cluster with benign biomarker profile and CKD driven with unfavorable biomarker profile); thereby collapsing subphenotype 1 and 2 into one major cluster and subphenotype 3 and 4 into another single one. The angle of this study is unique and complementary to previous data, since we aimed to identify potentially useful 'data patterns' over statistically and numerally significant features that emulate real world data abundance. However, we acknowledge the potential trade with power reduction at the time of time to event analyses by deriving 4 subphenotypes instead of 2. The heterogeneity and clinical relevance found in the present study should be confirmed in additional studies with larger sample sizes. In terms of limitations, due to the design of the ASSESS-AKI study, only patients who survived to the 3-month follow-up visit were included in the longitudinal data collection and analyses. Thus, there is some selection bias present. However, patients who die in-hospital or immediately following hospitalization are not in need of subphenotyping, and risk-stratifying for long-term outcomes. Our subphenotypes may not inform on those with the highest risk profile. Also, despite the heterogeneity of our AKI population (medical/surgical floors, ICU), a high proportion AKI cases were stage 1, which could carry a lower risk of CKD events compared to more severe stages. Most of our patients were white and we did not have a validation cohort, which limits the generalizability of our subphenotypes especially given the uniqueness of the biomarkers tested and the study design. Since this is data-driven approach, data pattern aggregation and cluster membership depend on the input of data. Therefore, if the input variables were to be modified, we could expect different AKI subphenotypes. Furthermore, . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 16, 2021. ; reproducing these 4 novel clusters within each participating center in ASSESS-AKI is challenging, and potentially explained by the participating population in each center. For example, it may be difficult to reproduce AKI subphenotype 1 (cardiac/kidney injury predominant) in the surgical floor or AKI subphenotype 4 (injury-inflammation predominantly in the context of sepsis) in a cardiac care unit. Also, while our clusters describe altered biological pathways, they do not inform on the pathophysiology of AKI and genomic/transcriptomic analyses are needed. Moreover, cluster analysis summarizes patients' heterogeneity of large data sets into discrete categories, which may result in loss of information in exchange for better clinical interpretation. Finally, subphenotype discovery depend largely on data input and feature characteristics, resulting in different feature aggregation and clinical/biomarker composites. Therefore, these subphenotypes may not necessarily produce in other settings (e.g. and further studies are needed. We discovered 4 novel and clinically meaningful and statistically different AKI subphenotypes that inform on potential biological pathway abnormalities that associate with different risks for adverse clinical events and death. To the best of our knowledge, our data are novel as it comprehensively assessed the biological and statistical role of a comprehensive panel of biomarkers when obtained during the diagnostic phase of AKI. Biomarker composites within each cluster informed on the complexity of AKI syndromes through an unbiased approach. This study provides a novel look to the role of biomarkers in the diagnosis of AKI and their association to clinical events via multidimensional analyses. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 16, 2021. ; he also serves as an associate editor for the Clinical Journal of the American Society of Nephrology. CRP reports personal income and equity and stock options from Renaltyix; he also reports personal income from Genfit Biopharmaceutical Company and Akebia Therapeutics. E.M., C.R. and M.J.K. are employees of Randox but do not own shares in the company. The remaining authors declare that they have no relevant financial interests. GNN is also supported by R01DK108803, U01HG007278, U01HG009610, and 1U01DK116100. Furthermore, GNN receives financial compensation as consultant and advisory board member for RenalytixAI, Inc, and owns equity in RenalytixAI. GNN is a scientific cofounder of RenalytixAI. Dr Nadkarni has received operational funding from Goldfinch Bio and consulting fees from BioVie Inc and GLG consulting in the past 3 years. WO is supported by Grant Number T32 DK007757 from NIH NIDDK. The remaining authors have no disclosures to report. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Acute kidney injury Acute kidney injury and chronic kidney disease as interconnected syndromes Sub-Phenotypes of Acute Kidney Injury: Do We Have Progress for Personalizing Care? Recommendations on Acute Kidney Injury Biomarkers From the Acute Disease Quality Initiative Consensus Conference: A Consensus Statement Timing of Recovery From Moderate to Severe AKI and the Risk for Future Loss of Kidney Function A prospective cohort study that examined 10 Circulating TNF receptors 1 and 2 predict ESRD in type 2 diabetes Circulating TNF receptors 1 and 2 predict progression of diabetic kidney disease: A meta-analysis Circulating TNF receptors 1 and 2 predict stage 3 CKD in type 1 diabetes Kidney injury molecule-1 and the loss of kidney function in diabetic nephropathy: a likely causal link in patients with type 1 diabetes Plasma Biomarkers and Kidney Function Decline in Early and Established Diabetic Kidney Disease Plasma kidney injury molecule-1 (p-KIM-1) levels and deterioration of kidney function over 16 years Urine biomarkers of tubular injury do not improve on the clinical model predicting chronic kidney disease progression Acute kidney injury in the 20 Big Data in Nephrology Acute Kidney Injury and Big Data Identification of Acute Kidney Injury Subphenotypes with Differing Molecular Signatures and Responses to Vasopressin Therapy Utilization of Deep Learning for Subphenotype Identification in Sepsis-Associated Acute Kidney Injury The assessment, serial evaluation, and subsequent sequelae of acute kidney injury (ASSESS-AKI) study: design and methods Plasma Soluble Tumor Necrosis Factor Authors' contributions: GVR* and WO* contributed equally to the elaboration and intellectual