key: cord-0745499-rokh0s2s
authors: Di Castelnuovo, A.; Gialluisi, A.; The COVID-19 RISK and Treatments Collaboration,; Iacoviello, L.
title: Disentangling the association of hydroxychloroquine treatment with mortality in Covid-19 hospitalized patients through Hierarchical Clustering
date: 2021-01-29
journal: nan
DOI: 10.1101/2021.01.27.21250238
sha: eab4f566b559f81f1c998a1377e53c8b4d1d16c3
doc_id: 745499
cord_uid: rokh0s2s

The efficacy of hydroxychloroquine (HCQ) in treating SARS-CoV-2 infection is harshly debated, with observational and intervention studies reporting contrasting results. To clarify the role of HCQ in Covid-19 patients, we carried out a retrospective observational study of 4,396 unselected patients hospitalized for Covid-19 in Italy (February-May 2020). Patients characteristics were collected at entry, including age, sex, obesity, smoking status, blood parameters, history of diabetes, cancer, cardiovascular and chronic pulmonary diseases and medications in use. These were used to identify subtypes of patients with similar characteristics through hierarchical clustering based on Gower distance. Using multivariable Cox regressions, these clusters were then tested for association with mortality and modification of effect by treatment with HCQ. We identified two clusters, one of 3,913 younger patients with lower circulating inflammation levels and better renal function, and one of 483 generally older and more comorbid subjects, more prevalently men and smokers. The latter group was at increased death risk adjusted by HCQ (HR[CI95%] = 3.80[3.08-4.67]), while HCQ showed an independent inverse association (0.51[0.43-0.61]), as well as a significant influence of cluster*HCQ interaction (p<0.001). This was driven by a differential association of HCQ with mortality between the high (0.89[0.65-1.22]) and the low risk cluster (0.46[0.39-0.54]). These effects survived adjustments for additional medications in use and were concordant with associations with disease severity and outcome. These findings suggest a particularly beneficial effect of HCQ within low risk Covid-19 patients and may contribute clarifying the current controversy on HCQ efficacy in Covid-19 treatment.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint Hydroxychloroquine (HCQ) is an antimalarial drug suggested to be effective in inhibiting Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-Cov-2) replication in vitro [1, 2] . Indeed, HCQ is characterized by anti-viral, anti-inflammatory, and anti-thrombotic actions, contrasting the main disruptive effects of SARS-CoV-2 infection on the organism [3] . For this reason, it has been heavily used in treating patients affected by SARS-CoV-2 infection related disease (commonly known as , especially in the first phases of the current pandemics, when Covid-19 was quite unknown [4] .

Despite these elements and initial suggestive evidence of efficacy based on daily clinical practice, in the last months the potential benefit of HCQ for Covid-19 patients has been harshly debated [3, 5] . In particular, evidence supporting protective effects from observational studies [6] [7] [8] [9] [10] [11] [12] [13] was in contrast with that suggesting no effect at all by recent randomized clinical trials (RCTs) [14] [15] [16] [17] [18] .

More recently, a meta-analysis combining both RCTs and observational studies over more than 44,000 patients supported a protective effect of HCQ, driven by the findings of observational studies [5] . A potential explanation for this discrepancy may be due to the usually high dosage administered in RCTs (800 mg/day), compared to lower dosages reported in observational studies supporting HCQ efficacy (≤400 mg/day), as hypothesized elsewhere [5] . As an alternative explanation, it is likely that the efficacy of HCQ treatment for Covid-19 may vary across patients and is influenced by subtypes of the disease, which in turn is largely dependent on patients' characteristics and their non-linear combinations [19] . In this "personalized medicine" view, response to HCQ may not be the same across all patients of the same age, or with similar circulating inflammation levels. In order to identify these combinations, the use of big data like Electronic Health Records (EHRs) and of machine learning (ML) algorithms to interpret hidden HCQ response patterns is of fundamental importance. Notwithstanding it, to our knowledge only one study attempted so far a similar approach through the application of a supervised ML technique (gradient boosting), to identify those patients with likely beneficial effects of HCQ treatment. Interestingly, authors reported a reduction of in-hospital mortality within patients treated with HCQ, which was . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint even more pronounced within those patients predicted to benefit most from the drug, in line with expectations [19] .

Here, we attempted a personalized Covid-19 patient characterization to better disentangle the beneficial effects of HCQ previously reported within the COVID-19 RISK and Treatments (CORIST) study, a large retrospective cohort of patients hospitalized for SARS-CoV-2 infection in Italy [13] . This approach consisted of i) identifying the existence of subtypes of Covid-19 patients through an unsupervised ML algorithm -hierarchical clustering -comparing their characteristics and their clinical risks, ii) testing the resulting patients' clusters for association with mortality and modification of effect by treatment with HCQ and iii) analyzing potential interactions between clusters and HCQ use. This approach represents a prominent example of how personalized medicine may support clinicians in Covid-19 treatment.

The COVID-19 RISK and Treatments (CORIST) study includes 4,396 patients hospitalized for SARS-Cov-2 infection in 35 hospitals across Italy, between February 2020 and May 2020.

Molecular diagnosis of SARS-CoV-2 infection was based on polymerase chain reaction (PCR) of viral DNA extracted and amplified from nasopharyngeal swabs. Within each participating hospital, clinical data were abstracted at one-time point from electronic medical records or charts, and collected using either a centrally-designed electronic worksheet or a centralized web-based database. Collected data included patients' demographics, laboratory tests, medications in use, history of disease and prescribed pharmacological therapy for Covid-19 treatment. For each participant, the study index date was defined as the date of hospital admission, while the study end point was death. Follow-up time was computed as the time between the index date and death, or alternatively between the index date and the date of discharge, applying right-censoring. Further details on the study are reported elsewhere [13, 20, 21] .

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint

All analyses were carried out in R 4.0.2 [22] . We applied a hierarchical clustering analysis on Covid-19 patients using their main clinical, lifestyles and socio-demographic characteristics, which were suggested as the most influential on mortality risk by previous studies in the field [21, 23, 24] .

These included age (years), sex, glomerular filtration rate (eGFR, mL/min/1.73 m 2 ) and highsensitivity plasma C-reactive protein levels (mg/L) at in-hospital admission, obesity (body mass index (BMI) ≥ 30 kg/m 2 ), hypertension (Yes/No), smoking status (never/previous/current smoker), as well as history of myocardial infarction, heart failure, chronic pulmonary disease, cancer and diabetes (Yes/No). Missing data were imputed through a k-Nearest Neighbor approach, implemented in the knn() function of the VIM package (with k=10) [25] . CRP was transformed on the natural logarithm scale to reduce skewness and all the continuous variables were normalized through the normalize() function in keras (see URLs).

Cluster analyses were then performed on the 4,396 patients, using the variables specified above, through the cluster package (see URLs). First, we computed a dissimilarity matrix based on Gower pairwise distance ( Figure S1 ), through the daisy() function. Gower distance is a parameter in the [0;1] range representing the average of partial dissimilarities across individuals (the higher the distance, the more dissimilarities for a given pair of subjects) [26] . Second, we performed hierarchical clustering through the hclust() function applied to the Gower distance matrix computed above, which separated subjects based on their degree of pairwise dissimilarity, both from lowest to highest (agglomerative clustering) and from highest to lowest (divisive clustering). Third, we determined the appropriate number of clusters for patient classification, based on the Average Silhouette method ( Figure S2 ). This computes the number of clusters which maximizes the average silhouette width, a measure of the quality of a clustering indicating how well each object lies within its cluster [27] . This method, applied through the fviz_nbclust() function of the factoextra package . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint (See URLs), computed k=2 as the optimal number of clusters. Finally, each patient was assigned to one of the two clusters determined above, through application of the cutree() function (cluster package) to the results of the cluster analysis previously carried out.

First, we compared the classifications made through agglomerative and hierarchical clustering, which revealed high consistency (Odds Ratio = 34.0 [26.7-43.7], Fisher Exact Test p-value < 10 -15 ).

In light of this homogeneity of classification and since divisive clustering has been reported to be more accurate and robust [28] , all the subsequent analyses were performed on cluster classification identified through the latter approach.

The two clusters of patients identified were then compared for all anamnestic variables mentioned above, through Fisher's Exact Test (for binary variables), Chi-squared test (for non-binary categorical variables), and through Student's t test or Wilcoxon Rank Sum tests (for continuous variables meeting and not meeting parametric assumptions, respectively). Similarly, we compared Covid-19 disease severity, classified by recruiting centers in asymptomatic/mild, non-severe pneumonia, severe pneumonia and acute respiratory distress syndrome (ARDS). Moreover, we compared the use of six common drugs for Covid-19 treatment between the two clusters, including hydroxychloroquine, antihypertensive drugs, anti-interleukin-6 antibody, antivirals (Remdesivir, Lopinavir, Darunavir) and corticosteroids. These were reported as binary variables (Yes/No) and

were therefore compared with clusters through Fisher Exact Tests.

Once the clusters were characterized, we modelled incident mortality risk as a function of patients clusters and use of HCQ through Cox Proportional Hazards (PH) models, using the cox.zph() function of the survival package (see URLs) [29] . Only patients with complete information needed in each model were included in the analysis (case-complete approach, see below). A preliminary . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint check of the basic Cox PH assumptions revealed no influential observations based on dfbeta residuals ( Figure S3a ), while Schoenfeld residuals tests revealed a statistical violation of the proportionality of hazards assumption, although these did not show any evident trend at a visual inspection ( Figure S3b ). For this reason, we carried out Cox PH models both with and without including an interaction term with time-to-event, a strategy commonly used to overcome this violation [30] . Incrementally adjusted models were analyzed: i) a crude model including only hazard ratios (HR) with 95% confidence intervals (95% CI) of dying, and HR with p-values below α = 0.05 were considered significant. To quantify the potential for unmeasured confounding effects, the E-value was calculated for all the HRs observed in Model 3, as described in [31] (see URLs).

This represents the minimum association required for a potential unmeasured confounder with both the exposure and the outcome to explain away the observed association between the latter. In other words, the higher the E-value, the harder it is to attribute an association to an unmeasured covariate [32] .

To better evaluate the associations of Covid-19 patients clusters and HCQ use with negative outcomes other than death, we built a composite endpoint based on the occurrence of at least one of the following outcomes: in-hospital death, access to intensive care unit during hospitalization or severe disease manifestation (either severe pneumonia or ARDS). In this case, the resulting binary variable was assigned a value of 1. Conversely, the variable got "0" value if one of the following alternative conditions applied: i) none of the above mentioned outcomes was verified; ii) a patient . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint survived without recurring to intensive cares or iii) without showing severe manifestations of the disease. Six patients with missing values on survival were removed. Then, we modelled the risk of manifesting a bad outcome through a logistic regression (glm() function in R), modelling both additive and interactive models of Covid-19 patients cluster and HCQ use, as above. This analysis was motivated by the fact that the curse of disease often differs across patients, e.g. with some subjects with less severe forms suddenly worsening their conditions until death and others having severe manifestations but still surviving, possibly thanks to intensive cares. Therefore, a composite outcome variable represented a robust way to measure potential risk/protective effects of patients' clusters and HCQ use.

While both agglomerative and divisive clustering approaches were developed, all statistical analyses presented below are based on clusters identified through the latter approach, since this showed a high homogeneity with the results of agglomerative clustering (see Methods section) and divisive clustering has been reported to be more accurate and robust [28, 30] .

We identified two clusters of Covid-19 patients (with N = 3,913 and 483, Figure 1 ). The larger cluster was younger (mean (SD) age: 65.2 (15.6) vs 77.9 (9.2) years; t-test = -25.9, p < 10 -15 ), with better kidney function (eGFR: 77.9 (26.9) vs 52.6 (25.6) mL/min/1.73 m 2 ; t-test = 20.3, p < 10 -15 ) and lower circulating inflammation levels (CRP: 34.6 (62.4) vs 36.5 (58.6) mg/L; Wilcoxon-test = 847,660, p = 2.5×10 -14 ) (Figure 2a-c) . Conversely, BMI did not show strong differences between the two clusters (28.0 (4.2) vs 27.5 (4.2) Kg/m 2 ; t-test = 2.3, p = 0.02; Figure 2d ). Moreover, patients belonging to the larger cluster were less frequently men (60% vs 75%) and smokers (11.5% vs 19.5%) and showed a lower prevalence of chronic health conditions like myocardial infarction, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint heart failure, diabetes, hypertension, cancer and lung disease (all p<0.0001), while no significant difference was observed in the prevalence of obesity (Table 1) .

Clusters were also associated with severe Covid-19 disease manifestations, with 65.8% of patients in the smaller cluster presenting with either severe pneumonia or ARDS, compared to 45.9% in the larger cluster (Chi-squared = 76.4, p < 10 -15 ; Table S1 ). For the characteristics mentioned above, the large and small clusters will be hereafter named as "low risk" and "high risk" cluster. When we compared the use of specific drugs, in the high risk cluster we observed a less frequent use of HCQ (p < 0.001) and of Lopinavir/Darunavir (p < 0.05) and a more frequent use of corticosteroids and antihypertensive medications (p < 0.001), compared to the low risk cluster (Table S2) .

In Cox PH regressions modelling mortality risk as a function of clusters (Model 1), we analyzed 4,319 patients with a case-complete approach, with a total of 799 deaths and a total follow-up 73,924 person-days (median 13 days). In this model, patients belonging to the high risk cluster showed a significant increase of incident mortality risk, compared to those of the low risk cluster is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint significant reduction in the high risk cluster (0.89 [0.65-1.22], p = 0.47). The above mentioned associations were quite robust against potential unmeasured cofounding effects, with E-values of 3.1, 2.8 and 2.5 for the associations of mortality risk with cluster 2, HCQ use and cluster*HCQ interaction in Model 3. Moreover, these associations remained substantially stable in Cox PH models including additional drugs in use ( Table 2 , Model 4), as well as in those including an interaction term with time (Table S3) .

When we modelled the risk of bad clinical outcomes of the disease -i.e. severe Covid-19 manifestations, access to intensive care unit or death -as a function of clusters and HCQ use, we observed results in line with survival analyses, with increased risk for cluster 2 and decreased risk for HCQ users, both in the additive and in the interactive model (Table 3) 

In the present work, we report a hierarchical clustering analysis applied to patients hospitalized for SARS-CoV-2 infection, which revealed the existence of two separate clusters of Covid-19 patients, based on their clinical and socio-demographic characteristics: one of younger patients with less comorbidities, lower circulating inflammation and better renal function ("low risk" cluster), and one of older and more comorbid patients, more prevalently men and smokers ("high risk" cluster). The former cluster showed a higher prevalence of severe manifestations of Covid-19, ranging from severe pneumonia to ARDS. Moreover, survival analyses showed an almost four-fold increase of incident in-hospital mortality for the high risk compared to the low risk cluster. Although a previous study attempted to identify subtypes of Covid-19 patients, associating them with disease severity . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint [33] , this represents the first attempt to use clustering in disentangling the effect of HCQ on different types of patients, by testing associations with incident in-hospital mortality risk.

Specifically, we tested and observed both additive and interactive associations of HCQ and Covid-19 subtypes. Indeed, the high risk cluster was consistently associated with increased mortality across all models, while treatment with HCQ was generally associated with a halving of death risk, in line with previous evidence from both observational [6] [7] [8] [9] [10] [11] [12] [13] and intervention studies [19] . While we already reported evidence suggesting a protective influence of HCQ against mortality in a largely overlapping sample [13] , here we have further deepened this relationship by testing and reporting a significant association between cluster-by-HCQ interaction and mortality, which was driven by a differential association within the two clusters. Indeed, the low risk cluster showed a significant "protective" influence of HCQ on in-hospital deaths, while the high risk cluster showed a concordant but non-significant association. This represents an element of novelty of the present study, since in our previous work we observed a "protective" association between HCQ and mortality within the totality of patients (about 75% of the current sample size), and when stratifying by age, sex and other characteristics [13] , but not within different subtypes of patients combining all these characteristics together in a non-linear setting, as can be built through unsupervised ML algorithms. Moreover, here our evidence is supported also by concordant associations with a composite and possibly more robust outcome of the disease, based on the occurrence of death, access to ICU and severity of manifestations.

Recently, an approach based on the definition of subtypes of Covid-19 patients has been already proven to be successful in identifying which patients benefit most from HCQ treatment, in a multicenter trial involving six US hospitals and 290 patients hospitalized for Covid-19, the IDENTIFY study [19] . HCQ treatment was associated with higher survival in the treated harm, and especially within those patients which were predicted to benefit most based on a supervised ML algorithm applied to their characteristics, which included blood pressure, heart rate, temperature, respiratory rate, oxygen saturation, white blood cell and platelet count, lactate, blood urea nitrogen, creatinine, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint and bilirubin levels [19] . Interestingly, lactate and creatinine levels where the most important features in this algorithm [19] , the latter representing an index of renal function, which was also a characteristic feature of the low risk cluster in the present study, where HCQ was more effective.

Moreover, patients eligible for HCQ treatment as derived by the algorithm of (Burdick et al., 2020) showed to be younger and less comorbid than the whole population studied, in line with the evidence reported in the present work, suggesting that HCQ treatment may be more effective for younger patients with better general health conditions.

Although to our knowledge this study represents the largest and broadest cluster analysis on Covid-19 patients and a novel approach in analyzing the influence of HCQ treatment on Covid-19 mortality and outcomes, it also presents few limitations.

First, the observational retrospective design does not allow us to completely control for confounders and randomization of treatments across individuals. The former aspect is quite unlikely since a potential residual confounder should be strongly associated with in-hospital mortality to take away observed associations in the interactive model, as suggested by the computed E-values [31, 32] . As for drug therapy, we cannot rule out that assignment to specific treatments was driven also by clinical conditions of the participants, as usually found in common clinical practice. For the same reason, the protective association observed for HCQ may be hypothesized to be driven by other coadministered medications. However, here HCQ and patients' cluster showed significant independent associations, which remained substantially stable across models and survived correction for other drugs in use for Covid-19 treatment. Last, our evidence is in contrast with RCTs published so far [14] [15] [16] [17] [18] , which are commonly conceived as the gold standard for establishing drug efficacy and safety. While we generally agree with this view, we would like to underline that these studies did not randomize patients to treatment arms based on combinations of their features, but rather based on single characteristics such as age and sex. This may be the reason for this . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint discrepancy, along with the hypothesis that the high dosage of HCQ administered in RCTs may be harmful for patients, compared to lower dosages reported in observational studies supporting HCQ efficacy [5] . Of interest, a recent critical review underlined the aspect of suboptimal randomization methods of RCTs, which often do not take into account the whole patient profile and disease severity and may lead to misleading conclusions [34] .

Overall, the evidence supported here and elsewhere [19] suggests that HCQ treatment may be more effective in specific subtypes of Covid-19 patients and indicates machine learning as a useful approach to identify the most "promising" patients in terms of success rate of this treatment.

Ideally, a trial administering low dosages of HCQ (≤400 mg/day) and randomizing subjects based on their Covid-19 subtype profile rather than on single characteristics may be warranted to clarify the effects of HCQ on mortality risk in SARS-CoV-2 infection, especially within those patients with a "low risk" profile.

This may help solving current controversies on the use of HCQ as a medication for Covid-19 and maximize the efficacy of treatment strategies for this yet largely unknown disease, especially in low-income and developing countries with poorer national health systems.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint

In vitro antiviral activity and projection of optimized dosing design of hydroxychloroquine for the treatment of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

Design and synthesis of hydroxyferroquine derivatives with antimalarial and antiviral activities

Is hydroxychloroquine beneficial for COVID-19 patients?

Chloroquine/hydroxycloroquine and COVID-19: need to know more about

Low dose hydroxychloroquine is associated with lower mortality in COVID-19 a meta-analysis of 26 studies and 44521 patients

Treatment with hydroxychloroquine, azithromycin, and combination in patients hospitalized with COVID-19

COVID-19 in New York City

The Effect of Early Hydroxychloroquine-based Therapy in COVID-19 Patients in Ambulatory Care Settings: A Nationwide Prospective Cohort Study

Low dose of hydroxychloroquine reduces fatality of critically ill patients with COVID-19

Use of hydroxychloroquine in hospitalised COVID-19 patients is associated with reduced mortality: Findings from the observational multicentre Italian CORIST study

Effect of Hydroxychloroquine in Hospitalized Patients with Covid-19

Hydroxychloroquine with or without Azithromycin in Mild-to-Moderate Covid-19

Repurposed Antiviral Drugs for Covid-19 -Interim WHO Solidarity Trial Results

Hydroxychloroquine in Nonhospitalized Adults With Early COVID-19 : A Randomized Trial

Hydroxychloroquine for COVID-19: Balancing contrasting claims

Is Machine Learning a Better Way to Identify COVID-19 Patients Who Might Benefit from Hydroxychloroquine Treatment?-The IDENTIFY . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity

RAAS inhibitors are not associated with mortality in COVID-19 patients: Findings from an observational multicenter study in Italy and a meta-analysis of 19 studies

Common cardiovascular risk factors and in-hospital mortality in 3,894 patients with COVID-19: survival analysis and machine learning-based findings from the multicentre Italian CORIST Study

R: A Language and Environment for Statistical Computing

Prevalence and Associated Risk Factors of Mortality Among COVID-19 Patients: A Meta-Analysis

A clustering approach to classify italian regions and provinces based on prevalence and trend of sars-cov-2 cases

Imputation with the R package VIM

A General Coefficient of Similarity and Some of Its Properties

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

Divisive hierarchical maximum likelihood clustering

Modeling survival data : extending the Cox model. New . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

Web Site and R Package for Computing E-values

Sensitivity analysis in observational research: Introducing the E-Value

Identification of COVID-19 Clinical Phenotypes by Principal Component Analysis-Based Cluster Analysis

Randomised controlled trials for COVID-19: evaluation of optimal randomisation methodologies-need for data validation of the completed trials and to improve ongoing and future randomised trial designs

Filippo Aucella27, Greta Barbieri28, Martina Barchitta29, Paolo Bonfanti30,31, Francesco Cacciatore32

Francesco Cipollone17, Claudia Colomba34, Crizia Colombo12, Annalisa Crisetti27, Francesca Crosta14, Gian Battista Danzi36, Damiano D'Ardes17

Jovana Milic10, Filippo Minutolo52, Roberta Mussinelli16, Cristina Mussini10, Maria Musso53, Anna Odone5, Marco Olivieri54, Antonella Palimodde50, Emanuela Pasi42, Raffaele Pesavento55, Francesco Petri30, Carlo A Pivato13

Ilaria Rossi17, Marianna Rossi30, Anna Sabena15, Francesco Salinaro15, Vincenzo Sangiovanni39, Carlo Sanrocco14, Nicola Schiano Moriello41, Laura Scorzolini58, Raffaella Sgariglia47, Paola Giustina Simeone14, Michele Spinicci49, Enrica Tamburrini8, Carlo Torti59, Enrico Maria Trecarichi59, Roberto Vettor55, Andrea Vianello43, Marco Vinceti4

Licia Iacoviello, MD, PhD; Department of Epidemiology and Prevention, IRCCS Neuromed, Via dell'Elettronica, 86077 Pozzilli (IS), Italy; Phone

E-mail: licia.iacoviello@moli-sani.org. Affiliations are reported in the Supplementary File

We thank the participating clinical centers included in this cohort. This Article is dedicated to all the patients who suffered or died, often in solitude, due to COVID-19; their tragic fate gave us moral strength to start and carry out this project. The Authors are responsible for the views expressed in this Article, which do not necessarily represent the views, decisions, or policies of the Institutions they are affiliated with.The COVID-19 RISK and Treatments (CORIST) Collaboration: Augusto Di Castelnuovo1^,

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint E-value calculator: https://www.evalue-calculator.com/

Raw data analyzed in the present work may be made available under approval by each local center involved in the study, in a way which does not affect patients' privacy.

ADiC, RDC and LI conceived the CORIST study. AG, ADiC and LI conceptualized the present work. AG performed statistical analyses and wrote the first draft of the manuscript, under the supervision of ADiC and LI. All the co-authors contributed to collection, curation and elaboration of the analyzed data, and/or to a critical review and editing of the manuscript.

The authors declare no competing financial interest. AG was supported by Fondazione Umberto Veronesi.Alessandro Gialluisi2^, Andrea Antinori3, Nausicaa Berselli4, Lorenzo Blandi5, Marialaura Bonaccio2, Raffaele Bruno6,7, Roberto Cauda8,9, Simona Costanzo2, Giovanni Guaraldi10,CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint Stefano Perlini15,16, Francesca Santilli17, Carlo Signorelli18, Giulio Stefanini13, Alessandra . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint Two main clusters of patients were identified, with N= 3,913 (green) and 483 (red) respectively.. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250238 doi: medRxiv preprint