key: cord-0930584-w9zsrrmv
authors: Calabrese, Fiorella; Pezzuto, Federica; Fortarezza, Francesco; Boscolo, Annalisa; Lunardi, Francesca; Giraudo, Chiara; Cattelan, Annamaria; Del Vecchio, Claudia; Lorenzoni, Giulia; Vedovelli, Luca; Sella, Nicolò; Rossato, Marco; Rea, Federico; Vettor, Roberto; Plebani, Mario; Cozzi, Emanuele; Crisanti, Andrea; Navalesi, Paolo; Gregori, Dario
title: Machine learning‐based analysis of alveolar and vascular injury in SARS‐CoV‐2 acute respiratory failure
date: 2021-03-30
journal: J Pathol
DOI: 10.1002/path.5653
sha: 4ff216bb8371a19f66f318ddc35294d33bca5515
doc_id: 930584
cord_uid: w9zsrrmv

Severe acute respiratory syndrome‐coronavirus‐2 (SARS‐CoV‐2) pneumopathy is characterized by a complex clinical picture and heterogeneous pathological lesions, both involving alveolar and vascular components. The severity and distribution of morphological lesions associated with SARS‐CoV‐2 and how they relate to clinical, laboratory, and radiological data have not yet been studied systematically. The main goals of the present study were to objectively identify pathological phenotypes and factors that, in addition to SARS‐CoV‐2, may influence their occurrence. Lungs from 26 patients who died from SARS‐CoV‐2 acute respiratory failure were comprehensively analysed. Robust machine learning techniques were implemented to obtain a global pathological score to distinguish phenotypes with prevalent vascular or alveolar injury. The score was then analysed to assess its possible correlation with clinical, laboratory, radiological, and tissue viral data. Furthermore, an exploratory random forest algorithm was developed to identify the most discriminative clinical characteristics at hospital admission that might predict pathological phenotypes of SARS‐CoV‐2. Vascular injury phenotype was observed in most cases being consistently present as pure form or in combination with alveolar injury. Phenotypes with more severe alveolar injury showed significantly more frequent tracheal intubation; longer invasive mechanical ventilation, illness duration, intensive care unit or hospital ward stay; and lower tissue viral quantity (p < 0.001). Furthermore, in this phenotype, superimposed infections, tumours, and aspiration pneumonia were also more frequent (p < 0.001). Random forest algorithm identified some clinical features at admission (body mass index, white blood cells, D‐dimer, lymphocyte and platelet counts, fever, respiratory rate, and PaCO(2)) to stratify patients into different clinical clusters and potential pathological phenotypes (a web‐app for score assessment has also been developed; https://r-ubesp.dctv.unipd.it/shiny/AVI-Score/). In SARS‐CoV‐2 positive patients, alveolar injury is often associated with other factors in addition to viral infection. Identifying phenotypical patterns at admission may enable a better stratification of patients, ultimately favouring the most appropriate management. © 2021 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

Coronavirus disease 2019 (COVID-19) due to severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) primarily affects the lung [1] , with a broad spectrum of clinical manifestations. The severity of symptoms is extremely variable, showing the highest morbidity and mortality in elderly men and in patients with chronic comorbidities [2] . The incidence of acute respiratory distress syndrome (ARDS) among COVID-19 patients has been reported to range from 47% to 100% in intensive care unit (ICU) patients and from 1% to 68% in overall hospitalized patients [3] [4] [5] . Even though the most severe forms of acute respiratory failure were initially treated as ARDS, the increased awareness of the disease has suggested that acute respiratory failure is characterized in these patients by distinctive clinical features. Indeed, compared with conventional ARDS, SARS-CoV-2 acute respiratory failure presents severe hypoxaemia and ventilation/ perfusion mismatch, likely due to a downregulation of angiotensin-converting enzyme-2 (ACE-2) secondary to viral endocytosis [6] [7] [8] . However, this aspect is still debated; indeed, other relevant studies reported an increase of ACE-2 expression after viral infection [9] . In addition, COVID-19 is often accompanied by a hypercoagulable state, with high levels of fibrinogen, D-dimers, and an increased risk of thromboembolic complications. Lastly, an abnormal immune response (i.e. 'cytokine storm syndrome') with high levels of interleukin (IL)-2, -4, -6, and -10, and tumour necrosis factor (TNF)-α occurs. All these distinctive clinical features represent only some examples of the complex clinical scenario characterizing COVID-19 patients, whose management still remains extremely problematic [10] [11] [12] [13] . Histological hallmarks of ARDS include a spectrum of lung injuries from the hyaline membrane, oedema, and haemorrhage (acute exudative phase) to type 2 pneumocyte hyperplasia, organizing pneumonia, and squamous metaplasia (organizing/proliferative phase) [14, 15] . All these lesions grouped under the generic pattern called diffuse alveolar damage (DAD) were described in deceased COVID-19 patients as being the major histological findings detected in the lung parenchyma [16] .

Consistent vascular lesions, mainly with the features of endothelialitis, thrombotic microangiopathy, and pulmonary intussusceptive angiogenesis, have recently been reported as distinctive pathological features of COVID-19 pneumopathy [17] [18] [19] [20] [21] . Thus, as clinical features, clear heterogeneous pathological changes have also been reported.

Despite the fact that all the previously mentioned studies have represented an important step forward in our knowledge of SARS-CoV-2-related lesions, the aetiopathogenesis of most of the pathological lung lesions remains unclear. In particular, it is yet to be determined if lung lesions are exclusively related to the viral infection or are conversely more influenced by other factors. Indeed, to date, the distribution of morphological lesions associated with SARS-CoV-2 pneumonia and how they relate to clinical data have not been systematically studied.

In the present work, a large number of lung lesions from patients who died from SARS-CoV-2 acute respiratory failure were comprehensively analysed and quantified by pathologists. Statistical analyses and machine learning algorithms were then used to 'phenomap' patients into prevalent pathological subtypes. The pathological subtypes were analysed in association with clinical, laboratory, molecular, and radiological data with the principal aim of identifying which factors, in addition to SARS-CoV-2 infection, may have influenced the presence of the different pathological lesions. An additional exploratory aim was to identify, using the random forest algorithm (RFA), the most discriminative clinical characteristics at hospital admission that might predict SARS-CoV-2 pathological phenotypes.

The present work was a single-centre prospective study of 26 consecutive COVID-19 laboratory-confirmed autopsies performed at the University Hospital of Padua from 23 March to 23 April 2020, according to national and international protocols, as previously described [22] . The Ethics Committee of our Centre was informed about the study (4853/A0/20): the study was approved by our local Clinical Institution Review Board and complied with the Declaration of Helsinki. The diagnosis for COVID-19 was made according to the WHO interim guidance [23] . Specifically, nasopharyngeal swabs were tested by reverse transcriptionpolymerase chain reaction (RT-PCR) according to international standards. For each patient, the following demographic data and clinical characteristics were recorded: age, gender, body mass index (BMI), comorbidities, sequential organ failure assessment (SOFA) score at hospital admission, ongoing therapies, first positivity for SARS-CoV-2, other respiratory pathogens, length of intensive care unit (ICU) or internal medicine ward (IMW) stay, conventional laboratory data and inflammatory parameters, respiratory rate, blood gas analysis (arterial partial pressure of oxygen, PaO 2 , and carbon dioxide, PaCO 2 ; oxygen saturation), antiviral therapies and anticoagulant therapy during the hospital stay, mode of respiratory support (i.e. conventional O 2 therapy; high flow nasal cannula, HFNC; continuous positive airway pressure, CPAP; or invasive mechanical ventilation, IMV), fraction of inspired oxygen (FiO 2 ), and PaO 2 /FiO 2 ratio.

To reduce the risk of infection spread, at the beginning of the current pandemic computed tomography (CT) scans in our tertiary centre were seldom performed and COVID-19 patients were mainly assessed with chest X-rays at diagnosis and follow-up. Thus, for all patients, the only imaging available was chest X-ray.

All radiographs were evaluated by an experienced radiologist in thoracic imaging (CG) using a previously published and validated composed COVID-19 chest X-ray score (i.e. COVID-19 chest X-ray score -CARE) [24] . In brief, the score was based on the subdivision of each lung in three areas (i.e. upper area, from the apices to the superior margin of the hilum; middle area, from the upper to the lower margin of the hilum; lower area, from the lower margin of the hilum to the costophrenic angle) and a three-grade score describing, separately, the extension of ground glass (i.e. hazy opacity not obliterating bronchi and vessels) and consolidations (i.e. area of attenuation obscuring airways and vessels). The occurrence of additional findings such as pleural effusion, pneumothorax, pneumomediastinum, and subcutaneous emphysema was also recorded.

Autoptic examinations were carried out with a postmortem interval ranging from 24 hours to 6 days. Lungs were removed en bloc and fixed in 10% buffered formalin. Sixteen tissue blocks from the airways (2) alveolar spaces: oedema, blood extravasation, fibrin, hyaline membranes, organizing pneumonia, squamous metaplasia, pneumocyte type II hyperplasia, and inflammatory cells, distinguishing acute (neutrophils) and chronic (macrophages, monocytes, and lymphocytes); (3) alveolar wall: chronic and acute inflammation, interstitial fibroblasts, capillary inflammation, and microthrombi. We define the degrees of capillary inflammation as (a) mild capillary inflammation (only neutrophil margination above baseline), (b) moderate inflammation (neutrophil margination above baseline with at least two back-to-back neutrophils), or (c) severe (frank capillaritis with destruction of capillary walls, blood extravasation, and neutrophilic karyorrhexis) [25, 26] ; (4) vessels: microthrombi and large thrombi; and (5) pleura: inflammatory infiltrates, fibrosis, and fibrin deposition. In each case, a total of 27 histological parameters were separately quantified in 16 slides, distinguishing each lobe of the right and left lung. A total of 432 pathological features were evaluated for each patient. The presence and the extension of each parameter were scored in a semiquantitative manner from 0 to 3 (0: absent; 1: present, focal, in <25% of the section; 2: present, ranging from 25% to 50%; 3: present, diffuse, in >50%). All other additional findings were also carefully reported. Based on the presence and severity of some histological parameters, evaluated in all slides of each case, patients were categorized in a prevalent histological phenotype. Pathological alveolar injury (AI) prevalent phenotype was defined as when the median scores of hyaline membranes, organizing pneumonia, pneumocyte type 2 hyperplasia, and squamous metaplasia were at least as twice as high as the median scores of vascular lesions (microthrombi, large thrombi, vasculitis, and capillary inflammation). We considered prevalent vascular injury (VI) phenotype when the median scores of vascular lesions were double those of AI. A mixed phenotype was defined as when lesions of both AI and VI were equally present (similar median score of the two different types of lesion). The reaction conditions were 10 min at 50 C, then 95 C for 3 min, followed by 45 cycles of denaturation at 95 C for 3 s, then annealing and extension at 55 C for 30 s. To correct for sampling variability, we used human RNAse P (PRORP) as a reference to normalize the viral load using the comparative cycle threshold (CT) method (ΔCt) that transforms the CT into relative loads (ratios of viral target to human target).

All clinical, radiological, and pathological data were recorded in an electronic database. Data are presented as medians (with first and third quartiles) for continuous variables and as percentages for categorical variables. Due to repeated measurements, comparisons among groups were based on generalized linear models (GLMs), with variance inflated using the Huber-White sandwich estimator [27] . GLM family was the Gamma family for continuous response variables with a logarithmic link and quasi-binomial family for the binary response. To further account for multiple testing, P values were adjusted using the Benjamini-Hochberg procedure [27] . Agreement between pathologists was assessed by using Cohen's kappa. The final dataset comprised more than 1.5 million data points coming from the 26 patients.

Data on pathological findings were summarized using robust principal component analysis (RPCA) based on projection pursuits [28] . Since the database had a wide number of variables and a relatively small number of patients, we used more conservative and robust statistical methods for the analysis than the more familiar multilinear regression analysis, which enabled us to separate important features from noisy data. Sparse robust principal components were derived using a grid search algorithm in the plane and with sparseness constraints derived using a grid search and tradeoff product optimization [29] . The PCA loadings were used to characterize the AVI score. The entire analysis was blinded with respect to the operator-based classification of phenotypes into AI or VI subtypes. Subsequently, after unblinding, a cut-off for the score was performed using standard ROC analysis to discriminate between the operator-based classification of phenotypes. The distribution of AVI scores was estimated using a Gaussian kernel and optimal bandwidth selection [30] .

Descriptive analysis of the association between presentation of all clinical data of patients and pathological findings was based on GLMs, with choice of link and Machine learning of SARS-CoV-2 acute respiratory failure 3

variance functions tailored to the type of response (Gaussian functions for continuous variables and binomial functions for binary variables). Survival functions were estimated using a Kaplan-Meier estimator, and multivariable survival models were based on the Cox proportional hazard model. Then to 'phenomap' the clinical variables that were mostly associated with pathological phenotypes [31] , a random forest algorithm (RFA) was implemented [32] . The choice of an RFA model was suggested by the high k/n ratio, where k is the number of candidate variables and n is the number of observations in the samples. RFA was implemented optimizing over the number of trees (6600), a number of k/3 variables randomly selected as candidates for splitting a node, an average number of five data points in a terminal node, and weighted mean-squared error as splitting rule. Missing data were iteratively imputed using the Ishwaran algorithm [33] . The relative importance of each clinical variable against the score derived via RPCA was measured by the minimal depth of variables: the smaller the minimal depth, the greater the impact that the variable has on sorting observations, and therefore on the forest prediction. The mean of the minimal depth distribution was used as an analytic threshold for evidence of variable impact, with variables with a minimal depth lower than this threshold being considered as important in the forest prediction [34] . A representative tree was then extracted from the forest to clinically phenomap the pathological patterns. Extraction was based on the Euclidean distance d2 based on closeness to prediction [35] . All analyses were made using the R software [36] with the pcaPP [29] and the randomForestSRC [37] libraries. Machine learning analysis was reported according to the EQUATOR draft guidelines [38] .

The median age of the 26 patients was 82 years (Q1-Q3: 76-88 years); 42% were women. Table 1 summarizes the main clinical/radiological data. The estimation of disease duration from symptom onset or hospitalization to patient death was 8.5 days (95% CI 7-13) and 5 days (95% CI 4-7), respectively (supplementary material, Figure S1) 

A total of 97 chest X-rays were assessed. During the entire time of hospitalization, all patients except one showed both ground glass and consolidation after at least Machine learning of SARS-CoV-2 acute respiratory failure 5

one chest X-ray. The highest global score (i.e. 36) was found in four patients. The median scores were 2 (Q1-Q3: 1-2.75) and 0 (Q1-Q3: 0-5) for ground glass and consolidation, respectively. The median global score was 4 (Q1-Q3: 1-6) ( Table 1 ). The ground glass score was significantly higher in the left parenchyma than on the right side (p < 0.001), whereas no statistically significant differences emerged for the consolidation and the global score (supplementary material, Table S1 ).

RNA was extracted from frozen lung tissue samples obtained from all patients included in the study population. The quantity and quality of RNA samples were adequate for the RT-qPCR analysis. In particular, mean (AE SD) RNA concentration was 241 AE 193 ng/μl; mean (AE SD) A260/280 and A260/230 ratios were 2.09 AE 0.03 and 1.52 AE 0.55, respectively. In all patients, positivity for SARS-CoV-2 was confirmed in lung tissues by RT-qPCR, showing median (Q1-Q3) values of N1 ΔCt of 1.2 (−3.4 to 6.8), ranging from −9.3 to 13.7 (supplementary material, Figure S2 ).

Microscopic evaluation of lungs revealed lesions in different anatomic areas with a heterogeneous distribution. The evaluation of all slides showed good inter-observer agreement (K value between 0.6 and 1). A statistically significant difference in lobe involvement was observed for some parameters. In lung parenchyma, the highest grades of the pathological parameters of AI were more frequently assigned to the lower lobes. The vascular bed showed various lesions from different grades of capillary inflammation, including the most severe form of neutrophilic capillaritis which was mainly detected in the right lungs (p = 0.001), to microthrombi and, less frequently, macrothrombi. There was no preferential distribution of microthrombi and macrothrombi in the parenchyma. Other types of lesions were also detected in 14 cases (54%), including infections (nine cases: seven bacterial and two fungal, morphologically compatible with Aspergillus spp.), neoplasms (three cases: two squamous cell carcinomas and one malignant solitary fibrous tumour), and aspiration pneumonia (two cases). Some of the infections were unknown before autopsy and were also detected by using special stains (e.g. Gram, Grocott, PAS). Large airway acute or chronic inflammation was observed in all cases. Pleura was frequently affected by inflammation and fibrous reaction, without a significant difference in lung/lobe distribution. From the comparison between cases with associated signs of infectious diseases (fungal or bacterial) and pure COVID-19 cases, pleural involvement was more frequently detected in the second group (39% versus 26%, p = 0.02). Detailed descriptions of the airways, parenchyma, vascular bed, and pleural lesions are reported in supplementary material, Tables S3-S6, respectively. According to the categorization of prevalent histological F Calabrese et al features, the AI phenotype was present in four cases (16%), the VI phenotype in 11 cases (42%), and the mixed phenotype in the remaining 11 cases (42%). Based on the above pathological findings, data were synthesized using RPCA. The first RPCA dimension explained 64.67% of the total variance, with higher loadings in absolute values determined by hyaline membrane, intra-alveolar blood/fibrin and neutrophils, type 2 pneumocyte hyperplasia, intra-alveolar macrophages/lymphocytes, oedema, organizing pneumonia, intra-alveolar squamous metaplasia, capillary inflammation, pleural inflammation, and thrombi (supplementary material, Table S2 ). The first RPCA dimension was thus used to derive a synthetic score to characterize each patient with respect to the most important variables. Such a score was challenged against a blinded classification by the pathologists who classified patients as 'prevalent AI', 'prevalent VI', or 'mixed' phenotype. Standard ROC analysis showed that the score, called 'AVI score', discriminated well between prevalent AI and prevalent VI patients. Positive AVI-score values identified prevalent AI phenotype, while negative AVI-score values identified prevalent VI phenotype. The mixed phenotype shared AVI-score values from both phenotypes (Figure 1 ).

A positive value of AVI score (consistent with a more representative presence of AI) was directly correlated with endotracheal intubation and the length of invasive mechanical ventilation, longer hospital and ICU stay, duration of illness from symptom onset, and lower tissue viral quantity (p < 0.001) (Figure 2 and supplementary material, Tables S7-S10). A positive AVI score was also associated with the presence of other pathological lesions, such as aspiration pneumonia, other infections, neoplasms, or all three (p < 0.001) (Figure 3 and supplementary material, Table S11). A negative value of AVI score (consistent with a more representative presence of VI) was not related to any clinical data. Neither positive nor negative value of AVI score was correlated with any radiological findings. More than 40 clinical covariates were analysed to characterize the AVI-score distribution among patients, deriving a synthetic 'phenomap' of the underlying pathological aspects. Supplementary material, Table S12 presents anonymized data for the 26 patients. The regression RFA showed good performance (98.7% of explained variance at 10-fold crossvalidation) using variables recorded at the time of hospital admission: BMI, white blood cells (WBC), D-dimer, lymphocyte (both absolute and percentage) count, platelet count, fever, respiratory rate, and PaCO 2 ( Figure 4A ). Instead, radiological variables showed no significant correlation and were excluded from further analysis. The combination of the above parameters at the time of admission could lead to the identification of clinical clusters associated with different pathological phenotypes and different AVI scores. For instance, patients with a low respiratory rate, unusual hypercapnia, and lymphocytopenia at admission are potentially at higher risk of developing a 'prevalent AI' phenotype (positive AVI score) ( Figure 4B) .

To facilitate the interpretation and use of the phenotypic representation, a web-app has been developed and made available at https://r-ubesp.dctv.unipd.it/shiny/AVI-Score/.

In this study of 26 consecutive patients who died from SARS-CoV-2 acute respiratory failure, we identified Machine learning of SARS-CoV-2 acute respiratory failure 7 three major phenotypes: (1) phenotype with a prevalent DAD (AI); (2) phenotype with prevalent vascular lesions (VI); and (3) phenotype with mixed DAD and VI (Figure 1 ). This evidence was obtained following a thorough analysis of multiple lung samplings to carefully capture all lesions and to avoid underestimating or missing some important tissue alterations [39] . To our knowledge, this is the first study that has used a combination of robust machine learning techniques to objectively identify pathological phenotypes and factors, which, in addition to SARS-CoV-2, may influence their occurrence. The generated AVI score grouped lung lesions in a spectrum identifying vascular injury for negative values and alveolar damage for positive values. The association between AVI score and clinical data showed that a positive value of the score was significantly related to a longer duration of the disease (whether calculated from the onset of symptoms or from the hospital stay), longer ICU stay, tracheal intubation, and prolonged invasive mechanical ventilation, independently of other factors. Generally, patients affected by SARS-CoV-2 acute respiratory failure are more hypoxic and usually require invasive mechanical ventilation and a longer hospital stay. Prolonged mechanical ventilation is an important risk factor for ventilatorassociated pneumonia, aspiration, and superimposed 

infections, and all of these elements could have contributed to the development of a prevalent AI, generating a vicious circle [40, 41] .

We also detected a more extensive AI in patients with associated other lesions: neoplasms, aspiration pneumonia, and other infectionsthe last two also found in patients not treated by prolonged mechanical ventilation. The risk of developing AI in patients with infectious pneumonia is higher in hospitalized patients, particularly in cases with viral, fungal, or mixed infections [42] . The pathogenetic mechanisms at the basis of AI in these pathological lesions are complex, often related to a dysregulation of innate or acquired immune response [43] . Thus, our observations suggest that the development of AI is not exclusively related to SARS-CoV-2 infection. Supporting this, the detection of a lower tissue viral quantity in patients with more extensive AI is noteworthy. Such findings are in agreement with earlier observations, obtained in a multicentre study with our contribution, which showed an association of AI with longer disease duration and less frequent tissue virus detection [44] . Based on this, a speculated timeline for the disease may involve viral-related lung damage and significant vascular injury in the early stages, whereas in the advanced stages the damage may be mainly influenced by other factors (such as prolonged disease duration or mechanical ventilation, superinfections, etc.) responsible for maintaining the inflammatory state with a prevalent pathological appearance of DAD.

In our study, negative AVI score (which means more extensive VI) was not correlated to any clinical outcome data, including all coagulation parameters. This was not an unexpected finding in our case series, considering that the majority of patients had received anticoagulant therapy due to comorbidities or prophylaxis. The absence of any correlation between disease duration, type of clinical management, and other pathological lesions may suggest that VI is a peculiar feature of the COVID-19 pneumopathy. Indeed, in the literature, considerable evidence indicates the important role of vascular damage in the aetiopathogenesis of the disease [17] [18] [19] [20] [21] 45] .

In COVID-19, the specificity of pathological lung lesions, particularly vascular injury, becomes even more interesting when we compare the characteristics observed in our case series with those of prior severe global pandemics -SARS, MERS, and H1N1 influenza. Patterns of lung injury, including DAD in exudative and/or organizing phases, were identified in MERS and SARS [46] [47] [48] [49] [50] [51] . However, the heterogeneous pattern of vascular lesions frequently detected in SAR-CoV2 infection was never reported in any of these infections. Particularly, Ackermann et al reported that several vascular lesions, including endothelialitis, thrombosis, and intussusceptive angiogenesis, were detected in COVID-19 patients more frequently compared with severe influenza virus infections [21] .

No associations between AVI score and radiological findings were detected. Although radiological imaging plays a crucial role in diagnosing and monitoring patients with SARS-CoV-2 pneumopathy [52] [53] [54] , our results showed that chest X-rays only marginally affect the AVI score. This controversial finding might be due to the severity of the disease in our population and the low specificity of the applied technique. Although chest X-ray has been shown to be a reliable tool for diagnosing and monitoring patients with COVID-19 [55] [56] [57] , further studies using CT scans are highly recommended to obtain more accurate characterizations of the disease.

We also built a model based on RFA in an attempt to identify clinical, laboratory, or functional 'biomarkers' that, at hospital admission, could predict the development of distinct pathological lesions. It is noteworthy that WBC, lymphocyte and platelet counts, BMI, PaCO 2 , fever, respiratory rate, and D-dimer were the most important features able to stratify patients into different clinical clusters and 'potential' pathological phenotypes with the aim of allowing more individualized management. For instance, in patients at risk of developing a 'prevalent' AI phenotype, more attention should be given to optimize mechanical ventilation to ensure a lung-protective ventilatory strategy, avoiding 'high' tidal volumes or transpulmonary pressure, preventing patient self-inflicted lung injury, atelectasis, barotrauma, and pneumothorax during both invasive and noninvasive partial ventilatory support [58] . Finally, a proper antibiotic stewardship should be established with the aim of minimizing any potential risk factor for superimposed infections [59] . On the contrary, in patients at risk of developing a potential VI phenotype, the primary aim is to optimize thrombotic prophylaxis. Specifically, thromboprophylaxis should be considered for all hospitalized COVID-19 patients [60, 61] . This model, if validated in a large multicentre case series, could be highly informative for more appropriate management of COVID-19 patients.

The present study has limitations. First, the sample size was relatively small and from a single centre. It should be noted, however, that this is one of the largest singlecentre case series in which all the patients received protocolized medical treatments, respiratory support [62] , and standardized lung sampling methodology and analysis. Furthermore, the statistical techniques were among those best suited for dealing with a limited number of patients and a large number of covariates, also including the different treatments. However, the risk of instability in the estimates could not be excluded; for example, there could have been a potential underestimation of the rate of 'pure' AI, which was observed in a few cases. Nevertheless, both the clinical and the pathological observations, as derived from the statistical models, do agree well with the current knowledge about COVID-19 pathology. Second, some clinical features could be affected by the numerous therapeutic treatments administered to patients. Accordingly, all clinical information subsequent to the patients' admissions was excluded from the phenomapping and used only for the exploratory analysis.

Finally, all pathological parameters were analysed only on haematoxylin and eosin-stained slides using a scoring system that is observer-dependent and subjective. However, all the evaluations were blindly performed by Machine learning of SARS-CoV-2 acute respiratory failure 9

two pathologists, showing good inter-observer agreement and, overall, were confirmed by machine learning analysis. The present study could represent a good starting point to promote the use of a deep learning approach in digitized slides stained both with haematoxylin and eosin and immunohistochemistry for a more precise definition of the phenotypical characteristics of the inflammatory infiltrate.

In conclusion, we found that vascular lesions are an important feature of COVID-19 pneumonia since they are consistently present in the majority of cases, while AI is related to several factors in addition to SARS-CoV-2 infection. The identification of phenotypical patterns associated with clinical characteristics could allow us to stratify COVID-19 patients into different risk clusters to optimize future management strategies. Table S1 . Radiological data obtained from the scoring system Table S2 . Loadings of the principal components analysis (PCA) Machine learning of SARS-CoV-2 acute respiratory failure 11 Table S3 . Pathological findings of the airways Table S4 . Pathological findings of the lung parenchyma Table S5 . Pathological findings of the lung vascular bed Table S6 . Pathological findings of the pleura Table S7 . Distribution of morphological characteristics and AVI scores according to HFNC, CPAP-NIV, and IMV during ICU stay Table S8 . Distribution of morphological characteristics and AVI score according to duration of IMV Table S9 . Distribution of morphological characteristics and AVI scores according to duration of ICU stay Table S10 . Distribution of AVI score according to duration of disease since hospitalization and symptoms' onset Table S11 . Distribution of morphological characteristics, AVI-score distribution, and other lesions 

Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China

Similarity in case fatality rates (CFR) of COVID-19/SARS-COV-2 in Italy and China

Helmet continuous positive airway pressure and prone positioning: a proposal for an early management of COVID-19 patients

Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a singlecentered, retrospective, observational study

Incidence of ARDS and outcomes in hospitalized patients with COVID-19: a global literature survey

Preparing for the most critically ill patients with COVID-19: the potential role of extracorporeal membrane oxygenation

Prone position and lung ventilation and perfusion matching in acute respiratory failure due to COVID-19

COVID-19: a hypothesis regarding the ventilation-perfusion mismatch. Version 2

Assessing ACE2 expression patterns in lung tissues in the pathogenesis of COVID-19

COVID-19-related severe hypercoagulability in patients admitted to intensive care unit for acute respiratory failure

Profiling serum cytokines in COVID-19 patients reveals IL-6 and IL-10 are disease severity predictors

Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China

Diffuse alveolar damagethe role of oxygen, shock, and related factors. A review

The pathologist's approach to acute lung injury

Pathological findings of COVID-19 associated with acute respiratory distress syndrome

Postmortem examination of COVID-19 patients reveals diffuse alveolar damage with severe capillary congestion and variegated findings in lungs and other organs suggesting vascular dysfunction

Autopsy findings and venous thromboembolism in patients with COVID-19: a prospective cohort study

Endothelial cell infection and endotheliitis in COVID-19

Complement associated microvascular injury and thrombosis in the pathogenesis of severe COVID-19 infection: a report of five cases

Pulmonary vascular endothelialitis, thrombosis, and angiogenesis in Covid-19

Feasibility of postmortem examination in the era of COVID-19 pandemic: the experience of a Northeast Italy University Hospital

Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected

Validation of a composed COVID-19 chest radiography score: the CARE project

International Society for Heart and Lung Transplantation Working Formulation for the standardization of nomenclature in the pathologic diagnosis of antibody-mediated rejection in heart transplantation

Banff study of pathologic changes in lung allograft biopsy specimens with donor-specific antibodies

The behavior of maximum likelihood estimates under nonstandard conditions

Algorithms for Projection-Pursuit robust principal component analysis

Robust sparse principal component analysis

A reliable data-based bandwidth selection method for kernel density estimation

Another step toward personalized care of patients with heart failure

Random forests

Random survival forests

High-dimensional variable selection for survival data

Identifying representative trees from ensembles

R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna

Fast Unified Random Forest for Survival, Regression, and Classification (RF-SRC)

Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view

Pulmonary pathology and COVID-19: lessons from autopsy. The experience of European Pulmonary Pathologists

The extent of ventilator-induced lung injury in mice partly depends on duration of mechanical ventilation

Definition and epidemiology of acute respiratory distress syndrome

Risk factors for the development of acute lung injury in patients with infectious pneumonia

Acute lower respiratory tract infection

COVID-19 pulmonary pathology: a multi-institutional autopsy cohort from Italy and New York City

Two sorts of microthrombi in a patient with Coronavirus Disease 2019 and lung cancer

Pulmonary pathologic findings of fatal 2009 pandemic influenza A/H1N1 viral infections

Histopathological and immunohistochemical findings of 20 autopsy cases with 2009 H1N1 virus infection

Clinicopathologic, immunohistochemical, and ultrastructural findings of a fatal case of Middle East respiratory syndrome coronavirus infection in the United Arab Emirates

Lung pathology of severe acute respiratory syndrome (SARS): a study of 8 autopsy cases from Singapore

A novel coronavirus associated with severe acute respiratory syndrome

Pulmonary pathology of severe acute respiratory syndrome in Toronto

Radiographic severity index in COVID-19 pneumonia: relationship to age and sex in 783 Italian patients

It's not over until it's over: the chameleonic behavior of COVID-19 over a six-day period

Clinical and chest radiography features determine patient outcomes in young and middleaged adults with COVID-19

The role of chest radiography in confirming covid-19 pneumonia

The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society

Radiological Society of North America expert consensus statement on reporting chest CT findings related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA -secondary publication

COVID-19-associated acute respiratory distress syndrome: is a different approach to management warranted?

International ERS/ ESICM/ESCMID/ALAT guidelines for the management of hospitalacquired pneumonia and ventilator-associated pneumonia: guidelines for the management of hospital-acquired pneumonia (HAP)/ventilator-associated pneumonia (VAP) of the

Scientific and Standardization Committee communication: clinical guidance on the diagnosis, prevention, and treatment of venous thromboembolism in hospitalized patients with COVID-19

Diagnosis, management, and pathophysiology of arterial and venous thrombosis in COVID-19

Regional COVID-19 network for coordination of SARS-CoV-2 outbreak in Veneto, Italy

We wish to thank Judith Wilson for English revision. This research was partially supported by a fellowship from the University of Padua/Intesa San Paolo Vita Bank (2020A08).

Author contributions statement FC, FP, FF and FL performed the autopsies and histological evaluations. CG provided radiological data. AB, NS, PN, AC, MR, FR, CDV, ACr, RV and MP provided clinical data. DG, LV and GL carried out the statistical data analyses. FC, FP, FF, EC, AB, CG, PN and DG drafted the manuscript. All the authors contributed to study conception, design, and revision of the manuscript.