key: cord-0976952-yzodfm7h authors: Homayounieh, Fatemeh; Bezerra Cavalcanti Rockenbach, Marcio Aloisio; Ebrahimian, Shadi; Doda Khera, Ruhani; Bizzo, Bernardo C.; Buch, Varun; Babaei, Rosa; Karimi Mobin, Hadi; Mohseni, Iman; Mitschke, Matthias; Zimmermann, Mathis; Durlak, Felix; Rauch, Franziska; Digumarthy, Subba R; Kalra, Mannudeep K. title: Multicenter Assessment of CT Pneumonia Analysis Prototype for Predicting Disease Severity and Patient Outcome date: 2021-02-25 journal: J Digit Imaging DOI: 10.1007/s10278-021-00430-9 sha: 61099aa0076c12588afd0844f27a029fe854840e doc_id: 976952 cord_uid: yzodfm7h To perform a multicenter assessment of the CT Pneumonia Analysis prototype for predicting disease severity and patient outcome in COVID-19 pneumonia both without and with integration of clinical information. Our IRB-approved observational study included consecutive 241 adult patients (> 18 years; 105 females; 136 males) with RT-PCR-positive COVID-19 pneumonia who underwent non-contrast chest CT at one of the two tertiary care hospitals (site A: Massachusetts General Hospital, USA; site B: Firoozgar Hospital Iran). We recorded patient age, gender, comorbid conditions, laboratory values, intensive care unit (ICU) admission, mechanical ventilation, and final outcome (recovery or death). Two thoracic radiologists reviewed all chest CTs to record type, extent of pulmonary opacities based on the percentage of lobe involved, and severity of respiratory motion artifacts. Thin-section CT images were processed with the prototype (Siemens Healthineers) to obtain quantitative features including lung volumes, volume and percentage of all-type and high-attenuation opacities (≥ −200 HU), and mean HU and standard deviation of opacities within a given lung region. These values are estimated for the total combined lung volume, and separately for each lung and each lung lobe. Multivariable analyses of variance (MANOVA) and multiple logistic regression were performed for data analyses. About 26% of chest CTs (62/241) had moderate to severe motion artifacts. There were no significant differences in the AUCs of quantitative features for predicting disease severity with and without motion artifacts (AUC 0.94–0.97) as well as for predicting patient outcome (AUC 0.7–0.77) (p > 0.5). Combination of the volume of all-attenuation opacities and the percentage of high-attenuation opacities (AUC 0.76–0.82, 95% confidence interval (CI) 0.73–0.82) had higher AUC for predicting ICU admission than the subjective severity scores (AUC 0.69–0.77, 95% CI 0.69–0.81). Despite a high frequency of motion artifacts, quantitative features of pulmonary opacities from chest CT can help differentiate patients with favorable and adverse outcomes. In a global health crisis precipitated by a high prevalence infectious disease, it is critical to understand the associated morbidity and mortality as well as to anticipate and prepare resources needed to mitigate the crises [1] [2] [3] . Assessment of disease severity regardless of its etiology requires knowledge of at-risk patient demographics, their underlying comorbidities, symptoms, vital signs, and laboratory and imaging findings. Such clinical information coupled with the epidemiologic statistics and simulations help understand available and needed healthcare resources to mitigate and minimize the impact of healthcare crises. Such healthcare resources include clinical personnel as well as available hospital and ICU beds, personal protective equipment like masks, shields, and gowns to life support devices such as mechanical ventilators and dialysis units. Even the most advanced nations on our planet can become overwhelmed in a pandemic without such knowledge and careful planning [1, 3] . The ongoing pandemic from the novel coronavirus disease of 2019 (COVID-19) is a textbook example of a healthcare crisis that requires such planning and information [4] . The reverse transcriptase-polymerase chain reaction (RT-PCR) assay is the diagnostic mainstay; imaging use is variable, based on availability of RT-PCR assay, and extends from diagnosis to assessment of disease severity and complications [4] [5] [6] [7] [8] [9] [10] [11] . Within weeks of the outbreak, there were published data on features and severity assessment of the disease on chest radiography and CT, the most frequent imaging procedures in the hospital-admitted patients [4] [5] [6] [7] [8] [9] [10] [11] [12] . In anticipation of the huge caseload and need to diagnose and quantify disease burden, the deep learning (DL) community had an early start on the pandemic [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] . Several studies reported DL algorithms on diagnosis, differentiation from other pneumonia, severity assessment, and mortality prediction based on imaging features [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] . However, most studies lack assessment with vastly different scanner technologies and geographic regions as well as the effect of frequent respiratory motion artifacts on the relative performance of DL-generated features versus subjective severity assessment and clinical/ laboratory data [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] . In this context, we assembled a database of clinical and imaging findings from two sites with dissimilar races, geography, and healthcare to assess a DL-based CT Pneumonia Analysis prototype (Siemens Healthineers, Erlangen, Germany) which was trained on a separate multicenter COVID-19 data. We performed a multicenter assessment of the CT Pneumonia Analysis prototype for predicting disease severity and patient outcome in COVID-19 pneumonia both without and with integration of clinical information. Our retrospective study was performed following institutional ethical board (IRB) approvals with waiver of written informed consent at both participating sites. De-identified clinical and imaging data were used in compliance with guidelines outlined in the Health Insurance Portability and Accountability Act (HIPAA). We did not receive any research grant or support pertaining to the prototype from a for-profit vendor (Siemens Healthineers) described in the manuscript. Our institution has received unrelated research grants from GE Healthcare, Lunit Inc., Riverain Tech and Siemens Healthineers. Four coauthors (MM, MZ, FD, and FR), employees of Siemens Healthineers, were included to ensure veracity of technical description of the prototype; they did not participate in subject recruitment, data collection, or data analysis parts of the study. Our study included 241 adult patients with RT-PCR-positive COVID-19 pneumonia from two tertiary care hospitals (site A: Massachusetts General Hospital, USA; and site B: Massachusetts General Hospital, Iran). Site A contributed 124 de-identified patients (mean age (± standard deviation) 76 ± 10 years; 64 females and 60 males). Site B contributed data from 117 de-identified patients (mean age 61 ± 17 years; 41 females, 75 males). These represented consecutive patients who underwent non-contrast chest CT for clinically indicated reasons such as false-negative or pending RT-PCR assay for COVID-19 pneumonia, assessment of moderate or severe pneumonia, and suspected complications. Patients with postcontrast chest CT were excluded since the prototype was trained for evaluation of non-contrast chest CT and did not recommend use of post-contrast CT. For each patient, study coinvestigators recorded the following information from their medical records: patient age, gender, past medical history (presence of hypertension, diabetes, cancer, immunosuppressive disease, asthma/ chronic obstructive pulmonary disease, and ischemic heart disease), white blood cell counts, platelet counts, and lactate dehydrogenase (LDH). In addition, we recorded whether patients required intensive care unit (ICU at both study sites) and mechanical ventilation (only available for site A) during the course of their hospital admission. Site A: Using standard-of-care department protocol, all non-contrast chest CT examinations were performed on one of the following scanners: 64-92-detector-row, dual-source CT (Siemens Definition or Force, Siemens Healthineers, Forchheim, Germany), 64-detectorrow, single-source CT (Siemens Definition Edge), and GE Discovery 750 HD (GE Healthcare, Waukesha, Wisconsin, USA). The scan factors included 100-120 kV, automatic exposure control (CARE Dose 4D, Siemens: quality reference mAs of 100; Auto mA, GE: 25-35 noise index), 0.9-0.984:1 pitch, and 0.5-s gantry rotation time. Images were reconstructed with iterative reconstruction techniques (Admire, Siemens: iterative reconstruction strength of 2 for section thickness of 1 mm; ASIR, GE: 40% strength of iterative reconstruction technique for section thickness of 1.25 mm). Site B: All non-contrast chest CT examinations were performed in accordance with the standard of care protocol with a 16-slice, multidetector-row CT scanner (Siemens SOMATOM Emotion 16, Siemens Healthineers, Forchheim, Germany). The scan factors included 110-130 kV, 30-50 mAs (with fixed tube current), 1.5:1 pitch, 16 × 1.2 mm detector configuration, and 1-s gantry rotation time. Filtered back projection reconstruction images with 2-mm section thickness and B20f (standard soft tissue) kernel were used for image analyses. Two thoracic subspecialty radiologists (SD with 16-year experience, MK with 14-year experience) reviewed all 241 chest examinations in consensus (RadiAnt Dicom Viewer, Medixant, Poznan, Poland) in both lung (window level −600 HU, window width 1500 HU) and soft tissue (window level 50 HU, window width 350 HU) windows. Both radiologists were allowed to change or adjust the display windows according to their preference and anatomy of interest. In each of the five lung lobes, the radiologists separately recorded presence, type (1 = ground-glass; 2 = mixed defined as ground-glass with consolidation and/or interlobular septal thickening; 3 = consolidation), and extent of pulmonary opacities based on the percentage of lobe involved (0-no opacities; 1-less than 5% lobe volume involved; 2-5-25% lobe involved; 3-26-50% lobe involved; 4-51-75% lobe involved; 5-greater than 75% lobar involvement). This scoring system for COVID-19 pneumonia was described in prior publications [7, 8] . Overall subjective severity score was obtained by adding the lobar involvement scores, and then classified into two groups for statistical analyses (severe: > 15; non-severe: ≤ 15). Radiologists also recorded presence and severity of respiratory motion artifacts within the lungs for all CTs on a 4-point scale (0: no motion artifacts, 1: minimal motion affecting less than 10% of the lungs; 2: moderate artifacts affecting 10-50% of the lungs without compromising assessment of pulmonary opacities, 3: severe artifacts affecting > 50% of the lungs and limiting evaluation of pulmonary findings). Deidentified DICOM images of patients were processed (FH with 2 years of post-doctoral research experience) with the deep learning-based prototype, which is an offline, standalone software. The research prototype is not approved by United States Food and Drug Administration for clinical use. The prototype was trained and validated separately for detection (on 1371 chest CT exams with COVID-19, other viral pneumonia, and groundglass and consolidative opacities of other etiologies) and quantification (on 1000 chest CT with COVID-19 pneumonia, 131 with interstitial lung diseases, 113 bacterial pneumonia, and 559 normal CT scans). The training and validation chest CT exams did not belong to either of the two sites included in our study. The details of the prototype are described in a previous publication [26] . For lung and lobe segmentation, the algorithm first applies multi-scale deep reinforcement learning to detect anatomic landmarks such as carina and sternal tip. Then, the algorithm resamples the isolated lung region of interest to a 2-mm isotropic volume and processed with a deep image-to-image network (DI2IN) to create lung segmentation. Lastly, the segmented lung mask is reconfigured to the original resolution of CT input data. For the COVID-19-related abnormality segmentation, we trained a DenseUNet with anisotropic kernels to covert CT 3D image volume to a semantic segmentation mask. Then, a single label is used to define all lung voxels with ground-glass or consolidative opacities as positive voxels. The remaining regions are defined as negative within the network trained as an end-to-end segmentation system. The algorithm filters the output 3D segmentation by the lung segmentation. All automatically segmented volume masks were reviewed to verify their accuracy. Manual editing was required for only 2/241 chest CT examinations included in our study. In these two exams, generated contours included subcutaneous emphysema and pneumothorax as lung parenchyma in one patient and stomach air as part of the left lower lung in another patient. Upon confirmation of the segmented contours, the prototype estimates several quantitative features related to the presence of pulmonary opacities (binary score based on presence of opacity), opacity scores based on percentage of lobe involved (score 0: 0%, 1: 1-25%, 2: 26-50%, 3: 51-75%, 4 > 75% of lobe involved), lung volume (in ml), volume and percentage of all-attenuation opacities within a given lung region (as absolute volume and relative percentage of opacities), volume and percentage of high-attenuation opacities (as absolute volume and percentage of pulmonary opacities with attenuation ≥ −200 HU), and mean HU and standard deviations for lung parenchyma as well as pulmonary opacities within given lung regions. The given lung regions imply separate values for both lungs combined as well as for each lung and each lung lobe. Data were recorded and analyzed for descriptive statistics with Microsoft EXCEL (Microsoft Inc., Redmond, Washington, USA). We calculated linear correlation coefficients between radiologists' severity score and quantitative features with Microsoft EXCEL. Multivariable analysis of variance (MANOVA) was performed to determine differences in quantitative features and type of pulmonary opacities as recorded from radiologists' assessment. Multiple logistic regression analyses were performed with R Statistical Computing software (https :// www.R-proje ct.org, R Foundation for Statistical Computing, Vienna, Austria, accessed on 6.20.2020) to assess if severity scores determined by radiologists and the quantitative features could predict patient outcome (death versus recovery) and need for ICU admission. Areas under the curve (AUC with 95% confidence interval) were set as the output information for the regression analyses. We used p-value of less than 0.05 as a statistically significant difference. More than a quarter of chest CT examinations ( There was a moderate to strong direct linear correlation between the subjective severity scores and the quantitative features for chest CT data from both sites. Table 1 summarizes the correlation coefficients for entire lung volumes, separately for right and left lung and lung lobes. The opacity scores mean HU of the lungs as well as mean HU of pulmonary opacities were significantly different for ground-glass, mixed, and consolidative opacities on chest CT examinations from both sites (p < 0.0001) ( Table 2 ). The prototype-estimated mean HU of pulmonary opacities for ground-glass, mixed, and consolidative opacities were −555 HU, −457 HU, and −399 HU, respectively (p < 0.0001). The average and standard deviations of quantitative features for different patient outcomes are summaries in Table 3 . Transverse chest CT image d demonstrate diffuse ground-glass opacities with scattered areas with consolidation which are displayed in extensive red color on the volume rendered 3D image e and quantified in the table from prototype f CI 0.73-0.82) had higher AUC for predicting ICU admission than the subjective severity scores (AUC 0.69-0.77, 95% CI 0.69-0.81). Among clinical and laboratory variables, white blood cell count (AUC 0.64, 95% CI 0.64-0.69) in site A was the best predictor for ICU admission; patient age (AUC 0. 68, 95% CI 0.68) was the best predictor for ICU admission in site B. Addition of clinical/laboratory data did not result in a significant change in the AUCs of either subjective severity scores or the quantitative features (p > 0.05) ( Table 2) . For site A, the percentage of all-attenuation opacities in the entire lungs (both lungs combined) was the best feature for predicting the need of mechanical ventilation (AUC 0.83, 95% CI 0.83-0.85). The subjective severity score had performance (AUC 0.82, 95% CI 0.82-0.83) for differentiating those with and without mechanical ventilation. Clinical features (best feature being LDH) had significantly lower AUC (0.71, 95% CI 0.70-0.76) for such differentiation. As noted above, data on mechanical ventilation were not available for site B. There were significant differences in distribution of patients who recovered (site A: 59%, 71/121; site B: 82%, 85/104) versus those who died from COVID-19 pneumonia (site A: 41%, 50/121; site B: 18%, 19/104) (p < 0.01) (Figs. 1 and 2) . While the volume of all-attenuation opacities was the best predictor of the final outcome for site A (AUC 0.72, 95% CI 0.70-0.72), the combination of percentage of high-attenuation opacities and volume of all-attenuation opacities of left lower lobe was the best subset (AUC 0.77, 95% CI 0.70-0.84) for site B. The quantitative features for both sites were better than both the subjective severity scores and type of pulmonary opacities from radiologists' assessment (best AUC 0.68, 95% CI 0.67-0.68) for predicting the final outcome. At site A, LDH (AUC 0.69, 95% CI 0.69) could predict patient outcome in site A; no demographic, clinical, or laboratory variable could predict patient outcome for site B. Quantitative features obtained from the DL-based prototype were superior to both the subjective severity scores from radiologists' assessment as well as clinical and laboratory data for prediction of patient outcomes (death, ICU admission, and mechanical ventilation) in patients with COVID-190 pneumonia. Jiang et al. reported 70-80% accuracy for an AI framework based on patient symptoms and laboratory values for assessing disease severity and outcome in COVID-19 [21] . The lower performance of clinical and laboratory data in our study (maximum AUC of 0.69), might have been related to differences in patient demographics, disease severity and/or management strategies. As reported in prior studies, the percentage and volume of pulmonary opacities were best features for predicting patient outcomes at both sites included in our study [22] [23] [24] . Lanza et al. reported that the percentage of compromised lung volume (between −50 and 100 HU) was the most accurate outcome predictor for risk of oxygen support, intubation and in-hospital death in a single-center study of 222 patients [22] . In a study with 176 patients, the volume and ratio of regions with ground-glass opacities (between −700 and −300 HU) obtained from a DL algorithm were the best of the 30 quantitative features obtained with the best AUC of 0.91 for classifying the patient into severe and nonsevere COVID-19 pneumonia [23] . Matos et al. reported an AUC of up to 0.92 for their DL models using clinical/ laboratory data and CT feature (volume of disease) to predict patient outcome (need for mechanical ventilation or death) [24] . There could be several reasons for the differential performance of our DL algorithm (maximum AUC of 0.82) including the higher frequency of motion artifacts in our study and differences in patient population, disease severity, comorbidities, and treatment strategies. The differences in performance may also be related to significant variations in training and test datasets used in our study as opposed to the other publications where testing and training datasets originated from the same or similar sources [21] [22] [23] [24] . The strong predictive value of quantitative features such as opacity scores, volume and percentage of pulmonary opacities for the radiologists, assessed severity and type of pulmonary opacities emphasizes the robust performance of the prototype. This finding is consistent with another study of 126 patients with COVID-19, which reported significant differences (p < 0.01) between CT lung opacification percentage between patients with different severities of covid-19 pneumonia [25] . The primary implication of our study is the ability to classify and quantify the type and severity of pulmonary opacities on chest CT with a single-click processing on the prototype used in our study. The predictive information from the prototype on patient outcome (death versus recovery), ICU admission, and mechanical ventilation can help in patient management and resource planning in case of a high-prevalence pandemic. Although radiologists describe the type and distribution of pulmonary opacities, their semantic interpretation does not include qualitative or quantitative assessment of diffuse or multifocal processes such as COVID-19 pneumonia. The subjective severity scores in our study have been reported previously [7] [8] [9] but are inefficient, prone to subjective variations, and not used in routine clinical interpretation due to time and difficulty in grading findings based on the percentage of involved lobes. In such context, the addition of quantitative features derived from the prototype following regulatory approvals can provide useful information with no or minimum additional work or time. Both the user and the developing community must recognize, train, and test their DL models on data from different imaging sites as well as with different CT technical factors including exams with and without motion artifacts to understand how such differences can affect their models. There are a few limitations in our study. First, we did not perform a power analyses to determine the sample size to test the prototype. However, our data from both sites included all consecutive subjects with RT-PCR-positive COVID-19 pneumonia who underwent non-contrast, thinsection chest CT and had a known outcome in terms of death or recovery, and hospital admission. Second, there were variations in data variables available from the two participating sites. We did not have information on the use of mechanical ventilation in patients from site B. Third, at the height of the pandemic in site A, a few medical floor beds were converted into ICU functionality to accommodate ICU patient overflow. Although this could have affected the classification of patients with and without ICU admission, we classified patients on medical floor beds converted to ICU functionality as those with ICU admission. Fourth, the differences in mortality associated with patients from the two participating sites was likely related to infrequent use of chest CT at site A relative to site B where all hospital-admitted patients with suspected or known COVID-19 pneumonia regardless of the disease severity and presence of complications underwent chest CT. However, these variations in practice and mortality did not affect the performance of the prototype at either site. Fifth, neither the prototype nor the radiologists evaluated the chest CT for vascular complications of COVID-19 (such as pulmonary thromboembolism) or presence of coronary calcification, pleural effusions, or mediastinal or hilar lymphadenopathy which could have provided additional information on classification of different patient outcomes. Sixth, some differences in performance of the prototype at the two sites can be attributed to the differences in the scanner technologies. Site A used more advanced and newer CT scanners as compared with site B. Indeed, respiratory motion artifacts were more frequent in site B than in site A. However, there was no difference in the performance of the radiologists or the prototype both with and without chest CT with motion artifacts. The differences in CT scanner technologies and vendors also helped us assess generalizability of the prototype. In conclusion, the deep learning-based CT Pneumonia Analysis prototype enables a single-click lung segmentation and determination of patient outcome and need for ICU admission in patients with COVID-19 pneumonia. These findings were generalizable at the two high-prevalence sites from Iran and Northeast United States. A strong correlation between the quantitative information from the prototype and radiologists' qualitative assessment of disease severity suggest that the prototype can provide additional quantitative information to the current radiology reports which do not contain information on distribution and extent of pulmonary opacities. Estimation of COVID-19-induced depletion of hospital resources in Ontario Projecting hospital utilization during the COVID-19 outbreaks in the United States Development of a Clinical Decision Support System for Severity Risk Prediction and Triage of COVID-19 Patients at Hospital Admission: an International Multicenter Study Chest CT practice and protocols for COVID-19 from radiation dose management perspective Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to duration of infection The clinical and chest CT features associated with severe and critical COVID-19 pneumonia Chest CT severity score: An imaging tool for assessing severe COVID-19 Relation Between Chest CT Findings and Clinical Conditions of Coronavirus Disease (COVID-19) Pneumonia: A Multicenter Study CT features of SARS-CoV-2 pneumonia according to clinical presentation: a retrospective analysis of 120 consecutive patients from Wuhan city Imaging profile of the COVID-19 infection: radiologic findings and literature review Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT Deep learning-based multi-view fusion model for screening 2019 novel coronavirus pneumonia: A multicentre study CovidCTNet: An Open-Source Deep Learning Approach to Identify Covid-19 Using CT Image A predictive model and scoring system combining clinical and CT characteristics for the diagnosis of COVID-19 Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image Deep learning for detecting corona virus disease 2019 (COVID-19) on high-resolution computed tomography: a pilot study Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks Towards an Artificial Intelligence Framework for Data-Driven Prediction of Coronavirus Clinical Severity Quantitative chest CT analysis in COVID-19 to predict the need for oxygenation support and intubation Severity assessment of coronavirus disease 2019 (COVID-19) using quantitative features from chest CT images Evaluation of novel coronavirus disease (COVID-19) using quantitative lung CT and clinical data: prediction of short-term outcome Serial Quantitative Chest CT Assessment of COVID-19: Deep-Learning Approach Quantification of tomographic patterns associated with COVID-19 from chest CT Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Fatemeh Homayounieh 1 · Marcio Aloisio Bezerra Cavalcanti Rockenbach 2 · Shadi Ebrahimian 1 · Ruhani Doda Khera 1 · Bernardo C. Bizzo 1,2 · Varun Buch 2 · Rosa Babaei 3 · Hadi Karimi Mobin 3 · Iman Mohseni 3 · Matthias Mitschke 4 · Mathis Zimmermann 4 · Felix Durlak 4 · Franziska Rauch 4 · Subba R Digumarthy 1 · Mannudeep K. Kalra