key: cord-0758992-v1277h9f authors: Anastasopoulos, Constantinos; Weikert, Thomas; Yang, Shan; Abdulkadir, Ahmed; Schmuelling, Lena; Buehler, Claudia; Paciolla, Fabiano; Sexauer, Raphael; Cyriac, Joshy; Nesic, Ivan; Twerenbold, Raphael; Bremerich, Jens; Stieltjes, Bram; Sauter, Alexander W.; Sommer, Gregor title: Development and clinical implementation of tailored image analysis tools for COVID-19 in the midst of the pandemic: The synergetic effect of an open, clinically embedded software development platform and machine learning date: 2020-08-28 journal: Eur J Radiol DOI: 10.1016/j.ejrad.2020.109233 sha: c3cf3df9b76eb978f3645d4faa050a0b1ad3ce69 doc_id: 758992 cord_uid: v1277h9f PURPOSE: During the emerging COVID-19 pandemic, radiology departments faced a substantial increase in chest CT admissions coupled with the novel demand for quantification of pulmonary opacities. This article describes how our clinic implemented an automated software solution for this purpose into an established software platform in 10 days. The underlying hypothesis was that modern academic centers in radiology are capable of developing and implementing such tools by their own efforts and fast enough to meet the rapidly increasing clinical needs in the wake of a pandemic. METHOD: Deep convolutional neural network algorithms for lung segmentation and opacity quantification on chest CTs were trained using semi-automatically and manually created ground-truth (N(total) = 172). The performance of the in-house method was compared to an externally developed algorithm on a separate test subset (N = 66). RESULTS: The final algorithm was available at day 10 and achieved human-like performance (Dice coefficient = 0.97). For opacity quantification, a slight underestimation was seen both for the in-house (1.8 %) and for the external algorithm (0.9 %). In contrast to the external reference, the underestimation for the in-house algorithm showed no dependency on total opacity load, making it more suitable for follow-up. CONCLUSIONS: The combination of machine learning and a clinically embedded software development platform enabled time-efficient development, instant deployment, and rapid adoption in clinical routine. The algorithm for fully automated lung segmentation and opacity quantification that we developed in the midst of the COVID-19 pandemic was ready for clinical use within just 10 days and achieved human-level performance even in complex cases. Despite knowledge of the spread of the disease in Asia, Europe was overwhelmed by the dynamic of the new coronavirus disease (COVID-19) outbreak in spring 2020. Initial attempts to prevent its spread by geographically confined lockdowns failed and western countries became alerted by the exponential increase of new infections in Italy, which by the end of March had exceeded those reported by China. In mid-March, the European countries gradually entered a systemic lockdown and urgently prepared their healthcare systems for the challenges to come. Driven by initial reports from China that indicated a higher sensitivity of chest computed tomography (CT) compared to polymerase chain reaction (PCR) in epidemic areas [1, 2] imaging was recognized as an important additional diagnostic tool in the wake of the pandemic [3, 4] . As a consequence, not only emergency and intensive care units but also radiology departments in Europe had to quickly adapt to the new reality, following recommendations from their colleagues in Asia [5] . In our department, a substantial increase in chest CT admissions for COVID-19 was seen soon after the initial cases were diagnosed in our hospital by the end of February. At first, CT was used for differential diagnosis of flu-like symptoms that had been advocated by early reports from China [6, 7] . Soon, however, when the number of patients on the dedicated medical wards increased, our department received inquiries for a standardized method allowing quantification and follow-up of disease burden for supporting both triage towards intensive care and therapy decisions. At the time of the initiation of this study, only little evidence was available on the evolution of lung tissue alterations in the course of the disease [8, 9] and methodological proposals for quantification of these changes were at a very early stage [10, 11] . Meanwhile the steadily growing literature on this topic is complemented by several other publications, including recent reports on visual scoring systems [12, 13] , first quantitative and deep convolutional neural network (DCNN) approaches [14] [15] [16] and a J o u r n a l P r e -p r o o f multicenter initiative for automated diagnosis and quantitative analysis of COVID-19 on imaging has been set up (https://imagingcovid19ai.eu). This article describes the process from a prototypical development of automated AI based software for lung segmentation and quantification of lung opacities in CTs of COVID-19 patients in a Research and Development environment to its clinical implementation within ten days. We discuss major strengths and weaknesses of our approach and set our results into the context of the current literature. The underlying hypothesis was that modern academic centers in radiology are capable of developing and implementing a clinically useful AI based software for quantification of pulmonary opacities in COVID-19 by their own efforts and with sufficient speed to meet the rapidly increasing clinical needs in the wake of a pandemic. Our hospital is an academic hospital supplying maximum-care to a metropolitan area of 600.000 inhabitants. For this study, the Department of Radiology was supported by the Department of Research and Analysis, a team of 15 researchers with skills in image processing, data pipelines, deep learning applications and statistics. In the framework of this project, 3 members of latter department joined the newly created study team of 8 physicians, including 6 residents and 2 staff physicians specialized in cardio-thoracic imaging. The total work force dedicated to the project during the time frame of 10 days added up to approximately 2 full-time equivalent (FTE) for the scientists and 4 FTE for the physicians. The prospective collection and evaluation of data from subjects with COVID-19 for this project was approved by the local ethics committee (approval number 2020-00566) as part of a study registered on ClinicalTrials.gov on 04/29/2020 (Identifier: NCT04366765). Data from patients actively denying consent for further research use were excluded. Finally, 152 datasets of COVID-19 patients were included belonging to 146 patients, 23 performed with and 123 without iodine contrast administration, respectively. CT scans were performed in our institution on six scanners of four different types (Somatom Force, Edge, Definition Flash, and Definition AS+, all Siemens, Forchheim, Germany). Iterative image reconstruction (ADMIRE/SAPHIRE, Siemens Healthineers, Erlangen, Germany) with soft tissue kernel (I26f) was used. The processing of this training and validation subsets was also approved by the local ethics committee (approval number 2020-00595). Patient characteristics for the included datasets are given in Table S1 of Appendix E1. The technical development, refinement and testing of methods followed a stepwise approach as listed below and visualized in Figure 1 . previous reports on density distribution in acute respiratory distress syndrome [17] , the percentage of voxels with HU values between -600 and 0 relative to the entire lung was calculated and considered during radiological reporting. The CT-scans from this initial evaluation step processed until 03/26/2020 (N=45) are in the following referred to as Subset 1. Step 1 -Training of baseline in-house segmentation method The 3D segmentations of Subset 1 were exported as binary masks without further manual intervention. For the training of deep learning algorithm 1 (A1), a framework for DCNN semantic segmentation with a U-Net architecture was used [18] . The data was processed with two convolutions in three spatial dimensions (3x3x3 convolution kernels). The principle of the network was based on the 3D U-Net without batch normalization [19] but we implemented it with only three resolution layers formed by two pooling and upsampling layers to reduce the model complexity and facilitate training. The number of channels per layer were the same as in the first three resolution levels in [19] . The training was performed on TensorFlow with NiftyNet (https://niftynet.io) on a consumer-grade graphics processor unit. Step 2 -Refinement of model by training with manual reference segmentations The manually refined 3D segmentations from this step (n=238) are in total referred to as Subset 2, which consists of the previously described Subsets 2C (N=86), 2NC (N=86), and 2T (N=66). Manually segmented From the Subset 2T, 10 cases were randomly selected for inter-rater comparisons of independent segmentations by all three raters, from which a single estimated ground truth segmentation was computed, as in [16] . The same network architecture as for A1 (Step 1) was used for training of algorithm A2, trained with Subset 2C and Subset 2NC (total N=172). Step 3 -Implementation of third-party lung segmentation algorithm On 04/04/2020 an independent team released "COVID-19 Web" on the GitHub platform, which to our knowledge was the first open-source lung segmentation algorithm specifically trained with COVID-19 chest CT datasets (https://github.com/JoHof/lungmask) [15] . Similar to A1 and A2, it is based on the U-Net architecture and had been trained on 40 and 238 datasets from patients with and without COVID-19, respectively (1 to 6 ratio). We implemented version 0.2.2 (downloaded on April 7 th , in the following referred to as A3) in step 3 as an external reference for our own algorithms A1 and A2. The inference of lung borders was followed by a simple postprocessing step for all three algorithms, during which spurious remote segmentations were excluded, while keeping the largest connected components. Algorithms A1-3 were tested and compared on Subset 2T. We estimated the percentual opacity load (POL) in both lungs by thresholding between -600 and 0 HU: where voxel count of lung mask with -600≤HU≤0 and voxel count of lung mask in slice , respectively. −600 derived from each of the algorithms were separately compared to manual −600 in Subset 2T with Bland-Altman analyses in R (v 3.6.3) [20] . Additionally, quantification was computed in 55 non-COVID-19 cases in varying acquisition phases after administration of intravenous iodine contrast (Appendix E1, Data Supplement S4). After the pathway development, a data pipeline was set up to navigate acquired images from the scanner to Nora, where the proposed algorithm (A2) was implemented. This locally hosted software is available to the radiologist during reporting through a web browser. Upon arrival of the image dataset, a Segmentation performance for whole-lung The performance metrics are given in Figure 3 and Table 1 . The precision of the deep learning lung tissue segmentation in Subset 2T was excellent for A2 and A3 with mean Dice coefficients of 0.97, while A1 showed a slightly lower mean Dice coefficient of 0.95. The maximal Hausdorff-distance showed a mean of 25, 17 and 28 mm for A1, A2 and A3, respectively. Isolated outliers were observed above the upper quartile, mainly for the preliminary (A1) and the third-party (A3) algorithm, corresponding to the unexpected inclusion of pneumothorax or pleural effusion in the lung segmentation (for an example see Figure 2 ). Inter-rater segmentation comparison on the 10 cases was excellent with mean Dice coefficients of 0.99 for all comparisons (Table 1) . The results of the threshold-based quantification analysis are displayed in Figure 4 and Table 2 . Lung opacities were diagnosed in forty-four out of sixty-six scans (66 %) from the test dataset 2T. −600 of the manually lung segmentations ranged from 5-55%. The algorithms showed mean underestimations of 3.0%, 1.8% and 0.9% for A1, A2 and A3 in Subset 2T, respectively. For A1 and A3, there was a proportional bias in −600 towards higher opacity loads (0.35% and -0.4% for each 10% increase in −600 , respectively), while for A2 the slope of the bias was almost zero (Figure 4 ). Implementation has been accomplished during the pandemic in April 2020. In the first 6 weeks after implementation, almost 500 chest CTs admitted to our department were processed automatically. The first deep learning algorithm A1 was finalized as early as 4 days after project initiation. Forty-five chest CTs from patients with COVID-19 available at that time were utilized for its training. Even without prior manual refinement of the segmentations, A1 provided a satisfactory overall segmentation performance. Nevertheless, a significant progress from the preliminary results of A1 to the second algorithm A2 was seen. This latter deep learning algorithm, trained with the fourfold amount of chest CT datasets and after manual refinement of the reference standard, showed a considerable improvement segmentation. This might be in part attributed to the fact that the reference standard of training and test subsets were created by the same human raters [22] , but is also reflected by higher segmentation accuracy in specific cases with coexisting pneumothorax or massive pleural effusion. The deep learning segmentation was interlinked with a subsequent opacity quantification step, based on a HU-thresholding method, established at our site for COVID-19 related opacities before the deep learning segmentations were introduced. Quantification based on thresholding has been previously used for differentiation of normal lung tissue from opacities, such as ground-glass or consolidations [17] and the lower cut-off of -600 HU implemented here reflects the counterpart of the "well-aerated" lung, which has been correlated to the severity of the disease and clinical outcome in patients with COVID-19 [14] . The average underestimation of automated quantification was minor for the proposed A2 and the third-party algorithm, whereas the latter showed a negative bias slope towards higher opacity loads. In contrary, quantification bias with A2 did not manifest a dependency on opacity load, thus making the estimation of error from automated quantification more predictable when comparing baseline and follow-up scans. The approach proposed in this article quantifies but does not classify lung opacities, as recently shown by an automated differentiation of lung opacities in chest CTs caused by COVID-19 and acquired pneumonia [16] . Direct segmentation of affected lung areas has also been proposed as an alternative to approaches using thresholding after segmentation, although the voxel misclassifications reported there might eventually result in a similar degree of opacity underestimation [23] . Factors influencing the distribution of HU values in lung tissue such as inflation depth and prior contrast administration were identified in a small sample with no lung opacities (Appendix E1 only). The role of these and other The presented pragmatic approach also harbors some limitations. The first is the selection and curation of datasets that was strongly dominated by the question of availability of pre-processed data from the provisional, semi-automated pipeline in the wake of the pandemic. The data that was used for this project does therefore neither represent a complete, consecutively acquired sample of the COVID-19 cohort at our hospital nor is it a fully random sample. In addition, the reference standard subset of COVID-19 chest CT for the training of A2 was extended by an equally sized subset of relatively homogeneous scans performed for exclusion of pulmonary infection with and without iv contrast. Taken together, these inconsistencies in data selection may limit the performance of the algorithm in cases of advanced COVID-19, although for the training subset of the third-party A3 algorithm this portion of chest CTs performed for other reasons than COVID-19 was even lower. On the other hand, this augmentation of the training subset might reduce the selection bias and represents a more diverse sample [24] . An additional limitation is the use of empirical HU-thresholds for disease quantification, The manuscript is not for publication elsewhere. Contact details for all authors and their contributions are provided at the end of the document. The authors have no conflicts of interest. Publication is approved by all authors and by the responsible authorities at our institution. Table 1 . Descriptive statistics for performance metrics Dice coefficient and maximum Hausdorff distance, on the left for comparisons between each algorithm and the human reference standards and on the right for the inter-rater comparisons. Table 2 . Bland-Altman analyses of opacity quantification (in %) between the manual reference standard and each of the 3 algorithms in the test subset. J o u r n a l P r e -p r o o f Tables Table 1. Descriptive statistics for performance metrics Dice coefficient and maximum Hausdorff distance, on the left for comparisons between each algorithm and the human reference standards and on the right for the inter-rater comparisons. Inter Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases COVID-19 patients and the radiology department -advice from the European Society of Radiology (ESR) and the European Society of Thoracic Imaging (ESTI) The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society Interpretation of CT signs of 2019 novel coronavirus (COVID-19) pneumonia CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV) Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection Temporal Changes of CT Findings in 90 Patients with COVID-19 Pneumonia: A Longitudinal Study Severe COVID-19 Pneumonia: Assessing Inflammation Burden with Volume-rendered Chest CT Longitudinal Assessment of COVID-19 Using a Deep Learning-based Quantitative CT Pipeline: Illustration of Two Cases CO-RADS -A categorical CT assessment scheme for patients with suspected COVID-19: definition and evaluation Chest CT Severity Score: An Imaging Tool for Assessing Severe COVID-19 Well-aerated Lung on Admitting Chest CT to Predict Adverse Outcome in COVID-19 Pneumonia Automatic lung segmentation in routine imaging is a data diversity problem, not a methodology problem What has computed tomography taught us about the acute respiratory distress syndrome? Convolutional Networks for Biomedical Image Segmentation, Proceedings of MICCAI 2015 Learning Dense Volumetric Segmentation from Sparse Annotation Deepankardatta/Blandr: Version 0.5.1 -Cran Submission Release: Zenodo Data Lifecycle Challenges in Production Machine Learning Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers Deep-Learning Approach Integrating artificial intelligence into the clinical practice of radiology: challenges and recommendations