key: cord-159554-50077dgk authors: Shan, Fei; Gao, Yaozong; Wang, Jun; Shi, Weiya; Shi, Nannan; Han, Miaofei; Xue, Zhong; Shen, Dinggang; Shi, Yuxin title: Lung Infection Quantification of COVID-19 in CT Images with Deep Learning date: 2020-03-10 journal: nan DOI: nan sha: doc_id: 159554 cord_uid: 50077dgk CT imaging is crucial for diagnosis, assessment and staging COVID-19 infection. Follow-up scans every 3-5 days are often recommended for disease progression. It has been reported that bilateral and peripheral ground glass opacification (GGO) with or without consolidation are predominant CT findings in COVID-19 patients. However, due to lack of computerized quantification tools, only qualitative impression and rough description of infected areas are currently used in radiological reports. In this paper, a deep learning (DL)-based segmentation system is developed to automatically quantify infection regions of interest (ROIs) and their volumetric ratios w.r.t. the lung. The performance of the system was evaluated by comparing the automatically segmented infection regions with the manually-delineated ones on 300 chest CT scans of 300 COVID-19 patients. For fast manual delineation of training samples and possible manual intervention of automatic results, a human-in-the-loop (HITL) strategy has been adopted to assist radiologists for infection region segmentation, which dramatically reduced the total segmentation time to 4 minutes after 3 iterations of model updating. The average Dice simiarility coefficient showed 91.6% agreement between automatic and manual infaction segmentations, and the mean estimation error of percentage of infection (POI) was 0.3% for the whole lung. Finally, possible applications, including but not limited to analysis of follow-up CT scans and infection distributions in the lobes and segments correlated with clinical findings, were discussed. The outbreak of 2019 novel coronavirus in Wuhan, China has rapidly spread to other countries since Dec 2019 [1] [2] [3] [4] [5] [6] [7] . The infectious disease caused by this virus was named as COVID-19 by the World Health Organization (WHO) on Feb 11, 2020 8 . To date (Mar 5 th 2020), there have been 80,565 confirmed cases in China and 95,333 confirmed cases all around the world 9 . Each suspected case needs to be confirmed by the real-time polymerase chain reaction (RT-PCR) assay of the sputum 10 . Although it is the gold standard for diagnosis, confirming COVID-19 patients using RT-PCR is time-consuming and has been reported to suffer from high false negative rates. On the other hand, because chest CT scans collected from COVID-19 patients frequently show bilateral patchy shadows or ground glass opacity (GGO) in the lung 11, 12 , it has been used as an important complementary indicator in COVID-19 screening due to high sensitivity. Chest CT examination has also shown its effectiveness in follow-up assessment of hospitalized COVID-19 patients 13 . Due to fast progression of the disease, subsequent CT scans every 3-5 days are recommended to evaluate the therapeutic responses. Although CT provides rich pathological information, only qualitative evaluation has been provided in the radiological reports owing to the lack of computerized tools to accurately quantify the infection regions and their longitudinal changes. Thus, subtle changes across follow-up CT scans are often ignored. Besides, contouring infection regions in the Chest CT is necessary for quantitative assessment; however, manual contouring of lung lesions is a tedious and time-consuming work, and inconsistent delineation could also lead to subsequent assessment discrepancies. Thus, a fast auto-contouring tool for COVID-19 infection is urgently needed in the onsite applications for quantitative disease assessment. We developed a deep learning (DL)-based segmentation system for quantitative infection assessment. The system not only performs auto-contouring of infection regions, but also accurately estimates their shapes, volumes and percentage of infection (POI) in CT scans of COVID-19 patients. In order to provide delineation for hundreds of the training COVID-19 CT images, which is a tedious and time-consuming work, we proposed a human-in-the-loop (HITL) strategy to iteratively generate the training samples. This method involves radiologists to efficiently intervene DL-segmentation results and iteratively add more training samples to update the model, and thus greatly accelerates the algorithm development cycle. To the best of our knowledge, there are no literatures that have reported the utilization of HITL strategy in identifying COVID-19 infection in CT scans. The protocol of this retrospective study was approved by the Ethics of Committees of Shanghai Public Health Clinical Center. Informed consent was waived because of the respecpective nature of the study, and all the private information of patients was anonymized by the investigators after data collection. Totally 300 CT images from 300 COVID-19 patients (from Shanghai) were collected for validation. 249 CT images of 249 COVID-19 patients were collected from other centers (outside Shanghai) for training. The inclusion criteria are list as follows: (a) Patients with a positive new coronavirus nucleic acid antibody and confirmed by the CDC; (b) Patients who underwent thin-section CT; (c) Age >=18; (d) Presence of lung infection in CT images. Patients with CT scans showing large motion artifacts or pre-existing lung cancer conditions were excluded in this study. 51 of 300 patients have been previously reported 23 . The prior article investigated the clinical, laboratory, and imaging findings of COVID-19 pneumonia in humans, whereas in this manuscript we develop a deep learning system to quantify COVID-19 infection in CT scans. The patient data were used for validation of system performance. All COVID-19 patients underwent thin-section CT scan (SCENARIA 64 CT, Hitachi Medical, Japan). The median duration from illness onset to CT scan was 4 days, ranging from 1 to 14 days. The CT protocol was as follows: 120 kV; automatic tube current (180 mA-400 mA); iterative reconstruction; 64 mm detector; rotation time, 0.35 sec; slice thickness, 5 mm; collimation, 0.625 mm; pitch, 1.5; matrix, 512×512; and breath hold at full inspiration. The reconstruction kernel used is set as "lung smooth with a thickness of 1 mm and an interval of 0.8 mm". During reading, the mediastinal window (with window width 350 HU and window level 40 HU) and the lung window (with window width 1200 HU and window level-600 HU) were used. Due to the low contrast of the infection regions in CT images and large variation of both shape and position across different patients, delineating the infection regions from the chest CT scans is very challenging. We developed a DL-based network called VB-Net for this purpose. It is a modified 3-D convolutional neural network that combines V-Net 14 with the bottle-neck structure 15 . VB-Net consists of two paths ( Figure 1 ). The first is a contracting path including down-sampling and convolution operations to extract global image features. The second is an expansive path including up-sampling and convolution operations to integrate fine-grained image features. Compared with V-Net 14 , the speed of VB-Net is much faster because the bottle-neck structure is integrated in VB-Net, as detailed in Figure 1 16, 17 . The bottle-neck design is a stacked 3-layer structure. The three layers use 1×1×1, 3×3×3 and 1×1×1 convolution kernels, where the first layer with 1×1×1 kernel reduces the number of channels and feeds the data for a regular 3×3×3 kernel layer processing, and then the channels of feature maps are restored by another 1×1×1 kernel layer. By reducing and combining feature map channels, not only the model size and inference time are greatly reduced, but also cross-channel features are effectively fused via convolusion, which makes VB-Net more applicable to deal with large 3D volumetric data than traditional V-Net. Training samples with detailed delineation of each infection region are required for the proposed VB-Net. However, it is a labor-intensive work for radiologists to annotate hundreds of COVID-19 CT scans. We, therefore, adopted the human-in-the-loop (HITL) strategy to iteratively update the DL model. Specifically, the training data were divided into several batches. First, the CT data in the smallest batch are manually contoured by radiologists. Then, the segmentation network was trained by this batch as an initial model. This initial model was applied to segment infection regions in the next batch, and radiologists manually correct the segmentation results provided by the segmentation network. These corrected segmentation results were then fed as new training data, and the model can be updated with increased training dataset. In this way, we iteratively increased the training dataset and built the final VB-Net. In the testing stage, the trained segmentation network segments the infection area on a new CT scan via a forward pass of neural network, and the HITL interaction also provides possible intervention and human-machine interaction for radiologists in clinical application. According to our experience, this HITL training strategy converged after 3~4 iterations. Figure 2 illustrates the process of the proposed HITL training strategy. After segmentation, various metrics were computed to quantify the COVID-19 infection, including volumes of infection in the whole lung, and volumes of infection in each lobe and each bronchopulmonary segment. In addition, the POIs in the whole lung, each lobe and each bronchopulmonary segment were also computed, respecrively, to measure the severity of COVID-19 and the distribution of infection within the lung. The Hounsfield unit (HU) histogram within the infection region can also be visualized for evaluation of GGO and consolidation components inside the infection region. Figure 3 shows the entire pipeline for quantitative COVID-19 assessment. A chest CT scan is first fed to the DL-based segmentation system, which generates infection areas, the whole lung, lung lobes, and all the bronchopulmonary segments, respectively. Then, the aforementioned quantitative metrics are calculated to quantify infection regions of the patient. The quantification provides the basis for measuring the severity of COVID-19 from the CT perspective and for tracking longitudinal changes during the treatment course. Statistical analysis was performed by R version 3.6.1 (R Project for Statistical Computing, Vienna, Austria). Because a majority of the continuous data did not follow a normal distribution, they were expressed as the median and interquartile range (IQR, 25th and 75th percentiles). The Dice similarity coefficient (DSC) was used to evaluate the overlap ratio between an automatically segmented infection region ( ) and the corresponding reference region ( ) provided by radiologist(s). It is calculated as follows: where |•| is the operator to calculate the number of voxels in the given region, and ∩ is the intersection operator. The Pearson correlation coefficient 18 was used to evaluate the correlation of two variables: where is the total number of observations, and , = 1, ⋯ , , are the observations of the two variables. To demonstrate the effectiveness, Figure 4 shows typical cases of COVID-19 infection in three different stages: early stage, progressive stage and severe stage. Coronal images without and with overlaid segmentation are presented in parallel for comparison. In addition, 3D rendering of each case is also provided to give a more vivid understanding of COVID-19 infection within the lung. All three cases show that the contours delineated by the deep learning system match well with the visable lesion boundaries in CT images. To quantitatively evaluate the accuracy of segmentation and measurement, infection regions on 300 CT scans of 300 COVID-19 patients were manually contoured by two radiologists (W.S. and F.S., with 12 and 19 years of experience in chest radiology, respectively) to serve as the reference standard. Each case was manually contoured by one radiologist and reviewed by the other. In case of disagreement, the final results were determined by consensus between the two radiologists. The automatically segmented infection regions are compared to the reference standard in terms of overlap ratio (measured by Dice similarity coefficient), volume, the percentage of infection (POI) in the whole lung, POI in each lung lobe, and POI in each bronchopulmonary segment. Inter-rater variability was assessed by randomly sampling 10 CT scans of COVID-19 patients from the entire validation set. The two radiologists first independently contoured the infection regions in these CT scans. Their manual segmentation were then compared using the same metrics as mentioned above. ). The mean POI estimation difference is 0.2% for whole lung, 0.3% for lung lobes, and 0.4% for bronchopulmonary segments. 91.4% of lung-lobe POIs and 85.9% of bronchopulmonary-segment POIs are consistently estimated with equal or less than 1% difference. By comparing Table 1 and Table 2 , it can be seen that the segmentation and measurement errors of the deep learning system are close to the inter-rater variability. This demonstrates the effectiveness of using deep learning to quantify the COVID-19 infection in CT images. Two metrics were used to evaluate the HITL strategy. First, the time of manual contouring was recorded to compare labeling time of a CT scan with the deep learning model. Second, the segmentation accuracy of deep learning models at different stages was assessed to see whether the accuracy improves with more annotated training data. Table 3 shows the labeling time and segmentation accuracy at different stages. Without any assistance of deep learning, it takes 211.3±52.6 minutes to contour COVID-19 infection regions on one CT scan. The contouring time drops dramatically to 31.1±8.1 minutes with the assistance of the first deep learning model trained with 36 annotated CT scans. It further drops to 12.0±2.9 minutes with 114 annotated data, and to 4.7±1.1 with 249 annotated data. Meanwhile, the segmentation accuracy of deep learning models was evaluated using Dice similarity coefficient on the entire 300 validation set. It improves from 85.1±11.4%, to 91.0±9.6%, and to 91.6%±10.0 with more training data added. The improved segmentation accuracy greately reduces human intervention and thus reduces significantly the time of annotation and labeling. CT imaging has become an efficient tool for screening COVID-19 patients and for assessing the severity of COVID-19. However, radiologists lack a computerized tool to accurately quantify the severity of COVID-19, e.g., the percentage of infection in the whole lung. In the literature, deep learning has become a popular method in medical image analysis and has been used in analyzing diffuse lung diseases on CT 19, 20 . In this work, we explored deep learning to segment COVID-19 infection regions within lung fields on CT images. The accurate segmentation provides quantitative information that is necessary to track disease progression and analyze longitude changes of COVID-19 during the entire treatment period. We believe that this deep learning system for COVID-19 quantification will open up many new research directions of interest in this community. The first potential application of this system is to quantify longitudinal changes in the followup CT scans of COVID-19 patients. Hospitalized patients with confirmed COVID-19 typically take a CT examination every 3-5 days. As currently there is no effective medicine to target COVID-19, most patients recover with different degrees of supportive medicine intervention. Given lots of such patients, it is interesting to see how disease progresses under different clinical management. Figure 5 gives a case with three follow-up CT scans. With infection region segmented, the changes of infection volume as well as consolidation and groundglass opacities can be easily visualized using surface rendering technique. The POI estimated by our system can be used to indicate the severity of COVID-19 from the radiology perspective. It is of great interst to find out how this POI correlates with clinical pneumonia assessment. Pneumonia severity index (PSI) is a clinical prediction rule that is often used to calculate the probability of morbidity and mortality among patients with community acquired pneumonia 21, 22 . It is calculated based on demographics, the coexistence of comorbidity illnesses, and physical and laboratory examinations. In our study, COVID-19 patients were classified into non-severe (PSI level ≤2) and severe groups (PSI level ≥3). The POIs in the whole lung were calculated from their CT scans by the system. Based on 196 patients with both PSI and POI available, the Pearson correlation coefficient between these two variables gives 0.5, which means moderate correlation between these two scores. This result indicates the POI estimated from CT scans is clinically relevant with the severity of pneumonia. Ongoing research works are being carried on to study whether POI or its derived coefficients are helpful in predicting COVID-19 disease progression. Another application of our system is to explore the quantitative lesion distribution specifically related to COVID-19. According to recent literature 23, 24 , COVID-19 infection happens more frequently in lower lobes of the lung. However, so far no researches have reported quantitatively the severity of COVID-19 infection in each lung lobe and bronchopulmonary segment. With this deep learning system, the POIs of lung lobes and bronchopulmonary segments can be automatically calculated. Thus, statistics of infection distribution can be summarized in a large-scale dataset, e.g., 300 CT scans in our study. Figure 6 show the boxplots of these POIs calculated from 300 CT scans of COVID-19 patients in Shanghai district. Figure 6 (a) shows that the mean POIs of left and right lower lobes are higher than those of other lobes, which coincides with the findings reported in 23, 24 . Moreover, infection distribution can be analyzed further down to the bronchopulmonary segment level, as shown in Figure 6 (b). To the best of our knowledge, this is the first work that reveals the COVID-19 distribution in bronchopulmonary segments in terms of a largescale patient CT data. Our results show that the following segments are often infected by COVID-19 (listed with decreasing mean POI): right lower lobe -outer basal, right lower lobe -dorsal, right lower lobe -posterior basal, left lower lobe -outer basal, left lower lobe -dorsal, left lower lobe -posterior basal, and right upper lobe -back. Using HITL strategy in training the segmentation network is a novel feature of our system. Existing AI-based systems for automatic quantitative assessment always requires a large amount of annotation CT data, whereas collecting the annotated data is very expensive or even difficult. Moreover, these AI systems are always trained as a black box to users, who however always want to know what has happened behind the model. Our experimental results indicate that the HITL strategy makes the manual annotation process faster with the assistance of deep learning models. Also, the HITL strategy makes the system more comprehensible. That is, with manual intervention in HITL, the radiologists are aware of how good the system performs in the training process. Besides, the HITL strategy helps radiologists accustomed to the AI system because they are involved in the training process. It integrates the professional knowledge from radiologists in an interactive way. It is worth noting the limitations of our work in several aspects. First, the validation CT datasets were collected in one center, which may not be representative of all COVID-19 patients in other geographic areas. The generalization of the deep learning system needs to be further validated on multi-center datasets. Second, the system is developed to quantify infections only, and it may not be applicable for quantifying other pneumonia, e.g., bacterial pneumonia. Finally, in our future work, we will extend the system to quantify severity of other pneumonia using transfer learning. With this automatic DL-based segmentation, many studies on quantifying imaging metrics and correlating them with syndromes, epdemicology, and treatment responses could further reveal insights about imaging markers and findings towards improved diagnosis and treatment for COVID-19. Tables Table 1 . Quantitative evaluation of the deep learning segmentation system on the validation dataset. The Dice coefficients, volume estimation error, and POI estimation error in the whole lung, lung lobes and bronchopulmonary segments were calculated to assess the automatic segmentation accuracy. Table 2 . Inter-rater variability analysis between two radiologists on randomly sampled 10 CT cases. The Dice coefficients, volume estimation difference, and POI difference in whole lung, lung lobes and bronchopulmonary segments were estimated to serve as the reference for assessing the automatic segmentation accuracy. A novel coronavirus from patients with pneumonia in China A novel coronavirus genome identified in a cluster of pneumonia cases -Wuhan Importation and human-to-human transmission of a novel coronavirus in Vietnam First case of 2019 novel coronavirus in the United States A novel coronavirus outbreak of global health concern The response of Milan's Emergency Medical System to the COVID-19 outbreak in Italy. The Lancet A new coronavirus associated with human respiratory disease in China Severe acute respiratory syndrome-related coronavirus-The species and its viruses, a statement of the Coronavirus Study Group Chest CT for typical 2019-nCoV pneumonia: relationship to negative RT-PCR testing Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China Imaging profile of the COVID-19 infection: radiologic findings and literature review Fully convolutional neural networks for volumetric medical image segmentation Deep residual learning for image recognition Image-Guided Procedures, Robotic Interventions, and Modeling; 2019: International Society for Optics and Photonics Automatic MR kidney segmentation for autosomal dominant polycystic kidney disease Applied multiple regression/correlation analysis for the behavioral sciences Automatic Lung Segmentation Based on Texture and Deep Features of HRCT Images with Interstitial Lung Disease Lung Segmentation on HRCT and Volumetric CT for Diffuse Interstitial Lung Disease Using Deep Convolutional Neural Networks A prediction rule to identify low-risk patients with community-acquired pneumonia Validity of pneumonia severity index and CURB-65 severity scoring systems in community acquired pneumonia in an Indian setting Emerging coronavirus 2019-nCoV pneumonia Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection