key: cord-1049851-dvnxr0tb authors: Huang, Shan; Wang, Yuancheng; Zhou, Zhen; Yu, Qian; Yu, Yizhou; Yang, Yi; Ju, Shenghong title: Distribution Atlas of COVID-19 Pneumonia on Computed Tomography: A Deep Learning Based Description date: 2021-05-11 journal: Phenomics DOI: 10.1007/s43657-021-00011-4 sha: 70d031b51c5a085acdde2b645973a19ee0c6b78e doc_id: 1049851 cord_uid: dvnxr0tb OBJECTIVES: To construct a distribution atlas of coronavirus disease 2019 (COVID-19) pneumonia on computed tomography (CT) and further explore the difference in distribution by location and disease severity through a retrospective study of 484 cases in Jiangsu, China. METHODS: All patients diagnosed with COVID-19 from January 10 to February 18 in Jiangsu Province, China, were enrolled in our study. The patients were further divided into asymptomatic/mild, moderate, and severe/critically ill groups. A deep learning algorithm was applied to the anatomic pulmonary segmentation and pneumonia lesion extraction. The frequency of opacity on CT was calculated, and a color-coded distribution atlas was built. A further comparison was made between the upper and lower lungs, between bilateral lungs, and between various severity groups. Additional lesion-based radiomics analysis was performed to ascertain the features associated with the disease severity. RESULTS: A total of 484 laboratory-confirmed patients with 945 repeated CT scans were included. Pulmonary opacity was mainly distributed in the subpleural and peripheral areas. The distances from the opacity to the nearest parietal/visceral pleura were shortest in the asymptomatic/mild group. More diffused lesions were found in the severe/critically ill group. The frequency of opacity increased with increased severity and peaked at about 3–4 or 7–8 o’clock direction in the upper lungs, as opposed to the 5 or 6 o’clock direction in the lower lungs. Lesions with greater energy, more circle-like, and greater surface area were more likely found in severe/critically ill cases than the others. CONCLUSION: This study constructed a detailed distribution atlas of COVID-19 pneumonia and compared specific patterns in different parts of the lungs at various severities. The radiomics features most associated with the severity were also found. These results may be valuable in determining the COVID-19 sub-phenotype. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s43657-021-00011-4. As of November 22, more than 57.8 million people were infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and over 1.3 million deaths were reported worldwide (World Health Organisation 2020). Clinical characteristics, such as typical symptoms, main laboratory findings, and epidemiology of COVID-19, have been reported and updated serially (Huang et al. 2020a; Guan et al. 2020 ; Special Expert Group for Control of the Epidemic of Novel Coronavirus Pneumonia of the Chinese Preventive Medicine Association 2020). Since the outbreak, chest computed tomography (CT) has demonstrated particular value in detecting suspected cases and evaluating this illness, especially in epidemic focus (Ai et al. 2020) . The imaging findings of coronavirus disease 2019 pneumonia are characterized by bilateral ground-glass opacification and consolidation (Salehi et al. 2020) . The pulmonary lesions are distributed as subpleural and peripheral lesions with a predominance in the lower lobe [Salehi et al. (2020) ; Chung et al. 2020; Xu et al. 2020 ]. Moreover, it was reported that the imaging manifestation differs various severity of the illness and can reflect the progression or regression throughout the disease course (Feng et al. 2020; Zhang 2020) . However, little is known regarding the accurately detailed distribution pattern of the opacity. Deep learning, in recent years, has been significant in medical imaging processing (Sahiner et al. 2019; Lakhani et al. 2018) . Radiomics, which can characterize images quantitatively, demonstrates state-of-the-art performance in radiological imaging analysis (Lambin et al. 2012) . These novel approaches have demonstrated great performance in aspects of medical research, such as deferential diagnosis, treatment response and outcome prediction She et al. 2018; Coudray et al. 2018) . In this study, we aimed to perform anatomic pulmonary segmentation and pneumonia lesion extraction with a welltrained artificial intelligence (AI) algorithm and make a detailed distribution atlas of COVID-19 pneumonia on chest CT images. We also intended to ascertain some radiomics features that are associated with the disease severity. This retrospective study was approved by the Ethics Committee of Zhongda hospital (2020ZDSYLL013-P01 and 2020ZDSYLL019-P01), and the requirement for informed consent was waived. The clinical and radiological data of the cases were collected retrospectively. All possible patients with COVID-19 entering 24 designated hospitals in Jiangsu Province from January 10 to February 18 were tracked primarily and later checked with diagnostic laboratory results. Patients without crucial clinical information or with poor-quality CT images or incomplete image data were subsequently excluded. The selected patients were divided into three groups: asymptomatic/mild, moderate and severe/critically ill groups based on the clinical evaluation, in accordance with the criteria "Diagnosis and Treatment Program for New Coronavirus Infection (Trial Version 5)" published by the National Health Commission of the People's Republic of China (National Health Commission of the People's Republic of China 2020). The asymptomatic/mild group included patients who had no or mild symptoms as well as no abnormal initial radiological findings. The moderate group covered patients with typical symptoms, including fever, cough, and radiological findings of pulmonary pneumonia. The patients with the following conditions were assigned to the severe/critically ill group: respiratory distress (respiratory rate ≥ 30 beats/min), mean oxygen saturation (resting state) ≤ 93%, arterial blood oxygen partial pressure/oxygen concentration ≤ 300 mmHg, respiratory failure requiring mechanical ventilation, shock, and intensive care unit (ICU) admission. All participants underwent initial and follow-up non-contrast high-resolution chest CT examinations in the supine position. The patients were asked to hold their breath, and scanning was conducted at the end of inspiration. Thin-section images were collected preferentially and stored in the format of Digital Imaging and Communications in Medicine. The study pipeline is shown in Fig. 1 . The pulmonary lobes and lesions were first segmented via an artificial intelligence (AI) system based on a deep learning algorithm (Beijing Deepwise & League of PhD Technology Co.LTD, China). The results were then manually checked by a radiologist (W.Y.C, with chest imaging experience of more than 10 years) and modified if any wrong segmentation was found. An example of accurate segmentation is depicted in Fig. 2 . The detailed process of the development of the artificial intelligence (AI) system is completed in the Supplementary information. The image registration method included the following steps: lung mask generation, surface point cloud generation, point cloud registration, and finally, pneumonia position projection. The presumed standard lung (template) was selected from a middle-aged male, who had a good shape of a healthy lung. 1. The lung mask can be obtained through the above segmentation process. 2. Some points on the surface of the lung mask were sampled to form a point cloud of pseudo-landmarks. 3. Coherent Point Drift (CPD) was used to register the point cloud of a given lung and the template lung, where a transformation was learned. 4. A regression classifier (support vector regression with RBF kernel) was trained to generate the projected location of each voxel occupied by pneumonia. All CT images of patients in our study were projected to the template lung, and the heatmap was then generated. That is, the voxel value of the projected standard lung represented the frequency of opacity occurring at this location. By connecting the points with the same frequency at each slice, the contour line map was drawn. Compared to the heat map, the Fig. 1 The pipeline of this study. Modality X (X = 1, 2, 3) refer to CT images with different window width and level (lung, mediastinal, bone windows). The Shared Convolution Backbone is a series of stacked blocks (convolution, elution, pooling), the parameters of which are shared among three streams. The Attention Fusion Model refers to attention across channels by elementwise plus feature maps from three streams. Pulmonary opacity detection and segmentation are two individual models and are trained separately. Their relationship is that the input of the segmentation model is derived from the output of the detection model The median distance of each voxel in the opacity to the nearest parietal and visceral pleura were calculated in the whole lung and per lobe, respectively. Every volume of interests (VOIs) contained more than 400 voxels and it was separately drawn at the levels of the right pulmonary artery in the upper lung and the left inferior pulmonary vein in the lower lung. Each VOI covered three consecutive slices. For each side, nine VOIs were placed in subpleural areas, of which seven were assigned peripherally, about five-millimeter away from the parietal pleura in a clockwise direction from 12:00 to 6:00 on the right or from 0:00 to 6:00 on the left. The two others were placed in anterior and posterior medial subpleural areas, respectively. A schematic graph is displayed in Fig. 3 . The visual evaluation of the lesion location is also conducted. The axial lung field is divided into equally spaced areas (outer, middle and inner zones). The occurrence of opacity in each of the three zones has been recorded for each patient. Through the above imaging process, the segmentation of pulmonary lesions could be obtained. With the original images and the segmented mask, we extracted a series of radiomic features using PyRadiomics (https:// pyrad iomics. readt hedocs. io/ en/ latest/ index. html). A total of 100 features were extracted, including shape features, first-order statistics, Gray Level Cooccurrence Matrix, Gray Level Run Length Matrix, Gray Level Size Zone Matrix, and Gray Level Dependence Matrix. After standardization and normalization of the feature matrix, the principal component analysis (PCA) was performed to achieve dimensional data reduction. With the severity as the classified label (severe/ critically ill as the positive label, the asymptomatic/mild and the moderate were grouped together as the negative label), analysis of variance (ANOVA) was adopted to select significant features. Logistic regression was used as the classifier, and the model was further evaluated by receiver operating characteristic (ROC). The training set and the validation set were separated as a ratio of 7:3, and a cross-validation with 5-folder was performed. The study cohort was divided into the training set and the validation set by a ratio of 7:3, and then a 5-fold cross validation was employed to test the performance of the classifier. This part of the statistical analysis was conducted with FeAture Explorer (FAE, v0.2.5, https:// github. com/ salan 668/ FAE) on Python (3.6.8, https:// www. python. org/). The normality test was performed for the data. Demographic information was presented as medians with interquartile ranges (IQRs) (continuous variables with non-normal distribution), means ± standard deviations (SDs) (continuous variables with normal distribution), or frequency and percentage (categorical variables), respectively. The t-test or analysis of Variance, Kruskal-Wallis or Mann-Whitney test, and Chi-square test were applied in the statistical analysis for normal distributed, non-normal distributed, and categorical data, respectively. All statistical analyses were carried out using R ver. 3.0.3. The flow diagram of the study is presented in Fig. 4 . Total 626 laboratory COVID-19 patients were first selected from 712 highly suspected patients from January 10 to February 18 in the Jiangsu Province. Further exclusions were then made: no medical record (n = 6); chest CT imaging not available (n = 125); incomplete or poor-quality images (n = 11). Finally, 484 cases with 954 CT scans were recruited for this radiological research. Most (87.9%) were thin slices, with a slice thickness ≤ 3 mm. Table 1 summarizes the basic demographics, and clinical and radiological characteristics of 484 patients (asymptomatic/mild group: 63 patients with 122 CT scans; moderate group: 378 patients with 747 CT scans; severe/critically ill group: 43 patients with 85 CT scans). The median age increases with the severity of the illness, and the gender distribution is similar. The history of diabetes, hypertension and cardiovascular disease was statistically different among the three groups (p = 0.019, 0.001, 0.032). Most patients in moderate and severe/critically ill groups had the typical initial symptoms of fever and cough (59.3-83.7%), a similar incubation period (3-10 days, 3-8 days), and a high exposure-history rate (86.5%, 83.7%). The pulmonary opacity on CT was found to involve 2.5 lobes (IQR: 1-4) in the asymptomatic/mild group, five lobes (IQR: 3-5) in the moderate group and all 5 lobes (IQR: 5-5) in the severe/critically ill group. By visual evaluation, the frequency of pulmonary opacity declined from outer zone to inner zone and increased from asymptomatic/mild illness to severe/critical illness. With two-tailed p < 0.05 as the statistical significance, the median distance of opacity to the nearest pleura varied with the severity when calculated via a voxel-based approach. The Kruskal-Wallis tests of all median distances showed great significance except for those in the right lower lobe (p = 0.062, p = 0.072, respectively). Asymptomatic/mild group showed the shortest median distances compared to the other two groups. For the comparison between the moderate and the severe/critically ill groups, no great difference was presented. The distances from pulmonary opacity to the nearest parietal/visceral pleura is demonstrated in Table 2 . A heat map superimposed with the contour map was generated by projecting pulmonary opacity of 954 CT images to a standard chest. Figure 5 shows the frequency of the pulmonary opacity from the apex to the diaphragm. The lesions in the upper lungs are located more laterally than in the lower lungs, and are predominant in the lower lungs, especially the right lower lung. Different patterns of lesion distribution are further illustrated in Fig. 6 according to the disease severity. Fewer Fig. 4 The flow diagram of the study lesions were found in the asymptomatic/mild group and a diffused distribution pattern was observed in the severe/ critically ill group. The opacity loading in the moderate group was between that of the two other groups. Among each group, all lesions were predominantly present in the subpleural area in the lower lungs. In specific lesion location analysis as shown in Fig. 7 , the median frequency of opacity shows great difference among the three groups, especially in the moderate and the severe/ critically groups. The severe/critical ill group presents the highest pulmonary opacity frequency and the moderate group displays the second-hightest. Within each group, the median frequency of opacity in the left upper lung, right upper lung, left lower lung, and right lower lung all have a hump-shape configuration from an overall perspective. Still the peaks vary, and the change of the asymptomatic/mild group is relatively smaller than other groups. In the upper lungs, the moderate and severe/critically ill groups peaked at 3:00 (left) and 8:00 (right) directions as well as 4:00 (left) and 7:00 (right), respectively, whereas in lower lungs, they both reached the highest point at 5:00 in the left and 6:00 in the right. The anterior and posterior medial aspects both show low opacity frequency in all patients. For radiomics analysis, a total of 3,720 lesions were extracted (the severe/critically ill lesions: 293). As shown in Fig S1, with the feature number increased, the AUC value became higher and reached the almost highest point (0.790 in the training set, 0.761 in the validation set, shown in Fig S2) when the feature number was 20. The detailed information of selected features is displayed in Table S1 . Among Variables were presented as median and interquartile range or frequency and percentage. When calculating the incidence of initial symptoms in the asymptomatic/mild group, we excluded 33 patients with asymptomatic illness a Exposure history means that the patients either has a the history of Wuhan exposure, contact with COVID-19 patients or contact with people from Wuhan b Days interval means days between CT examinations and symptoms onset while in asymptomatic cases, days between CT examination and the outpatient visit are used instead the 20 features, the energy, elongation and surface area contributed most to the model, which means that these three features were most associated with the severe/critically of the illness. The CT images of a typical patient in our study are presented in Fig S3. To the best of our knowledge, this study provides the most detailed description of the patterns of lesion location on chest CT images, which may help improve the specificity of differential diagnosis and surveillance. Further lesionbased radiomics analysis will help to quantify the lesion phenotypes. Additionally, having incorporated almost all confirmed cases in Jiangsu Province with initial and repeated CT examinations, the results of this study are reliable and representative. The overall demographic characteristics of enrolled patients in our study are similar to those of patients in other studies. Consistent with the previous study, we also concluded the crucial impact of the age on the worse outcome with a high susceptibility and a particular tendency. Specifically, in our study, the median age increases by 13 years for every additional one grade of illness. In some reports, males tend to be a larger proportion in all patients with COVID-19 , which could also be found in moderate and severe/critically ill groups in our study but with no statistical significance. Since the first descrption of a cluster of COVID-19, the radiographic manifestations have been widely depicted in different cohorts and different sample sizes. Radiology has published a series of case reports and key points of radiological findings about COVID-19 at early outbreak (Chung et al. 2020; Kanne 2020; Shi et al. 2020a; Lei et al. 2020) . The most frequent findings were bilateral earlier GGO and later consolidation distributed in the subpleural zone. Later, complete radiological analysis of 63, 50 and 81 laboratoryconformed patients were achieved Pan et al. 2020; Shi et al. 2020b ). The lesion distribution per lobe and the other CT findings, including fibrous stripes, air bronchogram and interlobular/ intralobular septa thickening were illustrated in these studies. In our case, the above reported manifestations of COVID-19 were observed. The pneumonia lesions commonly involved more than two lobes or even 4-5 lobes in severe/critically ill patients, while a single lobe, usually the right lower lobe, was seen in a few cases and generally at early phase (Caruso et al. 2020 ). The right lower lobe was the most vulnerable while the right middle lobe was the least . The result that there is a predilection for right lower lobe was further reported in a research at the segmental level, with median involved segments of 10.5 (Shi et al. 2020b) . Through the analysis of our cohort, similar results were found. The right lower lung had a significant predominance to be involved, which could be intuitively visualized through the heatmap and the contour map. This was further validated in the study by Luo and Yu et al. in which a diffuse congestive appearance with a predominance of right lower lobe was observed on gross examination in a critically ill patient with COVID-19. The underlying mechanisms remain unknown. Shi et al. maintain that the anatomical structure of the trachea and bronchi, wherein the right bronchus is shorter and straighter, partially contribute to this finding (Shi et al. 2020b ). Fig. 6 The heat maps of lesion distribution by disease severity. Among the three groups, all lesions predominate the subpleural area in bilateral lower lungs. Fewer lesions are observed in the asymptomatic/mild group, while more diffused lesions are found in severe/critically ill cases Moreover, it was also reported that opacity was distributed mainly in the middle and outer zone of the lungs (Pan et al. 2020 ) and extended towards pulmonary hilum when the disease progressed. Similarly, in our case, most lesions located in the outer zone and the frequency of opacity significantly dropped from the outer to the inner zone. Notably, in the severe/critically group, the frequency of opacity in the middle and inner zones were significantly higher than those of the other two groups. Further, the total opacity loading increased sharply from the moderate group to the severe/ critically ill group. Moreover, among the three groups, an increase of median distance was observed as the disease degenerated. We hypothesise that diffused distribution pattern in severe cases increases the proportion of lesions in the central area, thus leading to a longer distance between the lesion and the pleura. Furthermore, more interesting findings were illustrated by the opacity location analysis. The median frequency of opacity first increased and later decreased with the clockwise direction in every side of the lung, peaked at similar highest points in all groups. There was a subtle variance between the upper and lower lungs. In upper lungs, lesions were situated more laterally about in 3-4 or 7-8 o'clock direction, while in lower lungs, the lesions had a predilection for the dorsal area in 5 or 6 o'clock direction. We suppose these results may suggest some endogenous Fig. 7 Frequency line chart of pulmonary opacity. The chart demonstrates an overall increase of opacity from asymptomatic/mild to severe/critically ill group with a similar hump-shape configuration within each line. The median frequency of opacity in the upper lungs peaked at 3:00 (left) and 8:00 (right), 4:00 (left) and 7:00 (right), respectively, in moderate and severe/critically ill group but reached the highest point at 6:00 in the right and 5:00 in the left in lower lungs. Blue line: severe/critically ill; green line: moderate; red line: mild/asymptomatic characteristics of COVID-19 pneumonia, but more evidence should be provided by pathology. Moreover, the median frequency of opacity increased sharply from the mild group to the severe/critically ill group. In the asymptomatic/mild group, the median frequencies were all below 5%, while in the moderate group, frequencies were higher than those in the asymptomatic/mild group, but below 20%, and finally, in the severe/critically ill cases, frequencies were between 20% and 40%. In addition, the median frequency of opacity in the anterior and the posterior medial regions were the least in all three groups, both in the upper and lower lungs. These results were consistent with the visual evaluation of the lesion distribution, as presented in the Table 2 . All these findings may contribute to the subphenotype of patients with COVID-19 and may influence the outcome and prognosis of patients with COVID-19. Additionally, we found a series of radiomic features associated with the severity of the illness. The three most important features were energy, elongation, and surface area and their coefficients in the model were all positive (14.516, 12.117, 4.837, respectively) . Energy is a measure of the magnitude of voxel values, of which the lowest gray values contribute the least. Elongation is a morphological based feature, whose value measures the circle-like degree of a shape. And the surface area, as the name implies, calculates the surface area of the lesion. These results indicate that a lesion with a higher gray value, a more circle-like shape, and a greater surface area has more possibility to be presented in severe/critically ill cases. Since the outbreak, deep learning approaches have been used to detect the pneumonia lesions and assess the opacity quantitatively (Amyar et al. 2020; Huang et al. 2020b) . In this article, a deep learning algorism based on 3D U-Net was also applied. Innovatively, in our study, we project the pneumonia lesions to a standard lung and construct a distribution atlas with the contour map and the heatmap. In this way, the detailed distribution characteristics of COVID-19 pneumonia are displayed quantitatively and visually. This accurate description and analysis of the variant distribution pattern in varied regions of the lung and in varied groups would provide new insights into the understanding of COVID-19 pneumonia. These results would help to determine the disease subphenotype. Moreover, we further made a lesion based radiomics analysis, which provide more characteristics of the opacity and can contribute to disease phenotype. There are some limitations to this study. First of all, the retrospective nature of this study and the nuance of multicenter CT protocols are unavoidable. Second, though we have a relatively large sample size of 484 patients in total, the imbalance between severity groups is evident. More samples should be included in the mild/asymptomatic and severe/critically ill groups. Third, this study is a description of the whole cohort and severity subgroups, and further analysis depending on other factors and temporal changes is needed. These will be done in future studies. Summarily, we constructed a distribution atlas that clearly shows the frequency of pulmonary opacity in different lung zones on CT, and figured out the most important radiomics features related to severity. The pulmonary lesions were mainly distributed in the subpleural and peripheral areas and in addition, the detailed patterns varied between bilateral lungs, upper and lower lungs and different severity groups. For each lesion, higher gray value, more circle-like shape, and greater surface area contributed to the severity of the illness. These results may provide insights into the nature of the disease and are potentially valuable in disease sub phenotype. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s43657-021-00011-4. The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation Chest CT features of COVID-19 in CT imaging features of 2019 novel coronavirus (2019-nCoV) Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning Early prediction of disease progression in COVID-19 pneumonia patients with chest CT and clinical characteristics Clinical characteristics of coronavirus disease 2019 in China Serial quantitative chest CT assessment of COVID-19: a deep learning approach. Radiol Cardiothorac Imaging 2:e200075 Clinical features of patients infected with 2019 novel coronavirus in Wuhan Chest CT findings in 2019 novel coronavirus (2019-nCoV) infections from Wuhan, China: key points for the radiologist Hello world deep learning in medical imaging Radiomics: extracting more information from medical images using advanced feature analysis CT Imaging of the 2019 Novel Coronavirus (2019-nCoV) Pneumonia Age-dependent risks of incidence and mortality of COVID-19 in Hubei province and other parts of China National Health Commission of the People's Republic of China, Diagnosis and Treatment Program for New Coronavirus Infection Case-fatality rate and characteristics of patients dying in relation to COVID-19 in Italy Initial CT findings and temporal changes in patients with the novel coronavirus pneumonia (2019-nCoV): a study of 63 patients in Wuhan, China Deep learning in medical imaging and radiation therapy Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules Evolution of CT Manifestations in a Patient Recovered from 2019 Novel Coronavirus (2019-nCoV) Pneumonia in Wuhan Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study Special Expert Group for Control of the Epidemic of Novel Coronavirus Pneumonia of the Chinese Preventive Medicine Association (2020) An update on the epidemiological characteristics of novel coronavirus pneumonia (COVID-19) A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study Situation reports Clinical and computed tomographic imaging features of novel coronavirus pneumonia caused by SARS-CoV-2 Imaging changes in severe COVID-19 pneumonia The authors have no relevant financial or non-financial interests to disclose.