key: cord-0877278-hrspptuv authors: Ter-Sarkisov, A. title: Detection and Segmentation of Lesion Areas in Chest CT Scans For The Prediction of COVID-19 date: 2020-10-27 journal: nan DOI: 10.1101/2020.10.23.20218461 sha: 10222b2e159a24a1c33b1cf720da006cd0aae465 doc_id: 877278 cord_uid: hrspptuv In this paper we compare the models for the detection and segmentation of Ground Glass Opacity and Consolidation in chest CT scans. These lesion areas are often associated both with common pneumonia and COVID-19. We train a Mask R-CNN model to segment these areas with high accuracy using three approaches: merging masks for these lesions into one, deleting the mask for Consolidation, and using both masks separately. The best model achieves the mean average precision of 44.68% using MS COCO criterion for instance segmentation across all accuracy thresholds. The classification model, COVID-CT-Mask-Net, which learns to predict the presence of COVID-19 vs common pneumonia vs control, achieves the 93.88% COVID-19 sensitivity, 95.64% overall accuracy, 95.06% common pneumonia sensitivity and 96.91% true negative rate on the COVIDx-CT test split (21192 CT scans) using a small fraction of the training data. We also analyze the effect of Non-Maximum Suppression of overlapping object predictions, both on the segmentation and classification accuracy. The full source code, models and pretrained weights are available on https://github.com/AlexTS1980/COVID-CT-Mask-Net. (a) (b) (c) (d) Figure 1 : Segmentation masks for the same CT scan slice. Figure 1a : input raw image. 1b: 2-class problem, red: GGO masks, blue: C masks. figure 1c: 1-class problem (only GGO). Figure 1d : 1-class problem (merged masks for GGO and C) . White masks are the lungs areas. Best viewed in color. masks are often used in combination with the extracted features to predict the class of the image, [ZLS + 20, WGM + 20], improving the final prediction over the baseline feature extractor. Models like Mask R-CNN [HGDG17] solve a the combined problem of object detection (localization) using bounding boxes and prediction of the object's mask, known as instance segmentation. In this paper we compare three ways to predict instances of lesions' masks. First, we use only masks for GGO areas, merging C with the background. Secondly, we merge GGO and C masks in a single 'lesion' mask. Finally, we keep separate masks for GGO and C instances (this approach was first presented in [TS20] ). The first two are 1+1 class problem (1 object class + background, the latter is a 2+1 problem (2 object classes + background). Our choices are explained by the observations that areas with GGO have larger sizes and are observed more frequently than areas with C in COVID-19 patients, hence GGO class alone may be sufficient for COVID-19 prediction. We show that merging GGO and C masks into one class ('lesion') both improves the segmentation precision and the accuracy of the classification model built on top of the segmentation model compared to using only GGO mask. We measure the model's accuracy using MS COCO convention of Intersect over Union ( The segmentation problems solved in the paper are shown in Figure 1 . The 2-class problem, Figure 1b was first solved in [TS20] . We compare this problem to two 1-class problems: For the first one, Figure 1c , we only consider GGO as the positive class and train the model to detect its instances (predict the bounding box coordinates and segment the positive area within it). Consolidation (C) masks are discarded (merged with the background). For the second problem, Figure 1d , we merge the masks for GGO and C into one class ('lesion'), thus increasing the prevalence of the positive class in the error space, compared to only GGO. We use the same dataset split of 500 training + 150 validation images with varying representation of either class in each image as in [TS20] . Many images are purely negative (only background mask). To train Mask R-CNN model to solve these problems, we extract bounding box coordinates of each lesion object from the masks, and either use 3 (2 2 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. Table 1 : Key hyperparameters of the segmentation models. RPN output is the number of predictions after the NMS step, RoI output is the maximum number of predictions at test stage after the NMS stage, RPN score θ is the threshold for positive predictions at train time, RoI score θ is the threshold for object confidence at test time. In RoI, NMS threshold is used only in testing. positive + 1 background label) or 2 (1 positive+1 background) labels for objects. We define each object as the area isolated from other areas of the same class either by the background or by the area of a different class. Lung mask is merged with the background for all problems. Except the usual normalization using global mean and standard deviation, no other data augmentations or balancing (resampling, class balancing, image rotations, etc) were applied to the data at any stage, unlike in many other solutions, e.g. [GWW20] . For the classification problem us re-use the train/validation/test splits in [TS20, GWW20] . We sample 3000 images from the COVIDx-CT [GWW20] train split (1000 images/class), and use their full validation (21036 CT scans) and test (21191 CT scans) splits. As a result of our approach, we use less that 5% of the COVIDx-CT training data split, and 3% of the total CNCB CT scans data [ZLS + 20]. Each image is the same size as in the segmentation data, 512 × 512 × 3 pixels, all alpha-channels removed. The training split used in this paper is the same as in [TS20] , to have a fair comparison. As with the segmentation problem, no other data normalization tenchinques were used apart from the image global normalization. We study in-depth the effect of non-maximum suppression (NMS) threshold, a criterion for discarding overlapping bounding box predictions in the Region Proposal Network (RPN) at train and test stages and Region of Interest (RoI) at test stage. High threshold values mean that a larger number of overlapping predictions is kept in the model. At the training stage of the segmentation model, low NMS in the RPN implies that a lower number of high-scoring predictions will be passed to RoI, and, a lower number of high-scoring predictions will be processed by RoI, both at train and test stages. This is because RoI, after passing the region of interest through the classification 'head' (two fully connected layers and a class+bounding box layer), can still classify this region as background, even if at the RPN stage the prediction was derived from the 'positive' anchor [HGDG17] . The hyperparameters of the segmentation model are set in Table 1 . The model computes 4 loss functions: two by RPN (objectness and bounding box coordinates) and two by RoI (class and bounding box coordinates). For our training and evaluation we use the torchvision v0.3.0. In COVID-CT-Mask-Net, see Figure 2 , Mask R-CNN layers, including RPN and RoI are set to test mode: they don't compute any losses. Therefore, RoI uses NMS threshold to filter predictions. A larger number of overlapping positive prediction can prompt the model to learn to associate them with a particular class, e.g. they are more prevalent in COVID-19 rather than common pneumonia. If the NMS threshold is low, the model will have to learn to associate a small number of distant predictions with the particular condition, which is likely to be a harder problem, because of the similarities between COVID-19 and common pneumonia. RoI score θ is set to −0.01 to accept all predictions regardless of confidence score, to keep the input size in the classification module S of fixed size. The details of the architecture of the classification model (including the de-batchification of the RoI predictions) are presented in Figure 2 and [TS20] , and its hyperparameters in Table 2. 3 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.23.20218461 doi: medRxiv preprint [TS20] 0.5020 0.4198 0.3871 Separate masks + NMS@0.75 [TS20] 0.4741 0.3895 0.3641 Table 4 : Average precision of segmentation models. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. Each segmentation model was trained using Adam optimzier with the same learning rate of 1e − 5 and weight regularization coefficient 1e − 3 for 100 epochs, the best models for each configuration are reported in Table 4 . Training of each model took about 3h on a GPU with 8Gb VRAM. All classifiers were trained with the same configuration for 50 epochs, which took about 8 hours on the same GPU. The sizes of the models are presented in Table 3 , the difference in size between all segmentation models presented here is minuscule (< 1000 parameters). The architecture and the training of models with separate masks is exactly the same as in [TS20] , the only difference, that explains better results in Tables 4-6 is due to the removal of very small objects (less than 10 × 10 pixels) and reduction of unnecessary large sample sizes during the training of RPN and RoI, from 1024/1024 in [TS20] to 256/256 in this paper. To measure the accuracy of the segmentation models, we use the average precision (AP), a benchmark tool for datasets labelled at an instance level like MS COCO[LMB + 14] and Pascal VOC [EVGW + 10]. We adapt the MS COCO convention and report values for three thresholds: AP@0.5, AP@0.75 and AP (primary challenge metric). The first two use Intersect over Union (IoU) between predicted and ground truth bounding boxes with thresholds equal to 0.5 and 0.75. The latter averages over thresholds between 0.5 and 0.95 with a 0.05 step (a total of 10 thresholds). For details see [LMB + 14] . We adapt the implementation of average precision computation from https://github.com/matterport/Mask_RCNN. Confidence threshold for considering the object (RoI θ hyperparameter) is 0.75 across all models. Only predictions with confidence scores >RoI θ are considered for computing (m)AP, the rest are discarded. RoI NMS θ is always the same as RPN. Figure 3 illustrate the difference between the two NMS thresholds across each all mask types. Each column corresponds to a particular CT scan slice. The bottom row is the ground truth masks with both segmented lesion regions. Rows 1,3,5 are models that use NMS threshold of 0.25, rows 2,4,6 use NMS threshold of 0.75. Rows 1,2 are models that were trained only with the GGO mask. Models in rows 3,4 were trained with merged masks. Models in rows 5,6 were trained using both masks. Models with a higher NMS threshold output a larger number of predictions 5 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101 /2020 doi: medRxiv preprint Figure 3 : Predicted masks for a number of CT scans. Row 7: ground truth masks, red: GGO, blue: C. Rows 1,3,5: models with NMS=0.25. Rows 2,4,6: models with NMS=0.75. Rows 1,2: models trained only with the GGO mask, Rows 3,4: models trained with the merged GGO and C masks. Rows 5,6: models trained with separate masks for both classes. All mask predictions are overlaid with bounding boxes and RoI confidence scores. Best viewed in color. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 overall (except, for example, in Figure 3e , the models with the merged GGO and C masks, row 3 with low NMS and row 4 with high NMS), many of them overlapping. This is a consequence of the fact that a particular predicted region can have a high enough confidence score in RPN to be passed on to RoI, but then RoI classification 'head' outputs a confidence score lower than RoI score θ , hence that region will be classified as background. In case of a low NMS, an overlapping prediction with a slightly lower score would be discarded at RPN stage. In case of the high NMS, it would be added to the pool of predictions, and RoI could extract a confidence score exceeding RoI score θ from this second prediction, therefore, models with high NMS produce more predictions overall, both true and false positives. Evaluation results of the segmentation model are summarized in Table 4 . The model that learns from merged GGO and C masks with high NMS confidently outperforms GGO-only at every level of the IoU threshold. Apart from the NMS effect described above, GGO and C areas in CT scans have many commonalities, so if the model learns to segment GGO only, then Consolidation and background have the same label. As a result, the model associates some important patterns with the background rather than the object class. Results for separate GGO and C masks are mostly better than for only GGO, but worse than for the merged masks. We explain this by the fact that overall C is not very well represented in the dataset (see [TS20] for details of the data analysis), and therefore the model often confuses it with GGO features, or fails to learn certain important features because of their under-representation in the data. Results of the COVID-CT-Mask-Net evaluation are presented in Table 5 , and the comparison of the best models we trained (highest COVID sensitivity and highest overall accuracy) in Table 6 . All results are a significant improvement over the baseline COVID-CT-Mask-Net model in [TS20] , which we beat by 3.08% (COVID sensitivity) and 5.10% (overall accuracy). Comparing the segmentation and classification results though, the advantage of the segmentation models learning from merged masks doesn't immediately translate into the advantage for solving the classification problem. Overall, the classifiers derived from these models are slightly better than the classifiers derived from the segmentation models for two classes, and noticeably better than GGO-only models. This advantage, though is much smaller than than the gap in the AP and mAP metrics for the corresponding segmentation problems. Compared to benchmark models, we beat COVIDNet-CT [GWW20] by 1.07% in COVID-19 sensitivity. In this paper we compared a number of Mask R-CNN models that detect and segment instances of two types of lesions in chest CT scans. We established that merging lesion masks for Ground Glass Opacity and Consolidation into a single lesion mask greatly improves the predictive power and the precision of the instance segmentation model compared to other approaches. We extended these model to predict COVID-19, common pneumonia and control classes using COVID-CT-Mask-Net architecture. On a large COVIDx-CT dataset (21192 chest CT scan slices), the classification model derived from the best segmentation model achieved the COVID-19 sensitivity of 92.68% and an overall accuracy of 96.33%, and the model derived from the segmentation model using both masks achieved a COVID-19 sensitivity of 93.88% and an overall accuracy of 95.64%. The source code and the pretrained models are available on https://github.com/AlexTS1980/COVID-CT-Mask-Net. Deep learning system to screen coronavirus disease 2019 pneumonia The pascal visual object classes (voc) challenge Covidnet-ct: A tailored deep convolutional neural network design for detection of covid-19 cases from chest ct images Mask r-cnn Comparison of chest ct findings between covid-19 pneumonia and other types of viral pneumonia: a two-center retrospective study Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images Covid-ct-mask-net: Prediction of covid-19 from ct scans using regional features. medRxiv Jcs: An explainable covid-19 diagnosis system by joint classification and segmentation Automatic distinction between covid-19 and common pneumonia using multi-scale convolutional neural network on chest ct scans Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography A comparative study on the clinical features of covid-19 pneumonia to other pneumonias Ct scans of patients with 2019 novel coronavirus (covid-19) pneumonia