key: cord-355441-0b266hwn authors: Misztal, Krzysztof; Pocha, Agnieszka; Durak-Kozica, Martyna; Wątor, Michał; Kubica-Misztal, Aleksandra; Hartel, Marcin title: The importance of standardisation – COVID-19 CT&Radiograph Image Data Stock for deep learning purpose date: 2020-10-28 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2020.104092 sha: doc_id: 355441 cord_uid: 0b266hwn With the number of affected individuals still growing world-wide, the research on COVID-19 is continuously expanding. The deep learning community concentrates their efforts on exploring if neural networks can potentially support the diagnosis using CT and radiograph images of patients’ lungs. The two most popular publicly available datasets for COVID-19 classification are COVID-CT and COVID-19 Image Data Collection. In this work, we propose a new dataset which we call COVID-19 CT&Radiograph Image Data Stock. It contains both CT and radiograph samples of COVID-19 lung findings and combines them with additional data to ensure a sufficient number of diverse COVID-19-negative samples. Moreover, it is supplemented with a carefully defined split. The aim of COVID-19 CT&Radiograph Image Data Stock is to create a public pool of CT and radiograph images of lungs to increase the efficiency of distinguishing COVID-19 disease from other types of pneumonia and from healthy chest. We hope that the creation of this dataset would allow standardisation of the approach taken for training deep neural networks for COVID-19 classification and eventually for building more reliable models. At the end of 2019, a new coronavirus SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) appeared in Wuhan, which then triggered a global pandemic. SARS-CoV-2-induced pneumonia has been termed COVID-19 (Coronavirus Disease 2019). The main symptoms of COVID-19 are high fever, 5 dry cough, shortness of breath, muscle pain, diarrhea, myalgia, nasal obstruction and runny nose [1] . As of 15 July 2020, a total of 13,690,108 confirmed cases with COVID-19 pneumonia have been reported globally, including 586,265 deaths (4.28 %). The current diagnostic method for COVID-19 is real time reverse transcription 10 -polymerase chain reaction (RT-PCR) [2] . The main limitation of this method is the insufficient amount and quality of the clinical material from which the nucleic acids are isolated [3] . This can result in false negative results. Lung CT and radiograph scans are gradually recognised as an alternative for COVID-19 diagnosis. The lungs of people infected with COVID-19 are 15 characterised by consolidation, ground-glass opacification, bilateral involvement, peripheral and diffuse distribution. Lung CT scans can be used to diagnose COVID-19 in patients in acute and convalescent periods of disease [1] . Only patients with severe or permanent lung damage will show changes in CT after recovery, which makes it impossible to determine the percentage of the population 20 that has undergone the disease based on lung scans [4] . The favorable aspects of CT scanners are their availability in many hospitals and the short amount of time required to obtain the results estimated to be around 15 minutes. The use of CT for initial diagnostics might significantly increase testing capabilities. On the other hand, the imaging costs are relatively high, which may limit the 25 use of CT for COVID-19 diagnostics. Moreover, the use of CT for COVID-19 diagnostics requires thorough cleaning of the equipment between examinations 2 J o u r n a l P r e -p r o o f and a large surface of contact increases the risk of infection, compared to the RT-PCR method performed in sterile conditions [5] . Despite the large number of publications indicating high sensitivity and specificity of CT, the radiologists' The advantages of radiograph scans for COVID-19 diagnostics include greater availability of radiographs, lower radiation doses to which the patient is subjected and a short scanning time. 35 Recently, both CT and radiograph scans have been shown to enable training models which achieve promising results in the COVID-19 classification task [7, 8] . Considering the advantages and disadvantages of both methods we decided to create a database containing both CT and radiograph images. Currently, the most popular datasets for COVID-19 classification are COVID- 40 CT [9] and COVID-19 Image Data Collection [10] . These datasets contain images of CT and radiograph chest scans of individuals affected with COVID-19 as well as of patients not affected with COVID-19. scans. The number of CT scans is insufficient for training deep models. The 45 number of radiograph images is higher but there is not enough negative samples. Moreover, this dataset does not define a data split. COVID-CT concentrates on CT scans and defines a data split. However, it provides only a rough categorisation of samples into COVID-19-positive and negative cases, where negative cases can be images of healthy individuals or 50 patients with a different disease. Training neural networks on these datasets requires including samples from additional data sources such as common bacterial pneumonia [11] or lung nodule analysis [12, 13] . Apostolopoulos and Mpesiana [14] used a MobileNet v2 [15] pre-trained on 55 ImageNet [16] for fine-tuning on two datasets which were created using samples from COVID-19 Image Data Collection [10] , COVID-19 X-ray collection available on kaggle [17] , and a dataset containing radiograph scans of common bacterial 3 J o u r n a l P r e -p r o o f pneumonia [11] . They achieved sensitivity of 98% and specificity of 96% on the dataset which included both common bacterial pneumonia and viral pneumonia 60 cases as distractors for the COVID-19 class, and sensitivity of 99% and specificity of 97% on the dataset which included only common bacterial pneumonia cases. Zhao et al. [9] pre-trained a DenseNet [18] on ChestX-ray14 [19] and fine-tuned it on COVID-CT. They achieve AUC of 0.82. He et al. [7] used models pre-trained on ImageNet, which were further pre-65 trained using contrastive self-learning [20] first on LUNA dataset [12, 13] and then on COVID-CT, followed by fine-tuning on COVID-CT. This methodology allowed them to achieve AUC of 0.94 with DenseNet-169. The huge variety of scenarios in which the models are evaluated prevents any comparison between them. As a result, it is difficult to tell which design choices 70 contribute to improved performance of some models and to use this knowledge to build incrementally more reliable solutions. In this work, we propose COVID-19 CT&Radiograph Image Data Stock, which combines data from multiple sources into a single dataset. The advantages of COVID-19 CT&Radiograph Image Data Stock include: • a large number of both CT and radiograph scans of COVID-19 class, • a large number of negative samples in both modi, • the exact class of the negative samples is known, • the source of each sample is known, • a data split is defined. Using COVID-19 CT&Radiograph Image Data Stock does not require employing any additional data sources. We hope that this dataset will allow for better understanding of the influence of individual choices on the final performance of COVID-19 classification models. 4 . we show that using a precise class information helps to improve the model's 100 ability to distinguish between COVID-19-positive and negative samples. The rest of this work is organised as follows: in section 2 we shortly characterise COVID-CT and COVID-19 Image Data Collection, and describe in detail how was COVID-19 CT&Radiograph Image Data Stock created. In section 3, we describe the evaluation of models trained on each of the datasets and in section 105 4, we present the results. In section 5, we conclude the paper. In this section, we briefly characterize COVID-CT [9] and COVID-19 Image Data Collection [10] shortly discussing their strong and weak points and describe in detail how the proposed COVID-19 CT&Radiograph Image Data Stock was and are in png format. The task is to classify images as belonging to COVID-19-positive or negative class. This dataset has a defined split which allows for Radiograph Image Data Stock is to create a public pool of CT and radiograph images of lungs to increase the efficiency of distinguishing COVID-19 from other types of pneumonia and from healthy lungs. We hope this can help to prepare a "ground" for distinguishing between newly discovered and already known viruses and bacteria strains causing pneumonia in order to improve diagnostics in the Images were searched by entering the following phrases: COVID-19 lung CT / radiograph images, normal chest CT / radiograph/X-RAY, bacterial pneumonia CT / radiograph images, viral pneumonia CT / radiograph and fungal CT / 145 radiograph images. To qualify the image for the database, it must contain an accurate annotation about the disease (a type of pneumonia). Therefore, images from COVID-CT [9] were rejected for the lack of this information. Most of the images from online publications were downloaded as high quality pictures directly from the web pages. Importantly, each image in the database is accompanied with a precise The number of COVID-19-negative samples is insufficient for training neural networks. Therefore, following [14] , we decided to enrich the COVID-19-negative class with radiograph images from dataset of common bacterial pneumonia [11] . The COVID-19-positive samples were split into train, validation and test sets following 75-15-10 ratio. We trained the following models: ResNet-18, ResNet-50 [55] , WideResNet-50 [56] , and DenseNet-169 [18] . For these models we used pytorch implementations 235 and ImageNet initialisation. Additionally, we used DenseNet-121 from [57] which is pre-trained on seven datasets: ChestX-ray14 [19] , PadChest [58] , Chexpert [59], MIMIC-CXR [60], Indiana chest X-ray collection [61] , dataset from [62] , and the RSNA Pneumonia Detection Challenge dataset 2 which is a subset of ChestX-ray8 [19] . We refer to this model as DenseNet-121 + to highlight that it 240 has been pre-trained on medical datasets instead of ImageNet. All of the models were trained with Adam optimiser and used step scheduler with step size 7 and gamma equal to 0.1. ResNet-18 was trained with batch size equal to 32, ResNet-50 with batch size equal to 16, and the remaining networks with batch size equal to 8. The loss function for all the models was cross entropy. The networks were trained for 100 epochs with early stopping with patience equal to 10. The other training parameters were chosen using random search which we describe next. In case of binary classification we report the following metrics: precision, recall, F1 score, accuracy and area under ROC curve (AUC). We assume that 265 the positive class is COVID-19. In case of multiclass classification we report: precision, recall, F1 score, accuracy, binary AUC calculated by merging COVID-19-negative classes together, and multiclass AUC obtained by calculating binary AUC scores for each class in one-versus-rest regime and taking an unweighted mean of these scores. 3 . Again, 270 we assume COVID-19 to be the positive class and the remaining classes to be negative. For all these metrics the higher the score, the better the performance. In this section we present the results of binary classifiers trained on COVID- The results for binary classifiers trained on images of CT scans are shown in resulting mean is not over-influenced by the most common class in the dataset. The confusion matrices for the best performing models are presented in figure 5 and the confusion matrices of the remaining models are shown in Appendix B. The multiclass AUC of the presented models ranges between 0.74 for ResNet- CT&Radiograph Image Data Stock we constructed two additional test sets. We In case of CT data we have not found any images that were present in COVID-CT train or validation sets which is not surprising as this dataset was not used as a source of images for COVID-19 CT&Radiograph Image Data Stock. Since COVID-CT does not provide an exact diagnosis for COVID-19-negative samples we could not ensure the data distribution in the reduced test set to In To conclude, these results suggest that precise label information can improve the performance of neural networks on the task of binary COVID-19 classification. In this work, we proposed a new self-contained dataset for COVID-19 classi-440 fication which includes a significant number of both CT and radiograph images from a diverse set of classes. The dataset was constructed ensuring a high quality of samples with each image carefully annotated with its precise label and source. A train-validation-test split is defined. The dataset is publicly available and will be updated monthly until the inhibition of COVID-19 pandemic. We fine-tuned several neural network architectures pre-trained on ImageNet or medical dataset to provide benchmark results for the proposed dataset. We Neural networks for CT were trained for each slide separately, which corresponds to the radiograph dimension.In the future we would like to investigate the whole series for patient at ones. Also, we would like to address several remaining questions which include using additional information about the patient to build 460 better-performing models or analysing how the models trained on images of CT and radiograph will perform when used directly on CT or radiograph scans. Here, we present the confusion matrices of multiclass classifiers trained on COVID-19 CT&Radiograph Image Data Stock. In figure 6 the results of multiclass classifiers trained on CT data is shown. A careful analysis shows that these models struggle to separate between healthy and COVID-19-positive cases -most of the COVID-19-positive samples are well classified but when mistakes occur they are more likely to be categorised as shown. In this case, the COVID-19-positive samples are always correctly classified and hardly any samples of bacterial pneumonia or healthy chest are categorised 685 as COVID-19-positive samples. The class which is most commonly confused with COVID-19 is viral pneumonia but the number of samples misclassifed as COVID-19 never exceeds 10% of samples in this class. Viral and bacterial pneumonia classes are not well separated with bacterial pneumonia cases categorised as viral pneumonia more often than the other way around. The performance metrics for these models are shown in table 14. Chest CT findings of COVID-19 pneumonia by duration of symptoms Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Chest CT for typical 470 2019-nCoV pneumonia: relationship to negative RT-PCR testing Imaging changes of severe COVID-19 pneumonia in advanced stage Breaking the Testing Logjam: CT scan diagnosis ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection acr.org/Advocacy-and-Economics/ACR-Position-Statements/ Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans Automated detection of COVID-19 cases using deep neural networks with X-ray images CT scan dataset about COVID-19 COVID-19 image data collection, arXiv 2003.11597 URL Identifying medical 495 diagnoses and treatable diseases by image-based deep learning LUNA16 Part 1/2 LUNA16 Part 2/2 Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Mobilenets: Efficient convolutional neural networks for mobile vision applications Imagenet large scale visual recognition challenge COVID-19 X rays Proceedings of the IEEE conference on computer vision and pattern recognition Chestx-ray8: 520 Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases A simple framework for con-525 trastive learning of visual representations Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients CT Findings of Coronavirus Disease (COVID-19) Severe Pneumonia First imported case of 2019 novel coronavirus in Canada, presenting as mild pneumonia First case of Coronavirus Disease 2019 (COVID-19) pneumonia in Taiwan A case of COVID-19 and pneumonia returning from Macau in Taiwan: Clinical course and anti-SARS-CoV IgG dynamic Featuring COVID-19 cases via screening symptomatic patients with epidemiologic link during flu season in a medical center of central Taiwan Chest CT findings of early and progressive 550 phase COVID-19 infection from a US patient Breadth of concomitant immune responses prior to patient recovery: a case report of non-severe COVID-19 Clinical characteristics of 140 patients infected with SARS-CoV-2 in Importation and human-560 to-human transmission of a novel coronavirus in Vietnam Clinical Characteristics of Imported Cases of Coronavirus Disease Radiological Findings in Patients with COVID-19 The first Vietnamese case of COVID-19 acquired from China Case of the index patient who caused tertiary transmission of COVID-19 infection in Korea: the application of lopinavir/ritonavir for the treatment of COVID-19 infected pneumonia 575 monitored by quantitative RT-PCR Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): analysis of nine patients treated in Korea Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Evolution of CT manifestations in a patient recovered from 2019 novel coronavirus (2019-nCoV) pneumonia in Wuhan, China A locally transmitted case of SARS-CoV-2 infection in Taiwan Emerging 2019 novel coronavirus (2019-nCoV) pneumonia Frequency and distribution of chest radiographic findings in COVID-19 positive patients Severe acute respiratory disease in a Huanan seafood market worker: Images of an early casualty Chest imaging appearance of COVID-19 infection Imaging profile of the COVID-19 infection: 605 radiologic findings and literature review Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review Classification of COVID-19 in 610 chest X-ray images using DeTraC deep convolutional neural network Radiological findings for diagnosis of SARS-CoV-2 pneumonia (COVID-19) Septic pulmonary embolism requiring critical care: clinicoradiological spectrum, causative pathogens and outcomes Imaging of pulmonary infections Acute Klebsiella pneumoniae pneumonia alone and with concurrent infection: comparison of clinical and thin-section CT findings Spectrum of 625 imaging findings in pulmonary infections. Part 1: Bacterial and viral Radiographic and CT features of viral pneumonia Imaging of pulmonary viral pneumonia Radiologic pattern of disease in patients with severe acute respiratory syndrome: the Toronto experience Severe acute respiratory syndrome: radiographic appearances and pattern of progression in 138 patients Deep residual learning for image recognition Wide residual networks On the limits of cross-645 domain generalization in automated X-ray prediction Padchest: A large chest x-ray image dataset with multi-label annotated reports Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs Preparing a collection of radiology examinations for distribution and retrieval 665 Chest radiograph interpretation with deep learning models: assessment with radiologist-adjudicated reference standards and population-adjusted evaluation The aim of COVID-19 CT&X-ray Image Data Stock is to create a publicpool of CT and X-ray images of lungs to increase the efficiency of distinguishingCOVID-19 disease from other types of pneumonia and from healthy chest We compared binary and multiclass classifiers trained onCOVID-19 CT&X-ray Image Data Stock revealing that training neural networkswith precise label information can improve the performance on binary COVID-19 classification task