key: cord-0984511-zm3yhzo4 authors: Arias-Garzón, Daniel; Alzate-Grisales, Jesús Alejandro; Orozco-Arias, Simon; Arteaga-Arteaga, Harold Brayan; Bravo-Ortiz, Mario Alejandro; Mora-Rubio, Alejandro; Saborit-Torres, Jose Manuel; Serrano, Joaquim Ángel Montell; Tabares-Soto, Reinel; Vayá, Maria de la Iglesia title: COVID-19 detection in X-ray images using convolutional neural networks date: 2021-08-20 journal: Machine learning with applications DOI: 10.1016/j.mlwa.2021.100138 sha: 0911f539be61d977981dd1580ae5eca69fa722e3 doc_id: 984511 cord_uid: zm3yhzo4 COVID-19 global pandemic affects health care and lifestyle worldwide, and its early detection is critical to control cases’ spreading and mortality. The actual leader diagnosis test is the Reverse transcription Polymerase chain reaction (RT-PCR), result times and cost of these tests are high, so other fast and accesible diagnostic tools are needed. Inspired by recent research that correlates the presence of COVID-19 to findings in Chest X-ray images, this papers’ approach uses existing deep learning models (VGG19 and U-Net) to process these images and classify them as positive or negative for COVID-19. The proposed system involves a preprocessing stage with lung segmentation, removing the surroundings which does not offer relevant information for the task and may produce biased results; after this initial stage comes the classification model trained under the transfer learning scheme; and finally, results analysis and interpretation via heat maps visualization. The best models achieved a detection accuracy of COVID-19 around 97%. COVID-19 global pandemic affects health care and lifestyle worldwide, and its early detection is critical to control cases' spreading and mortality. The actual leader diagnosis test is the Reverse transcription Polymerase chain reaction (RT-PCR), result times and cost of these tests are high, so other fast and accesible diagnostic tools are needed. Inspired by recent research that correlates the presence of COVID-19 to findings in Chest X-ray images, this papers' approach uses existing deep learning models (VGG19 and U-Net) to process these images and classify them as positive or negative for COVID-19. The proposed system involves a preprocessing stage with lung segmentation, removing the surroundings which does not offer relevant information for the task and may produce biased results; after this initial stage comes the classification model trained under the transfer learning scheme; and finally, results analysis and interpretation via heat maps visualization. The best models achieved a detection accuracy of COVID-19 around 97%. COVID-19, Deep learning, Transfer learning, X-ray, Segmentation Coronavirus illness is a disease that comes from Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS). A novel coronavirus, COVID-19, is the infection caused by SARS-CoV-2 (Zhang, 2020) . In December 2019, the first COVID-19 cases were reported in Wuhan city, Hubei province, China (Xu et al., 2020) . World Health Organization (WHO) declared These diseases cause respiratory problems that can be treated without specialized medicine or equipment. Still, underlying medical issues such as diabetes, cancer, cardiovascular and respiratory illnesses can make this sickness worse (World Health Organization, 2020) . Reverse transcription Polymerase chain reaction (RT-PCR), gene sequencing for respiratory or blood samples are now the main methods for COVID-19 detection (Wang et al., 2020) . Other studies show that COVID-19 has similar pathologies presented in pneumonic illness, leaving chest pathologies visible in medical images. Research shows RT-PCR correlation with Chest CT (Ai et al., 2020) , while others study its correlation with X-ray chest images (Kanne et al., 2020) . Typical opacities or attenuation are the most common finding in these images, with ground-glass opacity in around 57% of cases (Kong & Agarwal, 2020) . Even though expert radiologists can identify the visual patterns found in these images, considering monetary resources at low-level medical institutions and the ongoing increase of cases, this diagnostic process is quite impractical. Recent research in Artificial Intelligence (AI), especially in Deep Learning approaches, shows how these techniques applied to medical images performed well. There are only a few large open access datasets of COVID-19 X-ray images; most of the published studies use as a foundation the COVID-19 Image Data Collection Cohen et al. (2020) , which was constructed with images from COVID-19 reports or articles, in collaboration with a radiologist to confirm pathologies in the pictures taken. Past approaches use different strategies to deal with small datasets such as transfer learning, data augmentation or combining different datasets, finding good results in papers as Civit-Masot et al. (2020) using a VGG16 with 86% accuracy; Ozturk et al. (2020) Khan et al. (2020) with a 99% accuracy using CoroNet a model based on an Xception. This paper presents a new approach using existing Deep Learning models. It focuses on enhancing the preprocessing stage to obtain accurate and reliable results classifying COVID-19 from Chest X-ray images. The preprocessing step involves a network to filter the images based on the projection it is (lateral or frontal), some common operations such as normalization, standardization, and resizing to reduce data variability, which may hurt the performance of the classification models, and a segmentation model (U-Net) to extract the lung region which contains the relevant information, and discard the information of the surroundings that can produce misleading results (Instituto Tecnológico de Informática, 2020). Following the preprocessing stage comes the classification model (VGG16-19), using the transfer learning scheme that takes advantage of pre-trained weights from a much bigger dataset, such as ImageNet, and helps the training process of the network in performance and time to convergence. It is worth noting that the dataset used for this research is at least ten times bigger than the ones used in previous works. Finally, the visualization of heatmaps for different images provides helpful information about the regions of the images that contribute to the prediction of the network, which in ideal conditions should focus on the appearance of the lungs, backing the importance of lung segmentation in the preprocessing stage. After this section, the paper follows the next order: first, the Methodology applied for these approaches, followed by the experiments and results obtained, a discussion of the products, and lastly the conclusions. Our methodology consists of three main experiments to evaluate the performance of the models and assess the influence of the different stages of the process. Each experiment follows the workflow shown in Fig. 1 Figure 1 : Experiment diagram: a is the first classification task, b is the lung segmentation task, c is a covid prediction with standard images, d is a covid prediction with only lungs part in the images, and e is covid prediction without lungs in images. A total of 9 Chest X-ray images datasets were used in different stages: The following datasets were used to train the classification models: BIMCV-COVID19+ , BIMCV-COVID-(Medical Imaging Databank of the Valencia region BIMCV, 2020), and Spain Pre-COVID era dataset. These datasets were provided by the Medical Imaging Databank of the Valencia Region (BIMCV). Also, for comparing these processes with other previous works, we use 5 J o u r n a l P r e -p r o o f another two databases. For positive cases, the COVID-19 Image Data Collection by Cohen (Cohen et al., 2020) , and negative cases compound by Normal, Viral Pneumonia and Bacterian Neumonia database by Kermany (Daniel Kermany et al., 2018) , these last databases can be found (kag). The images from the COVID-19 datasets have a label corresponding to the image projection: frontal (posteroanterior and anteroposterior) and lateral. Upon manual inspection, several mismatched labels were found, affecting model performance, given the difference between the information available from the two views and that not every patient had both views available. In order to automate the process of filtering the images according to the projection, a classification model was trained on a subset of BIMCV-Padchest dataset , with 2481 frontal images and 815 lateral images. This model allowed us to filter the COVID-19 datasets efficiently and keep the frontal projection images that offer more information than lateral images. Finally, to train COVID-19 classification models, the positive dataset (BIMCV-COVID19+), once separated, has 12, 802 frontal images. In Experiment 1, images from BIMCV-COVID-dataset were used as negative cases, with 4, 610 frontal images. BIMCV-COVID-was not organized; also, some of the patients from this dataset were confirmed as COVID-19 positive in a posterior evaluation. Therefore, the models trained on this data could have a biased or unfavorable performance based on dataset size and false positives identified by radiologists. Three datasets were used to train the U-Net models for these segmentations: Montgomery dataset (Jaeger et al., 2020) with 138 images, JSTR (Shiraishi et al., 2020) with 240, and NIH with 100. Despite the apparent small amount of data, the quantity and variability of the images was enough to achieve a useful segmentation model. For the classification task, data were divided into a train (60%), validation (20%), and test (20%) partitions, following the clinical information to avoid images from the same subject in two different partitions, which could generate bias and overfitting in the models. Accordingly, the data distribution was as follows: • For the classification model to filter images based on the projection, the data was composed of frontal images, 1, 150, 723, and 608 for train, test, and validation partitions. In contrast, in the same partitions, the separation of lateral images was 375, 236, and 204 images. • For the COVID-19 classification model, the positive cases dataset has 6, 475 images for train, 3, 454 for test, and 2, 873 for the validation set. Meanwhile, for the negative cases datasets, the BIMCV-COVID dataset is divided into 2, 342, 1, 228, and 1, 040 images for train, test, and validation. After the BIMCV-COVID-dataset was curated, there were 1, 645 images, 895, and 700 for the train, test, and validation sets. Finally, the Pre-COVID era dataset was divided into 2, 803 images, 1, 401, and 1, 265 for the train, test, and validation sets. The image quantity was considerably less for the segmentation task, so creating a test dataset was avoided, leaving the distribution of 80% (382 images) for the train set and 20% (96 images) for validation data. As the images come from several datasets with different image sizes and acquisition conditions, a preprocessing step is applied to reduce or remove effects on the performance of the models due to data variability. For instance, the BIMCV-Padchest dataset was collected all from the same hospital. In contrast, COVID-19 datasets have images mainly from the Valencian region in Spain, other parts of Spain, and other European countries. On the other hand, the Montgomery and NIH segmentation datasets come from US images, while JSRT is a Japanese dataset. In general, this implies that there were many types of X-ray devices used to take the images, with different technologies and resolutions. The preprocessing layer is shown orange in Fig. 1 ., it consists of three steps: resize all images to 224 × 224 pixels in one channel (grayscale). In the second step, Eq. (1) shows the normalization of datasets where x represents the original images and N , the normalized image. Finally, we standardized datasets according to Eq. (2), being Z the standardized image and N the normalized image. When applying standardization to the validation and test sets, the mean and standard deviation (std) from the training set were used to unify the data distribution. There are multiple ways to perform image segmentation; this paper uses a Deep Learning model based on U-Net architecture (Ronneberger et al., 2020) . Previous articles show that U-Net architecture is accurate for the segmentation consequently, in a production setting, model input is an X-ray chest image, and the output is the predicted mask. Fig. 2 . shows the structure of U-Net. For experimental purposes, we tested three different amounts of filters on convolutional layers to find the optimal for this task. The number of filters in contraction blocks are computed according to Eq. (3), where F 0 is the number of the initial filters, i corresponds to the number of contraction blocks. Eq. (4) shows the number of filters for each block for Expansion blocks: F f is the number of filters at the last contraction block, and i is the number of the corresponding expansion block. In the expansion block, the transposed convolution layer uses the same number of filters as convolutional layers. The values used for F 0 were 16, 112, and 64, the models will be identified as U-Net 1,2, and 3, respectively. Kernel size in convolutional layers is 3 × 3 with a kernel initialization henormal and padding same. In Maxpooling layers, the pool size is 2 × 2, the Dropout rate in the first two Expansion and contraction blocks is 0.1, while in three and four of 0.2, and for contraction block five is 0.3. Transposed convolutional layers use kernel size of 2 × 2, strides of 2 × 2, and padding same. Finally, the last convolutional layer uses one filter and a kernel size of 1 × 1. There are two classification tasks in this research, first to separate frontal and lateral Chest X-ray images, and the second one to distinguish COVID-19 weights from the Imagenet dataset (Deng et al., 2020) . These were trained, using millions of images to predict more than 1000 classes. The use of pretrained models takes advantage of features learned on a larger dataset so that a new model converges faster and performs better on a smaller dataset (Aggarwal, 2020). Pre-trained models come from the Tensorflow+Keras library, these weights come from 3 channels images, and the X-ray data comes in one channel. The following weights were used to convert the RGB values from 3 channels to 1 channel: Red 0.2989, Green 0.5870, and Blue 0.1140. Regarding Fig. 1 , in part a, the dataset is filtered by a VGG19 model to find whether a Chest X-ray image is lateral or frontal. This network will be referred to as VGG19 FL to distinguish it from the other classification model. To filter frontal and lateral images, a subset of samples from BIMCV-Padchest and BIMCV-COVID-datasets were labeled manually; experiments were performed using the VGG16 and VGG19 models with pre-trained weights from the Imagenet dataset. Table 1 shows the accuracy for these experiments leaving the best results in VGG19, making that model the one to be used for future parts in the Experiment diagram. Each model was trained for 30 epochs with a batch size of 64. Lungs segmentation was performed with a U-Net model using a combination of three datasets. Three different models were applied, changing filter number in convolutional layers for each U-Net as shown in section 2.4 Segmentation. For each of the three variations, the implementation for COVID case prediction was by selecting the best model between VGG16 and VGG19. For all of them, the model was trained for 30 epochs with a batch size of 64. Table 3 shows the results of part c in which data has not segmentation applied. Meanwhile, Table 4 shows the results of part d and lung segmentation 12 J o u r n a l P r e -p r o o f Journal Pre-proof used over this data. Furthermore, Table 5 shows the results of part e in which segmentation masks were inverted and applied to images. For all Tables above, the models used were a VGG16 and a VGG19. Table 6 shows accuracy, sensitivity, specificity, and F1 score in the COVID label of parts c,d, and e with a threshold of 0.5. Table 7 shows the results of part c. Currently, Table 8 shows the results of part d. Additionally, Table 9 shows the results of part e. For all of them, the models used were a VGG16 and a VGG19. Tables 11, 12, and 13 show the results of parts c, d, and e, respectively, for models VGG16 and VGG19. Table 14 presents COVID positive label accuracy, sensitivity, specificity, and F1 score metrics for parts c,d, and e with a threshold of 0.5 for Experiment 3. To develop this project, we used Python 3.8.1. All models were designed with TensorFlow 2.2.0 using the Keras library. We used Google Colaboratory for most of the experiments. In this case, Tensor Processor Unit (TPU) was used when possible; otherwise, we used the Graphic Processor Unit (GPU) depending on the Colaboratory assignation. RAM available in all instances was 12.72 GB. When Colaboratory was insufficient, we used a machine with Ubuntu 20.04 LTS as the operating system and a GeForce RTX 2080 Ti GPU, with 11 GB to 250W. CUDA Version 11.0, and an AMD Ryzen 9 3950X16-Core Processor, RAM with 128 GB (4 modules of 32GB with 2666Mhz). For more accurate results, we identified two main future work opportunities, For the classification tasks proposed in this research, the better results achieved were using the model VGG19. The first classification task was needed to filter data, as it was a real problem within the datasets, and as their size increases over time, the manual preprocessing becomes unmanageable. More than that is a powerful tool to prevent feeding images from lateral Chest X-ray to the model's training process. It is appropriate to say that this classification doesn't avoid glitches from images different from frontal or lateral Chest X-ray ones. For the previous study, by following experiment order is seen that for Experiment 1, first test accuracy, Table 4 shows better performance than Table 3 , meaning segmentation works, but also Table 5 has better accuracy. In this case, lungs are out of images, meaning models use other image characteristics rather than lungs pathologies for classifying. As shown in Table 6 , the COVID-19 positive label has higher accuracy for all parts. In general, these models use to mismatch more the negative cases than positive ones; ROC and Precision-Recall curves enhance Tables 3,4 Finally, Table 16 shows the comparison of our development with other previous works. by segmenting lungs and adding information combined with lungs surrounding noise. This noise is associated with cables, captured devices, patient's age or gender, making images without lungs have more details for classifying in these cases. Either future application using models without lungs could have the highest chances of mislabeling images because of noise bias. Further investigation is required to segment pathologies identified by the expert radiologist to ensure any noise is a factor for bias. It is also essential to highlight that results presented do not necessarily mean the same performance in all datasets. For example, primary datasets come from European patients; other world patients may show minor data capture changes or pathologies, assuming a better classification is needed using worldwide datasets. In addition, separating the datasets by gender will provide more information on the model's scope, as the soft tissues of the breast may hide parts of the lungs, and it is unknown whether this is considered a bias in the prediction of the model. Authors affirm no one has a competing financial interest of personal issues that could influence the work developed in this paper. COVID-19 X rays -Kaggle Neural Networks and Deep Learning Correlation of Chest CT and RT-PCR Testing for Coronavirus COVID-19) in China: A Report of 1014 Cases Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks Cervical cancer classification using convolutional neural networks, transfer learning and data augmentation PadChest: A large chest x-ray image dataset with multilabel annotated reports Deep learning system for COVID-19 diagnosis aid using X-ray pulmonary images COVID-19 Image Data Collection. arXiv Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell , 172 ImageNet: A Large-Scale Hierarchical Image Database. CVPR09 The WHO Just Declared Coronavirus COVID-19 a Pandemic -Time Early detection in chest images Informe de "In search for bias within the dataset Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery A deep learning approach to detect Covid-19 coronavirus with X-Ray images Essentials for Radiologists on COVID-19: An Update-Radiology Scientific Expert Panel. RSNA CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest xray images Chest Imaging Appearance of COVID-19 Infection Medical Imaging Databank of the Valencia region BIMCV (2020) Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning Automated detection of COVID-19 cases using deep neural networks with X-ray images Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet U-net: Convolutional networks for biomedical image segmentation Detection of coronavirus disease (COVID-19) based on deep features and support vector machine Development of a digital image database for chest radiographs with and without a lung nodule: Receiver operating characteristic analysis of radiologists' detection of pulmonary nodules Very deep convolutional networks for large-scale image recognition Xlsor: A robust and accurate lung segmentor on chest x-rays using criss-cross attention and customized radiorealistic abnormalities generation BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients. arXiv Clinical Characteristics of 138 Hospitalized Patients with 2019 Novel Coronavirus-Infected Pneumonia in Coronavirus Update (Live): 55,912,871 Cases and 1,342,598 Deaths from COVID-19 Virus Pandemic -Worldometer Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral shedding Deep Learning-Based Decision-Tree Classifier for COVID-19 Diagnosis From Chest X-ray Imaging Imaging changes of severe COVID-19 pneumonia in advanced stage The authors would like to thank Universidad Autónoma de Manizales for the support in this project, The Medical Imaging Databank of the Valencia Region for providing some datasets with support from the Regional Ministry of Innovation, Universities, Science and Digital Society grant awarded through decree 51/2020 by the Valencian Innovation Agency (Spain) and Regional Ministry of Health in Valencia Region and be always an integral part of the project, finally author thanks other dataset authors for make free access dataset well structured helping solve world-changing medical problems.