key: cord-0496143-64gkopsz authors: Haghanifar, Arman; Majdabadi, Mahdiyar Molahasani; Ko, Seokbum title: COVID-CXNet: Detecting COVID-19 in Frontal Chest X-ray Images using Deep Learning date: 2020-06-16 journal: nan DOI: nan sha: 18a690f7622f41638b42d4f07dabbd0fd45b784a doc_id: 496143 cord_uid: 64gkopsz One of the primary clinical observations for screening the infectious by the novel coronavirus is capturing a chest x-ray image. In most of the patients, a chest x-ray contains abnormalities, such as consolidation, which are the results of COVID-19 viral pneumonia. In this study, research is conducted on efficiently detecting imaging features of this type of pneumonia using deep convolutional neural networks in a large dataset. It is demonstrated that simple models, alongside the majority of pretrained networks in the literature, focus on irrelevant features for decision-making. In this paper, numerous chest x-ray images from various sources are collected, and the largest publicly accessible dataset is prepared. Finally, using the transfer learning paradigm, the well-known CheXNet model is utilized for developing COVID-CXNet. This powerful model is capable of detecting the novel coronavirus pneumonia based on relevant and meaningful features with precise localization. COVID-CXNet is a step towards a fully automated and robust COVID-19 detection system. In this study, we firstly collect a dataset of CXRs captured from RT-PCR positive COVID-19 infected patients from multiple publicly accessible sources. Our collected dataset is the largest source of COVID-19 CXRs publicly available, containing 738 images from various public datasets. Then, we investigate the possibility of disease detection by an individual Convolutional Neural Network (CNN) model trained on different amounts of input images. On the next step, performance of prominent pretrained CNN models for fine-tuning on the dataset is investigated. Afterwards, the CheXNet pretrained model on the same type of medical images is introduced, and its efficiency is discussed. Finally, we developed our model based on the CheXNet and designed a lung segmentation module to improve the model localization of lung abnormalities. Learning curves and confusion matrices are plotted for each step to facilitate model interpretability. Class activation map (CAM) is our main visualization leverage to compare the prior mentioned models throughout this research study. Our main contributions can be summarized as: • Collecting the largest public dataset of COVID-19 CXR images from different sources Identification and screening of COVID-19 pneumonia using different types of medical data are fast-growing topics of interest. There are a huge number of research studies conducted to investigate the application of machine learning (ML) and deep learning (DL) methods for COVID-19 detection mostly published as preprints in arXiv 1 and medRXiv 2 . While some studies are focusing on a non-image-based diagnosis method, such as virus genomes [15] , clinical features [16] , or blood exams [17] , majority of available articles are using images for COVID-19 pneumonia detection. MLbased methods along with manual feature extraction algorithms are used in few articles to diagnose the disease [18, 19, 20, 21, 22] . However, most of the studies are utilizing DL-based techniques. In an early effort, Li et al. have implemented the COVNet to classify COVID-19 from community acquired pneumonia (CAP), which includes pneumonia caused by other germs, in CT scans [23] . Other researchers have also tried to tackle the same problem using CT images, reaching high scoring metrics and precise abnormality localization [24, 25] . Contrarily, even though many studies have claimed to reach excellent classification accuracy scores using CXRs [26, 27, 28, 29, 30, 31, 32] , none of them have reported visualization results for their model decisions. Considering the fact that pneumonia diagnosis is more challenging in CXRs in comparison with CT scans and the available COVID-19 pneumonia CXR datasets are small, we investigate those studies with visual interpretability as it could be considered as a stronger performance metric. Zhang et al. used a dataset including 100 CXRs from COVID-19 positive cases and developed a CNN model based on the ResNet architecture with pretrained weights from ImageNet as the backbone [33] . Their best model achieved an f-score of ≈ 0.72 in classifying COVID-19 pneumonia from CAP. Li et al. applied their multi-player model called COVID-MobileXpert on a dataset of 537 images equally divided into normal CXRs, CAP cases, and COVID-19 pneumonia samples [34] . Their main goal was to achieve acceptable accuracy using lightweight networks, such as SqueezeNet, to be employed for pneumonia detection on mobile devices capturing noisy snapshots. Rajaraman et al. collected a more expanded dataset containing 313 COVID-19 pneumonia CXR from two different sources [35] . Lung segmentation was then applied to the images using a U-Net-based model. Finally, an ensemble model of different fine-tuned models was implemented and pruned iteratively to reduce the number of model parameters. Their best single pruned architecture was Inception-V3, and their best ensemble model was by weighted averaging strategy. They have achieved the f-scores of 0.9841 and 0.99 detecting COVID-19 pneumonia from CAP and normal CXR samples, respectively. However, their final generated visualization maps are not precisely discussed, and their model suffers some implementation drawbacks due to significant number of parameters. In a more advanced effort, COVID-Net was introduced by Wang and Wong [36] . It was trained on COVIDx, a dataset with 358 CXR images from 266 COVID-19 patient cases. Their architecture was first trained on ImageNet and then achieved a best f-score of 0.9480 in three-class classification. Their model visualization is not properly presented nevertheless. A most recent similar research study was CovidAID conducted by Mangal et al. [37] . CovidAID is a DenseNet model built upon CheXNet weights. They compared their results with COVID-Net on the same test set. Their findings suggest CovidAID surpassed COVID-Net with a notable margin, 0.9230 f1 score in comparison with 0.3591. CovidAID image visualization shows more precise performance compared to previous studies. Consequently, developed models are suffering a lack of robustness in identifying COVID-19 pneumonia, which is mainly related to the insufficient number of CXR images. The most common imaging technique which is used as the first clinical step for chest-related diseases is CXR [38] . Hence, more CXRs could be collected publicly than CT images. A batch of randomly selected samples from the dataset with frontal view, also known as anteroposterior (AP) or posteroanterior (PA) view, are shown in Fig.1 . Figure 1 : Randomly selected frontal CXR images from different sources X-ray imaging produces grey-scale images where the intensity of body tissues corresponds to the absorption amount of X wave radiations in that particular region. There is another CXR imaging view called L view, standing for Lateral, which is an adjunct for the main frontal view image. Lateral CXR is performed when there is diagnosis uncertainty using frontal CXR [39] , thus it is not as common as frontal CXR and due to its different angel, it is excluded from our data. The third CXR view, called AP supine or AP erect, is an alternative for PA view, usually taken from the patients who are generally too unwell that can not leave the bed to sit or stand. The erect view has lower quality than PA and supine view is of lesser quality than both PA and erect, but AP views still can help diagnose acute and chronic chest conditions [40] and are included in the dataset. Since COVID-19 is a novel disease, the number of publicly available X-ray images of infected patients is relatively small. There are different online image databases, regularly updated day by day, which our dataset is constructed upon them: 1. Radiopaedia [41] : open-edit radiology resource where radiologists submit their daily cases. 2. SIRM [42] : the website of the Italian Society of Medical and Interventional Radiology, which has a dedicated database of COVID-19 patients, including both CXR and CT images. 3. EuroRad [43] : a peer-reviewed image resource of radiological case reports. : an image-based social forum that has dedicated a COVID-19 clinical cases section. 5. COVID-19 image data collection [45] : a GitHub repository by Dr. Cohen et al., which is a combination of some of the mentioned resources and some other images. 6. Twitter COVID-19 CXR dataset [46] : a twitter thread of a cardiothoracic radiologist from Spain who has shared some high-quality positive subjects. 7. Peer-reviewed papers 3 : papers which have shared their clinical images, such as [47] and [48] . 8 . Hannover Medical School dataset [74] : a GitHub repository containing images from the Institute for Diagnostic and Interventional Radiology in Hannover, Germany. 9. Social media: images collected from Instagram pages [75] and [76] . For CXRs of normal cases, four resources were utilized: 1. Pediatric CXR dataset [77] : AP view CXRs of children is collected by the authors in Guangzhou Medical Center, including normal, bacterial pneumonia, and viral pneumonia cases. 2. NIH CXR-14 dataset [78] : Curated by the National Institute of Health (NIH), this large dataset has more than 100000 images of normal chests and different abnormal lungs, including pneumonia and consolidation. 3. Radiopaedia [41] : Other than infected cases, some healthy CXR taken for the purpose of medical check-ups is also available in this resource. Some samples correspond to COVID-19 patients prior to their infection. These images are labeled as "Clear lungs". 4. Tuberculosis Chest X-ray Image Datasets [79] : Provided by U.S. National Library of Medicine, it has two datasets of PA CXRs containing 406 normal x-rays. Currently, 738 images of COVID-19 pneumonia patients are collected in different sizes and formats, such as jpg or webp. All collected images are publicly accessible in the dedicated repository 4 . The dataset includes 5000 normal CXRs as well as 4600 images of patients with CAP collected from NIH CXR-14 dataset. A sample of CXR with frontal view of a COVID-19 positive case from the dataset is exhibited in Fig.2 . The CXR is taken from AP erect view captured of a 65 years old male patient admitted with shortage of breath and myalgia. Image findings noted by radiologist are bilateral ill-defined peripheral airspace opacification in both lungs, normal heart size, and no pleural effusions. Due to the small number of images in positive class, image augmentation is utilized in order to prevent overfitting. The images are rotated and the image brightness and scale are altered randomly. Images are normalized and down-sized to (320, 320) in order to prevent resource exhaustion and decrease RAM usage. There are various image enhancement methods based on histogram equalization. These algorithms try to increase image contrast to make non-linearities more distinguishable. Thus, enhancement methods are widely applied in the field of medical image processing. Radiologists also use manual contrast optimization to better diagnose mass and nodules in different types of x-ray radiographs. An example of enhancement algorithms applied on CXR with annotation is shown in Fig.3 . As expected, Contrast Limited Adaptive Histogram Equalization (CLAHE) has revealed the nodular-shaped opacity related to a COVID-19 positive case better than other histogram equalisation methods. CLAHE is one of the most popular enhancement methods in different image types [80] . Another histogram equalization-based algorithm is Bi-histogram Equalization with Adaptive Sigmoid Function (BEASF) [81] . This algorithm has shown considerably good results on grey-scale medical images, like dental x-ray radiographs [82] . BEASF improves the image contrast adaptively based on the global mean value of the pixels. It has the hyperparameter γ to define the sigmoid function slope. This image enhancment method is implemented in python and it is available in the repository 5 . Fig.4 depicts the output of BEASF with different γ values. Although BEASF did not result in opacity detection improvement in all the images, it could be a good compliment for the CLAHE method. Thus, a BEASF-enhanced image with a γ = 1.5 is concatenated with CLAHE and the main image to be fed into the model to increase accuracy and classification determination. In this section, model development is explained in different steps. At first, a base convolutional model is designed and trained on different portions of the dataset. Then, pretrained models based on the ImageNet dataset are discussed. Finally, a pretrained model on a similar image type is scrutinized. An individual CNN is designed and trained using 300 images, half of them from COVID-19 viral pneumonia cases and half of them from normal cases obtained from. The architecture of the model consists of 5 convolutional layers, followed by a flatten layer and three fully connected layers. No batch normalization or pooling layers are used for this implementation stage. Fig.5 illustrates the base model architecture. Convolution layers have 32 filters, each of which has a kernel size of 3x3. The activation function is set as rectified linear unit (ReLU) which adds non-linearity to images helping the model with better decision making. Fully connected layers have 10 neurons and the last layer has one neuron which demonstrates the probability of the input image belonging to the healthy class (p = 0.0) or pneumonia class (p = 1.0). Transfer learning is to benefit from a pretrained model in a new classification task. Some pretrained models are trained on millions of images for many epochs and achieved high accuracy on a general task. These models can be applied to The last fully connected (FC) layer is replaced with another FC with the number of nodes representing number of classes. The network could be easily replaced with other pretrained networks. It is worth mentioning that pretrained models are used for fine-tuning, training on target dataset for small number of epochs, instead of retraining for many epochs. CheXNet is trained on CXR-14, the largest publicly available CXR dataset of adult cases with 14 different diseases such as pneumonia and hernia [83] . CheXNet claims to have a radiologist-level diagnosis accuracy, has better performance than previous related research studies [78] , [84] , and has simpler architecture than later approaches [85] . CheXNet is based on DenseNet-121 model architecture and has been trained on frontal CXRs. It also could be used as a better option for the final model backbone. Despite aforementioned method, CheXNet is trained on a similar dataset of CXRs, classifying images into 14 general pulmonary diseases including consolidation, pneumonia and infiltration which are among common CXR image findings of COVID-19. According to [78] , pneumonia is correlated to other thoracic findings which is shown in Fig.7 . Figure 7 : Co-occurrence of different CXR findings as a circular diagram by [78] COVID-CXNet is a CheXNet-based model, fine-tuned on COVID-19 CXR dataset with 431 layers and ≈ 7M parameters. Architecture of the COVID-CXNet is presented in Fig.8 . COVID-CXNet has a FC layer consisting 10 nodes followed by a dropout layer with a 0.2 dropping rate to prevent overfitting. Activation function of the last layer is changed from SoftMax in CheXNet to Sigmoid function. The diagram of the ROI extraction block is shown in Fig.9 . Segmentation procedure Figure 9 : The segmentation approach based on the U-Net A U-Net based semantic segmentation [86] is utilized to extract lung pixels from the body and the background. A collection of CXRs with manually segmented lung masks from Shenzhen Hospital Dataset [87] and Montgomery County Dataset [88] are used for training. Using model checkpoints, best weights are used to generate final masks. Afterwards, edge preservation is considered by applying dilation as well as adding margins to the segmented lung ROIs. Lung-segmented CXR is then used as the model input. To evaluate and compare the performance of the models, a number of metrics are considered. Accuracy score is the basic metric used for statistical classification models, which is required but inadequate here as we are more interested in efficiently classifying positive class samples. Thus, f1-score for positive class is also measured. The main performance metric here is visualization results, because small number of COVID-19 positive samples make the model prone to overfitting by looking at wrong regions of interest. To precisely interpret visualization results, exact image findings related to COVID-19 pneumonia must be investigated. COVID-19 is an infectious disease caused by severe respiratory syndrome coronavirus 2 (SARS-CoV-2) [89] . It attacks the respiratory system and is responsible for pneumonia, which is mostly observed as pulmonary consolidation and Ground-Glass Opacification (GGO) in chest images. Pneumonia is also caused by other germs, such as different viruses or bacteria [90] . Most common manifestations and patterns of lung abnormalities caused by COVID-19 are: • 1. GGOs: first signs and extremely hard to observe in CXRs (The obsevartion is more straightforward in CT images). pneumonia is different from CAP which tends to be unilateral and involving a single lobe. In order to train the base model, the optimizer is set to "adam" with the optimal learning rate obtained using exponentially learning rate increasing method [91] , which is illustrated in Fig.10 . Training the base model using 300 samples resulted in an accuracy of 96.72% on the test-set within 100 epochs, and the learning curves are plotted in Fig.11 . Figure 11 : Learning curves of the model loss, trained on 300 images As the validation loss is higher than the training loss in most of the epochs, there is no overfitting. High accuracy is also a sign that shows there is no underfitting. Nevertheless, the validation accuracy changes show a high inconsistency. As the dataset extended, fluctuations in the curves damped gradually. With a training-set of 480 images, 240 from COVID-19 cases and 240 from normal chests, the model reaches a reasonable accuracy on the test-set of 120 images. Model loss curves on the training-set and the validation-set (which is test-set here) on different dataset sizes are plotted in Fig.12 . The base model hit accuracy of 96.10%, relatively high compared to the number of images in the training-set, and complexity of pneumonia identification in CXRs. To validate model performance and robustness, CNN architecture is demystified. A popular method to evaluate a network is to plot its heatmap visualization for each class to investigate parts of the image contributing the most on the network decision for input images. A popular technique for CNN visual explanation is Gradient-weighted Class Activation Mapping (Grad-CAM). Grad-CAM concept is to use the gradients of any target concept, flowing into the last convolutional layer to produce a coarse localization map with a concentration on important regions in the image for predicting the class [92] . Another visualization is Local Interpretable Model-Agnostic Explanations (LIME) which performs local interpretability on input images; training local surrogate models on image components to find regions with the highest impact on model decision [93] . Grad-CAM of the base model for images of both normal and infected classes is illustrated in Fig.13 . Although classification is successfully implemented with high accuracy score, imaging features extracted by the base model are wrong in more than half of the test images. One possible reason could be the fact that normal CXRs are mostly for pediatrics. To go further about the problem of the model and to prove whether it is because of the normal CXR dataset, the model was evaluated on a small external dataset of 60 images. The confusion matrix in Table 1 shows that the model is not consistent regarding normal cases. According to the results, recollecting CXRs from adult lungs is essential. The largest dataset containing normal cases is the NIH CXR-14 dataset, with almost 17000 images with a majority of adult CXRs. Then, we increased the number of normal CXRs in the dataset to prevent overfitting. The model is trained on a dataset of 3400 images, 3000 from normal and 400 from COVID-19 pneumonia classes. It is worth mentioning that classes are weighted in loss function calculation as a method of dealing with class imbalance. The results are presented in Fig.14 and Table 2 . The base model has achieved a high area under the curve (AUC) of 0.9984 and has obtained an accuracy of 98.68%, while reaching a reasonable f-score of 0.94. Model visualization shows better performance; however, there are still various wrong regions in both Grad-CAM and LIME image explanations illustrated in Fig.15 . In LIME explanation, green super-pixels are most contributing to current predicted class and red super-pixels are most contributing to the other class [93] . In Grad-CAM visualization, region importance decreases from red parts to blue areas. While LIME is looking inside the lung area to decide about positive class, it also has decided based on wrong zones in the left upper lung region. DenseNet-121 pretrained model is fine-tuned on the training-set containing pediatric normal cases for 20 epochs. The learning curve is plotted in Fig.16 . Demonstrating bad results, model is incapable of learning while fine-tuning only the last FC layer and freezing feature extractor convolutional layers and overfits to the data if retrained thoroughly for several epochs. As expected, ImageNet categories are of typical objects which have somewhat non-similar features as pneumonia imaging patterns in CXRs. Hence, although transfer learning techniques from ImageNet-pretrained models have remarkably improved segmentation accuracy thanks to their capability of handling complex conditions, applying them for classification is still challenging due to the limited size of annotated data and a high chance of overfitting [94] . ResNet-50 has also produced almost the same results. In Fig.7 most correlated classes with pneumonia are infiltration, atelectasis, etc. We first probe into CheXNet model to see if it is capable of correctly classifying COVID-19 pneumonia images from normal cases with no further improvements. Fig.17 shows results for two sample CXRs from both classes. Extracted heatmaps reveal that CheXNet is properly marking chest lobes to determine each class probability, and the output of each class is slightly higher in positive case for most of the classes as well. Some of the drawbacks are extremely high predictions for infiltration in most of the dataset images, getting stuck in regions outside the lung boundaries and predominantly in the corners, and missing some of the opacities particularly in the lower lobes. To overcome these issues and to force the model's attention to the correct regions of interest (ROI), we hereby introduce the COVID-CXNet. Our model is initialized with the pretrained weights from the CheXNet and retrained for 10 epochs. A dataset of 3628 images, 3200 normal CXRs and 428 COVID-19 CXRs, are divided into 80% as training-set and 20% as test-set. Batch size is set to 16, rather than 32 in previous models, regarding memory constraints. Grad-CAMs of the COVID-CXNet for random images are plotted in Fig.18 . Table 3 is the confusion matrix of the proposed model. Proposed CheXNet-based model is capable of correctly classifying images. In many cases, it can localize pneumonia findings more precise than the CheXNet. An example is illustrated in Fig. 19 . 19 shows a CXR with an infiltrate in the upper lobe of the left hemithorax [95] ; while CheXNet missed the region of pneumonia, proposed model correctly uncovered the infiltration area. One concern about the COVID-CXNet model results is in Fig.19 , where it has also pointed into other irrelevant image parts, even outside the lungs. The same problem happens when there are frequently-appeared texts and signs, such as dates, present in the image. Fig.20 shows how text removal can improve model efficiency. While text removal methods can be utilized to obviate the overfitting, we can simply force the model to just look into the lungs in order to address both problems in one effort. By segmenting lungs from CXRs, our solution will be preserved against pointless features that can seriously affect decision-making. To accomplish this task, a U-Net based segmentation illustrated in Fig. 9 is applied to the input images before image enhancements. Visualization results for COVID-CXNet with ROI-segmentation block is shown in Fig. 21 . Figure 21 : Grad-CAM visualization of the proposed model, trained with lung-segmented CXRs, over sample cases. A figure with more Grad-CAMs of the model is attached in the Appendix B. From Fig. 21 it can be observed that COVID-CXNet fed by ROI-segmented images has delivered superior performance regarding localization of pneumonia x-ray features. Worthwhile to mention that image augmentation is expanded by adding zoom-in, zoom-out, and brightness adjustment. Beside, since wrong labeling is probable in the dataset, label smoothing is further applied to the loss function. Label smoothing is a regularization method which prevents the model from predicting the labels with extreme confidence by changing one-hot encoded labels [96] . Proposed method has shown a negligible drop in metric scores; accuracy is decreased by 0.42% and f-score is declined by 0.02. This decrease is a result of training with a larger dataset and accurately segmented ROIs, which means it has become more robust against unseen samples. There is a trade-off between catching good features and higher metric scores; while better features result in more generalized model, high metric scores may indicate overfitting. Moreover, in some cases the designed model has missed lung abnormalities. Hence, while it could be used as a promising decision support tool to help radiologists detect pneumonia, COVID-CXNet can not be solely relied for medical diagnosis. As an extra step, we expanded COVID-CXNet for multiclass classification between normal, COVID-19 pneumonia (CP), and non-COVID pneumonia samples to examine its performance regarding differentiation between two types of pneumonia. CP is often appeared with bilateral findings, whereas non-COVID pneumonia, or CAP as defined in section 5, mostly have unilateral consolidations. Since the majority of normal class and CAP class images are collected from NIH CXR-14 dataset, a histogram matching is applied to all images in order to adjust histograms according to a base image. Output layer of COVID-CXNet is changed to have three neurons with SoftMax activation function. Due to the fact that two classes here have high overlap, fine-tuning is performed longer with more epochs (30 vs 10). Confusion matrix of multiclass model is shown in Table 4 . Accuracy score is 94.20%, with a f-score of 0.86 and 0.83 for CP and CAP classes, respectively. In a number of cases, especially in first stages of virus progression, CP has unilateral findings. Also, CAP may cause bilateral consolidations. Therefore, some cases with rare findings are expected to be erroneously classified between CP and CAP by the model. From the confusion matrix it could be seen that a relatively high number of images are misclassified between CAP and normal. A potential reason for this issue is considered to be related to wrong labeling by the dataset provider. Another presumption is that in some CAP CXRs are from patients with early-stage disease development. To confirm the model performance, Grad-CAMs for several images are plotted in Fig. 22 Throughout this study, several model architectures are introduced and applied on different amounts of images. Bias to pediatric CXRs and lung segmentation module are also addressed in different models. A comparison between these models is shown in Table 5 . Accuracy score ranges are achieved by running models for 10 different times. With the expansion of the dataset, not only confidence intervals shrink, but also metric scores slightly decline while pneumonia symptoms localization improves, as it can be seen in Appendices. Furthermore, proposed model is compared to other research studies discussed in the section 2 regarding several criteria, such as dataset size and f-score. The comparison is illustrated in Table 6 . Pediatric Bias: Small number of COVID-19 CXRs could be alleviated by collecting a large number of images from other classes. However, images of other classes must be of different sources in order to prevent overfitting to images of a single dataset. Moreover, many primary research papers have benefited from databases built upon the proposed dataset by [77] . Children's chests have different anatomy compared to those of adults. Hence, developed models based on normal pediatric and adult pneumonia cases are highly vulnerable of "right decision with wrong reason" problem. Besides, previous studies have proved that using various datasets containing images from different hospitals improves the pneumonia-detection results [98] . To prevent the pediatric bias, we not only collected normal CXRs from different sources but also meticulously filtered images of [77] to exclude cases with smaller lungs. Furthermore, COVID-19 pneumonia CXR images were collected from 9 different sources to enhance the model performance in terms of cross-dataset robustness. In the future, other information regarding patient status can be used alongside the X-ray scan. Clinical symptoms can remarkably help radiologists in differential diagnosis of the COVID-19 in admitted patients. Since the diagnosis is not solely made by CXR, symptoms could also be added to the images in form of the metadata. Metadata could be concatenated with the input CXR and be fed into the model to help increase its decision certainty. Providing the metadata to the model, it is possible to have more detailed predictions, e.g. the chance of patient survival, based on clinical symptoms and the severity of pneumonia features presented by CXR. Metrics: In section 4.1 we introduced a simple base model consisting of multiple 2D convolution blocks, followed by its metric scores and Grad-CAM visualization results on different volumes of input data provided in section 5.1. The purpose behind this model was to show how a simple model could achieve very high accuracy scores, perfect learning curves, and also AUCs. Digging into model explainability, model revealed wrong features responsible for its excellent metric scores. Therefore, high accuracy scores obtained by sophisticated models learning from small datasets of CXRs, which have high texture complexity, are tricky. Investigation of model performance based upon confusion matrices and accuracy scores could not be usually validated unless demonstrating appropriate localization of imaging features. Transfer Learning: Using pretrained models with ImageNet weights, some studies such as [35] showed acceptable heatmaps, but only a few images were visualized. While these pretrained models may help, having small dataset sizes suggested us to fine-tune models previously trained on similar data. The CheXNet-based COVID-CXNet promising results indicated better performance over ImageNet-pretrained models, while not being stuck in problems like overparameterization. Besides, lung segmentation was also performed by a U-Net based architecture previously trained on similar AP/PA-view CXRs. Among studies conducted, there was only one article to use CheXNet as its backbone [37] , which applied the model on a fewer number of images and without lung segmentation as its image preprocessing procedure. CheXNet: CheXNet is trained on a very large dataset of CXRs and has been used for transfer learning by some other thoracic disease identification studies [99, 100] . However, it has its own deficiencies, such as individual sample variability as a result of data ordering changes [101] and vulnerability to adversarial attacks [102] . Enhancing thoracic abnormality detection in CXRs using CheXNet requires the development of ensemble models which is currently prone to overfitting due to the number of images from COVID-19 positive patients. In the future studies, ensemble learning can be considered as an efficient way to benefit from multiple models for classification. One of the most popular and efficient methods to develop CNN-based networks is to concatenate features extracted by different networks and applying a set of voting layers to find the best configuration of features extracted by backbone networks to reach high accuracy scores. Ensemble models could be implemented having larger datasets of COVID-19 positive CXRs to improve feature extraction and reduce model variability. In this paper, we firstly collected a dataset of CXR images from normal lungs and COVID-19 infected patients. The constructed dataset is made from images of different datasets from multiple hospitals and radiologists, and is the largest public dataset to the best of our knowledge. Next, we designed and trained an individual CNN and also investigated the results of ImageNet-pretrained models. Then, a DenseNet-based model is designed and fine-tuned with weights initially set from the CheXNet model. Comparing model visualization over a batch of samples as well as accuracy scores, we denoted the significance of Grad-CAM heatmaps and its priority to be considered as the primary model validation metric. Finally, we discussed several points like data shortage and the importance of transfer learning for tackling similar tasks. A final CP class f-score of 0.94 for binary classification, and 0.85 for three-class classification are achieved. The proposed model development procedure is visualization-oriented as it is the best method to confirm its generalization as a medical decision support system. Coronavirus disease 2019 (covid-19): situation report, 51 Clinical characteristics of coronavirus disease 2019 in china The neuroinvasive potential of sars-cov2 may be at least partially responsible for the respiratory failure of covid-19 patients Essentials for radiologists on covid-19: an update-radiology scientific expert panel Reverse transcription-polymerase chain reaction Performance of radiologists in differentiating covid-19 from viral pneumonia on chest ct There is only one way to know if you have the coronavirus, and it involves machines full of spit and mucus A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-ncov) infected pneumonia (standard version) Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases The role of chest imaging in patient management during the covid-19 pandemic: a multinational consensus statement from the fleischner society False negative chest x-rays in patients affected by covid-19 pneumonia and corresponding chest ct findings Radiation dose in computed tomography Overview of deep learning in medical imaging Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: Covid-19 case study Covid-19 diagnosis prediction by symptoms of tested individuals: a machine learning approach Detection of covid-19 infection from routine blood exams with machine learning: a feasibility study Coronavirus (covid-19) classification using ct images by machine learning methods Automatic x-ray covid-19 lung image classification system based on multi-level thresholding and support vector machine Social-group-optimization assisted kapur's entropy and morphological segmentation for automated detection of covid-19 infection from computed tomography images Ai based chest x-ray (cxr) scan texture analysis algorithm for digital test of covid-19 patients Ikonos: An intelligent tool to support diagnosis of covid-19 by texture analysis of x-ray images Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct Lung infection quantification of covid-19 in ct images with deep learning Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis A web-based diagnostic tool for covid-19 using machine learning on chest radiographs (cxr) Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks A new modified deep convolutional neural network for detecting covid-19 from x-ray images Convolutional sparse support estimator based covid-19 recognition from x-ray images Deep-covid: Predicting covid-19 from chest x-ray images using deep transfer learning Prognet: Covid-19 prognosis using recurrent and convolutional neural networks," medRxiv Robust screening of covid-19 from chest x-ray via discriminative cost-sensitive learning Covid-19 screening on chest x-ray images using deep learning based anomaly detection Covid-mobilexpert: On-device covid-19 screening using snapshots of chest x-ray Iteratively pruned deep learning ensembles for covid-19 detection in chest x-rays Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images Covidaid: Covid-19 detection using chest x-ray Overview of current lung imaging in acute respiratory distress syndrome The forgotten view: Chest x-ray-lateral view Diagnostic Imaging: Emergency, ser. Diagnostic Imaging. Amirsys Radiopaedia open-edit radiology resource Sirm covid-19 database Available: https://eurorad.org [44] image-based social forum covid-19 clinical cases Covid-19 image data collection Twitter covid-19 cxr dataset Portable chest x-ray in coronavirus disease-19 (covid-19): A pictorial review Severe acute respiratory disease in a huanan seafood market worker: Images of an early casualty A case of covid-19 and pneumonia returning from macau in taiwan: Clinical course and anti-sars-cov-2 igg dynamic Atypical presentation of covid-19 in a frail older person Contribution of interventional radiology to the management of covid-19 patient Covid-19 tsunami: the first case of a spinal cord injury patient in italy Chest x-rays findings in covid 19 patients at a university teaching hospital-a descriptive study Early experience of covid-19 in two heart transplant recipients: Case reports and review of treatment options Complicated myocardial infarction in a 99-year-old lady in the era of covid-19 pandemic: from the need to rule out coronavirus infection to emergency percutaneous coronary angioplasty Novel coronavirus (covid-19) infection: What a doctor on the frontline needs to know Radiological findings in patients with covid-19 Diagnostic performance of chest x-ray for covid-19 pneumonia during the sars-cov-2 pandemic in lombardy, italy The lungs before and after covid-19 pneumonia Chest x-ray of covid-19 infection in nakornping hospital Temporary conciliation methodical recommendations of the russian society of radiologists and radiologists (popp) and the russian association of ultrasound diagnostics in medicine (rasudm) "methods of radiation diagnosis of pneumonia in the new coronavirus infection covid-19 Acute myocardial infarction due to coronary stent thrombosis in a symptomatic covid-19 patient Progression of cxr features on a covid-19 survivor Fatal sars-cov-2 coinfection in course of ebv-associated lymphoproliferative disease Subacute perimyocarditis in a young patient with covid-19 infection Covid-19 in critically ill patients in the seattle region-case series Clinical features, laboratory characteristics, and outcomes of patients hospitalized with coronavirus disease 2019 (covid-19): Early report from the united states Covid-19 pneumonia: phenotype assessment requires bedside tools Radiographic findings in 240 patients with covid-19 pneumonia: time-dependence after the onset of symptoms A coronavirus disease 2019 (covid-19) patient with multifocal pneumonia treated with hydroxychloroquine Coronavirus disease 2019 (covid-19) complicated by acute respiratory distress syndrome: An internist's perspective Chest x-ray findings and visual quantitative assessment of covid-19 pneumonia Clinico-radiological evaluation of covid-19 pneumonia and its correlation with usg chest: Single centre study at sms hospital, jaipur Covid-19 image repository The radiologist page on instagram The radiology case reports on instagram Identifying medical diagnoses and treatable diseases by image-based deep learning Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Two public chest x-ray datasets for computer-aided screening of pulmonary diseases Adaptive histogram equalization and its variations Image enhancement using bi-histogram equalization with adaptive sigmoid functions Beasf-based image enhancement for caries detection using multidimensional projection and neural network Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Learning to diagnose from scratch by exploiting dependencies among labels Jointly learning convolutional representations to compress radiological images and classify thoracic diseases in the compressed domain U-net: Convolutional networks for biomedical image segmentation Chest x-ray analysis of tuberculosis by deep learning with segmentation and augmentation Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration Naming the coronavirus disease (covid-19) and the virus that causes it Community-acquired pneumonia Cyclical learning rates for training neural networks Grad-cam: Visual explanations from deep networks via gradient-based localization why should i trust you?" explaining the predictions of any classifier Transfer learning in medical image classification: Challenges and opportunities Importation and human-to-human transmission of a novel coronavirus in vietnam Rethinking the inception architecture for computer vision Detection of coronavirus (covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset Pneumonia detection using deep learning approaches Classification of abnormality in chest x-ray images by transfer learning of chexnet tchexnet: Detecting pneumothorax on chest x-ray images using deep transfer learning Individual predictions matter: Assessing the effect of data ordering in training fine-tuned cnns for medical imaging Adversarial attacks against medical deep learning systems