key: cord-0638578-oyzuuad2 authors: Sayyed, A.Q.M. Sazzad; Saha, Dipayan; Hossain, Abdul Rakib title: CovMUNET: A Multiple Loss Approach towards Detection of COVID-19 from Chest X-ray date: 2020-07-28 journal: nan DOI: nan sha: 9cfe8e1b8709affe8eddd5f41d2a6ad765f8a55a doc_id: 638578 cord_uid: oyzuuad2 The recent outbreak of COVID-19 has halted the whole world, bringing a devastating effect on public health, global economy, and educational systems. As the vaccine of the virus is still not available, the most effective way to combat the virus is testing and social distancing. Among all other detection techniques, the Chest X-ray (CXR) based method can be a good solution for its simplicity, rapidity, cost, efficiency, and accessibility. In this paper, we propose CovMUNET, which is a multiple loss deep neural network approach to detect COVID-19 cases from CXR images. Extensive experiments are performed to ensure the robustness of the proposed algorithm and the performance is evaluated in terms of precision, recall, accuracy, and F1-score. The proposed method outperforms the state-of-the-art approaches with an accuracy of 96.97% for 3-class classification (COVID-19 vs normal vs pneumonia) and 99.41% for 2-class classification (COVID vs non-COVID). The proposed neural architecture also successfully detects the abnormality in CXR images. COVID-19 (Coronavirus Disease 2019), an infectious disease caused by a new strain of the coronavirus family reported first in Wuhan city of Hubei province of China on December 31, 2019, is the biggest challenge the world has faced in the 21st century [1] . After only three months of the first reported case, WHO declared COVID-19 as a global pandemic on March 11, 2020 [2] . Like other diseases caused by the coronavirus family such as SARS and MERS, COVID-19 affects the respiratory system of the human body. Infected patients get symptoms like fever, sore throat, coughing, loss of smell and taste, tiredness, etc. In severe cases, patients feel breathing difficulties, chest pain, and kidney failure which can result in the death of the patient [3] . As of July 13, 2020, 13,876,441 COVID-19 cases were reported officially throughout the world with 593,087 deaths [4] . The rate of newly infected patients is not decreasing even after 6 months of the pandemic. As the vaccine of the disease is still not found, a large number of rapid testing and isolation of the affected people from the unaffected are the major solutions given by the experts. Therefore, the detection of coronavirus infected people has become the priority to combat the COVID-19 pandemic. Currently, most countries are following real-time reverse transcription polymerase chain reaction (RT-PCR) as the testing technique for COVID-19 diagnosis, which is regarded as the gold standard. However, RT-PCR has some shortcomings like potentially high false-negative rate which is very alarming and unwanted to combat COVID-19 [5] . Moreover, the procedure requires specialized laboratory facilities in compliance with biosafety level 2 or above and highly skilled medical persons to handle the samples carefully as results vary with the quality of samples [6] . In addition to that, the whole process is time-consuming and not easily available in low and middleincome countries. Some rapid test kits were also proposed by researchers but did not get recommendation from experts due to low sensitivity and accuracy [7] . Among other detection techniques, Chest X-ray (CXR) and Computed Tomography (CT) are showing good prospects in recent research [8, 9] . Both CT scan and CXR show identifiable abnormalities for COVID-19 cases. As biomarkers for COVID-19 in CT scan, Bernheim et al. [10] reported bilateral and peripheral ground-glass opacities (GGO) and consolidative pulmonary opacities along with a rounded morphological and peripheral lung distribution at times. GGO and consolidation with or without vascular enlargement, air bronchogram sign, interlobular septal thickening are found to be common observations in COVID-19 CT scans by Li et al. [11] . As additional patterns observed in some CT scans, Ye et al. [12] included reticular pattern and crazy paving pattern. That is why CT scan was widely used in China and Turkey when there was a shortage of test kits available. However, CT scan detection technique also comes with some disadvantages. High cost, high exposure to radiation, the high health risk of health technicians, and unavailability in remote areas are some notable drawbacks of CT scan-based detection methods. [1] Studies have shown that Chest X-ray shows signs of atypical pneumonia characteristic of COVID-19 infection [1] . Figure 1 exhibits how such patterns appear in the CXR of a COVID-19 infected patient. CXR based detection technique has some advantages like being faster, cheaper, and exposed to less radiation for the patients. However, it has been reported that the sensitivity of the CXR based method is lower than that of the CT-based method. Improving the sensitivity of the CXR based method through a robust algorithm can make it a viable, more acceptable, and more accessible COVID-19 detection method than the CT scan-based technique. Deep learning which is a subdomain of machine learning that has been successfully used in biomedical image processing. Different deep learning architectures have already been used in brain disease classification [13] , lung cancer diagnosis [14, 15] , arrhythmia detection [16, 17] , skin cancer classification [18] , pneumonia detection [19] , breast cancer classification [20] , gastrointestinal diseases detection [21] , Parkinson's disease detection [22] and many other disease detection. Such success of deep learning algorithms in disease detection suggests that they have the potential to identify biomarkers in medical images. As COVID-19 manifests itself in Chest X-rays through patterns working as a unique biomarker, integrating deep learning in computer-aided detection of COVID-19 from CXR images can provide a good performance boost. U-Net, a deep learning architecture proposed by [23] , has been widely used in biomedical image segmentation such as liver and tumor segmentation [24, 25] , spine disease diagnosis [26] , retinal vessel segmentation [27] , skin lesion segmentation [28] etc. Ronneberger et al. [23] suggested that U-Net outperforms state of the art in image segmentation when a low number of images are available in the dataset. On the other hand, since COVID-19 is a recent incident, only small datasets of Chest X-ray images are publicly available, U-Net can be a suitable architecture to extract features from images in the end-to-end method. In this work, a modified U-Net based architecture with multiple loss optimization to detect COVID-19 from Chest X-ray image. The specific contributions of this work are three-folds: • Development of a multiple loss based modified U-NET architecture to detect COVID-19 cases from Chest X-ray images. • Extensive experimental analysis through the implementation of patient-wise data separation to classify Chest X-ray images into three classes (COVID-19 vs normal vs pneumonia cases) and two classes (COVID vs non-COVID cases) • Abnormality detection in Chest X-ray through two class (normal and abnormal cases) classification. The remainder of the manuscript is organized as follows. At first, Section 2 narrates the literature review. Later, Section 3 presents the proposed method in detail. In Section 4, the details of the dataset, experimental setup, evaluation criteria, and results with proper interpretation are provided. Finally, Section 5 concludes the article. Although in recent times researchers have been working relentlessly to find novel methods to detect COVID-19 cases using radiological images such as CXR and CT, the idea of using image processing based abnormality detection techniques is not neoteric. We investigated a plethora of published articles exploring such ideas. In this section, the existing literature on the detection of COVID-19 from CXR images is described first in Section 2.1. Later, Section 2.2 explores the studies on lung abnormality detection from CXR images. Due to the lack of information about the novel Coronavirus, effective treatment is still elusive to the experts. Prevention of the spread of the virus through social distancing, fast detection, and quarantining has been the only way to fight against the virus until now. As a result, researchers are exploring the field of rapid and safe detection of infection from the virus where chest X-ray has proved to be a promising option. A good number of research works have been published utilizing this modality of detection. Vaid et al. [29] proposed a Convolutional Neural Network (CNN) based model with transfer learning approach to classify COVID-19 and normal cases from CXR images. A trainable multilayer perceptron was stacked on top of a modified version of the VGG-19 model initially trained on the ImageNet dataset. They experimented on 181 COVID-19 cases and 364 normal cases coming from datasets developed by JP Cohen [30] and Wang et al. [31] , respectively. This model obtained 96.3% two-class classification accuracy. Das et al. [32] proposed Truncated InceptionNet which is the modified version of InceptionNetV3. They adopted a transfer learning approach by initializing the network with weights pretrained on ImageNet. The model contains 2.1 million parameters in total. Dataset of JP Cohen was used along with the 5863 CXRs (1583 healthy and 4280 viral and bacterial pneumonia cases) from the CXR collection [33] . In the 2-class classification problem (COVID vs non-COVID case), the average accuracy and F1 score achieved by the model are 98.77% and 97%, respectively. Apostolopoulos et al. [34] compared the performance of five different CNN models: VGG19, MobileNetV2, InceptionNet, XceptionNet, Inception-ResNet-V2 using transfer learning approach. It was reported that VGG-19 outperformed other models. In total, 224 images with confirmed COVID-19, 700 images of bacterial pneumonia, and 504 images of the normal condition were used. VGG-19 obtained 98.75% and 93.48% accuracy in 2-class (COVID vs non-COVID cases) and 3-class (COVID vs pneumonia vs normal) classifications, respectively. Oh et al. [35] proposed a different patch-based detection method using FC-DenseNet103 and ResNet18. As COVID-19 makes substantial changes in the lung and heart portion of CXR, FC-DenseNet103 was used to extract that portion. ResNet18 used as the classifier network which was pretrained on ImageNet and fed by patched images provided by the earlier segmentation network. L1 regularization and weight decay were applied to compensate for the overfitting problem. The proposed method provided 88.9% accuracy and 83.4% precision for 4-class classification (normal, bacteria, tuberculosis, virus /COVID-19). Mahmud et al. [36] proposed CovXNet which is built using depthwise convolutions with varying dilation rates. A residual unit and a shifter unit were proposed as the building blocks of the CovXNet. The architecture utilized images of different resolutions to train separate models, predictions of which were later optimized using a stacking algorithm with a meta-learner. A collection of a total of 5856 images consisting of 1583 normal X-rays, 1493 non COVID viral pneumonia, and 2780 bacterial pneumonia from Guangzhou Medical Center, China, was used [33] . Another database from Sylhet Medical College was used which contained 305 X-rays of different COVID-19 patients. Finally, a small database of 305 images from each class was created, which was used for the reporting of metrics. The rest of the images were used for the pretraining phase. The method showed an accuracy of 97.4% and 89.6% in 2-class (COVID vs normal cases) and 3class (COVID vs viral vs bacterial pneumonia cases) classifications, respectively. Khan et al. [37] proposed CoroNet which used Xception CNN as the base model. CoroNet used weights pretrained on ImageNet dataset and had 33969964 parameters in total, among which 33915436 were trainable and the rest were non-trainable. They used undersampling to approximately balance the classes. The model showed 89.6%, 95%, and 99% accuracy in 4-class (COVID vs normal vs viral vs bacterial pneumonia cases), 3-class (COVID vs normal vs pneumonia) and 2-class (COVID vs normal cases) classifications, respectively. Ozturk et al. [38] proposed DarkCOVIDNet which was inspired by the DarkNet-19 model. The proposed model contained 1164434 parameters. They combined 127 COVID positive cases obtained from the dataset of JP Cohen [30] with 500 no-finding and 500 pneumonia cases which were randomly chosen from the database of Wang et al. [31] . The model achieved an accuracy of 98.08% for binary classification (COVID vs no-findings cases) task and 87.02% for 3-class classification (COVID vs no findings vs pneumonia cases). Wang et al. [39] proposed a deep CNN-based model named COVID-NET, which was pretrained on ImageNet before training on the Chest X-ray dataset. The data was augmented before entering the training session. Dataset is composed of 13,975 CXR images collected from 13870 patients. Their experiment involved 358, 8066, and 5538 CXR images belonging to COVID-19 patients, normal cases, and non-COVID pneumonia patients, respectively. For 3-class classification, COVID-NET showed an accuracy of 93.3%. Furthermore, the positive predictive value (PPV) was higher for COVID-19 cases (98.9%), which indicated that there would be very few false-positive predictions. Hemdan et al. [40] proposed COVIDX-NET framework comparing the performance of VGG19, DenseNet201, InceptionV3, ResNetV2, InceptionResNetV2, Xception, and MobileNetV2 deep learning architectures. Using deep learning classifier, images were categorized into two cases: normal and COVID-19. Dataset developed by JP Cohen was used. It consisted of 50 X-ray images which were divided into 2 classes: 25 normal X-rays and 25 COVID-19 positive X-rays. Among all the architectures, VGG19 and DenseNet201 gives the highest accuracy of 90%. Hall et al. [41] used VGG16 and ResNet50 with weights pretrained on the Imagenet dataset to predict COVID-19. Dataset used in the paper consisted of CXR images coming from 135 COVID-19 cases and 320 pneumonia cases. A balanced training dataset was formed for experimentation by selecting 102 COVID-19 cases and 102 randomly selected pneumonia cases. 10-fold crossvalidation was performed which gave an overall accuracy of 89.2% and AUC of 0.95. Afshar et al. [42] proposed lightweight COVID-CAPS which contained 3 capsule layers and 4 convolutional layers. The authors adopted an end-to-end training without data augmentation. The model had 295,488 trainable parameters. The model achieved a 2-class classification accuracy of 95.7% without pretrained weights and 98.3% with pretrained weights on a dataset developed by JP Cohen [30] and P. Mooney [43] . Minaee et al. [44] trained four deep convolutional networks: ResNet18, ResNet50, SqueezeNet, and DenseNet-161, all pretrained on ImageNet. A dataset of total 5071 chest X Ray images was used for training and validation (2031 for training and 3040 for testing), selectively combining the dataset of JP Cohen and ChexPert [45] dataset. COVID-19 infected X-rays were augmented to increase the number of images. The best performance was shown by SqueezeNet with sensitivity 97.5% ± 4.8% and specificity 97.8% ± 0.5% for 2 class classifications (COVID vs non-COVID cases). Yao et al. [46] proposed GeminiNet which was based on a region-based fully convolutional network (R-FCN). In GeminiNet, at first feature extraction was done by DetNet, RFBNet, and RPN networks. Secondly, PSRoI pooling and RoI Align were used to map the extracted features according to the size of the feature graph. A softmax layer was added in the end as the activation function. The dataset from RSNA was used here, which contained 15000 confirmed cases of pneumonia, 7500 pathologies different from pneumonia, and 7500 normal cases. Data augmentation was done to increase the number of data. The model provided average precision (AP) of 68.32% for threshold IOU 0.4 during segmentation of the abnormal portion of lungs. Chouhan et al. [47] proposed a novel ensemble approach with transfer learning using five different neural networks: AlexNet, DenseNet121, InceptionV3, ResNet18 and GoogLeNet. Image preprocessing and augmentation were done before training the neural networks. The networks were pretrained on ImageNet. The dataset contained a total of 5232 images, among which 1346 were from normal cases, are the rest are from bacteria and virus pneumonia patients. A total of 1248 images were used for testing while others were used for training. The ensemble approach achieved 96.39% accuracy and 93.28% precision in 2-class classification (normal vs pneumonia). Hashmi et al. [48] proposed a weighted classifier-based method. The data were preprocessed and augmented before entering into the fine-tuning block which contained ResNet18, DenseNet121, Inceptionv3, Xception, MobileNetV2. After that, weights were applied to every model and optimized for 1000 iterations to make the classification error minimum. 5836 images were used, among which 5136 and 700 images were used for training and test, respectively. The weighted classifier showed 98.43% accuracy and 98.26% precision in 2-class classification (normal vs pneumonia). ChexNet model was proposed by Rajpurkar et al. [49] , which was a 121-layer dense convolutional network. This model was pretrained on the ImageNet dataset before training on chest x ray-14 dataset proposed by Wang et al. [31] . The model detected all 14 diseases available in the dataset with an F1-score of 43.5%, which was higher than average radiologists. In this study, a multiple loss based deep neural network named CovMUNET is introduced as the proposed model. In this section, first CovMUNET is described in detail in Section 3.1. Later, Section 3.2 explains the loss function. Afterwards, optimization technique and parameter settings are narrated in Section 3.4 and Section 3.5, respectively. The proposed CovMUNET architecture has two data branches. The long branch named 'Reconstruction Branch', is inspired by U-Net architecture and it attempts to reconstruct the input image through encoder and decoder. The short branch, named 'Classification Branch' is an encoder with a classifier stacked on top of it. The two branches calculate two different losses and the model optimizes the combined loss. The idea to design such a model is to learn better feature maps by reconstructing the input image and with the help of such learning, classify the CXR images more precisely. The motivation here is to use autoencoder networks as an auxiliary to the classification branch to help learn better features faster. By optimizing multiple losses together, we can ensure that the features learned are good for classification and autoencoder loss helps the model learn faster. In total, there exist 12 stages in CovMUNET architecture. The 'Reconstruction Branch' is defined by the first 10 stages of the proposed network, where the initial four stages of this branch are identical. Each of these four stages starts with a block operation that consists of a depthwise separable convolution followed by ReLU activation function and batch normalization. Later, a max-pooling operation is performed to reduce the dimensions of feature maps by selecting dominant features and help reduce computational cost. As the name implies, the kernel used for the depthwise separable convolution can be separated into two different kernels. One kernel is used for depthwise convolution and another does pointwise convolution. The major advantage that inspires this work to use such convolution instead of traditional one is its ability to run the network faster with less computational cost and complexity. (1) : (1) → (1) be the block operation at the 1 st stage, where (1) ∈ ℝ 1 . Assume that (1) represents the i th feature map of the input to the network at the 1 st stage, where The output of such depthwise convolution (1) is given by where *, (1) and (1) refer to the convolution operation, weight matrix and bias terms, respectively. The subscript and superscript denote the feature map number and stage number periodically. In pointwise convolution, 3 (1) number of 1×1 kernels iterate through every single point of (1) for stride, 1 (1) =1. (1) [ , ] represents the (p,q) point of i th feature map of pointwise convolution is given by (1) where (1) refers to the weight matrix for pointwise convolution. Later, the ReLU activation function adds nonlinearity to the model and batch normalization speeds up the training process and helps to avoid over fitting. The output (1) of the batch normalization layer is given by (3), respectively. where (• ,0) and (•) represent ReLU activation function and batch normalization operation, periodically. Finally, in the 1 st stage, the max-pooling layer reduces the dimensionality of the feature maps. The output of the max-pooling layer is given by (1) [ , ] = 1 ∈{0, 1 (1) −1}, 1 ∈{0, 2 (1) −1} (1) [ + 1 , where 1 (1) × 2 (1) represents the kernel size of the max-pooling layer. The operations at the next three stages of the network are similar to the operation at the 1 st stage in terms of functionality. In each of these stages, the output of the max-pooling operation of the previous stage is fed into the input at the current stage for block operation. The outputs at the 2 nd , 3 rd , and 4 th stages can be expressed by (5) -(6) where j indicates the stage number and ∈ {2,3,4}. In the 5 th stage, three operations are performed sequentially: block operation, transposed convolution, and concatenation. The output for the block operation in the 5 th stage can be calculated by (7). (5) : (4) → (5) To explain transposed convolution, let i and o be the flattened versions of the input matrix I ∈ ℝ 1 × 2 and the output matrix O ∈ ℝ 1 ′ × 2 ′ of a normal convolution operation where 1 > 1 ′ and 2 > 2 ′ . Such convolution operation can be written by Where W is the learnable weight matrix. Based on this matrix representation, the transposed convolution can be written by where ′ ∈ ℝ 1 × 2 ×1 , ′ ∈ ℝ 1 ′ × 2 ′ ×1 and are the transposed matrix of W. A transposed convolution operation with 3 (5) kernels of 1 (5) × 2 (5) size is performed on (5) . Later, a concatenation layer later merges the feature maps of (5) and (4) along their depth. The outputs of these layers are given be (10) - (11) . Where ∪ and (5) denote concatenation operation and the output of concatenation layer at the 5 th stage, respectively. ′ (5) and ′ (5) represent the flattened versions of (5) and (5) , periodically. Similarly, in the next three stages, block operation, transposed convolution, and concatenation are performed sequentially. In concatenation layer, features of ( ) and (9− ) are merged. The outputs of the 6 th , 7 th and 8 th stages can be expressed by Where s indicates the stage number and ∈ (6,7,8). In the 9 th stage, there is only a block operation which converts (8) to (9) . Later, in the 10 th stage, a traditional convolution with a single kernel of 1 × 1 size is performed. Afterwards, a sigmoid activation function outputs a feature map (10) . These operations are given by (9) : (8) → (9) Where (•) indicates sigmoid activation function. (10) , the output of the reconstruction branch, is the reconstructed CXR image of the proposed network architecture. The 'Classification Branch' starts from the 11 th stage. At this stage, a global average pooling layer averages the features of each feature map of U (5) . A fully connected layer with 1 (11) neurons with 'ReLU' activation is connected next to the global average pooling layer. The outputs of these layers are defined by (17) Where | (5) | indicates the area of the j th feature map of U (5) and (11) is the flattened version of G 11 . In the final stage, a fully connected layer with neurons and softmax activation function results in class prediction probability ̂, given by (12) ̂= (12) ∑ (12) Where N is the total number of classes and (12) is the value of the neuron representing class " ". Figure 2 provides a stick diagram of the proposed network architecture where block operation, max-pooling layer, transposed convolution, concatenation, convolution, global average pooling layer, fully connected layer and softmax operation are represented by rectangle, converging trapezoid, shaded rectangle, emerald shape, straight baguette shape, diamond enclosed rectangle shape, stripped rectangle, and solid rectangle, respectively. As the proposed model has a reconstruction branch and a classification branch, we use two separate loss functions for the two purposes. To ensure that the reconstructed image matches to the input image as closely as possible, we have used mean square error (mse) as reconstruction loss, which is denoted with in equation (22) . For faster convergence, the input images were normalized to have values between 0 and 1 and sigmoid activation was used to constrain the values of the reconstructed image in between 0 and 1. For the classification branch, categorical cross-entropy loss ( ) is used. The total loss ( ) is obtained by linearly combining the two losses through an "amalgam coefficient" ( ) as shown in equation (21) . The amalgam coefficient represents the contribution of the reconstruction loss in obtaining the total loss. The loss function is given by (21) Where (1) and (10) are the flattened input and output images. 1 (1) 2 (1) denote the dimensions of the input image. The symbols and ̂ denote the ground truth and the predicted class probability of an image. To find the optimal weights, Adam Stochastic optimization algorithm [50] is applied. The update rule utilizes the first and second moments ( and respectively) of the gradient of the loss function ( ) with respect to the weights ( ). At any given iteration , the equations describing the moments and the update rule are ( ) = Here, the parameters , 1 , 2 denote the learning rate, the decay value for the first moment and the decay value of the second moment, respectively. The values of the parameters are constrained as >0, 0< 1 <1, and 0< 2 <1. is a small value used for numerical stability. The parameters of the model were selected according to standard practice and later tuned to optimize the performance. The input to the network has a size of 128 128 1. Strides g of all convolutional layers and max-pooling layers were set to 1. The strides for the transposed convolution layers were set to 2. All convolution layers used zero-padding to keep the dimensions of the feature maps unchanged. This is required to match the shape of the feature maps for concatenation operation. The output of the global average pooling layer has a dimension of 512. Moreover, 512 neurons were used in the fully connected layer before the final classification layer. Table 1 showcases the detail parameter settings at each stage. Additionally, size of input and output feature maps are mentioned in the table. In total, the model has 1,126,732 parameters. Extensive experiments are carried out in this study to ascertain the efficacy of the proposed algorithm. The experiments are conducted on a combined dataset containing CXR images belonging to three different classes: COVID-19, pneumonia and normal. Though COVID-19 detection is the prime focus of this work, since the dataset offers CXR images of pneumonia cases, also, the classification of normal and abnormal (COVID-19 and pneumonia) cases is performed in this work. In brief, the experiments performed in this work can be classified into two broad categories: (1) COVID-19 detection (2) Abnormality detection To avoid the randomness in results, 5-fold cross-validation is applied. The COVID-19 data separation based on the patients guarantees no bias in the results, which was considered in none of the previous works on COVID-19 detection. In this section, dataset and data separation are described at first in Section 4.1 and Section 4.2, respectively. Later, Section 4.3 and Section 4.4 describe the training scheme and evaluation metrics periodically. Finally, Section 4.5 presents the results of the experiments along with discussion. To the best of our knowledge, there is no single publicly available dataset which contains images of COVID-19 infected, pneumonia infected, and normal patients. Therefore, we combined datasets of JP-Cohen [30] and CXR dataset by Mooney et al. [43] from kaggle. As Cohen's dataset updates on a regular basis, the number of CXR images of COVID-19 cases varies time by time. We accessed this dataset on 26 June, 2020 and got in total 738 images belonging to various classes: COVID-19, SARS, MERS etc. Among those images, 417 images of AP (Anterior-posterior) and PA (Posterior-Anterior) CXR images of COVID-19 are used for experimentation. On the other hand, Mooney's dataset provides 5856 CXR images in total, among which 1583 are from normal cases and the rest belong to 4273 pneumonia cases. Figure 3 shows the distribution of COVID-19 cases in Cohen's CXR dataset. From the figure, it can be seen that many COVID-19 cases have appeared multiple times in the dataset. For example, there are two CXR images from each of the 60 COVID19 patients in Cohen's dataset. To the best of our knowledge, none of the previous works have addressed this multiple appearance while splitting data into train and test sets. However, random separation of data without addressing such multiple appearance of images from the same patients creates bias in the result since COVID-19 CXR images of similar pattern may exist both in train and test sets. In this work, we address this problem and separate the COVID-19 data based on the patients. To explain, we put all CXR images coming from a single COVID-19 patient either in the train set or test set. Such data separation guarantees that no CXR image from the same patient has been included both in the train and test set. Moreover, to avoid the randomness in the result, we adopt 5-fold cross-validation in this work. To implement 5-fold cross-validation, we divide the 255 COVID-19 cases of Cohen's dataset into 5 folds. Since Mooney's dataset does not give any patient information of the CXR images, we randomly split these data into 5 sections. In each iteration, we use 4 folds from each of both datasets as a train set and the rest are used as a test set. 10% of the training data are used for validation. The trainable parameters are optimized with Adam Stochastic optimization algorithm and minibatch optimization technique with batch size of 32. For the learning-rate, decay values for the first and second moments in (24)-(26) are set to 0.001, 0.9, and 0.999. To help the optimizer get out of plateaus, a learning rate decay routine is used. When the validation loss doesn't change for two consecutive epochs, the learning rate is halved. Tesla K80 GPU with 12 GB of RAM has been used for training the model. Due to the problem of class imbalance, we have used both accuracy and F1-Score to estimate and compare the performance of the proposed method with relevant works. We have adopted the microaverage method to get the final evaluation metric as this reflects the effect of class imbalance on the result. Accuracy denotes the percentage of samples correctly classified. This emphasizes on the capability of the model of being correct. On the other hand, F1-Score puts emphasis on the number of false positives and false negatives. Using both these metrics portrays a wholesome picture of the models performance. The evaluation metrics are defined as As previously mentioned, two types of experimentations are performed in this study. The results of the proposed method for COVID-19 detection and abnormality detection are described in Section 4.5.1 and Section 4.5.2, respectively. Since the dataset contains CXR images of three classes, two different classification scenarios are attempted to detect COVID-19 cases from CXR images. These two scenarios are: (a) COVID-19 vs pneumonia vs normal classification (b) COVID-19 vs non-COVID-19 classification As described in Section 3.2, the proposed method deals with a loss function combining two different losses. Loss function defined in equation (21) contains an amalgam coefficient which is not learnable during training. To inspect the performance of the proposed model with the variance of , experiments are carried out with different values of . Table 2 shows the value of accuracy of the proposed algorithm in scenario (a) when is varied from 0 to 1. When =0, the loss function has no reconstruction loss, which implies a single-loss algorithm. From Table 2 , it can be seen that highest overall accuracy 96.97% with standard deviation 0.39% is obtained when =0.3. Such a low standard deviation confirms the robustness of the proposed method. Careful inspection of Table 2 also indicates that the use of multiple losses increases the accuracy of the model by a significant margin. Hence, =0.3 is considered as the best amalgam coefficient for COVID-19 detection. Table 3 shows the foldwise detail results obtained by the proposed method in 3-class classification when =0.3. Class precision class recall, class F1-Score are mentioned in Table 3 to showcase the individual class performance. On the other hand, macro and micro F1-Scores are shown to evaluate the overall performance of the proposed model. From the table, it can be noticed that none of the individual class performance goes below 90% in any of the evaluation metrics, which indicates how good the individual class performance is. Even in fold 4, the precision value for COVID-19 detection is 100%, which suggests that none of the predicated COVID-19 cases is misclassified. Furthermore, the recall value for COVID-19 is higher than the other two classes. Such good performance in individual class performance of the proposed model is also reflected in macro and micro F1-Scores which are 96.65% ± 0.52% and 96.97% ± 0.39%, respectively. For better interpretation of the performance of the proposed method in 3-class classification, confusion matrices for five folds are shown in Figure 4 . From the confusion matrices, it is clear that the proposed method successfully predicts the COVID-19 cases. Only a few images are classified interchangeably between pneumonia and normal images. Comparison of different methods in 3-class classification is showcased in Table 4 in terms of accuracy. The network architectures and the dataset size of all works are also mentioned in the table. From the table, it can be found that the proposed CovMUNET shows the best performance among all the works. The proposed method has achieved 2.27% higher accuracy than the nearest competitor. This work also performs the experiment on the highest number of COVID-19 cases. Another important observation is the use of patient-wise data separation. Among the works mentioned in Table 4 , this study is the only work that avoids bias in result by separating the data in a proper manner. To perform COVID-19 vs non-COVID classification, normal and pneumonia cases from the dataset are combined. Table 5 shows the accuracy of the proposed network in the classification. It can be seen that each of the folds exhibits high accuracy value. The overall accuracy score is found to be 99.41% and the standard deviation is only 0.048%, which ascertains the statistical robustness of the model in detection of COVID-19 cases. To detect the abnormality in the CXR images, COVID-19 and pneumonia cases are combined and considered into a single class named 'abnormal cases'. The results of such experiments are mentioned in Table 7 . The table shows that the proposed network architecture performs almost equally well in all folds. The overall accuracy score is found 96.24% with a very low standard deviation of 0.165%. Such good performance shows that the proposed method can be adopted not only in COVID-19 detection but also to find out the abnormal cases from the chest X-rays. We compare the performance of our model in detecting abnormality in CXR images with two other recent methods in Table 8 . The proposed model and Chouhan et al. [47] shows almost similar accuracy. On the other hand, more than 2% higher accuracy is achieved by Hashmi et al. [48] . However, it should be brought into consideration that both [47] and [48] , actually classify normal and pneumonia cases. However, in this work, COVID-19 cases along with pneumonia cases are also considered as abnormal cases. Such inclusion of data implied variation of data in abnormal cases and it may affect the performance of the model. Moreover, it can be seen from the table that this study also deals with the highest number of CXR images among the methods. The world-wide pandemic has brought the humanity to a halt. To mobilize the world again, it is imperative that everyone follows the health and safety rules strictly and all infected persons are promptly identified and brought to isolation. With that in mind, computer-aided radiological image analysis -a proven method of effective assistance to medical practitioners, can be of aid in this pandemic too. The affordability and accessibility of X-ray make it an enticing modality for fast detection of coronavirus infected patients. The reported accuracy of 96.97% for 3-class (COVID vs normal vs pneumonia) and 99.41% for 2-class (COVID vs non-COVID) classification proves that CXR is a good contender as a detection method for patients infected with SARSCoV2. The multiple loss approach adopted in this paper proves to be more effective and robust than the other reported techniques. The proposed CovMUNET performs well even in a data imbalanced setting. In future, the effectiveness and robustness of the proposed model can be further validated on a bigger COVID-19 dataset when available. Moreover, the model architecture can be modified to include various established architectures in the encoder part to possibly utilize the power transfer learning. We intend to explore the effect of model that modifications in our future work. The CXR datasets that are used in this paper can be found from the reference [30] & [43] .The code for the model is available in the github repository: https://github.com/SazzadSayyed/CovMUNET First case of 2019 novel coronavirus in the united states World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19) A review of coronavirus disease-2019 (COVID-19) Coronavirus disease (COVID-19) Situation Report-180 Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19 Real-time RT-PCR in COVID-19 detection: issues affecting the results Performance of VivaDiag COVID-19 IgM/IgG rapid test is inadequate for diagnosis of COVID-19 in acute patients referring to emergency room department COVID-19 outbreak in Italy: experimental chest x-ray scoring system for quantifying and monitoring disease progression Sensitivity of chest CT for COVID-19: comparison to RT-PCR Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and management Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review Deep ensemble learning of sparse regression models for brain disease diagnosis Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning Using deep learning for classification of lung nodules computed tomography images DeepArrNet: An efficient deep CNN architecture for automatic arrhythmia detection and classification from denoised ECG beats Arrhythmia detection using deep convolutional neural network with long duration ECG signals Dermatologist-level classification of skin cancer with deep neural networks An efficient deep learning approach to pneumonia classification in healthcare Breast cancer detection using deep convolutional neural networks and support vector machines Detecting and locating gastrointestinal anomalies using deep learning and iterative cluster unification A deep learning approach for parkinson's disease diagnosis from EEG signals U-net: Convolutional networks for biomedical image segmentation H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes S3D-UNet: separable 3D U-Net for brain tumor segmentation IVD-Net: Intervertebral disc localization and segmentation in MRI with a multi-modal UNet Weighted Res-UNet for high-quality retina vessel segmentation Efficient skin lesion segmentation using separable-Unet with stochastic weight averaging Deep learning COVID-19 detection bias: accuracy through artificial intelligence COVID-19 Image Data Collection ChestX-Ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Truncated inception net: COVID-19 outbreak screening using chest Xrays Labeled optical coherence tomography (OCT) and Chest X-Ray images for classification COVID-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Deep learning COVID-19 features on cxr using limited training data sets CovXNet: A multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest x-ray images with transferable multireceptive feature optimization Coronet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images Automated detection of COVID-19 cases using deep neural networks with x-ray images COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images COVIDx-net: A framework of deep learning classifiers to diagnose COVID-19 in x-ray images Finding COVID-19 from chest x-rays using deep learning on a small dataset COVID-caps: A capsule network-based framework for identification of COVID-19 cases from x-ray images Kaggle chest c-ray images (Pneumonia) Dataset Deep-COVID: Predicting COVID-19 from chest x-ray images using deep transfer learning Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison GeminiNet: Combine Fully Convolution Network With Structure of Receptive Fields for Object Detection A novel transfer learning based approach for pneumonia detection in chest X-ray images Efficient pneumonia detection in chest xray images using deep transfer learning Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Adam: A method for stochastic optimization The authors declare no conflict of interest regarding the paper.