key: cord-0520226-tpng2uij
authors: Pawan, S J; Sankar, Rahul; Prabhudev, Amithash M; Mahesh, P A; Prakashini, K; Das, Sudha Kiran; Rajan, Jeny
title: MobileCaps: A Lightweight Model for Screening and Severity Analysis of COVID-19 Chest X-Ray Images
date: 2021-08-19
journal: nan
DOI: nan
sha: a2660823edca6637536b0a30d7adb54114f633ed
doc_id: 520226
cord_uid: tpng2uij

The world is going through a challenging phase due to the disastrous effect caused by the COVID-19 pandemic on the healthcare system and the economy. The rate of spreading, post-COVID-19 symptoms, and the occurrence of new strands of COVID-19 have put the healthcare systems in disruption across the globe. Due to this, the task of accurately screening COVID-19 cases has become of utmost priority. Since the virus infects the respiratory system, Chest X-Ray is an imaging modality that is adopted extensively for the initial screening. We have performed a comprehensive study that uses CXR images to identify COVID-19 cases and realized the necessity of having a more generalizable model. We utilize MobileNetV2 architecture as the feature extractor and integrate it into Capsule Networks to construct a fully automated and lightweight model termed as MobileCaps. MobileCaps is trained and evaluated on the publicly available dataset with the model ensembling and Bayesian optimization strategies to efficiently classify CXR images of patients with COVID-19 from non-COVID-19 pneumonia and healthy cases. The proposed model is further evaluated on two additional RT-PCR confirmed datasets to demonstrate the generalizability. We also introduce MobileCaps-S and leverage it for performing severity assessment of CXR images of COVID-19 based on the Radiographic Assessment of Lung Edema (RALE) scoring technique. Our classification model achieved an overall recall of 91.60, 94.60, 92.20, and a precision of 98.50, 88.21, 92.62 for COVID-19, non-COVID-19 pneumonia, and healthy cases, respectively. Further, the severity assessment model attained an R$^2$ coefficient of 70.51. Owing to the fact that the proposed models have fewer trainable parameters than the state-of-the-art models reported in the literature, we believe our models will go a long way in aiding healthcare systems in the battle against the pandemic.

The world is undergoing one of the greatest medical emergencies of all time. The outbreak of COVID-19 has perpetrated a massive blow on the health and economic conditions of millions across the globe. Viruses are said to be on the fence between living and non-living organisms. By themselves, they are not capable of doing anything, but once they enter into a living cell of a host, they can take control of the cell and replicate themselves, resulting in the damage of the host cell. Corona virus is an enveloped RNA virus, which is highly contagious [1] . The name corona is originated from Latin, meaning crown [2] . The virus possesses crown-shaped spikes giving the appearance of the Solar Corona. These viruses primarily infect birds and mammals, but their zoonotic nature can result in a cross-species infection leading to the formation of a dangerous variant infecting humans. Some of the previous versions of coronaviruses involved in human infection are Severe Acute Respiratory Syndrome-CoV of 2013 and Middle East Respiratory Syndrome-CoV of 2012 [3] . As per World Health Organization (WHO) guidelines, the terminology n-COVID-19 represents the novel Corona Virus Disease. Considering the outbreak affecting multiple countries in a repetitious fashion across the continent, WHO declared n-COVID-19 a pandemic on 11 March 2020. The predominant transmission mode of n-COVID-19 is through direct, indirect, or close contact with the infected people when their respiratory secretions are expelled as droplets when they cough, sneeze, sing or talk. Airborne transmission caused by the dissemination of droplet nuclei (aerosols) is evident in medical settings during aerosol-generating procedures and only hypothesized indoors with poor ventilation. Despite consistent evidence on the survival of SARS CoV2 on surfaces and contamination, the possibility of indirect transmission through fomites lacks specific reports. Other modes of transmission like faeco-oral, urine, serum, blood have not been proven yet [4] . The time from exposure to onset of symptoms may range from two to fourteen days. Common symptoms include cough, fever, shortness of breath, sore throat, fatigue, diarrhea, muscle pain, and abdominal pain. Most of the cases result in mild symptoms (above 80%), while some progress to viral pneumonia and multi-organ failure. Old age people and people with pre-existing medical conditions (such as asthma, diabetes, and heart disease) are found more vulnerable to becoming severely ill [5] . The high mutation rate and the asymptomatic nature of those infected inflate the drug/vaccine development complexity. Initially, the drugs such as chloroquine, hydroxychloroquine, remdesivir, dexamethasone were found effective in controlling the damages caused when the patient's immune system is affected, particularly those who need oxygen or ventilation support [6] . In early 2021, there was a significant surge in the research and development related to vaccine production to build antibodies to fight against the COVID-19. AstraZeneca, Pfizer, Sputnik, ZyCoV-D, Covieshield, and Covaxin are, to name a few, with high efficacy. However, as the COVID-19 keeps mutating into various other forms, it becomes more challenging for the vaccines to neutralize them.

The definitive method of diagnosing COVID-19 involves the usage of a swab from the nose and throat from the patient and performing a real-time Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test to detect the viral RNA strand [7] . In addition, FDA has permitted many laboratory-developed tests for emergency usage. Though these methods demonstrate high analytic sensitivity and specificity in ideal settings, the clinical performance depends on the type and quality of the specimen and the duration of illness at the testing time. However, these methods are subjected to high false negativity rates ranging from 5-40%. Nevertheless, the turnaround time varies from 15 minutes to 8 hours, depending on the test and laboratory workflow in a busy setting [8, 9] . Having understood the shortcomings of RT-PCR or laboratory testing and the highly contagious nature of COVID-19, there is a need for a test that can give a faster diagnosis and enable early isolation. Though less sensitive than CT, in the account of public health concern, chest X-Ray (CXR), when abnormal, can be an essential tool in suspecting a case of COVID-19. It also helps in triaging, risk stratification, early management, and monitoring the progress in moderate to severe cases while RT-PCR reports are awaited. Though the findings of CXR in COVID-19 are not specific, a few of the classical findings such as bilateral, basal, and peripheral ground-glass attenuation with occasional consolidation helps to differentiate it from non-COVID-19 pneumonia, and healthy cases. Fig. 1 depicts the sample Chest X-Ray images with varying severity levels. Despite its limited sensitivity in detecting early disease, nearly 70% of those requiring hospitalization have abnormal radiographs that progress to peak in about 10-12 days after symptom onset. Chest radiographic abnormalities are typically air space opacities (either ground-glass opacities or consolidation), which are most often bilateral and peripheral with lower lobe predominance. Thus, the analysis of the CXR as an alternative tool for diagnosing and managing COVID-19 has gained interest.

The accuracy of CXR diagnosis heavily relies on radiological expertise because of the complex patterns involved, which is a huge disadvantage because of the limited number of expert radiologists, particularly in developing countries. Automating the process can reduce the workload of the radiologists/pulmonologists and can provide fast and appropriate treatment to the patients. Apart from detecting the abnormality from the CXR, assessing the severity of infection based on the severity scoring metrics such as BRIXIA and RALE has also gained attention in the research community. Fig. 2 depicts the workflow of the proposed severity assessment model using COVID-19 CXR. Several automated methods were proposed in the literature for detecting CXR with COVID-19 abnormality due to the availability of datasets [11] [12] [13] [14] [15] [16] . However, there are minimal works been appeared in the literature discussing the severity assessment. The main contributions in this paper are summarized below:

• We introduce MobileCaps, a novel hybrid architecture that integrates MobileNetV2 and Capsule Networks for efficient classification of COVID-19 CXR images. The proposed architecture significantly reduces the number of trainable parameters while giving better performance as compared to state-of-the-art models.

• We demonstrate that the proposed model is highly generalizable as compared to other studies in the literature by extensively evaluating on the diverse COVIDx test set, which is an amalgamation of five publicly available datasets. We also evaluated the proposed model on two other RT-PCR confirmed data sets to demonstrate the generalizability and superiority of the proposed model over the state-of-the-art. The rest of the paper is organized as follows, Section 2 discusses the automated methods proposed in the literature for classifying CXR images of COVID-19 from non-COVID-19 Pneumonia and normal cases. The proposed method is outlined in Section 3. Experimental results and the discussions are presented in Section 4 and finally, we conclude the paper in Section 5.

Automated screening of COVID-19 using Chest X-Ray is gaining a lot of significance in the battle against the pandemic. Several studies have been proposed in the literature to develop a fully automated AI-based solution for screening COVID-19 CXR images. Automated screening techniques can be categorized into two categories based on the tasks: 1) Classification 2) Severity score prediction. Classification tasks can be further classified into Two-class classification (COVID-19 and non-COVID-19), Three-class classification (COVID-19, non-COVID-19 Pneumonia, and Normal cases), and Four-class classification (COVID-19, Bacterial Pneumonia, non-COVID-19 Pneumonia, and Normal cases).

Two-class classification involves classifying COVID-19 from non-COVID-19 cases, including CXR with other infections such as SARS, MERS, ARDS, along with normal cases. Afshar et al. [17] introduced COVID-CAPS, a Capsule Network-based framework for the classification of COVID-19 CXR. Sethy et al. [18] presented a CNN model coupled with a Support Vector Machine (SVM) classifier for the defined objective. The intent behind both of the above works is to differentiate COVID-19 cases from non-COVID-19. Abraham and Nair [19] introduced a multi CNN approach, which involves s series of pre-trained CNN models for accurately screening COVID-19 CXR. This approach aggregates the features extracted by the multiple CNN models with the Bayesnet classifier and correlation-based feature selection (CFS) technique to achieve the objective. The authors have evaluated the performance of the proposed method on two publicly available datasets and achieved a reasonably good performance. Horry et al. [20] used a transfer learning approach with VGG 19 architecture (initialized with IMAGENET weights) to effectively detect COVID-19 radiographs. In the study, the authors have experimented with three modalities, including CT, Ultrasound, and CXR, and have claimed ultrasound images delivered a good detection rate. However, to develop a reliable system, the model should be capable of handling COVID-19 cases in a multi-class scenario, which turns to be the major drawback of this approach.

The majority of the works proposed in the literature focused on three-class classification involving the identification of COVID-19 cases from non-COVID-19 Pneumonia and normal cases. A semi-supervised hybrid model was proposed by Khobahi et al. [21] by using Task-Based-Feature-Extraction-Technique (TFEN) followed by COVID Identification Network (CIN). However, the study was conducted with minimal samples of COVID-19 cases (89 and 10 samples in training and testing, respectively), which turned out to be the major pitfall of this approach. COVIDiagnosis-Net was introduced by Ucar et al. [22] Wang et al. [29] formulated a public dataset called COVIDx [10] , which is a combination of many publicly available datasets [11] [12] [13] [14] [15] and also proposed an architecture called COVID-Net which makes extensive use of Projection-Expansion-Projection (PEP) blocks for efficient classification of COVID-19 CXR images from normal and non-COVID-19 Pneumonia cases. The model achieved an overall accuracy of 93.3 and a precision and recall rate of 98.9 and 91.0, respectively, for COVID-19 cases.

Oh et al. [30] introduced CNN with a patchwise wise approach based on the statistical interpretation of CXR data. Authors showed interesting results in comparison with [29] by achieving an overall accuracy of 91.9 with the recall rate of 100 and low precision of 76.9 for COVID-19 cases. A four-class classification approach is followed by Farooq et al. [31] involving the detection of COVID-19 cases from Normal, non-COVID-19 Pneumonia, and Bacterial Pneumonia cases using residual blocks. This study was carried out with limited samples of COVID-19 cases (68 samples), due to which it lacks generalizability. Aslan et al. [32] proposed a transfer learning-based hybrid architecture for accurately classifying COVID-19 radiographs from normal and pneumonia cases. This method combines AlexNet architecture and BiLSTM for incorporating temporal information to achieve the defined objective. This model recorded an overall accuracy of 98.70.

Severity assessment or severity score prediction involves assessing the progression of COVID-19 infection by observing and the lung involvement through the standard scoring techniques such as BRIXIA [33] , or Radiographic Assessment of Lung Edema (RALE) [34] scoring systems. BRIXIA score was an experimental scoring system developed by researchers from Italy during the early COVID-19 pandemic. In the BRIXIA scoring system, both lungs are divided separately into three different zones and classified each zone into 4 different severities based on lung opacities. These opacities though not specific, are characteristic of viral Pneumonia of COVID-19. In literature, we can observe few methods based on the BRIXIA scoring system to predict the severity of COVID-19 infection [35] [36] [37] . RALE scoring system was initially developed as a non-invasive measure to assess the severity and outcome profile of the patient with ARDS [34] . As ARDS is one of the features of Severe COVID pneumonia, modified RALE score, especially the geographic assessment, was found useful to assess the severity of COVID-19 Pneumonia in many studies. The scale of RALE score ranges from 1-8 (1-2: Mild, 3-5: Moderate, 6-8: Severe). Radiologists can assess this score to measure the severity and to treat the patients accordingly. Authors of [29] introduced COVIDNet-S, a deep neural network model for predicting the geographic extent and opacity extent of COVID-19 CXR. COVIDNet-S achieved an R 2 score of 66.4 and 63.5 for geographic extent and opacity extent, respectively. Tabik et al. [38] introduced COVID-Smart Data-based Network (COVID-SDNet) for classifying the infection progression into mild, moderate, and severe based on the RALE scoring technique. Apart from the BRIXIA and RALE scoring, there are some other techniques that are being adopted in the literature for severity scoring of COVID-19 CXR infection [39] [40] [41] [42] .

Convolutional Neural Networks (CNN) have revolutionized the computer vision domain with their ability to learn the intrinsic features of an image by using a gradient-descent-based learning algorithm, at times even surpassing human-level performance on many public datasets. Despite their massive success, certain conditions need to be met for a CNN to perform well. Firstly, a CNN requires the availability of huge amounts of data, which is hard to come by in medical imaging. Secondly, a typical CNN incurs a very high computational cost due to its large number of trainable parameters. Moreover, a CNN does not take into account spatial hierarchies among the features [43] , which are paramount in high-level image encoding/understanding. On the other hand, Capsule networks follow a different approach and address the above flaws with the help of a powerful neuronal representation called capsules and the dynamic routing algorithm. The first stage of Capsule Networks generally follows a series of simple convolutional layers to form primary capsules for initial level feature encoding. We demonstrate and show that using an efficient and powerful organization of convolutional layers instead of a series of simple convolutional layers would lead to powerful primary capsules. To this end, we propose a lightweight hybrid model called MobileCaps to build a rich feature space while requiring a fraction of the computational cost compared to other CNN models. We use MobileNetV2 for the initial stage of MobileCaps due to its high efficacy with a smaller parameter space, followed by capsule layers. In the following sub-section, we have briefly elaborated on the Capsule Networks and MobileNetV2, which follows the proposed model.

Sabour et al. [43] introduced Capsule Networks with an intent to address the limitations of CNNs, namely the large data requirement and their inability to learn hierarchical spatial relationships. Capsule Networks comprise multiple capsules that hold a vector value instead of a neuron that holds a scalar. Each capsule consists of instantiation parameters of the features of an image, and the length of a capsule represents the probability that the feature exists in the image. Capsules communicate with each other using the dynamic routing algorithm. A capsule u i in layer l will try to predict the output of another capsuleû j|i in layer l + 1 using a trainable weight matrix W ij as given in Eq 1 [43] :

Each capsule u i has a routing coefficient c ij associated with it that corresponds to how much it agrees with the output of the capsule u j . The value c ij is determined by the dynamic routing algorithm. The actual output s j of capsule u j is calculated as shown in Eq 2 [43] .

The output vector s j is then passed through a non-linear squash activation as can be seen in Eq 3. This is done to preserve it's direction by restricting it's length to the range [0, 1] to obtain the final output v j .

The similarity between the actual output v j and the predicted outputû j|i is calculated by taking their dot product, i.e. a measure of how much the predicted output agrees with the actual output is computed and the routing coefficient for each capsule u i is updated as per Eq 4 and Eq 5 [43] . This is the essence of the dynamic routing algorithm. Initially all capsules u i are given equal weightage. Eqs 2-5 can be repeated r times to obtain a better value of their routing coefficients, where r indicates the number of routing iterations.

MobileNetV2 [44] is a family of lightweight CNN models that are suitable for mobile and embedded computer vision applications. MobileNetV2 aims at replacing expensive convolution operations with less expensive and efficient ones without compromising on performance. The factorization of a standard convolution into a depth-wise convolution followed by 1 × 1 convolution (depth-wise separable) with inverted residuals and linear bottlenecks forms the crux of the architecture. It majorly consists of three blocks, namely the expansion block, the depth-wise convolution block, and the projection block, respectively.

The expansion block is a 1 × 1 convolution block that accepts the input tensor and expands the number of channels specified by the expansion-factor. Expansion-factor is a hyperparameter to be set (default value is 6). For example, If the expansion block accepts an input tensor with 32 channels, it will output a new tensor with 192 (32×6) channels. The resultant output is then filtered using a depth-wise convolution block and fed into the projection or the bottleneck layer, which is also a 1 × 1 convolution layer that shrinks or projects the input back to a smaller number of channels (say, 192 to 32). MobileNetV2 also makes use of residual connections to boost the gradient flow throughout the network.

The proposed model is shown in Fig. 3 . It is an amalgamation of MobileNetV2 and Capsule Network that results in a lightweight and efficient, hybrid architecture wherein MobileNetV2 extracts features from the CXR image, which is subsequently passed to the Capsule Network. Generally, the first stage of Capsule Networks is a series of convolutional layers meant for extracting and encoding simple features like edges, horizontal lines, vertical lines, etc., in the form of a vector. The output of the final convolution operation is reshaped to form primary capsules that communicate with higher-level capsules with the help of the dynamic routing algorithm (as described from Eqs 1-Eq 5). We observed that making the initial set of convolutional layers deeper and more efficient would help build better primary capsules, which gives an excellent start to the capsule network, thereby improving the model's overall performance. Our observations are corroborated by the ablation study, which we discuss in the following subsection.

MobileCaps accepts Chest X-ray (CXR) images of shape 224 × 224 × 3, which is fed into the first block i.e a MobileNetV2 that has been initialized with ImageNet weights [45] . The MobileNetV2 acts as a feature extractor that constructs a rich feature space of dimension 7 × 7 × 1024. This feature map is then passed through a dropout layer to enforce regularization and to prevent possible overfitting. This output is then reshaped into the primary capsule layer of dimension 392 × 128, i.e., 392 capsules of dimension 128. This layer is followed by two more capsule layers of dimension 32 × 16 that use the dynamic routing algorithm (Eqs. [2] [3] [4] [5] to pass information from one layer to the next. The final capsule layer contains three 16-dimensional capsules, one for each class. The prediction vector is calculated by computing the length of these capsules, which indicates the probability of the presence of each class. We further propose MobileCaps-S for predicting the severity score of COVID-19 CXR based on the RALE score. We modified the bottom layers of the proposed MobileCaps architecture for the defined objective. The model accepts the Chest X-Ray image of the shape 224 × 224 × 3 and predicts the probability output in the range of 0 to 1, indicating the severity, which is then mapped back into the RALE severity score (1) (2) (3) (4) (5) (6) (7) (8) . The number of capsule layers and the number of capsules in each capsule layer are fine-tuned to get the optimal performance with minimal parameter space. In the next subsection, we provide observations of the ablation study conducted and drive home the intuition behind using MobileNetV2 as a feature extractor for the Capsule Network.

As a part of this study, we performed an ablation analysis supporting the design of the proposed architecture. We compared the performance of the proposed model by training Capsule Networks and MobileNetV2 independently. As per our hypothesis, the performance of Capsule Network that uses simple convolutions to form primary capsules is subpar. Moreover, since MobileNetV2 misses out on the advantage of capsules and the dynamic routing algorithm, it performs poorly compared to the proposed architecture. Table 1 shows the quantitative results depicting the superiority of the proposed architecture. Capsule Network and MobileNetV2 achieved an F1 score of 74.03 and 82.90, respectively whereas, the proposed method achieved 94.91, which is remarkable. The following section presents the experimental results and discussion of our proposed model. 

This section epitomizes the datasets used in the study, followed by the training strategy, loss function, and hyperparameters adopted in the proposed method. Finally, we evaluate and compare the quantitative performance of the proposed model with various other benchmark models.

In this subsection, we will briefly elaborate on the different datasets used in the study. We consider the recent version of the COVIDx dataset (accessed 13 June 2020) [10] introduced by Wang et al. [29] , Twitter COVID-19 CXR data [16] , JSS COVID-19 data, and COVIDx RALE severity data for evaluating the performance of the proposed model.

• COVIDx data [10] : COVIDx data is formed by merging the following publicly available datasets. 1) COVID-19 image data collection [11] 2) COVID-19 chest x-ray data initiative [12] 3) Actualmed COVID-19 chest x-ray data initiative [13] 4) COVID-19 radiography database [14] 5) RSNA Pneumonia Detection Challenge dataset [15] . COVIDx [10] is widely considered as the benchmark dataset for developing an automated algorithm for accurate classification of COVID-19 CXR from non-COVID-19 CXR and healthy cases. The data distribution of actual and the official version of the COVIDx data adopted by the authors of COVIDx [10] is shown in the first and second rows of Table 2 . Further, COVIDx [10] data suffers from data imbalance, which can be observed in Table 2 . The number of samples in the COVID-19 class is less as compared to the Normal and non-COVID-19 Pneumonia classes. To mitigate the challenge of data imbalance, we over-sampled the COVID-19 class by applying data augmentation techniques (horizontal flip, zoom, rotation, width shift, height shift, and shear). We increased the size of the COVID-19 class to 1500 samples while randomly under-sampling the other two classes to obtain 1500 samples from each class, and thus we prepared and adopted a balanced version of COVIDx data in this study. • Twitter COVID-19 CXR data [16] : This dataset consists of 134 Chest X-Rays manifesting COVID-19 viral pneumonia, which is made available on Twitter for the research purpose by a radiologist from Spain. All 134 cases are subjected and confirmed with SARS-CoV-2 PCR+. • JSS COVID-19 CXR data: This data is obtained from Jagadguru Sri Shivarathreeshwara (JSS) medical college Mysore, India. This dataset consists of 76 COVID-19 RT-PCR confirmed cases from 76 patients. • COVIDx RALE severity data: COVID-19 CXR from COVIDx data [10] was scored with the RALE severity scoring method [34] with the help of a senior radiologist and a pulmonologist using the Labelbox [46] annotation tool. There were a total of 573 samples, but due to data imbalance (a large number of mild and moderate cases as compared to the number of severe cases), we have considered a total of 388 CXR with a varying severity level.

We train the proposed model on the preprocessed COVIDx [10] dataset with a cyclic cosine annealing policy [47] for the learning rate along with a snapshot ensembling strategy [48] . The policy starts with a large learning rate that aggressively decreased to a minimum value before increasing again. The learning rate is changed according to Eq 6 [47] . Fig. 4 shows the learning curve of the proposed method.

where α 0 denotes the maximum learning rate, t is the current epoch number, T is the total number of epochs, and C is the total number of cycles. The decaying learning rate during the cycle allows the model to converge to a local minimum. At the end of the cycle, the high value of the restarted learning rate is enough to perturb the model from its local minima, following which the decaying learning rate allows the model to converge to another local minimum. We take snapshots of the model at each such local minima, as there is significant diversity in the local minima visited by the model. With the snapshot ensembling method, we can get M models while incurring the computational cost of training only one model. During testing, we combine the results of M snapshots by taking the average of the softmax probabilities of each snapshot. The initial learning rate and the length of one cycle are hyperparameters, the optimal value for which is determined by the Bayesian optimization technique [47] .

Bayesian optimization makes informed decisions about the next set of hyperparameters to evaluate based on the previous results. It takes fewer iterations for the algorithm to find the optimal combination of hyperparameters as it disregards those values that won't improve the validation score. It achieves this by finding an approximation for the objective function called the surrogate function that can be sampled effectively. We use this technique to find an optimal value for the learning rate and the cycle length. For each Bayesian optimization process's estimate, we train our proposed model for one cycle to obtain a reasonable estimate of the validation accuracy. We select the model with the best validation accuracy and train it for 10 cycles, taking a snapshot at the end of each cycle, which is then used to make the final prediction.

We train the proposed model till convergence, which approximately took 500 epochs after selecting the best hyperparamaters with Bayesian optimization. Nadam optimization [49] with β 1 = 0.9, β 2 = 0.999, and = 10 −7 was used along with snapshot ensembling with snapshots being taken every 50 epochs. MobileCaps was trained with the margin loss function described in Eq 7 [43] and MobileCaps-S was trained with logcosh loss function as describe by Eq 8 respectively.

where v k denotes the final set of capsules, and T k = 1 only if the k th class is present in the image. m + , m − , and λ are hyperparameters whose values are 0.9, 0.1 and 0.5 respectively.

L(y, y p ) = n i log(cosh(y p i − y i )) (8)

In this section, we will quantitatively analyze the performance of the proposed MobileCaps method on the balanced COVIDx data [10] , Twitter COVID-19 CXR data [16] and the JSS COVID-19 CXR data and MobileCaps-S on COVIDx RALE severity data. The classification task is evaluated with precision, recall, and F1-score, and the severity prediction uses R 2 coefficient as the evaluation metric.

The proposed architecture is implemented in Keras [50] with TensorFlow [51] as the backend. The experiments were conducted and evaluated on a 64-bit workstation with a CentOS Linux 7 operating system, hard disk drive, NVIDIA Tesla P100 with 16 GB dedicated GPU, and Intel(R) Xeon(R) Silver 4114 CPU @ 2.20 GHz processor. Datasets and source code are available in the following GitHub repository: https://github.com/ActiveNeuron/MobileCaps

We evaluated the proposed MobileCaps architecture with 5-fold patient-wise cross-validation. Train, validation, and test data of each fold are prepared with the holdout approach. Table 3 shows the results of 5-fold cross validation and Table 4 shows the performance comparison of the proposed MobileCaps with the benchmark models (all the methods in Table 4 are subjected to 5-fold cross-validation). It is evident from Table 4 that the proposed model outperforms all the benchmarks by a sufficiently large margin in both recall as well as precision. Fig. 5 depicts the confusion matrix and the ROC curve obtained for one of the fold-1. The proposed model achieved a precision of 98.50, which indicates that only 1.50% of the samples were miss classified into non-COVID-19 Pneumonia and Normal cases (least false positive rate) which is quite significant and a recall rate of 91.60 with an F1-score of 94.91 for COVID-19 cases. The proposed model is able to classify COVID-19 CXR images regardless of the severity level. Moreover, we compare and evaluate the performance of the proposed model with two more RT-PCR confirmed datasets, namely the Twitter COVID-19 CXR data [16] and JSS COVID-19 CXR data. The results are depicted in Table 5 . We Another notable thing about the proposed architecture is the number of trainable parameters. As can be seen in Table 6 , the proposed model makes use of fewer trainable parameters in comparison to the state-of-the-art models, which helps in reducing the computation cost as well as hardware overhead. Though Afshar et al. [17] have fewer parameters, Its performance on the recent version of COVIDx data was found to be incompetent. Further, In Fig. 6 , we have visualized Probability-guided Activation maps (ProbAM) [52] of COVIDx data [10] ,Twitter COVID-19 CXR data [16] and JSS COVID-19 CXR data as an evidence for the areas focused or learned by the proposed method. ProbAM results clearly show the appropriate targeted ROI region, which the proposed model has learned from the input Chest X-Ray.

Due to the unavailability of the source code and insufficient information; we were not able to perform a one-to-one comparison with some of the methods, but based on the number of samples on which the proposed model is evaluated, the number of trainable parameters, and the generalizing ability we strongly argue that our model outperforms the recent approaches. In [30] the authors followed a patch-wise approach based on the statistical interpretation of the data. Authors compared the performance of their model with COVID-Net [29] , and as per their reported results, their approach managed to reach a sensitivity of 100% and a precision of 91.90% on COVID-19 cases. One noteworthy fact is that the authors evaluated the performance of their model on test data of 300 samples with just 10 samples of COVID-19 cases. COVID-Net [29] is considered as the state-of-the-art for detecting COVID-19 CXR. Authors of COVID-Net [29] achieved a precision and recall of 98.9% and 91.0% respectively for 

Rahimzadeh et al. [24] 48M Karim et al. [27] 21M Eduardo et al. [23] 11.60M Oh et al. [30] 11.60M Wang et al. [29] 11.75M Khobahi et al. [21] 11.80M Ucar et al. [22] 724K Afshar et al. [17] 295K Proposed Method 2.2M : ProbAM [52] visualization of COVIDx data [10] , Twitter COVID-19 CXR data [16] and JSS COVID-19 CXR data.

COVID-19 cases; whereas, the proposed method achieved a precision of 98.50, which is comparable with 98.9 of [29] and a recall of 91.60 after performing 5-fold cross-validation. Moreover, the proposed model is proven to be highly generalizable from Table 5 . Further, our model uses 81.27% fewer parameters than COVID-Net [29] . The performance analysis of the proposed MobileCaps-S is also evaluated with a 5-fold cross-validation technique and achieved a mean R 2 score of 70.51 ± 2.3504 in predicting the RALE severity of COVID-19 CXR. The model outputs a severity score in the range of 1-8, which can be used to determine the level of infection.

In this study, we presented MobileCaps, a fully automated, hybrid, and lightweight model for the accurate classification of COVID-19 CXR images, as well as MobileCaps-S for predicting severity scores based on the RALE scoring technique. The proposed models showed an effective way to utilize a Convolutional Neural Network to form powerful primary capsules in a Capsule Network that could construct a rich feature space with a great reduction in the number of trainable parameters. The MobileCaps model was trained and extensively evaluated on the COVIDx dataset, and further, the generalizing ability of the model was evaluated with two more additional datasets. Saliency maps visualized using the ProbAM technique depicted the appropriate ROI region learned by the proposed model, which further solidified its generalizability. Additionally, the COVID-19 CXR images from the COVIDx dataset were annotated using the RALE scoring technique and trained with the MobileCaps-S model for severity assessment of COVID-19. The superior performance of the proposed models, as corroborated by the experimental results, coupled with their lightweight nature, makes them an ideal candidate for easing the burden of healthcare systems compared to other state-of-the-art models in the literature.

COVID-19: Infection prevention and control

Epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (covid-19) during the early outbreak period: a scoping review

World Health Organization et al. Critical preparedness, readiness and response actions for covid-19: interim guidance

World Health Organization et al. Transmission of sars-cov-2: implications for infection prevention precautions: scientific brief

World Health Organization et al. Medical certification, icd mortality coding, and reporting mortality associated with covid-19

Discovering drugs to treat coronavirus disease 2019 (covid-19)

Laboratory diagnosis of covid-19: current issues and challenges

Diagnostic testing for severe acute respiratory syndrome-related coronavirus-2: A narrative review

Covid-19 diagnostics in context

COVIDx data by COVID-Net team

Covid-19 image data collection: Prospective predictions are the future

Figure 1 COVID-19 chest x-ray data initiative

Actualmed COVID-19 chest x-ray data initiative

COVID-19 radiography database

COVID-19 radiography database

Twitter COVID-19 CXR Dataset

Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images

Detection of coronavirus disease (covid-19) based on deep features

Computer-aided detection of covid-19 from x-ray images using multi-cnn and bayesnet classifier

Anwaar Ulhaq, Biswajeet Pradhan, Manas Saha, and Nagesh Shukla. Covid-19 detection through transfer learning using multimodal imaging data

Coronet: A deep network architecture for semi-supervised task-based identification of covid-19 from chest x-ray images. medRxiv

Covidiagnosis-net: Deep bayes-squeezenet based diagnostic of the coronavirus disease 2019 (covid-19) from x-ray images

Towards an efficient deep learning model for covid-19 patterns detection in x-ray images

A modified deep convolutional neural network for detecting covid-19 and pneumonia from chest x-ray images based on the concatenation of xception and resnet50v2

Covid-mobilexpert: On-device covid-19 screening using snapshots of chest x-ray

Automated detection of covid-19 cases using deep neural networks with x-ray images

Explainable covid-19 predictions based on chest x-ray images

A novel medical diagnosis model for covid-19 infection detection based on deep features and bayesian optimization

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images

Deep learning covid-19 features on cxr using limited training data sets

Covid-resnet: A deep learning framework for screening of covid19 from radiographs

Cnn-based transfer learning-bilstm network: A novel approach for covid-19 infection detection

Covid-19 outbreak in italy: experimental chest x-ray scoring system for quantifying and monitoring disease progression. La radiologia medica

Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ards

End-to-end learning for semiquantitative rating of covid-19 severity on chest x-rays

Chest x-ray severity index as a predictor of in-hospital mortality in coronavirus disease 2019: A study of 302 patients from italy

Modified chest x-ray scoring system in evaluating severity of covid-19 patient in dr. soetomo general hospital surabaya, indonesia

Covidgr dataset and covid-sdnet methodology for predicting covid-19 based on chest x-ray images

Severity of lung involvement on chest x-rays in sars-coronavirus-2 infected patients as a possible tool to predict clinical progression: an observational retrospective analysis of the relationship between radiological, clinical, and laboratory data

Predicting covid-19 pneumonia severity on chest x-ray with deep learning

Performance of a severity score on admission chest radiograph in predicting clinical outcomes in hospitalized patients with coronavirus disease (covid-19)

Covid-19 disease severity assessment using cnn model

Dynamic routing between capsules

Mobilenetv2: Inverted residuals and linear bottlenecks

Imagenet: A large-scale hierarchical image database

Sgdr: Stochastic gradient descent with warm restarts

Snapshot ensembles: Train 1, get m for free

Incorporating nesterov momentum into adam. ICLR

Tensorflow: A system for large-scale machine learning

Evaluating generalization ability of convolutional neural networks and capsule networks for image classification via top-2 classification