key: cord-0636185-9vezr83c authors: Abbasi, Wajid Arshad; Abbas, Syed Ali; Andleeb, Saiqa title: COVIDX: Computer-aided diagnosis of Covid-19 and its severity prediction with raw digital chest X-ray images date: 2020-12-25 journal: nan DOI: nan sha: b964a2992fb695397c22d1dfcbcbe369ff96ac2f doc_id: 636185 cord_uid: 9vezr83c Coronavirus disease (COVID-19) is a contagious infection caused by severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) and it has infected and killed millions of people across the globe. In the absence of specific drugs or vaccines for the treatment of COVID-19 and the limitation of prevailing diagnostic techniques, there is a requirement for some alternate automatic screening systems that can be used by the physicians to quickly identify and isolate the infected patients. A chest X-ray (CXR) image can be used as an alternative modality to detect and diagnose the COVID-19. In this study, we present an automatic COVID-19 diagnostic and severity prediction (COVIDX) system that uses deep feature maps from CXR images to diagnose COVID-19 and its severity prediction. The proposed system uses a three-phase classification approach (healthy vs unhealthy, COVID-19 vs Pneumonia, and COVID-19 severity) using different shallow supervised classification algorithms. We evaluated COVIDX not only through 10-fold cross2 validation and by using an external validation dataset but also in real settings by involving an experienced radiologist. In all the evaluation settings, COVIDX outperforms all the existing stateof-the-art methods designed for this purpose. We made COVIDX easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidx, respectively. validation and by using an external validation dataset but also in real settings by involving an experienced radiologist. In all the evaluation settings, COVIDX outperforms all the existing stateof-the-art methods designed for this purpose. We made COVIDX easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidx, respectively. Keywords: Coronavirus, COVID-19, diagnosis, SARS-COV-2, Chest X-Ray, Contagious infection, Pandemic Coronavirus disease (COVID-19) is a contagious infection caused by severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) and its transmission is also possible from asymptotic patients while incubation Kooraki et al., 2020) . This pandemic has infected and killed millions of people across the globe ("COVID-19 Map," n.d.). The World Health Organization (WHO) has already declared this pandemic as a global health calamity ("Coronavirus disease (COVID-19) -World Health Organization," n.d.). According to medical experts, this disease mainly infects the human respiratory system causing severe pneumonia showing symptoms of dry cough, breathing problems, fever, fatigue, and lung failure, etc Huang et al., 2020) . Right now, the world is curiously waiting for a specific vaccine or medication to prevent this lethal infection. In the absence of specific drugs or vaccines for the treatment of COVID-19, early diagnosis of the disease is crucial to avoid further spread by advising quarantine or isolation. Currently, there are two types of tests to detect COVID-19: diagnostic tests and antibody tests (Commissioner, 2020) . A diagnostic test such as Reverse transcription-polymerase chain reaction (RT-PCR) can be used to detect an active coronavirus infection. Whereas, the antibody test looks for antibodies created in your body in the response to the disease. RT-PCR is the most common diagnostic practice to detect viral infection (Sheikhzadeh et al., 2020) . However, the standard confirmatory clinical RT-PCR test to detect COVID-19 is manual, complex, laborious, and costly (Chowdhury et al., 2020) . Moreover, the limited availability of test kits and domain experts further hamper the situation. Meanwhile, antibody tests cannot be used to diagnose COVID-19 as after infection, antibodies can take numerous days or weeks to develop and may stay in your blood a bit longer after recovery (Commissioner, 2020) . Keeping in view the limitation of prevailing diagnostic techniques and a rapid surge of infected patients, there is a requirement for some alternate automatic screening systems that can be used by the physicians to quickly identify and isolate the infected patients. X-ray imaging is the most common noninvasive diagnostic technique that helps medical practitioners to diagnose and treat many diseases. A chest X-ray (CXR) is normally taken to assess the medical fitness of the lungs, heart, and chest wall. The CXR has also been playing a critical role in the pilot investigation of various respiratory irregularities (Chandra et al., 2021; Chandra and Verma, 2020a) . In this context, a CXR image can also be used as an alternative modality to detect and diagnose the COVID-19. CXR images are normally interpreted by expert radiologists. Whereas, some studies show that CXR images of COVID-19 infected patients show diverse features Chowdhury et al., 2020; Zhang et al., 2020) . Therefore, the manual interpretation of these CXR images with subtle variations is quite challenging. Moreover, the current enormous upsurge of infected patients makes it a challenging task for the domain experts to complete a timely diagnostic (Asnaoui et al., 2020; Chandra et al., 2021) . To combat this situation some Computer-Aided Diagnosis (CAD) systems are the need of the time. In Computer-Aided Diagnosis (CAD) using X-ray images, machine learning has already been applied successfully in many clinical and radiological studies to describe various characteristics of radio-imaging (Jaiswal et al., 2019; Pesce et al., 2019; Xue et al., 2018) . Also to diagnose COVID-19 using X-ray images, a plethora of machine learning-based methods with varying sources and amount of training data have been proposed in the literature (Abbas et al., 2020; Ardakani et al., 2020; Asnaoui et al., 2020; Chowdhury et al., 2020; Jain et al., 2020; Minaee et al., 2020; Ozturk et al., 2020; Panwar et al., 2020; Toğaçar et al., 2020) . Almost all the proposed methods in the literature are based on Convolution Neural Network (CNN) deep learning approach (LeCun et al., 1999) . However, deep learning approaches to generalize well normally require an enormous amount of data which is not readily available right now. Although, some studies have been proposed using transfer learning where to fine-tune the model, a handsome amount of data is required keeping in view the dimensionality of features produced through pre-trained deep learning models. Moreover, the above studies have been proposed only to diagnose COVID-19 but its severity detection is still an open research question. To overcome these shortcomings, in this study we have proposed COVID-19 diagnosis and its severity predictor called COVIDX (COVId-19 Detection using X-ray images) by employing a blend of deep and shallow learning. In this section, we discuss the details of our experimental strategy to design COVIDX (COVId-19 Detection using X-ray images) and its evaluation. The datasets used in this study have been collected from different publicaly open online COVIDchest-X-ray image repositories (Cohen et al., 2020; Wang et al., 2020) . We used two different datasets: i) dataset for COVID-19 diagnosis and ii) dataset for COVID-19 severity prediction. For COVID-19 diagnosis, we have a dataset of 576, 1583, and 4273 digital X-ray images (jpg format) of COVID-19 patients, healthy persons, and pneumonia patients, respectively. Whereas for COVID-19 severity prediction, we have a dataset of 114 and 164 X-ray images of highly severe and less severe COVID-19 patients, respectively. All the images in both datasets have been preprocessed including image resizing (313 × 313 pixels), de-noising, and contrast stretching (Chandra and Verma, 2020b) . We propose a machine learning approach for the identification and severity prediction of COVID-19 infection from raw digital chest X-ray images. The methodology adopted in this study has been depicted in Fig. 1 . To extract useful feature space from the digital chest X-ray images in our datasets, we have used different off-the-shelf CNN based pre-trained models on ImageNet. These pre-trained models include Resnet50 (He et al., 2015) , InceptionV3 (Szegedy et al., 2015) , Xception (Chollet, 2017) , VGG16 (Simonyan and Zisserman, 2015) , NASNetLarge (Zoph et al., 2018) , DenseNet121 (Huang et al., 2018) . The selection of these pre-trained CNN based models was based on their reported accuracy. Preprocessing and resizing required by the pre-trained models have been applied before extracting the features map. We propose a machine learning based approach for the identification and severity prediction of COVID-19 from digital chest X-ray images. As discussed earlier, the novelty of the proposed approach is that it uses a blend of pre-trained CNN based models to extract features and shallow learning algorithms for the classification purpose. The proposed scheme is based on the paradigm of transfer learning. In this work, our dataset consists of examples of the form ( , ) where is a chest X-ray image and ∈ {+1, −1} is its associated label. For COVID-19 identification, indicate whether COVID-19 (+1) or not (-1) and for COVID-19 Severity prediction indicate high (+1) or low (-1). For a given chest X-ray image , we extract deep feature maps from it which are denoted by . Our objective is to learn three separate functions for the identification and its severity prediction. For this purpose, we have used three different classifiers: classical Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machine (XGBoost) (Breiman, 2001; Cortes and Vapnik, 1995; Friedman, 2001 ). We used SVM for the diagnosis of COVID-19 and its severity prediction by learning a function ( ) = 〈 , 〉 with as parameters to be learned from the training data {( , )| = 1, 2, … , }. The optimal value of the is obtained in SVM by solving the following optimization problem (Cortes and Vapnik, 1995) . The objective function in Eq. (1) maximizes the margin while minimizing margin violations (or slacks ξ) (Cortes and Vapnik, 1995) . The hyperparameter = 1 controls the tradeoff between margin maximization and margin violation. We used both linear and radial basis function (RBF) kernels and coarsely optimized the values of λ and γ using grid search with scikit-learn (version:0.23) (Pedregosa et al., 2011) . Random Forest Classification (RFC) uses an ensemble learning technique based on bagging for regression. A random forest operates by constructing several decision trees in parallel during training and outputs the mean of the classes as the prediction of all trees (Breiman, 2001) . It usually performs better on problems having features with non-linear relationships. Each classification tree in the RF is constructed on randomly sampled subsets of input features. In this study, we have optimized RF for the number of decision trees in the forest, the maximum number of features considered for splitting a node, the maximum number of levels in each decision tree, and a minimum number of samples required to split. We have also seen this regression technique effectively in action in many other studies (Abbasi et al., 2017; Ballester and Mitchell, 2010; Li et al., 2014; Moal et al., 2011) . XGBoost is also an ensemble learning technique based on the boosting that combines weak learners into a strong learner in an iterative fashion (Chen and Guestrin, 2016; Friedman, 2001) . We have used trees as default base learners in XGBoost ensembles. In this study, we have optimized XGBoost in terms of the learning rate, maximum depth, the number of boosting iterations, booster, and subsample ratio with a grid search and using a python-based package xgboost 0.7 (Chen and Guestrin, 2016). We have divided the preprocessed images into two sub-sets: train-test set (80%), held-out validation set (20%), and reported performance metrics on both the sub-sets. For the train-test set, we have used 10-fold cross-validation (CV). In 10-fold CV, we have shuffled images in our datasets and then split them into 10 groups. Ten models have been trained and evaluated with each group given a chance to be held out as the test set (Abbasi and Minhas, 2016) . For the held-out validation set, we trained the classification models using the whole train-test set and tested on the validation set. For performance metrics, we have used the area under the ROC curve (ROC), the area under the precision-recall curve (PR), and F-measure as performance measures for model evaluation and performance assessment (Abbasi and Minhas, 2016; Davis and Goadrich, 2006; Tharwat, 2020) . We used grid search over training data to find the optimal values of hyperparameters of different classification models. We have developed and deployed a webserver of COVIDX which uses the optimal machine learning model for COVID-19 diagnosis and its severity prediction. This webserver takes a chest X-ray image and performs COVID-19 diagnosis and its severity prediction for it. After the successful submission of an X-ray image, the users are redirected to a page showing COVIDX predictions for COVID-19 diagnosis and its severity prediction. This process has broken into three steps: 1) whether the uploaded image belongs to a healthy person or unhealthy one (Healthy vs Un-healthy), 2) if the uploaded image belongs to an unhealthy person then whether it is COVID-19 infection or not (COVID-19 or not), and 3) if it is COVID-19 infection then how severe it is. The webserver is available at https://sites.google.com/view/wajidarshad/software. In this work, we have proposed a machine learning-based computer-aided COVID-19 diagnosis and its severity prediction. We have divided the COVID-19 diagnostic task into three sub-tasks based on the available data: 1) classification of healthy vs unhealthy, 2) classification of COVID- showing the prediction performance of our proposed method over cross-validation and an external test dataset. We have trained various shallow machine learning models for the classification of healthy versus unhealthy X-ray images with a range of deep learning-based feature maps and evaluated using (Table 1) . These results On 10-fold CV show the comparable performance of our proposed method in comparison to the state-of-the-art method proposed by Chandra et al (Chandra et al., 2021) . Meanwhile, using an external dataset for the evaluation of our machine learning models trained on the training set, we observed a similar trend of performance with an F1-score of 0.98 along with 0.99, and 0.99 as the area under the ROC curve, and the area under the PR curve, respectively with Support Vector Classifier and DenseNet121 feature map (Table 2 , Fig. 2A, Fig. 2B ). These results show a better performance of our proposed method in comparison to the state- (Chandra et al., 2021) . We have also trained various shallow machine learning models for the classification of COVID-19 versus Pneumonia X-ray images with a range of deep learning-based feature maps and evaluated using both 10-fold cross-validation (CV) and on an external validation dataset. The results of our evaluation in both settings are shown in Tables 3 & 4 Fig. 2C, Fig. 2D ). These results show a better performance of our proposed method in comparison to the state-of-the-art method proposed by Chandra et al., with an F1-score of 0.91 and the area under the ROC curve of 0.91 even after using majority voting (Chandra et al., 2021) . The significantly better performance of SVM in comparison to Random Forest (RF) and Extreme Boosting Machine (XGB) is attributed to its ability to deal well with high dimensional features with fewer examples as in our case. Moreover, features extracted through pre-trained deep learning models perform better than handcrafted ones as in the case of the study performed by Chandra et al., (Chandra et al., 2021) . Along with COVID-19 identification, we have also tried to predict the severity of COVID-19 infection. For this purpose, we have trained several shallow machine learning models for the Performance of COVIDX has also been evaluated using its webserver in a real setting under the supervision of an experienced radiologist at Abbas Institute of Medical Sciences (AIMS) hospital located in Muzaffarabad, Azad Jammu & Kashmir, Pakistan. For this purpose, 30 X-ray images (anonymous) belonging to different classes (10 COVID-19 infected, 10 Pneumonia infected, and 10 Healthy persons) have been used. Results obtained through this evaluation are shown as a confusion matrix in Fig. 3 . Our proposed system (COVIDX) has been able to classify correctly 9 out of 10 provided X-ray images of COVID-19 infected patients and 1 as Pneumonia (Fig. 3) . Similarly, for the provided X-ray images of Pneumonia infected patients, our system classified correctly 8 out of 10 images, 1 as COVID-10 and 1 as healthy (Fig. 3) . For provided health x-ray images, our system classified 9 out of 10 images correctly as healthy and 1 as Pneumonia. These results justify the use of our proposed system in real settings. In the present study, we have designed and developed a system called COVIDX for the preliminary diagnosis of COVID-19 infection and its severity prediction from a raw X-ray image. The performance evaluation through 10-fold CV, on an external validation dataset, and in a real setting show that our proposed system can efficiently be used to diagnose COVID-19 infected patients and to suggest precautionary measures (such as quarantine and RT-PCR test) avoiding the further surge of the infection. The key findings of the study are listed as follows: Showing COVIDX performance in a real setting used by an experienced radiologist.  Our proposed system performed significantly better in comparison to the state-of-the-art existing systems during the rigorous adopted evaluation criteria even in the presence of substantial variations in the input CXR images.  Our proposed system not only diagnose COVID-19 but also predict its severity.  We have made our proposed system accessible through a publically open cloud-based webserver and open-source code. All data generated or analyzed during this study are included in this paper or available at online repositories. A Python implementation of the proposed method together with a webserver is available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidx. WAA conceived the idea, developed the scientific workflow, performed the experiments, analyzed and interpreted the results, and was a major contributor in manuscript writing. SAA contributed to the analysis of the results and writing of the manuscript. SA helped in results interpretation and validation, formal analysis, and manuscript writing. All authors have read and approved the final manuscript. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network ISLAND: In-Silico Prediction of Proteins Binding Affinity Using Sequence Descriptors Issues in performance evaluation for host-pathogen protein interaction prediction Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks Automated Methods for Detection and Classification Pneumonia based on X-Ray Images Using Deep Learning A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking Random Forests Pneumonia Detection on Chest X-Ray Using Machine Learning Paradigm Analysis of quantum noise-reducing filters on chest X-ray images: A review Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble XGBoost: A Scalable Tree Boosting System First case of Coronavirus Disease 2019 (COVID-19) pneumonia in Taiwan Deep Learning with Depthwise Separable Convolutions Can AI help in screening Viral and COVID-19 pneumonia? COVID-19 Image Data Collection: Prospective Predictions Are the Future Coronavirus Disease 2019 Testing Basics. FDA. Coronavirus disease (COVID-19) -World Health Organization Support-Vector Networks Johns Hopkins Coronavirus Resource Center The Relationship Between Precision-Recall and ROC Curves Greedy function approximation: A gradient boosting machine Deep Residual Learning for Image Recognition Clinical features of patients infected with 2019 novel coronavirus in Wuhan Densely Connected Convolutional Networks Deep learning based detection and analysis of COVID-19 on chest X-ray images Identifying pneumonia in chest X-rays: A deep learning approach Coronavirus (COVID-19) Outbreak: What the Department of Radiology Should Know Object Recognition with Gradient-Based Learning Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning Protein-protein binding affinity prediction on a diverse set of structures Automated detection of COVID-19 cases using deep neural networks with X-ray images Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet Scikit-learn: Machine Learning in Python Learning to detect chest radiographs containing pulmonary lesions using visual attention networks Diagnostic techniques for COVID-19 and new developments Very Deep Convolutional Networks for Large-Scale Image Recognition Rethinking the Inception Architecture for Computer Vision Classification assessment methods. Applied Computing and Informatics aheadof-print COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images Localizing tuberculosis in chest radiographs with deep learning Recent advances in the detection of respiratory virus infection in humans Learning Transferable Architectures for Scalable Image Recognition The authors would like to acknowledge the services of all those who have provided compiled and annotated X-ray image dataset as open access repositories. The authors declare that they have no competing interests.