key: cord-0985875-mpnvjipy authors: Abbasi, Wajid Arshad; Abbas, Syed Ali; Andleeb, Saiqa; ul Islam, Ghafoor; Ajaz, Syeda Adin; Arshad, Kinza; Khalil, Sadia; Anjam, Asma; Ilyas, Kashif; Saleem, Mohsib; Chughtai, Jawad; Abbas, Ayesha title: COVIDC: An Expert System to Diagnose COVID-19 and Predict its Severity using Chest CT Scans: Application in Radiology date: 2021-02-23 journal: Inform Med Unlocked DOI: 10.1016/j.imu.2021.100540 sha: 518c030a0c90505774582f826a9b1a732a09bbc1 doc_id: 985875 cord_uid: mpnvjipy Early diagnosis of Coronavirus disease 2019 (COVID-19) is significantly important, especially in the absence or inadequate provision of a specific vaccine, to stop the surge of this lethal infection by advising quarantine. This diagnosis is challenging as most of the patients having COVID-19 infection stay asymptomatic while others showing symptoms are hard to distinguish from patients having different respiratory infections such as severe flu and Pneumonia. Due to cost and time-consuming wet-lab diagnostic tests for COVID-19, there is an utmost requirement for some alternate, non-invasive, rapid, and discounted automatic screening system. A chest CT scan can effectively be used as an alternative modality to detect and diagnose the COVID-19 infection. In this study, we present an automatic COVID-19 diagnostic and severity prediction system called COVIDC (COVID-19 detection using CT scans) that uses deep feature maps from the chest CT scans for this purpose. Our newly proposed system not only detects COVID-19 but also predicts its severity by using a two-phase classification approach (COVID vs non-COVID, and COVID-19 severity) with deep feature maps and different shallow supervised classification algorithms such as SVMs and random forest to handle data scarcity. We performed a stringent COVIDC performance evaluation not only through 10-fold cross-validation and an external validation dataset but also in a real setting under the supervision of an experienced radiologist. In all the evaluation settings, COVIDC outperformed all the existing state-of-the-art methods designed to detect COVID-19 with an F1 score of 0.94 on the validation dataset and justified its use to diagnose COVID-19 effectively in the real setting by classifying correctly 9 out of 10 COVID-19 CT scans. We made COVIDC openly accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidc. Coronavirus disease 2019 (COVID-19) is a contagious infection caused by a family of viruses called coronaviridea. Since its identification at the end of 2019 in Wuhan, a city in the Hubei Province of China, this viral disease has spread rapidly around the globe [1] . The World Health Organization (WHO) has already declared this pandemic as a global health calamity and it has infected and killed millions of people worldwide [2] . Physicians say this is a viral infection that mainly infects the human respiratory system and antibiotics do not cure this disease [3, 4] . Meanwhile, currently, there is an inadequate provision of an effective vaccine or medication to prevent or cure this lethal infection. In the absence or insufficient supply of effective medication or vaccines to cure or prevent COVID-19 infection, early diagnosis of the disease is fundamental to avoid its further surge by advising or adopting quarantine or isolation. Most of the patients having COVID-19 infection stay asymptomatic while others show mild to moderate symptoms such as fever or cough, shortness of breath, and signs such as oxygen saturation or lung auscultation [5, 6] . Signs and symptoms have the utmost importance to identify any infection and to perform further diagnostic tests. However, even symptomatic COVID-19 infection does not show any particular symptoms which distinguish this from other respiratory infections such as severe flu and Pneumonia [6] . In this situation, all the suspected patients require screening and have to pass through the adopted diagnostic tests. Currently, there are two types of tests to detect COVID-19: diagnostic tests and antibody tests [7] . A diagnostic test such as Reverse transcription-polymerase chain reaction (RT-PCR) can be used to detect an active coronavirus infection whereas, the antibody test looks for antibodies created in your body in the response to the disease. RT-PCR is currently the gold standard diagnostic practice to detect viral infection [8] . However, the standard confirmatory clinical RT-PCR test to detect COVID-19 is manual, complex, laborious, costly, and ineffective [9, 10] . Moreover, the limited availability of test kits and domain experts further hamper the situation. Meanwhile, antibody tests cannot be used to diagnose COVID-19 as after infection, antibodies can take numerous days or weeks to develop and may stay in your blood a bit longer after recovery [7] . Keeping in view the limitation of prevailing diagnostic techniques and a rapid surge of infected patients, there is an utmost requirement for some alternate automatic screening J o u r n a l P r e -p r o o f systems that can be used by the physicians to swiftly identify and isolate the COVID-19 infected patients. X-ray and Computed Tomography (CT) imaging modalities are the most common noninvasive diagnostic techniques that help medical practitioners to diagnose and treat many diseases. In the current framework, chest X-ray (CXR) and chest Computed Tomography (CCT) can also effectively be used to diagnose COVID-19 [11, 12] . However, CCT due to 3D image and contrast dyes shows a significantly improved performance in COVID-19 diagnosis in comparison to simple 2D CXR [9] . In the meanwhile, CT scans of COVID-19 infected patients show diverse features and manual interpretation of these scans with subtle variations is quite challenging [13] . Moreover, the current enormous upsurge of infected patients makes it a challenging task for the domain experts to complete a timely diagnosis [14, 15] . Therefore, some Computer-Aided Diagnostic (CAD) systems are required to better manipulate and understand the CCT images. In Computer-Aided Diagnostic (CAD) system design using CT scans, machine learning in the form of deep learning has successfully been applied in diagnosing lung diseases [16] [17] [18] [19] . Also to diagnose COVID-19 using CCT images, several machine learning-based methods with varying sources and amount of training data have been proposed in the literature (for details please see "Related Work" in the next section). Most of these previously proposed methods in the literature are based on Convolution Neural Network (CNN) deep learning approach [20] . However, for the deep learning approaches to perform effectively, a healthy amount of data is usually required for parameter tuning which is not readily available so far. Moreover, all the published studies focused only to diagnose COVID-19 whereas, its severity prediction still requires an efficient method. Therefore, to address the issues of data scarcity in deep learning and the absence of an J o u r n a l P r e -p r o o f efficient method for COVID-19 severity prediction, in this article we have proposed a method called COVIDC (COVID-19 detection using CT scans) to diagnose and predict the severity of the COVID-19 infection with CCT images. Our proposed method is different from previously proposed methods for the diagnosis of the COVID-19 and its severity prediction in that it uses a combination of transfer and shallow learning. Using this method, we can extract efficient feature representation of CT scans by using transfer learning with off-the-shelf pre-trained models on ImageNET and still be able to avoid the curse of dimensionality by engaging shallow learning algorithms such as Support Vector Machines (SVMs) [21] . This has led to improved accuracy in comparison to methods that rely only on the deep learning paradigm. Since COVID-19 identification at the end of 2019, several studies have been proposed in the literature to diagnose COVID-19 with chest computed tomography (CT)by using artificial intelligence (AI) techniques. We have searched existing techniques in the online literature using different keywords such as "Machine learning and COVID-19", "COVID-19 diagnosis with CT scans", and "COVID-19 diagnosis with CT scans and machine learning". While going through the online existing literature (peer-reviewed) published in reputed journals, we found a plethora of machine learning techniques to diagnose COVID-19 using chest CT scans with varying sources and amount of training data [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] . All these previously published techniques can be categorized into three main classes as follows: Deep learning-based, transfer learning with finetuning a customized fully connected layer, shallow learning with handcrafted textured features. Previously published studies employing deep learning-based machine learning techniques to detect COVID-19 mostly used Convolution Neural Network (CNN) [20] based architectures in J o u r n a l P r e -p r o o f their proposed design [23] [24] [25] 29, [31] [32] [33] [34] [35] [36] [37] . However, deep learning approaches to generalize well normally require an enormous amount of data which is not readily available right now [22] . Due to data scarcity issues in CNN-based deep learning, some studies have also been proposed using transfer learning in the literature [22, 28, 30] . Ahuja These previously published methods to detect COVID-19 using transfer learning still have room for improvement because in transfer learning, fine-tuning the customized fully connected layer requires a healthy amount of data due to high dimensional features space generated by the offthe-shelf pre-trained models. To solve data scarcity issues in deep learning, Kang et al. proposed a shallow learning technique for COVID-19 classification using different types of handcrafted features extracted from CT images [27] . They used 2522 CT images (1495 are from COVID-19 patients, and 1027 are from community-acquired pneumonia) for the classification purpose. The proposed method achieved an overall accuracy of 95.5%. However, even after learning multiple representations of CT scans, the method proposed by Kang et al. still has room for performance improvement because it is often hard to represent subtle variations in CT scans using handcrafted texture descriptors [39] . Moreover, almost all the above studies have been proposed only to diagnose COVID-19 but its efficient severity detection is still an open research question. Furthermore, to the best of our knowledge, all of the above-discussed studies only discussed the technical details of their proposed methodology without providing any easily accessible interface to end-users. To overcome these shortcomings and to deal with the issues of data scarcity in deep/transfer learning and handcrafted textured features, in this study we have proposed a novel COVID-19 diagnosis and its severity predictor called COVIDC (COVID-19 detection using CT scans) by employing a combination of transfer and shallow learning. In the following sections, we give the detail of our experimental procedure to design and test the generalization performance of COVIDC (COVID-19 diagnosis using CT images). In this study, we have collected three different datasets of Chest Computed Tomography (CCT) images from publically open online repositories [40] [41] [42] . For COVID-19 diagnosis, we have used two datasets: i) training set (1433 COVID and 1229 non-COVID CT images) [41] , ii) external validation set (349 COVID and 397 non-COVID) [42] . For COVID-19 severity prediction, we have used a dataset of 141 CT images of severe COVID-19 patients and 135 CT J o u r n a l P r e -p r o o f images of less-severe [40] . All the images in these datasets have gone through the essential preprocessing required for feature extraction including image resizing (313 × 313 pixels), denoising, and contrast stretching [43] . The methodology we have adopted for this study has been shown in Fig. 1 . We give the detail of our proposed methodology to design and develop CT image-based COVID-19 diagnosis and its severity predictor using machine learning as follows. Deep learning, where we learn automatically an efficient feature representation of an image, is very famous and has successfully been used in different image classification and analysis tasks. However, to apply deep learning efficiently, we normally require an extensive amount of data to better learn the actual distribution of the image samples. This data hunger is a challenge so far to apply the deep learning approach efficiently in our case to diagnose and predict the severity of COVID-19 where we face data scarcity. Therefore, to overcome this challenge, we have used the transfer learning strategy for feature extraction. Using transfer learning, feature extraction from the available CT images in our datasets has been performed using different off-the-shelf CNN based pre-trained models on ImageNet. For this purpose, we have used different off-the-shelf CNN based pre-trained models such as Resnet50 [44] , InceptionV3 [45] , Xception [46] , VGG16 [47] , NASNetLarge [48] , DenseNet121 [49] . We have used these models for feature extraction by simply loading a pre-trained model without its classifier part by specifying the "include_top" argument as "False". The selection of these pre-trained CNN based models was based on their J o u r n a l P r e -p r o o f reported accuracy. Preprocessing and resizing required by these pre-trained models have been applied before extracting the features map. Most of the machine learning studies involved in COVID-19 diagnosis have used pure deep learning strategies or transfer learning with a customized fully connected layer to be fine-tuned. However, to train deep neural networks from scratch or to fine-tune the parameters of the customized fully connected layers on the top of a pre-trained model, we need an extensive amount of data and high-performance computational resources, which is mostly impractical [50] . Therefore, we have proposed a different machine learning-based approach for the preliminary diagnosis of COVID-19 and its severity prediction using digital chest CT images. The novelty of the proposed approach is that it uses a blend of pre-trained CNN based models to extract features and shallow learning algorithms such as SVMs for the classification purpose. The proposed scheme is somehow based on the paradigm of transfer learning. Boosting Machine (XGBoost) [51] [52] [53] . The detail of these shallow machine learning algorithms is as follows. J o u r n a l P r e -p r o o f We used SVM for the diagnosis of COVID-19 and its severity prediction by learning a function ( ) = 〈 , 〉 with as parameters to be learned from the training data {( , )| = 1, 2, … , }. The optimal value of the is obtained in SVM by solving the following optimization problem [52] . Random Forest Classification (RFC) uses an ensemble learning technique based on bagging. A random forest operates by constructing several decision trees in parallel during training and outputs the mean of the classes as the prediction of all trees [51] . It usually performs better on problems having features with non-linear relationships. Each classification tree in the RF is constructed on randomly sampled subsets of input features. In this study, we have optimized RF for the number of decision trees in the forest, the maximum number of features considered for splitting a node, the maximum number of levels in each decision tree, and a minimum number of J o u r n a l P r e -p r o o f samples required to split. We have also seen this classification technique effectively in action in many other studies [55] [56] [57] [58] . XGBoost is also an ensemble learning technique based on the boosting that combines weak learners into a strong learner in an iterative fashion [53, 59] . We have used trees as default base learners in XGBoost ensembles. In this study, we have optimized XGBoost in terms of the learning rate, maximum depth, the number of boosting iterations, booster, and subsample ratio with a grid search and using a python-based package xgboost 0.7 [59] . We have developed and deployed a webserver of COVIDC which uses the optimal machine learning model for COVID-19 diagnosis and its severity prediction. This webserver takes a chest CT image and performs COVID-19 diagnosis and its severity prediction for it. After the successful submission of a CT image, users get COVIDC predictions in the form of COVID-19 identification and its severity prediction. This process has broken into two steps: 1) whether the uploaded image represents a COVID-19 patient or not (COVID vs non-COVID), 2) if it is a COVID-19 infection then how severe it is. The webserver is available at https://sites.google.com/view/wajidarshad/software. We have used two different datasets for COVID-19 diagnosis: train-test set (1433 COVID and also on an external validation dataset is given as follows. We have trained various shallow machine learning models for the classification of COVID Along with COVID-19 diagnosis using CT images, we have also tried to predict the severity of The real generalization performance of our optimal trained models (COVIDC) has been evaluated under the supervision of an experienced radiologist at Abbas Institute of Medical Sciences (AIMS) hospital located in Muzaffarabad, Azad Jammu & Kashmir, Pakistan. For this purpose, 20 CT scans for COVID-19 diagnosis (10 COVID and 10 non-COVID) and 20 CT scans for severity (10 severe and 10 milds) have been used. These scans were used as novel input CT scans to our proposed method called COVIDC in raw digital form as these were not included in the training set. Results obtained through this evaluation are shown as confusion matrices in Fig. 3 . For COVID vs non-COVID, our proposed system (COVIDC) has been able to classify correctly 9 out of 10 provided COVID CT images as COVID and 1 as non-COVID (Fig. 3(A) ). Similarly, for the provided non-COVID CT images, our system classified correctly 8 out of 10 images as non-COVID, and 2 as COVID (Fig. 3(A) ). This performance is reasonably good as we have less misclassification rate for COVID and this is ideally required to isolate the infected patients. These results help us to interpret that this system can reliably be used to advise isolation or quartine while suspecting COVID-19 infection. Over severe vs mild COVID-19 classification task, our proposed system (COVIDC) has been able to classify correctly10 out of 10 provided COVID CT images as severe (Fig. 3(B) ). Similarly, for the provided mild COVID CT images, our system classified correctly 9 out of 10 images as mild, and 1 as severe (Fig. 3(B) ). This performance of our proposed system in severity prediction shows that this system can reliably be used to advise intensive care during COVID-19 infection. Overall these results justify the use of our proposed system in real settings. In the absence or inadequate provision of specific drugs or vaccines for the treatment or prevention of COVID-19 its quick and automatic diagnosis and severity prediction is crucial. After [22, 27] and in a real setting suggests that the proposed method can effectively be used to diagnose COVID-19 and to predict its severity. In this study, we have proposed a system called COVIDC for the preliminary diagnosis and severity prediction of COVID-19 infection using chest CT scans. The stringent performance evaluation through 10-fold CV, on an external validation dataset, and in a use under real setting shows that our proposed system can effectively be used not only to diagnose COVID-19 infection but also its severity prediction. The use of our proposed system can help to reduce the surge of COVID-19 by advising timely isolation and further diagnostic tests such as RT-PCR to the suspected patients. This system can also aid in avoiding massive casualties by deciding on intensive care in case of severe COVID-19 infection. The key findings of the study are listed as follows: • Our proposed system performed significantly better in comparison to the state-of-the-art existing systems during the rigorous adopted evaluation criteria even in the presence of substantial variations in the used CT images. J o u r n a l P r e -p r o o f • Our proposed system not only diagnose COVID-19 but also predict its severity. • We have made our proposed system accessible through a publically open cloud-based webserver and open-source code. diagnosis and its severity prediction system (COVIDC) using Machine learning and chest CT images. This system has been trained using shallow learning algorithms such as SVMs with chest CT scans by extracting feature maps involving pre-trained off-the-shelf models. COVIDC can be used to predict whether a novel test CT image has COVID-19 or not. General's remarks at the media briefing on COVID-19 2020 An interactive web-based dashboard to track COVID-19 in real time First case of Coronavirus Disease 2019 (COVID-19) pneumonia in Taiwan Clinical features of patients infected with 2019 novel coronavirus in Wuhan COVID-19 Outbreak: Application of Multi-gene Genetic Programming to Country-based Prediction Models Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 disease Coronavirus Disease 2019 Testing Basics Diagnostic techniques for COVID-19 and new developments Diagnostic accuracy of X-ray versus CT in COVID-19: a propensity-matched database study Can AI help in screening Viral and COVID-19 pneumonia? Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases Chest CT for Typical Coronavirus Disease 2019 (COVID-19) Pneumonia: Relationship to Negative RT-PCR Testing Chest CT in COVID-19: What the Radiologist Needs to Know Automated Methods for Detection and Classification Pneumonia based on X-Ray Images Using Deep Learning Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble Unravelling Machine Learning -Insights in Respiratory Medicine Imaging research in fibrotic lung disease; applying deep learning to unsolved problems Deep learning for classifying fibrotic lung disease on high-resolution computed tomography: a case-cohort study Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation Object Recognition with Gradient-Based Learning Implications of the curse of dimensionality for supervised learning classifier systems: theoretical and empirical analyses Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices Artificial Intelligence Augmentation of Radiologist Performance in Distinguishing COVID-19 from Pneumonia of Other Origin at Chest CT Accurate Screening of COVID-19 Using Attention-Based Deep 3D Multiple Instance Learning Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets Deep learning analysis provides accurate COVID-19 diagnosis on chest computed tomography Diagnosis of Coronavirus Disease 2019 (COVID-19) With Structured Latent Multi-View Representation Learning COVID-19 Pneumonia Diagnosis Using a Simple 2D Deep Learning Framework With a Single Chest CT Image: Model Development and Validation Identifying COVID19 from Chest CT Images: A Deep Convolutional Neural Networks Based Approach Deep Transfer Learning Based Classification Model for COVID-19 Disease Lung-Sys: A Deep Learning System for Multi-Class Lung Pneumonia Screening From CT Imaging Diagnosis of COVID-19 using CT scan images and deep learning techniques COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks A Fully Automatic Deep Learning System for COVID-19 Diagnostic and Prognostic Analysis A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT A Deep Learning System to Screen Novel Coronavirus Disease Uzun Ozsahin D. Review on Diagnosis of COVID-19 from Chest CT Images Using Artificial Intelligence Comparison of Handcrafted Features and Deep Learning in Classification of Medical X-ray Images COVID-19 Image Data Collection: Prospective Predictions Are the Future SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification A CT Scan Dataset about COVID-19 Analysis of quantum noise-reducing filters on chest X-ray images: A review Deep Residual Learning for Image Recognition Rethinking the Inception Architecture for Computer Vision Deep Learning with Depthwise Separable Convolutions Very Deep Convolutional Networks for Large-Scale Image Recognition Learning Transferable Architectures for Scalable Image Recognition Densely Connected Convolutional Networks. ArXiv:160806993 [Cs An efficient mixture of deep and machine learning models for COVID-19 diagnosis in chest X-ray images Random Forests Support-Vector Networks Greedy function approximation: A gradient boosting machine Scikit-learn: Machine Learning in Python ISLAND: in-silico proteins binding affinity prediction using sequence information A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study Protein-protein binding affinity prediction on a diverse set of structures • Our proposed system performed significantly better in comparison to the state-of-theart existing systems during the adopted stringent evaluation criterion even in the presence of substantial variations in the input CT scans. • Our proposed system not only diagnose COVID-19 but also predict its severity The authors would like to acknowledge the services of all those who have provided CT scans data as open access repositories. We also thank the reviewers and the editor for their valuable feedback and suggestions to improve the presentation of this work. The authors would like to acknowledge the services of all those who have provided CT scans data as open access repositories. We also thank the reviewers and the editor for their valuable feedback and suggestions to improve the presentation of this work.J o u r n a l P r e -p r o o f All data generated or analyzed during this study are included in this paper or available at online repositories. A Python implementation of the proposed method together with a webserver is available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/covidc. WAA conceived the idea, developed the scientific workflow, performed the experiments, analyzed and interpreted the results, and was a major contributor in manuscript writing. SAA The authors declare that they have no competing interests.J o u r n a l P r e -p r o o f The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.