key: cord-0965880-1xq2lk41
authors: Zargari Khuzani, Abolfazl; Heidari, Morteza; Shariati, S. Ali
title: COVID-Classifier: an automated machine learning model to assist in the diagnosis of COVID-19 infection in chest X-ray images
date: 2021-05-10
journal: Sci Rep
DOI: 10.1038/s41598-021-88807-2
sha: d3567188bce7ff7bd60c82036aac2faf80ebef77
doc_id: 965880
cord_uid: 1xq2lk41

Chest-X ray (CXR) radiography can be used as a first-line triage process for non-COVID-19 patients with pneumonia. However, the similarity between features of CXR images of COVID-19 and pneumonia caused by other infections makes the differential diagnosis by radiologists challenging. We hypothesized that machine learning-based classifiers can reliably distinguish the CXR images of COVID-19 patients from other forms of pneumonia. We used a dimensionality reduction method to generate a set of optimal features of CXR images to build an efficient machine learning classifier that can distinguish COVID-19 cases from non-COVID-19 cases with high accuracy and sensitivity. By using global features of the whole CXR images, we successfully implemented our classifier using a relatively small dataset of CXR images. We propose that our COVID-Classifier can be used in conjunction with other tests for optimal allocation of hospital resources by rapid triage of non-COVID-19 cases.

www.nature.com/scientificreports/ non-COVID-19 cases. A distinct feature of our model is the identification and extraction of features from the whole CXR image without any segmentation process on chest lesions. This new quantitative marker not only enables us to avoid segmentation errors but also reduces the computational cost of our final model. Our study provides strong proof of concept that simple ML-based classification can be efficiently implemented as an adjunct to other tests to facilitate differential diagnosis of CXR images of COVID-19 patients. More broadly, we think that our approach can be easily implemented in any future viral outbreak for the rapid classification of CXR images.

Generation of synthetic features. Identification of optimal features of the CXR images can decrease the feature space of ML models by generating key correlated synthetic features and removing less important features. These synthetic features perform more reliably in classification tasks while reducing the size of the ML models. Importantly, a more robust ML classifier can be generated by decreasing the ratio between the number of image features and the number of training data cases per class. We initially extracted 252 features from the whole CXR image without involving lesion segmentation ( Fig. 1A and Supplementary Figure 1 ) to finally generate a feature pool from 420 CXR images (Fig. 1B) . We hypothesized that we can use a feature analysis scheme to find an optimal number of features and reduce the size of the feature space. Figure 1C shows the pairwise feature association by Pearson correlation coefficients matrix obtained from 252 features. An analysis of the initial feature pool's histograms reveals that more than 73% of features have correlation coefficients of less than 0.4 ( Fig. 1D) , confirming a comprehensive view of the cases with relatively small redundancy. We used Kernel-Principal Component Analysis (PCA) method to decrease the size of the feature space to an optimal number of synthetic features composed of correlated features. By employing PCA, we converted the original pool of 252 features to 64 new synthetic features resulting in a ~ 4 × smaller feature space. We used this 64-element feature vector in the final classification process.

Classification performance. To design our classifier, we grouped our CXR images into three target classes, each containing 140 images; normal, COVID-19, non-COVID-19 pneumonia (Supplementary Figure 2) . We trained a multi-layer neural network, including one output classifier layer and two hidden layers, aiming to classify CXR images into three target groups (Fig. 2) . After 33 epochs of the training process, both training and validation loss scores reached ~ 0.22, corresponding the accuracy of 94% (Fig. 3A) . The loss graph showed a good fit between validation and training curves, confirming that our model is not suffering from overfitting or underfitting. We would like to note that our model has ~ 10,000 parameters that are considerably smaller than typical image classification models such as AlexNET with 60 million parameters 16 , VGG-16 with 138 million 17 , GoogleNet-V1 with 5 million 18 , and ResNet-50 with 25 million parameters 19 . Next, we generated a receiver operating characteristic (ROC) curve and computed the area under the ROC (AUC) to further evaluate the performance of our model (Fig. 3B) . A comparison of CXR images of COVID-19 cases with non-COVID-19 showed that our model has100% sensitivity and 96% precision when evaluated on a test set of 84 CXR images ( Fig. 3C and Table 1 ). Moreover, our synthetic feature classifier outperforms any single feature classifier as measured by AUC (Fig. 3D ). It is noteworthy that single synthetic features as the primary fast and low computational cost classifier can be accurate up to ~ 90% (Supplementary Figure 3) . www.nature.com/scientificreports/

In this study, we proposed an efficient machine-learning classifier that accurately distinguished COVID-19 CXR images from normal cases and pneumonia caused by other viruses. Among different imaging modalities [20] [21] [22] , X-ray is still the fastest and prevalent screening tool for detecting lung diseases and infections. However, there are some suspicious lung infection masses in x-ray images, which may result in misdiagnosis. Thus, a new approach to assist in automated lung screening analysis and facilitate the classification of different types of lung diseases www.nature.com/scientificreports/ is crucial. Our work shows that this is possible with relatively straightforward machine learning classifiers. Our proposed machine learning approach has the following distinctive characteristics: First, by deriving the global image features from the entire chest area, we avoided the lesion segmentation complexities and errors. In addition, we confirmed that the diagnostic information can be distributed on the entire chest area of the X-ray image, not only in the lesion area.

Second, in the feature extraction scheme, we focused on features obtained from both the spatial domain (Texture, GLDM, GLCM) and frequency domain (Wavelet and FFT), unlike many previous machine learning models analyzing only the texture-based features in the spatial domain. In addition, using the two-class classification results shown in Supplementary Figure 3 (second row) , we showed that if we, in an experiment, aim at distinguishing COVID-19 cases from other categories, the discrimination power and performance of features obtained from the frequency domain (FFT group) are more effective than features extracted from the spatial domain. The average AUC of the FFT group is around 0.71, showing the significance of acquiring such frequency domain features compared to other groups with an average AUC value of less than 0.63. Furthermore, the examination of every single feature in this experiment revealed that all top seven features belonged to the FFT category with an AUC value higher than or equal to 0.77, which may indicate that those frequency domain features were more relevant to the detection of COVID-19 cases.

Third, we investigated the influences of applying a dimensionality reduction method to obtain optimal and more correlated features. Interestingly, the results demonstrated that our dimensionality reduction method, in addition to reducing the dimension of feature space, is able to identify the new smaller feature fusion with more correlated information and a lower amount of redundancy. Besides, decreasing the ratio of the number of features to the number of cases per class will improve the reliability and robustness of the ML classifier while decreasing the risk of overfitting. Therefore, we could successfully classify CXR images using a relatively small image dataset of 420 cases. Typically, this is not possible with conventional deep learning models as they need a large dataset.

Although we obtained promising results, there are a few limitations in this study. First, our CXR dataset has a relatively small size. A larger dataset consisting of the cases from different institutions would be useful to more verify our proposed model's robustness and reliability. Also, in our future work, we will investigate different feature selection and feature reduction methods such as DNE 23 , Relief 24 , LPP 5 , Fast-ICA 25 , recursive feature elimination 26 , variable ranking techniques 27 , or merging them with our feature reduction approach. Besides, although the neural network-based classifier utilized in this investigation can solve our complicated problem efficiently, it might be useful to explore other efficient and prevalent classifiers such as SVM 28 , GLM 29 , Random Forest 30 .

Dataset and code (GitHub page). Our Python scripts and dataset are available for download on our GitHub page https:// github. com/ abzar gar/ COVID-Class ifier. git.

This resource is fully open-source, providing users with Python codes used in preparing image datasets, feature extraction, feature evaluation, training the ML model, and evaluation of the trained ML model. We used a dataset, which is collected from two resources of 31, 32 . Our collected dataset included 420 2-D X-ray images in the Posteroanterior (P.A.) chest view, classified by valid tests to three predefined categories of Normal (140 images), pneumonia (140 images), and COVID-19 (140 images). We set all image sizes to 512 × 512 pixels. Supplementary Figure 2 shows three example images.

In the scheme that we employed in the feature extraction part (Fig. 1A and Supplementary Figure 1 ), a total of 252 spatial and frequency -domain features were computed and categorized into five groups of (1) Texture 33 , (2) Gray Level Difference Method (GLDM) 11 (3), Gray-Level Co-Occurrence Matrix (GLCM) 34 , (4) Fast Fourier Transform (FFT) 35 , and (5) Wavelet Transforms (WT) 36 . Wavelet transforms were decomposed in eight sub-bands. GLDM and GLCM coefficients were also computed in four directions. As illustrated in Supplementary Figure 1 Figure 3A shows the AUC values of single features based on their AUC values in sorted order (highest to lowest) and considering three positive class labels. We used the AUC value as an index to compare the classification power of every single feature. As seen in all three AUC graphs, most of the features reported AUC values of higher than 0.6, where Figure 3B also compares the performance of five groups of features based on their average AUC values showing there is no significant difference between them, particularly where the positive label is pneumonia. Given COVID is the target class, the FFT group recorded the best performance, while the best group for the Normal class is GLDM.

Model training and test process. A schematic diagram of our model training and test processes is shown in Supplementary Figure 4 . We randomly split the original image dataset into a training set (80%) and a test set (20%). The train-test split is a technique used to evaluate supervised machine learning algorithms' performance where we have the inputs and desired output labels. The machine-learning algorithm uses the training set to make the model learn the patterns in the input by minimizing the error between predictions and target outputs. The test set is then used to evaluate the trained model's performance. Without providing a large enough training dataset, the model cannot generalize the knowledge from the training set to the test set, leading to low predictive accuracy in the test phase for unseen cases, as shown in Supplementary Figure 5 .

We chose Adam optimizer to optimize model weights and minimize the categorical cross-entropy loss function. The learning algorithm hyperparameters were set as follows: MaxEpochs = 100, BatchSize = 2, Learnin-gRate = 0.001, ValidationRatio = 0.2, TestRatio = 0.2, TrainRatio = 0.6, and DropoutValue = 0.2. We also used the Early Stopping technique to stop training when the validation score stops improving, aiming to avoid learning algorithm from overfitting. The run-time of different parts of our proposed machine learning scheme, listed in Table 2 , indicates that our model needed a short time of 15.4 s to learn training set and 2.03 s to predict one test sample. Table 2 . Run-time analysis on the local system with the CPU of Intel Core i7-8750H 2.2 GHz and GPU of RTX2080 Max-Q.

One single predict phase Feature extraction (Fig. 1A) Feature reduction (Fig. 1B) Classifier (Fig. 2 

Deep learning COVID-19 features on CXR using limited training data sets

Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound

A Fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis

World Health Organization. Chest Radiography in Tuberculosis Detection. (World Health Organization

CT imaging and differential diagnosis of COVID-19

Frequency and distribution of chest radiographic findings in COVID-19 positive patients

Classification of tumor epithelium and stroma by exploiting image features learned by deep convolutional neural networks

Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

Development and assessment of a new global mammographic image feature analysis scheme to predict likelihood of malignant cases

Transfer learning improves supervised image segmentation across imaging protocols

Prediction of chemotherapy response in ovarian cancer patients using a new clustered quantitative image marker

Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database (Oxford) 2020, baaa010

Artificial intelligence and machine learning in clinical development: A translational perspective

High-order feature learning for multi-atlas based label fusion: Application to brain segmentation with MRI

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography

ImageNet classification with deep convolutional neural networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going Deeper with Convolutions

Deep Residual Learning for Image Recognition

Molecular imaging of pulmonary diseases

X-Ray chest image classification by a small-sized convolutional neural network

Diagnosis of pulmonary embolism with various imaging modalities

Discriminant neighborhood embedding for classification. Pattern Recogn

Relief-based feature selection: Introduction and review

An approach for data mining of power quality indices based on fast-ICA algorithm

Enhanced recursive feature elimination

Combining multiple feature-ranking techniques and clustering of variables for feature selection

Effective sequential classifier training for SVM-based multitemporal remote sensing image classification

Comparison of logistic regression and linear regression in modeling percentage data

Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping

Labeled optical coherence tomography (OCT) and chest X-ray images for classification. Mendeley Data, v2

COVID-19 Image Data Collection

Applying quantitative CT image feature analysis to predict response of ovarian cancer patients to chemotherapy

Novel application of the gray-level co-occurrence matrix analysis in the parvalbumin stained hippocampal gyrus dentatus in distinct rat models of Parkinson's disease

Improvement in computation of Δ V10 Flicker severity index using intelligent methods

This work was supported by the NIGMS/NIH through a Pathway to Independence Award K99GM126027 (S.A.S.) and start-up package of the University of California, Santa Cruz.

A.Z.K. and S.A.S. designed the project and wrote the manuscript. A.Z.K. wrote the classifier and implemented the machine learning code. M.H. collected the dataset and wrote the image preprocessing code. This work was supported by the NIGMS/NIH through a Pathway to Independence Award K99GM126027, NIH(NIGMS) (S.A.S.), and a start-up package of the University of California, Santa Cruz.

The authors declare no competing interests.

The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-021-88807-2.Correspondence and requests for materials should be addressed to S.A.S.

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.