key: cord-0163210-jks6z7r6 authors: Zhang, Weihan; Pogorelsky, Bryan; Loveland, Mark; Wolf, Trevor title: Classification of COVID-19 X-ray Images Using a Combination of Deep and Handcrafted Features date: 2021-01-19 journal: nan DOI: nan sha: b8b65f22a1abab4fc4c66984b8c1b4e986f1d6e3 doc_id: 163210 cord_uid: jks6z7r6 Coronavirus Disease 2019 (COVID-19) demonstrated the need for accurate and fast diagnosis methods for emergent viral diseases. Soon after the emergence of COVID-19, medical practitioners used X-ray and computed tomography (CT) images of patients' lungs to detect COVID-19. Machine learning methods are capable of improving the identification accuracy of COVID-19 in X-ray and CT images, delivering near real-time results, while alleviating the burden on medical practitioners. In this work, we demonstrate the efficacy of a support vector machine (SVM) classifier, trained with a combination of deep convolutional and handcrafted features extracted from X-ray chest scans. We use this combination of features to discriminate between healthy, common pneumonia, and COVID-19 patients. The performance of the combined feature approach is compared with a standard convolutional neural network (CNN) and the SVM trained with handcrafted features. We find that combining the features in our novel framework improves the performance of the classification task compared to the independent application of convolutional and handcrafted features. Specifically, we achieve an accuracy of 0.988 in the classification task with our combined approach compared to 0.963 and 0.983 accuracy for the handcrafted features with SVM and CNN respectively. Coronavirus disease 2019 (COVID- 19) is an infectious disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Since the emergence in Wuhan, China in December 2019, it has spread worldwide and has caused a severe pandemic. The COVID-19 infection causes mild symptoms in the initial stage, but may lead to severe acute symptoms like multi-organ failure and systemic inflammatory response syndrome [1, 2] . As of December 2020, there have been more than 1.8 million COVID-19 related deaths around the world and daily new cases of the disease are still rising. Currently, reverse transcription polymerase chain reaction test (RT-PCR) is the most accurate diagnostic test. However, it requires specialized materials, equipment, personnel, and takes at least 24 hours to obtain a result. It may also require a second RT-PCR or a different test to confirm the diagnosis. Therefore, radiological imaging techniques like X-ray and CT-scan can serve as a complement to improve diagnosis accuracy [3] . In recent years, machine learning has been used extensively for automatic disease diagnosis in the healthcare sector [4, 5] . Various standard supervised learning algorithms such as logistic regression, random forests, and support vector machines (SVM) have been applied in detecting COVID-19 in X-ray and CT images of patients' lungs [6, 7, 8, 9] . The convolutional neural network (CNN) is a deep learning algorithm that can extract features from images through a combination of convolutional, pooling, and fully connected layers. It has been used extensively for image recognition, classification, and object detection. Recent works [10, 11, 12, 13, 14] show that it can also provide accurate results in detecting COVID-19 in images. These recent works present some insightful thoughts and valuable opinions. However, the lack of publicly available image databases and the limited amount of patient data are inevitable challenges for training a CNN. In this study, we propose a fusion model that classifies Xray images from a combination of handcrafted features and CNN deep features. The model is trained and tested on a large dataset with 1,143 COVID-19 cases, 2,000 normal cases and 2,000 other pneumonia cases collected from [15, 16] . The feature fusion classifier has been shown as an effective way of boosting the performance of CNN models in face recognition [17] and biomedical image classifications [18, 19] . Handcrafted and deep features extract different information from the same input image, so the fusion of these two systems has the potential to outperform the standard approaches [20] . Our key interest is whether a fusion model can also surpass the standard CNN and SVM for COVID-19 detection. The paper is organized as follows: The methodology and feature extraction techniques are presented in Section 2, the comparative classification performances are given in Section 3, and the final conclusions are made in Section 4. arXiv:2101.07866v2 [eess.IV] 21 Jan 2021 The proposed COVID-19 classifier is trained and tested on a collective dataset with 5,143 X-ray images categorized into three cases: COVID-19, Normal and Pneumonia. All the images are resized to 224 × 224 pixels and the local contrast is enhanced by an adaptive histogram equalization algorithm during the preprocessing stage. Several preprocessed example images are show in Figure 1 . Both handcrafted features and VGG16/ResNet50 deep features are extracted from the dataset, then combined and fed into an SVM classifier. The entire process is shown in Figure 2 . The features are computed by applying the following same 14 statistical measures on the outputs from the aforementioned six transformations: area, mean, standard deviation, skewness, kurtosis, energy, entropy, maximum, mean absolute deviation, median, minimum, range, root mean square, and uniformity as used in a COVID-19 image classifier that used handcrafted features only [21] . Of the aforementioned 14 measures, the following 10 are all calculated using the standard definitions: Mean, standard deviation, maximum, minimum, median, range, root mean square, skewness, mean absolute deviation, and kurtosis. Energy was calculated using the following definition: where p i is the i th value from the output vector of a transformation. Area here is defined as the sum of all of the compo-nents of the output vector. Entropy is calculated by first taking the frequency of each unique intensity via the numpy function unique() and then normalizing that vector. From there the entropy is directly calculated by taking the elementwise sum of that normalized vector times the base 2 log of itself. Uniformity is also calculated from this normalized vector. For clarity, the pseudo-code is reproduced below: Require: p (vector of output from a transformation) value, counts = unique(p, returncounts = T rue) counts = counts/( • GLCM The GLCM transform characterizes an image by creating a histogram of co-occurring greyscale values at a given offset and direction over an image [22] . In this specific implementation of GLCM, features are determined by applying the greycomatrix() function from the skimage library directly on each image with an offset of 1 and in four different directions (0, π/4, π/2, 3π/4). This function returns a 4-D array corresponding to each direction. Each dimension is evaluated on the 14 statistical measures as before, resulting in a total of 56 features. • GLDM GLDM is a method that characterizes an image by creating a distribution of the absolute differences of pixel intensity to the pixel intensity of surrounding pixels at a given distance and direction [23] . In this implementation, GLDM is computed in four directions (0, π/2, pi, 3π/2) with a distance of 10 pixels. • LBP LBP works by looking at points surrounding each pixel within a given distance and tests whether the points are greater than or less than the central point resulting in a binary output [25] . In this implementation, scikit-image's local binary pattern function is used to compute the LBP outputs with distances of 2,3,5, and 7. The resulting four LBPs are then used to compute the 14 statistical measures resulting in 56 features. Deep features are extracted from two CNN models, VGG16 [26] and ResNet50 [27] . More specifically, only the feature extraction layers of the model are utilized which are positioned prior to dense layers meant for the classification task. The model weights are pre-trained on the ImageNet dataset [28] which contains millions of images belonging to 1,000 classes. An important note is that no fine tuning is done to the models, meaning that the model weights are fixed and no further training is done. The VGG16 CNN architecture contains 16 layers with trainable weights (with 5 being dense layers that are not used for feature extraction) consisting of 5 blocks that include convolutional and pooling layers which can be seen in Figure 3 . The input of the model accepts RGB images of size 224 × 224 × 3 pixels. To maintain compatibility with the model, the grayscale the X-ray images are converted to to have three color channels by simply duplicating the pixel values and having each color channel be identical. Additionally, each image is zero-centered with respect to the ImageNet dataset without scaling. For each X-ray image the resulting feature output is of dimension 7 × 7 × 512 with subsequent flattening producing a vector containing 25,088 features. As opposed to CNN architectures such as VGG, ResNets can have more layer depth with increasing accuracy while at the same time having less overall complexity. This is achieved by utilizing shortcut connections allowing residual mapping that may skip one or more layers and performing identity mapping which can alleviate the problem of vanishing gradients. A residual block of this type is shown in Figure 4 . The ResNet50 model contains 50 layers with trainable weights (of which a single dense layer is not used for feature extraction). As with VGG16 the input of the model accepts RGB images of size 224 × 224 × 3 pixels. Again, the grayscale X-ray images are converted to duplicated three channel RGB before being zero-centered with respect to the ImageNet dataset without scaling. Each X-ray image results in a feature output of dimension 7 × 7 × 2048 and is flattened into a vector of 100,352 features. After features are extracted from the models, kernel principal component analysis (PCA) is applied to reduce the dimensionality of the deep features. The number of components after the transformation is selected to be 1,000 as this number of features is near the order of magnitude as the number of handcrafted features that are extracted. A linear SVM using one-vs-all approach is applied to classify the combined features. Despite the fact that most deep learning models employ the softmax activation function for classification task, it was shown that SVM works better on several standard datasets like MNIST,CIFAR-10,and the ICML 2013 Representation Learning Workshop's face expression recognition challenge [29] . To evaluate the performance of the method outlined above, it was important to compare the performance of combined deep features and handcrafted features in an SVM classifier with baseline individual CNNs in addition to solely using the handcrafted features in an SVM. Both the VGG and ResNet CNNs were evaluated again with the feature extraction layers frozen with pre-trained Im-ageNet weights. Two layers were added to the models, a 1,000 neuron dense layer with a rectified linear activation function and a three neuron output layer with a softmax activation function. The objective of the addition of the layers is to allow the classification of three classes to be possible in addition to increasing the number of trainable parameters as the feature extraction layers are frozen. Additionally, during training both models use categorical cross-entropy while employing the Adam optimizer [30] with a learning rate of 0.005. A parametric study was performed on the handcrafted features to evaluate which configuration created most accurate results as inputs into the SVM. Results in Table 1 show that by itself, the Wavelet features resulted in the highest classification accuracy followed by GLDM and GLCM. The lowest performing feature group was the texture features with an accuracy of 0.762. It was found that inputting all features (308) into the SVM resulted in the highest accuracy and F-1 Score. A 95% confidence interval is given for all values in Table 1 . For each classification model outlined above, the dataset of 5,143 was divided into the same train and test subsets with an 80/20 split. This resulted in 4,114 training images and 1,029 test images. The results of each classification model can be seen in Table 2 . All the metrics listed in the table are unweighted averages of the statistics of each class with a 95% confidence interval. From these results its is clear that all models that incorporate deep features clearly performed better than the SVM that only uses handcrafted features. The two models utilizing both deep features and handcrafted featured with an SVM classifier slightly outperform the conventional VGG16 and ResNet50 CNNs. Additionally, the confusion matrices of the combined deep features and handcrafted features SVM models are seen in Figure 5 . Both combined feature models achieve the same low false negative and false positive rates of 0.41% and 0.13% respectively. Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72 314 cases from the chinese center for disease control and prevention Clinical characteristics of covid-19 in new york city Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases Machine learning for detection and diagnosis of disease Deep learning for healthcare: review, opportunities and challenges The clinical and chest ct features associated with severe and critical covid-19 pneumonia Severity assessment of coronavirus disease 2019 (covid-19) using quantitative features from chest ct images Coronavirus (covid-19) classification using ct images by machine learning methods Detection of coronavirus disease (covid-19) based on deep features Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Automated methods for detection and classification pneumonia based on x-ray images using deep learning Automated detection of covid-19 cases using deep neural networks with x-ray images Covxnet: A multi-dilation convolutional neural network for automatic covid-19 and other pneumonia detection from chest x-ray images with transferable multi-receptive feature optimization Finding covid-19 from chest x-rays using deep learning on a small dataset Labeled optical coherence tomography (oct) and chest x-ray images for classification Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors Bioimage classification with handcrafted and learned features Classification of ct scan images of lungs using deep convolutional neural network with external shape-based features Handcrafted vs. non-handcrafted features for computer vision classification Covid-classifier: An automated machine learning model to assist in the diagnosis of covid-19 infection in chest x-ray images Textural features for image classification A theoretical comparison of texture algorithms Pywavelets: A python package for wavelet analysis A comparative study of texture measures with classification based on featured distributions Very deep convolutional networks for large-scale image recognition Deep residual learning for image recognition Imagenet large scale visual recognition challenge Deep learning using linear support vector machines Adam: A method for stochastic optimization