key: cord-0825210-ihcl41ls authors: Perumal, Varalakshmi; Narayanan, Vasumathi; Rajasekar, Sakthi Jaya Sundar title: Detection of COVID-19 using CXR and CT images using Transfer Learning and Haralick features date: 2020-08-12 journal: Appl Intell (Dordr) DOI: 10.1007/s10489-020-01831-z sha: f99872424232be738d1f3978fccd27f21f30b97b doc_id: 825210 cord_uid: ihcl41ls Recognition of COVID-19 is a challenging task which consistently requires taking a gander at clinical images of patients. In this paper, the transfer learning technique has been applied to clinical images of different types of pulmonary diseases, including COVID-19. It is found that COVID-19 is very much similar to pneumonia lung disease. Further findings are made to identify the type of pneumonia similar to COVID-19. Transfer Learning makes it possible for us to find out that viral pneumonia is same as COVID-19. This shows the knowledge gained by model trained for detecting viral pneumonia can be transferred for identifying COVID-19. Transfer Learning shows significant difference in results when compared with the outcome from conventional classifications. It is obvious that we need not create separate model for classifying COVID-19 as done by conventional classifications. This makes the herculean work easier by using existing model for determining COVID-19. Second, it is difficult to detect the abnormal features from images due to the noise impedance from lesions and tissues. For this reason, texture feature extraction is accomplished using Haralick features which focus only on the area of interest to detect COVID-19 using statistical analyses. Hence, there is a need to propose a model to predict the COVID-19 cases at the earliest possible to control the spread of disease. We propose a transfer learning model to quicken the prediction process and assist the medical professionals. The proposed model outperforms the other existing models. This makes the time-consuming process easier and faster for radiologists and this reduces the spread of virus and save lives. The CORONA Virus Disease (COVID-19) is a pulmonary disease brought about by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). This pandemic has imposed tremendous loss to life so far. World Health Organization (WHO) is persistently observing and publishing all reports of this disease outbreak in various countries. This COVID-19 is a respiratory ailment and to a greater extent spread through droplets in air. The infection is transmitted predominantly via close contact and by means of respiratory droplets delivered when an individual coughs or sneezes. The symptoms of this virus are coughing, difficulty in breathing and fever. Trouble in breathing is an indication of conceivable pneumonia and requires prompt clinical consideration. No antibody or explicit treatment for COVID-19 contamination is available. Emergency clinics provide isolation wards to infected individuals. It is the most common when individuals are symptomatic, yet spread might be conceivable before symptoms appear. The infection can sustain on surfaces for 72 hours. Symptoms of COVID-19 start to appear somewhere in between the range of 2 to 14 days, with a mean of 7 days. The standard technique for analysis is by real time Reverse Transcirption Polymerase Chain Reaction (RT-PCR) performed on a nasopharyngeal swab sample. The same disease can likewise be analyzed from a mix of manifestations, risk factors and a chest CT demonstrating highlights of pneumonia. Many countries are unable to limit the wide spread of COVID-19 quickly due to insufficient medical kits. Many researches are carried out across the globe to handle the pandemic scenario. Many deep leaning models are proposed to predict the COVID-19 symptoms at the earliest to control the spread. We propose a transfer learning model over the deep learning model to further quicken the prediction process. The literature survey of the proposed work is explained in Section 2. Proposed model is explored in Section 3. Experiment and results are discussed in Section 4. Discussion and conclusion are presented in Section 5 and Section 6 respectively. To limit the transmission of COVID-19 [11] , screening large number of suspicious cases is needed, followed by proper medication and quarantine. RT-PCR testing is considered to be the gold standard grade of testing, yet with significant false negative outcomes. Efficient and rapid analytical techniques are earnestly anticipated to fight the disease. In view of COVID-19 radio graphical differences in CT images, we propose to develop a deep learning model that would mine the unique features of COVID-19 to give a clinical determination in front of the pathological test, thereby sparing crucial time for sickness control. Understanding the basic idea of COVID-19 and its subtypes, varieties might be a continual test, and the same should be made and shared all over the world. Baidu Research [14] discharge its LinearFold calculation and administrations, which can be utilized for whole genome optional structure forecasts on COVID-19, and is evidently multiple times quicker than different calculations. Pathological discoveries of COVID-19 related with intense respiratory pain disorder are exhibited in [4] . In [29] , Investigations included: 1) rundown of patient attributes; 2) assessment of age appropriations 3) computation of case casualty and death rates; 4) geo-temporal examination of viral spread; 5) epidemiological bend development; and 6) subgroup Another [6] to examine the degree of COVID-19 disease in chosen populace, as controlled through positive neutralizer tests in everyone has been created. Chen H [5] proposed that limited information is accessible for pregnant ladies with COVID-19 virus. This examination meant to assess the clinical attributes of COVID-19 in women during pregnancy period. Shan F [19] suggested that CT screening is vital for conclusion, appraisal and organizing COVID-19 disease. Follow-up checks of each 3 to 5 days regularly are suggested for illness movement. It is concluded that peripheral and bilateral Ground Glass Opacification (GGO) with consolidation will be more prevalent in COVID-19 infected patients. Gozes [8] created AI-based computerized CT image examination instruments for location, measurement, and following of Coronavirus. Wang S [23] proposed a deep learning system that uses CT scan images to detect COVID-19 disease. Wang Y [27] presented a critical side effect in COVID-19 as Tachypnea, increasing rapid respiration. The examination can be used to recognize different respiratory conditions and the gadget can be used handy. Rajpurkar P [17] presented the algorithm that detected pneumonia using cheXNet system which is more accurate. Xu X [28] exhibited that the traditional method used for identifying COVID-19 has low positive rate during initial stages. He developed a model which does the early screening of CT images. Yoon SH [31] found that the COVID-19 pneumonia that affected people in Korea exhibits similar characteristics as people living in China. They are found to have same characteristics when analyzed for patients. Singh R [22] evaluated the impact of social distancing on this pandemic with taking age structure into account. The papers provide a solution for this pandemic caused by novel coronavirus. Many researches are also carried out using Image processing techniques on COVID-19. Recently Subhankar Roy [2] proposed the deep learning technique for classification of Lung Ultrasonography (LUS) images. Classification of COVID-19 was carried out with CT Chest images with help of evolution-based CNN network in [18] . In study [3] , Harrison X. Bai proposed that main feature for discriminating COVID-19 from Viral pneumonia is peripheral distribution, ground-glass opacity and vascular thickening. This study can distinguish COVID-19 from Viral pneumonia with high specificity using CT scan images and CT scan features. The same study [3] also suggested that both COVID-19 affected patient and Viral pneumonia affected patients develop central+peripheral distribution , air bronchogram , pleural thickening, pleural effusion and lymphadenopathy with no significant differences. General similarities between two viruses is that both cause respiratory disease, which can be asymptomatic or mild but can also cause severe disease and death. Second, both viruses are transmitted by contact or droplets. These predominant similarities with pneumonia viruses (influenza and SARS-CoV-2) urged us to proceed with the proposed work. The primary limitations that are analyzed so far are as follows: 1. Use of CT and CXR images: The COVID-19 virus attacks the cells in the respiratory tracts and predominantly the lung tissues; we can use the images of Thorax to detect the virus without any test-kit. It is inferred that the chest X-Ray are of little incentive in initial stages, though CT scans of the chest are valuable even before the symptoms appear. 2. Less testing period: The test taken to identify the COVID-19 is not fast enough. Especially during initial stages of the virus development, it is very hard to assess the patients. The manual analysis of CXR and CT scans of many patients by radiologists requires tremendous time. So we need the automated system which can save radiologist's valuable time. 3 . Reusing the existing model: In this paper, a novel existing system can be reused for identifying the COVID-19 using CT scan and Chest X-Ray images. This can precisely detect the abnormal features that are identified in the images. To resolve the limitations stated above, the paper has been proposed using transfer learning model. In this section, the proposed transfer learning model along with Haralick features used in the model are discussed. In this proposed work, a novel system has been presented to identify COVID-19 infection using deep learning techniques. First, transfer learning has been performed to images from NIH [21, 26, 30 ] Chest X-Ray-14 dataset to identify which disease has maximal similarity to COVID-19. It is found that pneumonia is much similar to COVID-19. The Fig. 1 represents images of the Chest X-Ray for various types of lung conditions. Transfer learning [9, 24] is a method where the knowledge gained by a model (Viral pneumonia detection model) for a given data is transferred to evaluate another problem of similar task (COVID-19 detection model). In transfer learning, a initial training is carried out using a large amount of dataset to perform classification. The architecture of the proposed transfer learning model is delineated in Fig. 2 . The CXR and CT images of various lung diseases including COVID-19, are fed to the model. First, the images are preprocessed to get quality images. The histogram equalization and Weiner filters are applied to increase the contrast and remove the noise respectively as image enhancement techniques to increase the quality of the images. Histogram equalization provides enhanced better quality image without loss of information. Weiner filter is used to determine target process by filtering the noisy regions in an image. The deconvolution is performed to the blurred images by minimizing the Mean Square Error(MSE). The area of interest is chosen using ITK-SNAP software.The image resizing is achieved using python PIL automatically. The images after applying image enhancement techniques are presented in Fig. 3 . Haralick texture features are obtained from the enhanced images and these modified images are then fed into various pre-defined CNN models. The haralick features are discussed in the next following section. In various pre-defined CNN models like Resnet50, VGG16 and InceptionV3, the convolutional layers are used to extract the image features and max pooling layers are used to down sample the images for dimensionality reduction, intermediate results are shown in Fig. 4 . The regularization is done by the dropout layers which speedup the execution by expelling the neurons whose contribution to the yield is not so high. The values of weights and bias are initialized randomly. The image size is chosen as 226x226. Adam optimizer takes care of updating the weights and bias. The sample images are trained in batches of size 250. Early stopping is employed to avoid over-fitting of data. The five different CNN models are also built with different configurations to analyze the various results as shown in Fig. 5 . The stride value and dilation is chosen to be 1 which is the default value. Since these models perform one class classification (i.e.) either the sample will belong to that class or not, this is same as binary classification. So sigmoid function is used as the activation function in the fully connected layer as mentioned in Eq (1). For the convolution layer and max-pooling layers ReLu function is utilized to activate the neurons and it is defined in Eq (2). Each model is trained with dropout of 0.2 or 0.3. The transfer learning model is applied to predict the COVID-19 images instead of developing a new deep learning model from the scratch, since it takes more training time. The different pre-trained models and different CNN configured models are trained and tested with different lung disease images, the one, VGG16, had given a lesser misclassification with viral pneumonia, would be taken for prediction of COVID-19 cases by the proposed transfer learning model. The Haralick features [16] are extracted from images that are resized as mentioned in the Fig. 2 . Haralick features very well describe the relation between intensities of pixels that are adjacent to each other. Haralick presented fourteen metrics for textual features which are obtained from Co-occurrence matrix. It provides the information about how the intensity in a particular pixel with a position is related to neighbouring pixel. The Gray-Level Cooccurrence Matrix (GLCM) is constructed for the images with N dimensions where N refers to the number of intensity levels in the image. For each image GLCM is constructed to evaluate fourteen features. The calculation of those 14 features leads to identification of new inter-relationship between biological features in images. The relationship between intensities of adjacent pixels can be identified using these features. These relationship among pixels contains information related to spatial distribution of tone and texture variations in an image. The homogeneity of image(F 1 ) which is the similarity between pixels, is given by Eq (3), where p(k,l) is position of element in matrix. The measure of difference between maximum pixel value and minimum pixel value (contrast)(F 2 ), is given by Eq (4) where m is |k − l|. p(k,l) is position of element in matrix. The dependencies of pixel's gray levels which are adjacent to each other (F 3 ), is given by Eq (5) also called correlation. Where μ k , μ l , σ k , σ l are mean and standard deviations of the probability density functions where p(k,l) is position. The square differences from mean of an image (F 4 ), are averaged in Eq (6) which is called variance or sum of squares, where p(k,l) is position and μ is the mean value. The local homogeneity of an image (F 5 ), is given by Eq (7) which is also called Inverse Difference Moment (IDM). The mean values in a given image (F 6 ), are summed to get sum average as given in the Eq (8), where a and b are row and column positions in the co-occurrence matrix summed to a+b. The variance values of an image (F 7 ), is summed to get sum variance as exhibited in Eq (9). The total amount of information that must be coded for an image (F 8 ), is given by Eq (10) which is called sum entropy. The amount of information that must be coded for an image (F 9 ), is given by the Eq (11) is called entropy. The variance of an image (F 10 ), is differenced to get difference variance presented in Eq (12). The entropy values of an image (F 11 ), is differenced to get difference entropy as delineated in Eq (13) . The Eq (14) shows the information measures of correlation1 (F 12 ), where HX and HY are entropies of P x and P y . The Eq (15) shows the information measures of correlation2 (F 13 ), where HX and HY are entropies of P x and P y . In the above equation HXY,HXY1 and HXY2 are as mentioned below in Eq (16) , Eq (17) and Eq (18) . The Eq (16) is the linear interdependence of pixels in an image (F 14 ), where p(k,i) and p(l,i) are positions as mentioned in Eq. (19) . In this section, the dataset for carrying out the experiments have been discussed. In addition to this, all the results and statistical analyses have been presented. The data for COVID-19 is assimilated from various resources available in Github open repository, RSNA and Google images. The data collected from various resources are presented in Table 1 . The data [21, 26, 30] for the Chest X-Ray pulmonary diseases are obtained from NIH with total of 81,176 observations with disease labels from 30,805 unique patients are shown in Table 2 . The images are of size 1024x1024. The data [12] for the viral, bacterial pneumonia and normal images are obtained from Mendeley with total of 5,232 images as shown in Table 3 . The misclassification rate is calculated for all the pretrained models like VGG16, Resnet50, and InceptionV3. The misclassification rate in Eq (20) is used to find the models which are similar to COVID-19, where N is total number of images, F n is number of data that are actually COVID-19 but wrongly classified as not COVID-19 and F p is number of data that are not COVID-19 but wrongly classified as COVID-19. From Table 4 , we can see the architecture 1 shows a better result when compared with other architectures with less misclassification rate. These architectures are shown in Fig. 5 . From Table 5 , we can identify that the COVID-19 data is very much similar to pneumonia, consolidation and effusion. It is evident that COVID-19 data when tested for pneumonia trained model produces less misclassification rate. In Table 6 , we can determine that transfer learning has produced better accuracy (ACCURACY1) compared with traditional learning accuracy (ACCURACY2). This is because the data for COVID-19 is similar to pneumonia. Because of this reason, when model is trained for pneumonia and tested with COVID-19 data, the accuracy is better. The time taken for VGG16 is less because it is only 16 layers deep while resent50 and inceptionV3 are 50 and 48 layers deep respectively even with better accuracy compared with other models. The models are trained using NVIDIA TESLA P100 GPUs provided by Kaggle. Further analyzing the pneumonia images, we can perform transfer learning for two types of pneumonia. It is found that COVID-19 is as similar as viral pneumonia. The VGG16 model correctly identifies the COVID-19 data with 0.012 misclassification rate as shown in Table 7 . We can find from Table 8 that out of 407 images for COVID-19 and normal images, 385 COVID-19 images are correctly classified as COVID-19 and 22 images are falsely classified under non viral pneumonia class. This shows that COVID-19 is very similar to viral pneumonia. 28 images are misclassified which eventually made misclassification rate for viral pneumonia of 0.012. From Fig. 6 we can find that the pre-trained VGG16 model has correctly classified the CT scan image of chest Fig. 7 , we can find that the VGG16 model has classified the Chest X-Ray images correctly. Here we can see that the images on the right side of the Fig. 7 , has got increased patchy opacity in the right lower lobe. While the images on the left are seems to be more clear. The left side pictures are normal lung images which are correctly classified by VGG16. CT scan carried out for a same person has peculiar features because the patient does not have any nodules or consolidation like reticular opacities as previous images. The image has got small patchy glass opacity in the center of lungs developed from the peripheral. These images are also precisely classified with more similarity percentage. So it has found a similar image from the training set. This shows that the model has been consistent with all the peculiar cases like the one shown in Fig. 8 . This is the reason why the model is trained using both Chest CXR and CT scan images. The loss and accuracy graphs are shown in Fig. 9 and Fig. 10 respectively. We can see the steady increase in accuracies and steady decrease in loss values while training and testing. This shows that model is more effective and efficient. After finding the similar models, the final classification is carried out to calculate confusion matrix for model evaluation. Tables 9, 10 and 11 shows the confusion matrix for the conventional and transfer learning models of COVID-19 data when tested for viral pneumonia models. Tables 12, 13 and 14 show the classification reports for all three models with precision, recall and F1-scores to analyze the performance where C1 denotes normal class, C2 is bacterial pneumonia class, C3 is viral pneumonia class and C4 is COVID-19 class. The precision and recall for all the classes are found to be promising. The F1-score as shown in Eq (21) is calculated using precision in Eq (22) and recall in Eq (23) , where T p is number of data that are COVID-19 and are correctly classified as COVID-19. This shows the model is skilled and classified the images precisely. The transfer learning gives better outcomes when compared with normal classification. (precision * recall) (precision + recall) Sample of 14 haralick features of 10 sample images are seen through Table 15, Table 16 and Table 17 for Normal image, viral pneumonia and COVID-19 images. The efficacy of the proposed model is compared with other recent studies on COVID-19 conventional classification works and it is given in Table 21 . From this performance analysis, the proposed transfer learning model outperforms the other existing models. The infected region of lung images are identified using GradCAM. Images in the Fig. 11 shows heatmap visualization based on the prediction made by the transfer learning Category C1 C2 C3 C4 T otal Category C1 C2 C3 C4 T otal C1 182 18 0 0 200 C1 184 16 0 0 200 C2 19 181 0 0 200 C2 15 185 0 0 200 C3 0 4 182 14 200 C3 0 2 187 11 200 C4 0 0 15 185 200 C4 0 0 7 193 200 Total 201 203 197 199 800 Total 199 203 194 204 800 Table 10 Confusion matrix for Resnet50 model for conventional classification and transfer learning models Classification Transfer learning Category C1 C2 C3 C4 T otal Category C1 C2 C3 C4 T otal C1 175 25 0 0 200 C1 176 24 0 0 200 C2 6 174 20 0 200 C2 20 176 4 0 200 C3 0 5 176 19 200 C3 0 0 178 22 200 C4 0 0 22 178 200 C4 0 0 17 183 200 Total 181 204 218 197 800 Total 196 200 199 205 800 Table 11 Confusion matrix for InceptionV3 model for conventional classification and transfer learning models Classification Transfer learning C a t e g o r y C 1 C 2 C 3 C 4 T o t a l C a t e g o r y C 1 C 2 C 3 C 4 T o t a l C1 154 46 0 0 200 C1 162 38 0 0 200 C2 45 155 0 0 200 C2 36 164 0 0 200 C3 0 0 156 44 200 C3 0 3 165 32 200 C4 0 0 40 160 200 C4 0 2 30 168 200 Total 199 201 196 204 800 Total 198 207 195 200 800 X-Ray images 89% COVIDX-Net Jinyu Zhao- [32] CT images 83% Pre-trained model Feng Shi- [20] CT images 87.9% Random Forest method Ioannis D. Apostolopoulos- [1] X-Ray images 88.8% CNN Khalid El Asnaoui- [7] X-Ray images 84% Pre-trained models Shuo Wang- [25] CT images 88% Pre-trained models Yujin Oh- [15] X-Ray images 88.9% Pre-trained model Asif Iqbal X-Ray images 89.5% Deep Neural Network World Health Organisation(WHO) has recommended RT-PCR testing for the suspicious cases and this has not been followed by many countries due to shortage of the testing kit. Here the transfer learning technique can provide a quick alternative to aid the diagnoses process and thereby limiting the spread. The primary purpose of this work is to provide radiologists with less complex model which can aid in early diagnosis of COVID-19. The proposed model produces precision of 91% , recall of 90% and accuracy of 93% by VGG-16 using transfer learning, which outperforms other existing models for this pandemic period. This COVID-19 detection model has been developed with keeping in mind the challenges prevailing in the field of COVID-19 detection using data assimilated from multiple sources. Analysis of unusual features in the images is required for detection of this virus infection. The earlier we detect the viral infection, the more it helps in saving lives. This paper has been visualized in holistic approach taking into account the critical issues that are daunting in the domain. The results are fairly consistent for all peculiar cases. We hope the outcomes discussed in this paper serves a small steps for constructing cultivated COVID-19 detection model using CXR and CT images. In future work, more data can be assimilated for better results which further strengthen the proposed model. Covid-19: Automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Diagnosis of pneumonia from chest x-ray images using deep learning Performance of radiologists in differentiating covid-19 from viral pneumonia on chest ct Clinical characteristics and intrauterine vertical transmission potential of covid-19 infection in nine pregnant women: a retrospective review of medical records Automated methods for detection and classification pneumonia based on x-ray images using deep learning Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis Lung nodule texture detection and classification using 3d cnn Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images Serial quantitative chest ct assessment of covid-19: Deep-learning approach Labeled optical coherence tomography (oct) and chest x-ray images for classification Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images Covid-19 and artificial intelligence: protecting health-care workers and curbing the spread Deep learning covid-19 features on cxr using limited training data sets Haralick feature extraction from lbp images for color texture classification Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Deep learning for classification and localization of covid-19 markers in point Lung infection quantification of covid-19 in ct images with deep learning Large-scale screening of covid-19 from community acquired pneumonia using infection size-aware classification Classification of covid-19 patients from chest ct images using multi-objective differential evolution-based convolutional neural networks Age-structured impact of social distancing on the covid-19 epidemic in india A deep learning algorithm using ct images to screen for corona virus disease A deep learning algorithm using ct images to screen for corona virus disease A fully automatic deep learning system for covid-19 diagnostic and prognostic analysis Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Abnormal respiratory patterns classifier may contribute to largescale screening of people infected with covid-19 in an accurate and unobtrusive manner Deep learning system to screen coronavirus disease Pathological findings of covid-19 associated with acute respiratory distress syndrome Chest x-ray image view classification Chest radiographic and ct findings of the 2019 novel coronavirus disease Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.