key: cord-0176023-med7p73w authors: Kadry, Seifedine; Rajinikanth, Venkatesan; Rho, Seungmin; Raja, Nadaradjane Sri Madhava; Rao, Vaddi Seshagiri; Thanaraj, Krishnan Palani title: Development of a Machine-Learning System to Classify Lung CT Scan Images into Normal/COVID-19 Class date: 2020-04-24 journal: nan DOI: nan sha: 2e31de7d7a946c9fda6a08e9a28c05ffcd725245 doc_id: 176023 cord_uid: med7p73w Recently, the lung infection due to Coronavirus Disease (COVID-19) affected a large human group worldwide and the assessment of the infection rate in the lung is essential for treatment planning. This research aims to propose a Machine-Learning-System (MLS) to detect the COVID-19 infection using the CT scan Slices (CTS). This MLS implements a sequence of methods, such as multi-thresholding, image separation using threshold filter, feature-extraction, feature-selection, feature-fusion and classification. The initial part implements the Chaotic-Bat-Algorithm and Kapur's Entropy (CBA+KE) thresholding to enhance the CTS. The threshold filter separates the image into two segments based on a chosen threshold 'Th'. The texture features of these images are extracted, refined and selected using the chosen procedures. Finally, a two-class classifier system is implemented to categorize the chosen CTS (n=500 with a pixel dimension of 512x512x1) into normal/COVID-19 group. In this work, the classifiers, such as Naive Bayes (NB), k-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF) and Support Vector Machine with linear kernel (SVM) are implemented and the classification task is performed using various feature vectors. The experimental outcome of the SVM with Fused-Feature-Vector (FFV) helped to attain a detection accuracy of 89.80%. The abnormality/infection in the internal body organs are more acute compared to other diseases. Further, the diseases in internal organs are commonly prescreened using the non-invasive methods such as bio-signals and bio-images [1] [2] [3] . The infection in lung due to the climatic condition and microorganisms are very common in humans and this infection may cause various symptoms ranging from caught, cold, fever and mild/severe pneumonia [4] [5] [6] . The respiratory tract infection due to the Coronavirus Disease (COVID-19) is emerged as one of the major threat globally due to its acuteness and the infection rate. It is one of the major communicable infectious diseases caused by Severe Acute Respiratory Syndrome-Corona Virus-2 (SARS-CoV-2) and according to a recent report [7, 8] , it affected a larger human community, irrespective of their race and gender. The infection caused by COVID-19 severely affects the respiratory system by causing the severe pneumonia. Due to its harshness and the spreading rate, the World Health Organization (WHO) recently announced it as pandemic [9] . Even though various controlling and treatment procedures are implemented from December 2019 to till date, the mortality due to COVID-19 infection is rapidly increasing. COVID-19 is a recently emerged infectious disease discovered initially in China (Wuhan) in December 2019 [9, 14] . The drug discovery for this disease is still in the research phase and no approved drug is available for COVID-19. Due to these reasons, the mortality rate is rising globally [7] . In recent days, a number of image assisted detection procedure for COVID-19 is discussed by the researchers and Table 1 present the summary of few recent techniques. Table 1 . The summary of image based COVID-19 detection procedure Implemented investigative procedure Rajinikanth et al. [26] Harmony-Search and Otsu's based image thresholding and watershed-Segmentation is implemented to extract COVID-19 infection. CT Disease severity prediction based on the size of the infection with respect to the lung is discussed Rajinikanth et al. [27] Firefly and Shannon's entropy based image thresholding and Markov-random-field segmentation is implemented to extract COVID-19 infection. Zhou et al. [32] U-Net with attention mechanism is implemented to segment the COVID-19 infection. This work provided the following measures; Dice Score=69.1%,Sensitivity=81.1% and Specificity = 97.2%, The details discussed in Table 1 presents recently implemented methodologies to segment and detect the COVID-19 infection using the CT and chest X-ray images. Further, a detailed review of the image assisted procedures existing in the literature can be found in the recent work of Shi et al. [33] . From these earlier works, it can be noted that, the image assisted COVID-19 detection system is essential to support the doctor during the disease diagnosis task. Hence, in this research work, a MLS is proposed to detect the disease using the CTS. This part of the work presents the methodology implemented in this system which can work well on the CTS of the views, such as axial, coronal and sagittal. For experimental demonstration, only the axial-view of the CTS is considered. The various stages employed in the proposed work are clearly depicted in Figure 1 . The infected patient primarily assed with a radiology assisted imaging procedure (CT scan), which provides a reconstructed three-dimensional (3D) image of the respiratory tract. Assessment of the 3D requires complex computations and hence, the 3D images are separated into 2D slices during the examination. In this work, the axial CTS of normal/COVID-19 class are considered to test the performance of the proposed MLS. Initially, the visibility of the infected section is enhanced using a tri-level thresholding implemented using the Chaotic-Bat-Algorithm and Kapur's entropy (CBA+KE). After the thresholding, a bi-level threshold filter discussed in [26, 34] is implemented to separate the image into Region-Of-Interest (ROI) and artifact. A feature extraction procedure is then implemented to extract the image features from original, threshold and ROI. After extracting the features; the dominant features from each image category is selected using the statistical test and the chosen features are then considered to train, test and validate the classifier system implemented in this work. Further, a future fusion technique is also employed to increase the classification accuracy. Image thresholding is one of the widely adopted enhancement technique considered to improve the visibility of grayscale/RGB images. In this work, the Kapur's entropy thresholding discussed in [35] [36] [37] is implemented to enhance the CTS for further assessment. The mathematical description of the Kapur's entropy is discussed blow; Let us consider a chosen dimension of the grayscale image with L gray-levels ( 0 to 1 L  ) with a total pixel value of G . If () Fi denotes the frequency of the i th intensity-level; then the pixel distribution of the image will be; Then the probability of i th intensity-level is represented by; If there are T thresholds as: 12 ( , ,..., ) During the thresholding operation, the image pixels are separated into 1 T  groups based on the assigned threshold value. After separating the images as per the chosen threshold, the entropy of each group is computed separately and combined to get the final entropy. For a tri-level threshold problem, the computed entropy will be; P =probability distribution, and  =probability occurrence. During this operation, the objective is to find; In this research, identification of () Kapur FT is achieved by the CBA. In the literature, a number of procedures are implemented to enhance the optimization performance of the bat-algorithm (BA) and in the proposed work, the search operator in the traditional BA is improved using the Lorenz-Attractor ( ) discussed in [38, 39] . ,, t t t using the Lorenz-attractor The BA bas the following representations [40] [41] [42] : Location update = () where  is a random value of range [0,1]. Eqn. (8) drives Eqn. (6) and Eqn. (7) and hence, the choice of the frequency value should be appropriate. Updated value for every bat is produced based on; where  is a Lorenz-Attractor and  =loudness constraint. The expression of the loudness variation can be represented as; where α is a variable with a value 0<α<1. The typical search of a bat in the three-dimension search space is depicted in Figure 2 . Every bat is responsible to find Step3: Find the best G attained by a bat and update the velocity and position using; Step4: When the search iteration rises, update the position of every bat using; Step5: Is maximal iteration is reached (or) all the agents attained ) T ( F Kapur If yes, stop the search and declare the thresholds Else, repeat steps 2 to 4, till maximal iteration is attained. The accuracy of the disease detection using bio-images depends mainly on the quality of the image considered. The lung CTS is normally associated with the lung section to be examined along with other unwanted section, such as the bone segment and other body parts. In order to have a better diagnosis using the computer assisted procedures, it is necessary to consider the Region-Of-Interest (ROI) from the medical image. In this work, a threshold filter implemented in [34] is considered to separate the threshold image into ROI and artifact. As discussed by Rajinikanth et al. [26] , the threshold level (Th) of the filter is initially identified manually and this threshold is then considered for all other images. The extracted ROI has the pneumonia infection section due the COVID-19 and this section is then considered for further assessment. From the ROI, the pneumonia infection is then segmented using the watershed-segmentation discussed in [26] . The segmentation result confirms that, proposed methodology helped to extract the pneumonia infected region from the axial, coronal and sagittal view of the CTS. All the images (original, threshold, and ROI) considered in this work are in 2D form and hence, the 2D image feature extraction procedures, such as Discrete Wavelet Transform (DWT), Gray-Level Co-Occurrence Matrix (GLCM) and Hu Moments (HuM) are implemented. Further, the entropy features, such as Kapur, max [43] [44] [45] , Renyi [46, 47] , Tsallis [48] , Shannon [49] , Vajda, and Yager [50, 51] are also extracted and considered as the prime features.  DWT: It evaluates the non-stationary details in image and the arithmetical expression of DWT is indicated as follows; When a wavelet has the function 2 ( ) ( ) t W r   , then its DWT will be; where ) t (  is the principle wavelet, the symbol '*' denote the complex conjugate, a and b  are scaling parameters for image dilation and transition correspondingly. The proposed work extracts 40 numbers of the features using DWT [43] . After extracting these features from the normal/Covid-19 class images, student's t-test based statistical evaluation is executed and these features are ranked based on the attained t-value and the DWT features whose p-value is >0.05 is discarded. This feature selection procedure helped to attain 13 numbers of one-dimensional feature vector and these are considered and the dominant DWT features. Let the feature vector (13x1) attained from this procedure be; ( , ,..., ) ( , ,..., )  Entropy features: Entropy is the measure of the abnormality existing in the image and this feature provides the essential information on the lung abnormality in the CTS. In this work, 7 entropy features are considered and the essential information of these features can be found in [43] [44] [45] [46] [47] . , and Entropy f . Feature extraction is separately implemented on the three images cased and the attained Feature-Vectors (FV) with a size of 47 features is arranged as follows; Original image= Features fusion is widely adopted in the Machine-Learning (ML) and Deep-Learning (DL) systems to enhance the classification accuracy. This practice is used to increase the size of the 1D FV to enhance the detection accuracy. In this work, the number of features existing in the considered FV is less (ie. 47x3=141). Hence, a serial fusion technique is employed to fuse the FVs, such as The fused feature vectors considered in the proposed MLS is depicted below; FFV FV FV FV    (17) In which, the 1 FFV = 94x1 features, and 1 FFV =141x1 features. In the ML and DL techniques, the classifiers are implemented to separate the given dataset into two or multi-class with the help of the feature-vector. Further, the choice of an appropriate classifier is essential to maintain the detection accuracy during the medial data assessment. In the proposed work, a two-class classification problem is considered and the implemented classifier is utilized to classify the CTS image dataset into normal/COVID-19 class. In the proposed work, most commonly implemented classifiers, such as Naive Bayes (NB), k-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM) with linear kernel are employed to classify the considered images using the feature vectors, The eminence of ML and DL based data analysis is generally authenticated by calculating the important performance values. In the proposed MLS, the following performance values are computed to validate the eminence of the implemented classifier system. The clinical level analysis of the pneumonia infection due to COVID-19 is generally assessed using CTS. This work considered 500 numbers of grayscale lung CTS (250 normal and 250 COVID-19 class) for the estimation. The normal CTS are collected from the LIDC-IDRI [54] [55] [56] and the RIDER-TCIA [57, 58] and the COVID-19 class images are collected from the Radiopaedia database [59] [60] [61] [62] [63] [64] [65] [66] [67] and the benchmark test images available at [68] . All these images are resized into 512x512x1 pixels and the resized images are considered for the experimental investigation. The sample test images considered in the proposed study is depicted in Figure 3 . In this section, the investigational outcome achieved are presented and discussed. This MLS is implemented using a workstation with configuration-Intel i5 2.GHz processor with 8GB RAM and 2GB VRAM equipped with the MATLAB ® . Experimental outcome of this MLS authenticate that it needs a mean time of 183±17sec to process the considered CTS dataset. The benefit of this MLS is, it is an automated technique and will not involve the operator assistance during the CTS classification. The performance of implemented thresholding and segmentation technique is initially executed using the clinical grade CTS provided in Radiopaedia case-study [69] and the attained results are presented in Figure 3 . In this work, the axial, coronal and sagittal slices of the case-study is assessed to confirm the performance of the proposed system and finally the infection due to COVID-19 is extracted using the watershed segmentation recently discussed in [26] . The results confirm that, proposed work offered better segmentation on the considered CTS irrespective of its orientation. After confirming the segmentation performance on the considered case-study, the proposed MLS is then considered to classify the CTS database into normal/COVID-19 class using a chosen procedure. As discussed in section 3, all the images of the considered CTS database are initially enhanced using the CBA+KE threshold and from the enhanced image the ROI is extracted by implementing the filter with a chosen thereshold of Th=179±4. Later, the essential image features, from the original, threshold image and the ROI are extracted using DWT, GLCM, HuM and entropies; as discussed in section 3.3. Later, the feature selection is implemented for the DWT, which helped to reach a final feature vector of dimension 47x1 is reached and is then named as Initially, 1 FV is considered to evaluate the performance of the classifier on the considered data and the attained results are depicted in Figure 5 . The performance of the classifier is verified using a five-fold cross validation and the best value among the five trials are chosen for the assessment. Table 2 and Table 3 . The results of these tables confirm that, the classifier accuracy achieved with the fused-feature-vector is better and these results confirm that, the increase in number of features will increase the classification accuracy. The classification accuracy attained with 2 FFV is better compared to the accuracy attained with 1 FFV as well as 1 FV . Along with the accuracy, it is necessary to compute the overall accuracy of the classifier, to confirm its clinical significance. To get the information about the overall performance of the classifier, Glyph plot is considered in this work. The Glyph plot [] will provide a graphical representation based on the amplitudes of the performance measures. Usually, the Glyph plot with larger dimension represents the better overall performance and the Glyph plot achieved for various classifiers using 1 FV , 1 FFV , and 2 FFV are depicted in Figure 6 . From Fig 6(a) it can be noted that, the overall performance attained with SVM is superior. Fig 6(b) shows the better performance by RF and Fig 6(c) confirms the performance of the SVM. In the considered system, for all the feature cases, the overall performance attained with the classifiers, such as NB, KNN and DT are lesser compared to the RF and the SVM. The results presented in Figure 7 confirm that, when the number of features is increased, then the classification accuracy can be increased. Further, the accuracy attained using 2 FFV is superior compared with the accuracy attained with other irrespective of the classifier unit. In the proposed research, the MLS is implemented to examine the CTS dataset of normal/COVID-19 class and attained an accuracy of >89% with the SVM classifier. In future, a suitable DL procedure can be implemented to improve the classification accuracy. The aim of this research is to propose a computerized system to distinguish the normal and COVID-19 CTS images from a considered image database. This work proposes a MLS using a sequence of procedures ranging from image pre-processing to the classification to implement a scheme with better detection accuracy. The proposed MLS initially implements an image thresholding process with CBA+KE to enhance the test image and then implements a threshold filter to separate the ROI and artifact. Later, essential procedures, such as feature-extraction, feature-selection, feature-fusion, and classification are employed in the proposed MLS. In this work the classifier units, like NB, KNN, DT, RF and SVM are considered and its performance are individually tested with chosen features, such as investigation of this study confirms that, the classification accuracy of SVM is 89.80% when 2 FFV is considered to train, test and validate the classifier. This confirms that, when the proposed MLS is equipped with the SVM classifier, a better classification is attained with the considered CTS database. Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques A unified patch based method for brain tumor detection using features fusion Chest CT Severity Score: An Imaging Tool for Assessing Severe COVID-19 Finding an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV Novel Coronavirus Outbreak Composite Monte Carlo Decision Making under High Uncertainty of Novel Coronavirus Epidemic Using Hybridized Deep Learning and Fuzzy Rule Induction Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection Tools for Coronavirus Outbreak: Need of Active Learning and Cross-Population Train/Test Models on Multitudinal/Multimodal Data The role of CT in case ascertainment and management of COVID-19 pneumonia in the UK: insights from high-incidence regions Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia CT manifestations of coronavirus disease-2019: A retrospective analysis of 73 cases by disease severity Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases Sensitivity of chest CT for COVID-19: comparison to RT-PCR Coronavirus disease 2019: initial chest CT findings Chest Radiographic and CT Findings of the 2019 Novel Coronavirus Disease (COVID-19): Analysis of Nine Patients Treated in Korea CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19) Emerging Coronavirus 2019-nCoV Pneumonia CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV) Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT Temporal Changes of CT Findings in 90 Patients with COVID-19 Pneumonia: A Longitudinal Study. Thoracic Imaging Harmony-Search and Otsu based System for Coronavirus Disease (COVID-19) Detection using Lung CT Scan Images Firefly-Algorithm Supported Scheme to Detect COVID-19 Lesion in Lung CT Scan Images using Shannon Entropy and Markov-Random-Field JCS: An Explainable CoroNet: A Deep Neural Network for Detection and Diagnosis of Covid-19 from Chest X-ray Images A New Modified Deep Convolutional Neural Network for Detecting COVID-19 from X-ray Images Coronavirus (COVID-19) Classification using Deep Features Fusion and Ranking Technique, 2020 An automatic COVID-19 CT segmentation based on U-Net with attention mechanism Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19 Deep-learning framework to detect lung abnormality-A study with chest X-Ray and lung CT scan images Kapur's entropy based optimal multilevel image segmentation using Crow Search Algorithm Segmentation of breast thermal images using Kapur's entropy and hidden Markov random field A new method for gray-level picture thresholding using the entropy of the histogram The Lorenz Attractor Exists Chaotic based differential evolution algorithm for optimization of baker's yeast drying process Nature-Inspired Metaheuristic Algorithms Bat algorithm: literature review and applications Dey, N: Multi-level image thresholding using Otsu and chaotic bat algorithm Automated detection of Alzheimer's disease using brain MRI images-a study with various feature extraction techniques Study of normal ocular thermogram using textural parameters A mathematical theory of communication Entropy expressions for multivariate continuous distributions Application of nonlinear methods to discriminate fractionated electrograms in paroxysmal versus persistent atrial fibrillation Possible generalization of Boltzmann-Gibbs statistics On Shannon's entropy, directed divergence and inaccuracy Automated diagnosis of celiac disease using DWT and nonlinear features with video capsule endoscopy images Automated detection and classification of liver fibrosis stages using contourlet transform and nonlinear features Texture Analysis. In: The Handbook of Pattern Recognition and Computer Vision Social-Group-Optimization based tumor evaluation tool for clinical brain MRI of Flair/diffusion-weighted modality The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans Data From LIDC-IDRI. The Cancer Imaging Archive Data From RIDER_Lung CT. The Cancer Imaging Archive Evaluating Variability in Tumor Measurements from Same-day Repeat CT Scans of Patients with Non-Small Cell Lung Cancer 1. Radiology Case courtesy of Dr Domenico Nicoletti, Radiopaedia.org, rID: 74724 (Last accessed date Case courtesy of Dr Fabio Macori, Radiopaedia.org, rID: 74867 (Last accessed date Case courtesy of Dr Fateme Hosseinabadi , Radiopaedia.org, rID: 74868 (Last accessed date Case courtesy of Dr Derek Smith, Radiopaedia.org, rID: 75249 (Last accessed date Case courtesy of Dr Mohammad Taghi Niknejad, Radiopaedia.org, rID: 75605 (Last accessed date Case courtesy of Dr Ammar Haouimi, Radiopaedia.org, rID: 75665 (Last accessed date Multivariate Data Glyphs: Principles and Practice. Handbook of Data Visualization