key: cord-0607211-73b30uvn authors: Ozkaya, Umut; Ozturk, cSaban; Budak, Serkan; Melgani, Farid; Polat, Kemal title: Classification of COVID-19 in Chest CT Images using Convolutional Support Vector Machines date: 2020-11-11 journal: nan DOI: nan sha: ab53fc0ff7fa7031286c0bfe626931efcb27f1e0 doc_id: 607211 cord_uid: 73b30uvn Purpose: Coronavirus 2019 (COVID-19), which emerged in Wuhan, China and affected the whole world, has cost the lives of thousands of people. Manual diagnosis is inefficient due to the rapid spread of this virus. For this reason, automatic COVID-19 detection studies are carried out with the support of artificial intelligence algorithms. Methods: In this study, a deep learning model that detects COVID-19 cases with high performance is presented. The proposed method is defined as Convolutional Support Vector Machine (CSVM) and can automatically classify Computed Tomography (CT) images. Unlike the pre-trained Convolutional Neural Networks (CNN) trained with the transfer learning method, the CSVM model is trained as a scratch. To evaluate the performance of the CSVM method, the dataset is divided into two parts as training (%75) and testing (%25). The CSVM model consists of blocks containing three different numbers of SVM kernels. Results: When the performance of pre-trained CNN networks and CSVM models is assessed, CSVM (7x7, 3x3, 1x1) model shows the highest performance with 94.03% ACC, 96.09% SEN, 92.01% SPE, 92.19% PRE, 94.10% F1-Score, 88.15% MCC and 88.07% Kappa metric values. Conclusion: The proposed method is more effective than other methods. It has proven in experiments performed to be an inspiration for combating COVID and for future studies. A group of patients was infected with a novel coronavirus disease in Wuhan, China, in late December 2019. This virus was called "severe acute respiratory syndrome coronavirus 2" (SARS-CoV-2) before the world health organization (WHO) renamed it COVID-19 (Khan et al. 2020) . After being infected, various symptoms such as fever, cough, weakness, and respiratory problems occur in patients. Depending on the state of the immune system and various personal factors, there are cases with pneumonia, multi-organ failure, and death (Mahase 2020) . This virus, which has such severe symptoms, spreads so quickly that it took 30 days to spread from Wuhan to China (Wu and McGoogan 2020) . In the United States, the total number of cases have exceeded 5.6 million since January 20, 2020, when it was first seen. The rate of spread is also high in other countries. As a result of the increase in the number of patients, patients with severe symptoms need to stay in hospitals. Some studies show that the average hospital stay is 22 days . Considering the condition and number of hospitals, it is very difficult to overcome this burden (Liew et al. 2020) . For this reason, various drug and vaccine studies are still ongoing for definitive treatment (Rismanbaf 2020) . Although there is no definitive treatment yet, pharmacological treatments are strongly needed in cases of severe disease (Scavone et al. 2020) . Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test is accepted as a standard for the definitive detection of this disease. However, in the early stages of the disease, this test fails in many cases (Shi et al. 2020a) . For this reason, X-ray and Computed Tomography (CT) devices, which are easily available medical imaging devices, are used in clinical practice (Ozkaya et al. 2020 ). These imaging tools play a crucial role in the early diagnosis of . Especially compared to test kits, they can produce faster results, analyze multiple cases at the same time, and give clear information about the progress of the case (Khan et al. 2020) . With the examination of CT and X-Ray images, patients can be diagnosed, and the treatment process can be started and many lives can be saved. However, considering the large number of patients and the rate of spread of the disease, it is very difficult for experts to respond to this number. As seen in previous examples, manual examination by professionals in hospitals is very slow (Tanne et al. 2020) . At this point, it is understood that intelligent technologies will make very important contributions to the diagnosis process. For this purpose, many researchers focus their attention on artificial intelligence (AI) studies that can accurately and quickly diagnose COVID-19 images. AI methods contribute significantly to almost every field from the manufacturing industry to the healthcare industry (Jaakkola et al. 2019) . AI systems are able to accomplish a given task as if they had the knowledge of an expert. These tasks can be visual perception, speech recognition, scene interpretation, object detection, decisionmaking, or translation. While performing all these tasks, problem-specific features should be obtained, and these features should be processed. This is the only thing that hasn't changed about AI since the past. In the past, by using simpler algorithms, an automatic decision-making process was carried out by using features such as edge information, frequency changes, plane differences (Crane 1979) . In the following years, as the computing ability of computers increased, more sophisticated algorithms began to be used for feature extraction and classification. In the works of this period, certain features are usually selected based on the problem. Selected features are classified with a classifier. These methods, which require a lot of experience, were later named hand-crafted feature extraction (Majtner et al. 2016 (Öztürk 2020) . All these positive developments in the field of AI might help researchers today to fight COVID-19. Considering the breakthrough results of CNN methods in the field of image processing, it is thought that CT and X-Ray images of COVID-19 will be diagnosed with very high performance. However, since CNN architectures have quite a lot of trainable parameters, they need as many training samples to train these parameters. For this reason, CNN is not the most suitable solution for datasets that do not contain enough samples. Data augmentation methods are preferred to solve this problem (Shorten and Khoshgoftaar 2019) . The early datasets created for the analysis of CT and X-Ray images related to COVID-19 do not contain a sufficient number of training samples. When the number of samples in other classes is increased to increase the number of samples in these datasets, the number of COVID-19 class remains quite low. For this reason, it is challenging to see a successful CNN architecture in early COVID-19 classification and segmentation studies. Pereira et al. (Pereira et al. 2020 ) classify chest X-ray (CXR) for COVID-19 diagnosis using texture features such as local binary pattern (LBP), elongated quinary patterns (EQP), Local directional number (LDN), and binarized statistical image features (BSIF). ) used handcrafted image features on a chest CT dataset consisting of 78 patients in total. In this early study, they argued that some chest CT images could lead to misdiagnosis when used alone. Barstugan et al. (Barstugan et al. 2020 Their findings show that deep feature fusion can produce highly effective results. Wang et al. (Wang et al. 2020) have proposed a weakly-supervised framework for the rapid detection of COVID-19, which requires more samples for training. Apostolopoulos and Mpesiana (Apostolopoulos and Mpesiana 2020) carried out a study evaluating the transfer learning performance of CNN methods using a dataset consisting of a total of 1427 X-ray images encompassing 224 Covid-19 cases. Afshar et al. (Afshar et al. 2020 ) proposed a deep learning approach called COVID-CAPS for Covid-19 detection from X-Ray images. The capsule network method they recommend has produced remarkable results since it can work in small datasets. Jaiswal et al. (Jaiswal et al. 2020 ) used a pre-trained DenseNet201 architecture to get rid of the adverse effects of small datasets on CNN training. Brunese et al. (Brunese et al. 2020) propose and evaluate an approach based on transfer learning by exploiting the VGG-16 model. When the CNN methods in the literature are examined, it is possible to find highly competent and highperformance methods. However, it is not efficient to use these methods with the current Covid-19 datasets in their original form because of datasets containing an insufficient number of samples and uneven distribution. Considering the workload of radiology experts, it is understood that it is not possible for now to create a sufficiently large labeled dataset. For this reason, many researchers avoided these powerful methods, while many researchers conducted studies on weakly-supervised methods or data augmentation. Although the results of the studies examined above are very inspiring, there is almost no study that can achieve satisfactory success in satisfactory times. For this reason, the main purpose of this study is to propose a powerful deep learning method for Covid-19 datasets containing an insufficient number of samples without encountering overfitting problems. Unlike many methods in the literature, we do not change the layer sequence of architectures or loss function of existing CNN structures. Because these architectures continue to have quite a lot of trainable parameters, and this still carries the overfitting problem. In this study, we use a convolutional support vector machine (CSVM) network (Bazi and Melgani 2018) . To the best of our knowledge, this is the first study in which uses CSVM to classify CT images for Covid-19 detection. CSVM network consists of blending several convolutions and pooling layers ending with the SVM layer. SVMs are quite decisive for the production of filter banks. The learning process, on the other hand, takes place as forward supervised learning, in contrast to the standard backpropagation approach. In this way, a very high learning performance occurs despite limited training data. The main contributions of this study are:  Since the proposed method is designed for Covid-19 datasets containing a small number of samples, it has higher classification performance than other state-of-the-art methods in the literature.  Compared to the familiar CSVM architecture, the number of parameters is reduced, and it can be trained using fewer training samples.  Test time is faster than other deep architectures in the literature. The rest of this paper is organized as follows. Section 2 provides a methodological background and proposed method details. Dataset and experimental details are presented in Section 3. Also, comparison tables and results are given in this section. Section 4 presents the discussion and section 5 conclusion. In this section, the background information of the proposed architecture methodology, proposed technique and application details, model architecture, and information about the parameters will be presented. In order for an AI algorithm to produce decisions similar to the decision-making process of an expert, it must perform a similar evaluation process. For image processing problems, the decision-making cycle of an expert usually consists of evaluating the input taken through his eyes using his brain. Visually obtained information consists of many parts from low-level to high-level. While image processing algorithms used only low-level algorithms in the early period, the situation is changing today. CNN architectures, which are very effective for image processing problems, can automatically learn low-level, mid-level, and high-level features (Öztürk and Özkaya 2020) . These learned features are kept in convolution kernels. Convolution kernels or in other words convolution layers are one of the most important layers of CNN architecture. The classical convolution process is applied by sliding each convolution kernel of the convolution layer on the image. Since each kernel is shifted over the entire image, there is a severe decrease in the number of parameters. Parameters in convolution kernels are updated with the backpropagation process and learn the features of the problem. Although the number of trainable parameters is reduced compared to a fully connected layer thanks to the parameter sharing in the convolution layer, the number of parameters in the CNN architecture is still quite high. (Fig. 2.1) shows how the process occurs in convolution neural networks: The size of the feature map obtained as output is calculated as in (Eq. 2.1): 'N in ' input is the size of the input image, 'P' is the number of zero layer to be added around the image, 'K' is the size of the filter, 'S' is how many steps the filter moves at each time, and 'N out ' is the output size. The stride parameter 'S' is the parameter expressing how many units the convolution filter will shift in each stage. The value determines the size of the feature map. As can be seen from (Fig. 2. 2), after the convolution process, the size of the feature map is lower than the size of the original image. Zero addition is made around the original image where size reduction is desired, as seen in ( parameters are eliminated, overcomes this problem. In the pooling process, the square window size is determined first. This window is scrolled over the image like convolution kernels. The pooling process is applied to the part under the pooling window. It is expected to be selected in an order in the encoding size as the size of the size of the pool becomes smaller in size. In the literature, max-pooling, sum-pooling, and average-pooling are generally used. In the max-pooling process, the maximum value is selected among the pixels under the pooling window. In averagepooling, the arithmetic average of the pixels under the pooling window is used. In the sum-pooling process, the total value of the pixels under the pooling window is calculated, and this value is transmitted. 2x2 pooling process is shown in (Fig. 2.4) . where 'z' represent the input value and the output value of activation functions 's', 't' and 'r' sigmoid, tanh and relu respectively. ReLU activation function process is shown in (Fig. 2 .5) (Agarap 2018 where I represents the input image, o represents the output of CNN, f is CNN operation, β represents max-pooling layer, n represents the kernel size of the pooling layer, σ represents the ReLU operation, w is the weights of the CNN layer, b is bias value, and x represents convolution operation. Features automatically obtained with the help of these basic layers are classified with a fully connected layer (FCL). FCL is a multi-layered perceptron (MLP) structure. In FCL output, the probability distribution is realized with a softmax layer. In addition to these basic layers, various layers, such as the concatenate layer and normalization layer, are recommended to increase performance day by day. In addition to linear CNN architectures, various new architectures such as residual architectures and parallel architectures enter the literature. One of the main reasons underlying CNN remarkable performance is huge datasets. If CNN architectures are not trained with datasets containing a sufficient number of labeled samples, either they cannot learn the problem or overfitting may occur. This data dependence is one of the most serious problems of deep learning. The data labeling process is very laborious and lengthy. In addition, it is impossible to find enough labeled data for some problems such as medical data, sudden emergencies. The transfer learning approach is generally a very effective tool to solve this problem. Briefly working on transfer learning is as follows: CNN architecture is trained with a labeled huge dataset, and CNN learns low, mid, and high-level features. Then, the information obtained is transferred from the source domain to the target domain. At this stage, the assumption that training data and test data must be independent and uniformly distributed random variables is relaxed (Tan et al. 2018 ). Let's have a domain represented by D={x, P(x)}, where x represents feature space and P(x) represents a probability distribution. Let's define a task with T={y, f(x)}, where y represents label space and f(x) is target prediction function. When T t is given a learning task based on D t , D s for the learning task T s can be used for help in this task. The transfer learning approach aims to increase the predictive power of the f t (.) function here and to do the information transfer task in the best way. CNN architectures classify data using the MLP structure in its last layer. SVM has come to the fore lately because MLP needs more training samples than SVM and is more vulnerable to some problems. SVM is less sensitive to the overfitting problem due to margin maximization. The convolutional SVM layer is the main structure of the CSVM network. This layer uses linear SVM weights as convolutional filters to generate feature maps. These filters learn weights by using a feed-forward learning method, unlike traditional CNN's weights by backpropagation. In the dataset, positive images contain the object of interest, while negative images represent the background. Firstly, an image patch set with three channels I i , h×h×3 is extracted from each image. After all images are processed, the data set defined as Tr (1) C is assigned as the penalty variable. The square hinge cost function max(1-y i (w T x i +b),o) 2 was used with reference to L1-SVM and L2-SVM. To put it more simply, bias values have been omitted from the formula and SVM filters can be trained in this form of the formula. Subsequently, all the weights of the convolution layer can be grouped into a four-dimensional filter stack. In the creation of the convolution feature map, each training image enters the convolution process with SVM filters. Convolution process on the image is mathematically shown in (Eq. (1) ( In this study, the SARS-CoV-2 CT scan dataset (www.kaggle.com/plameneduardo/sarscov2-ctscan-dataset), which includes a binary classification problem, is used. The data set includes 2492 CT-scans images in total. While 1262 images are positive (COVID-19) , 1230 images are negative (non-COVID). The images in the data set are divided into 75% and 25% for training and testing for pre-train CNN structures and CSVM models. Data augmentation methods are not applied to the data set. Some positive (COVID-19) and negative (non-COVID) image samples are reported in (Fig. 3.1) . The experiments in the study are carried out on an Intel Core i7-7700 HQ CPU at 2.8 GHz, 16 GB RAM, and NVIDIA GTX 1080 GPU. Matlab 2020a is used as a simulation program. Pre-trained CNN networks have been trained up to 30 Epochs. The mini-batch size is set at 10. Each epoch is completed in 186 iterations. The initial learning rate is 0.1, and a drop factor of 0.1 is applied to each ten epoch. Stochastic gradient descent with momentum has been chosen as the optimization method. ResNet-50 model has achieved the highest accuracy in pre-train CNN networks. The training and test graphics of this model are given together in (Fig. 3. 2) In the accuracy graph, the blue curves show the training data accuracy, while the black curves represent the test data accuracy. In the loss graph, red curves represent the loss of training, black curves represent loss for test data.      / ( ) Sensitivity TP TP FN   / ( ) Specificity TN TN FP   / ( ) Precision TP TP FP   1 (2 ) / (2 ) F Score TP TP FN FP       TP TN FP FN ( )( )( )( )         A confusion matrix is needed to measure classification performance. In (Fig. 3.3 Confusion matrices of proposed CSVM models for the test data are shown in (Fig. 3.4) . In addition to the pretrained CNN models trained using transfer learning, the confusion matrices of the proposed CSVM models are also calculated. When Fig. 5 is examined, it is seen that the accuracy rate obtained for the COVID class is not sufficient. It is seen that this problem has been largely overcome with proposed CSVM models. In the confusion matrix for the The impact of COVID-19, which has become a global epidemic, is assumed to show in the future. Besides, it is known that the days when the effect we call the second wave will increase are approaching. Thanks to this study, a COVID-19 diagnosis system has been proposed to carry out clinical studies quickly and automatically. The proposed model is based on CSVM architecture and shows higher performance than other pre-trained CNN networks. At the same time, the training time is very low compared to CNN models. CSVM models were trained as scratch models, contain less number of parameters than pre-trained CNN models. The CSVM algorithm, which can provide automatic feature extraction, can analyze CT images with high precision. In this study, the CSVM model provides trained with larger datasets to achieve the level of success that can assist physicians in the diagnosis of COVID-19 disease. A training process with large data sets is very important in determining the validity and reliability of the system. This developed system can be used to address the shortage of radiologists due to the increasing number of cases in countries affected by COVID-19. Also, such models can be used to diagnose other chest-related diseases. We plan to make our model more accurate and robust by obtaining more of these types from our hospitals. Conflict of interest: The authors declare that they have no conflicts of interest. Human and animal rights: The paper does not contain any studies with human participants or animals performed by any of the authors.  The proposed method offers a very fast and high-performance framework to actively combat COVID-19. In this way, it reduces the workload of medical experts.  To the best of our knowledge, the CSVM method is actively used for the first time in this study to detect COVID-19.  The proposed framework produces better performance metrics than other state-of-the-art deep learning techniques.  For datasets containing a small number of samples and a small variety of samples (eg nearly all COVID-19 datasets), the recommended method is ideal. List the figure legends: Mohammadi AJae-p (2020) COVID-CAPS: A Capsule Network-based Framework for Identification of COVID-19 cases from X-ray Images Deep learning using rectified linear units (relu) Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks Ozturk SJae-p (2020) Coronavirus (COVID-19) Classification using CT Images by Machine Learning Methods Convolutional SVM networks for object detection in UAV imagery Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays Automatic Cell Detection and Tracking New machine learning method for image-based diagnosis of COVID-19 A Novel Approach of CT Images Feature Analysis and Prediction to Screen for Corona Virus Disease (COVID-19) Artificial Intelligence Yesterday, Today and Tomorrow. Paper presented at the 2019 42nd International Convention on Information and Communication Technology Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19) Safe patient transport for COVID-19 Risk factors associated with disease severity and length of hospital stay in COVID-19 patients Coronavirus: covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate Combining deep learning and hand-crafted features for skin lesion classification A Novel Medical Diagnosis model for COVID-19 infection detection based on Deep Features and Bayesian Optimization Coronavirus (COVID-19) Classification using Deep Features Fusion and Ranking Technique Stacked auto-encoder based tagging with deep features for content-based medical image retrieval Gastrointestinal tract classification using improved LSTM based CNN. Multimedia Tools and Applications COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios Potential Treatments for COVID-19; a Narrative Literature Review Current pharmacological treatments for COVID-19: What's next? Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19 Shen DJae-p (2020b) Large-Scale Screening of COVID-19 from Community Acquired Pneumonia using Infection Size-Aware Classification A survey on Image Data Augmentation for Deep Learning Artificial Neural Networks and Machine Learning -ICANN 2018 Covid-19: how doctors and healthcare systems are tackling coronavirus worldwide