key: cord-1028007-wevekoq8 authors: Gündoğar, Zeynep; Eren, Furkan title: An adaptive feature extraction method for classification of Covid-19 X-ray images date: 2022-03-20 journal: Signal Image Video Process DOI: 10.1007/s11760-021-02130-x sha: 74209c6bd79af8c50a17b2994db1de2feb8edd4e doc_id: 1028007 cord_uid: wevekoq8 This study aims to detect Covid-19 disease in the fastest and most accurate way from X-ray images by developing a new feature extraction method and deep learning model . Partitioned Tridiagonal Enhanced Multivariance Products Representation (PTMEMPR) method is proposed as a new feature extraction method by using matrix partition in TMEMPR method which is known as matrix decomposition method in the literature. The proposed method which provides 99.9% data reduction is used as a preprocessing method in the scheme of the Covid-19 diagnosis. To evaluate the performance of the proposed method, it is compared with the state-of-the-art feature extraction methods which are Singular Value Decomposition(SVD), Discrete Wavelet Transform(DWT) and Discrete Cosine Transform(DCT). Also new deep learning models which are called FSMCov, FSMCov-N and FSMCov-L are developed in this study. The experimental results indicate that the combination of newly proposed feature extraction method and deep learning models yield an overall accuracy 99.8%. Medical images are crucial sources of information for diseases that cannot be easily diagnosed. Studies have been conducted in literature on many health problems that are difficult to diagnose through medical images [1] . The main purpose of these studies is to diagnose health problems through medical images. The common problem of undiagnosed diseases is that most disease symptoms are common symptoms of different health problems. Therefore, classification of diseases is a vital procedure [2] . Classification of medical signals and disease diagnosis require high accuracy and fast results [3] . In order to classify diseases with high accuracy, it is necessary to obtain the most distinctive features of classes that can be distinguished from each other. Achieving fast results in classification is directly proportional to the amount of data used. Feature extraction involves reducing B Zeynep Gündogar zgundogar@fsm.edu.tr Furkan Eren furkan.eren@stu.fsm.edu.tr 1 Computer Engineering Department, Fatih Sultan Mehmet Vakıf University, Istanbul, Turkey the amount of resources required to describe a large data set. One of the main problems that arise when analyzing complex data is the increase in the number of variables used. Analyses with too many variables often require a large memory space and powerful processor. Feature extraction is a generic term that defines methods to overcome these problems by creating combinations of variables and to ensure that the data can be explained with sufficient accuracy. Choosing clear, distinctive and independent features is a critical step for effective classification and regression algorithms [4] . In this study, lung X-rays of patients with and without symptoms of the Covid-19 virus were used as a data set. Although chest CT is an effective imaging technique for diagnosing lung-related diseases, X-ray films are much less costly due to the imaging time and widespread use. For this reason, X-ray images were preferred in this study. Quick diagnosis of the Covid-19 virus is an important step in quarantining people who come into contact with the virus quickly and reducing the risk of transmission. Machine learning algorithms, deep learning algorithms or hybrid models are used for the diagnosis of Covid-19 [5] . Using CNN-based transfer learning-BiLSTM network for Covid-19 diagnosis over lung CT images, Muhammed and his group performed image processing-based lung segmentation to increase the success rate [6] . They have applied feature selection with this segmentation process. Likewise, Morteze et al. applied an image processing-based preprocessing step for Covid-19 diagnosis over X-ray images [7] . In this study, in which CNN was used as the classification algorithm, the diaphragm regions in the lung X-rays were removed as a preprocessing step, and feature extraction and selection was applied to increase the classification success of the CNN model used. Feature selection and extraction is a crucial technique in Covid-19 diagnosis in order to increase classification success. In this study, different artificial neural networks have been trained with different feature extraction methods. As feature extraction methods, Singular Value Decomposition (SVD), Discrete Wavelet Transform (DWT), Discrete Cosine Transform(DCT), Tridiagonal Matrix Enhanced Multivariance Products Representation (TMEMPR) and TMEMPRbased newly developed PTMEMPR method were used. The PTMEMPR method presented in the study was compared with five well-known feature extraction methods. Six different neural network models were trained to compare the methods used. Three of the trained neural network models are known convolutional neural network (CNN) models, which are "Alex-net", "VGG-16" and "Lenet-5" models. In order to compare feature extraction methods with models with different features, 3 different neural network models have been developed within the scope of this study. Among these developed models, "FSMCov" and "FSMCov-L" are proposed as convolutional neural network model, and "FSMCov-N" is proposed as artificial neural (ANN) network model. In this study, X-ray lung images of patients infected with and not infected with Covid-19 virus were used as a data set. In addition, X-ray images obtained from two different sources were used for the diagnosis of Covid-19 [8, 9] . The lung Xray image database was developed by Cohen JP using images from various open access sources [10] . The size of each image in this data set is 1024×1024. There are 454 X-ray images in total. 154 of these images are images of patients with Covid-19 symptoms, while 300 are asymptomatic patients. Tridiagonal Matrix Enhanced Multivariance Products Representation (TMEMPR) method is a matrix decomposition method which is revealed by the recursive application of the EMPR method to a matrix [11] . This representation is given for a m × n matrix by the following expansion [12] . The columns of U and V matrices are support vectors, and is a tridiagonal matrix consisting of contributions coming from outer products (α, β, γ ) at each recursion step. Besides being a powerful matrix decomposition method, TMEMPR has disadvantages due to its nature. One of these disadvantages is the requirement of matrix multiplication operations to determine α, β, γ parameters. Another disadvantage is that it is a recursive method and this recursive structure arises from the use of the features obtained in the previous step to calculate new features in each recursion step. These disadvantages increase the calculation cost of TMEMPR method. It has been shown in previous studies that the computational complexity of TMEMPR method is O(n 3 ) [13] . TMEMPR method is successful as a feature extraction method; however, it is a method with high computational cost. Decreasing the computational cost is an important element that will make this method more preferred. The main purpose of the new method presented in this study is to increase the classification accuracy by increasing the distinctiveness of the features obtained by TMEMPR and also to reduce the computational cost [14] . The most important factor increasing the computational cost in TMEMPR method is the size of the matrix to be decomposed. As the size of the matrix increases, the computational cost increases. The other factor which increases the cost is the recursion number. The number of recursions in the TMEMPR method is proportional to the number of parameters to be obtained. As the number of recursion increases, the number of parameters obtained increases, while the dominance of the parameters decreases. The first parameters obtained in the method are the dominant parameters, and they hold the most important and distinctive information about the data. For this reason, as the number of recursion increases in the method, the dominance of the parameters decreases, but the calculation cost increases linearly. Our new method developed to reduce the calculation cost is shown in Figure 1 . The recursive nature of TMEMPR method makes the applicability of parallel programming difficult. For this reason, in the proposed method, the matrix to which the method will be applied is partitioned. Matrix size, which is one of the computational cost reasons in the method, has been reduced by partition in the new method developed. In addition to decreasing matrix sizes, each part decomposed by applying TMEMPR method in parallel and the calculation cost was reduced. In addition, deductive logic is taken as a basis in order to increase the number and dis- tinctiveness of the features obtained from the method. In the developed method, TMEMPR method is first applied to the whole matrix, then the matrix divided into 4, and this process is applied to these new sub-matrices that occur consecutively. In the PTMEMPR method, the number of recursion for each partitioned matrix is kept low. The low recursion number greatly reduces the computational cost, especially for the large matrix size. In TMEMPR method, the parameters obtained in the first recursion steps are the dominant values and they hold important information about the whole structure of the matrix. In the PTMEMPR method, the number of recursion is kept low and the dominant parameters of each sub-matrix are obtained. This parameter setting gives more distinctive information about the matrix in general. The number of parameters (N f eature ) obtained for PTMEMPR method according to the partition level and recursion number (RN) at each level is determined by the formula below. In TMEMPR method, the number of recursion is finite and at most is the smallest dimension of the matrix. With TMEMPR decomposition, three parameters are obtained, (α, β, γ ) in each recursion. When TMEMPR method is applied on an image with 50x50 matrix size, 150 parameters are obtained. In PTMEMPR method, the amount of partition depends on the sub-parts of the target matrix. The partition can segment up to when the sub-matrix size is 2x2 or more. For this reason, Figure 2 (a). According to TMEMPR method, PTMEMPR saves a lot of time. As the matrix size increases, the gain provided by PTMEMPR decomposition increases significantly. The goal of reducing the calculation cost of the developed method has been successfully achieved. Another important goal of the method is the success of the obtained features to the classification rate. The outputs of this target are discussed in 3. The graph in Figure 2 (b) shows the memory usage of the PTMEMPR and TMEMPR methods. According to the graph, the two methods are similar in terms of available memory usage. In this respect, the new method of memory usage gives better results than the classical method. The main purpose of the development of the FSMCov CNN model is for binary-classification for the Covid-19 data set. This model has been developed specifically for use with different feature extraction methods. CNN models create feature extraction within their structure. Convolutional layers extract the properties of the incoming data. In this way, raw data can be input to CNN models without any pre-processing. In this developed model, it was aimed to measure the effect on the classification results by using the properties obtained by applying the preprocessing step instead of the raw data. This model has been developed to examine the outputs of different feature extraction methods on models with simple convolutional layer and simple classification layer. The convolution layer of the model is kept at a low level compared to other models. The reason for this is that pre-processed data will be used as input to this model. The developed model has a very simple architecture. The simple structure of the architecture In the proposed architecture which is given in Figure 3 , the first successful Lenet-5 architecture is taken as a reference [15] . The biggest difference in the created model is the pooling layer. Instead of the average-pooling layer preferred in the Lenet-5 model, the max-pooling layer is used in this model. In addition, the sigmoid activation function was preferred instead of the softmax activation function used in the model due to its success in binary classification. Since the model is arranged for binary classification, binary-cross entropy is used as a loss function. FSMCov-L model is also proposed as a CNN model, and it is an extended version of FSMCov model. In this model, as in FSMCov, the conversion layer is kept at a low level compared to known models. The most important difference of the model is the expansion of the learning layer, that is, the neural network. In this model, the simple structure of the conversion layer is that the data to be used as input are pre-processed. For this reason, feature extraction in the convolutional layer is reduced. The main purpose of the model created, is to test and compare the success of a model with a complex and wide neural network by testing it with the data obtained by feature extraction methods. Expanding the neural network portion of this model significantly increases the computational cost. In this developed model, Max-Pooling is used in the commoning layer. The model is optimized for binary classification since there are two classes in the problem that the model will be used in. In the last layer, sigmoid is preferred as the activation function. Binary cross entropy is used as the loss function of the model. The Adam optimization algorithm has been preferred as the optimizer. The architecture of the model is shown in Figure 4 . FSMCov-N which is given in Figure 5 is a multilayer artificial neural network (MANN) model, and the purpose of the development of this model is to compare with CNN models to evaluate the effect of convolution layer in CNN models. Artificial neural networks basically consist of three layers as input, middle layer and output layer. Information is transmitted to the network through the input layer. Data processed in intermediate layers create a result in the output layer. There are two intermediate layers in this developed model. The number of neurons in the layers was kept at a low level. It has a simple structure compared to CNN models. Feature extraction from data is not performed in ANN models. For this reason, inputting the data obtained by feature extraction methods on this model corresponds to the structure created in CNN models. In CNN models, feature extraction is created in the internal dynamics of the model. ANN models do not have a convolutional layer. The properties obtained by the preprocessing steps on the data are similar to the convolutional layer in CNN models. SVD, DWT, DCT, TMEMPR and the proposed PTMEMPR method are applied as a pre-processing step in the classification of data. In the Covid-19 data set we use, each image is in the form of a 1024x1024 matrix given in Figure 6 . In this study, data sizes are reduced by %99.9 by using 5 different feature extraction methods. After each feature extraction, the original data were converted to a 32x32 matrix size. Neural networks have been trained using only % 0.1 of the original data. In this study, 6 different neural network models have been trained and tested. 3 of the models used are known, and the rest are neural network models specially developed for the application. The developed models and their features are explained in detail in section 2. For the 3 well-known CNN architectures used for model training, the only updated layer is the input layer. Different feature extraction methods were used for training the models. Vectors, whose size is 1024, obtained by feature extraction were converted to 32x32 matrix size for model training. A comparison of neural network models for 32x32 matrix entries is shown in Table 1 . This study, was also used in model training in raw data to evaluate feature extraction methods. Raw data are of 1024x1024 matrix size. Comparison of 6 different neural network models for inputs of this data size is shown in Table 2 . The computational cost of training and testing processes with raw data is discussed in the results section. For the VGG-16 model, 768x768 matrix size was used instead of 1024x1024 matrix size. The high number of convolution layers of the VGG-16 model significantly increases the calculation cost of the model for high-dimensional data. For the hardware used in this study, the VGG-16 model could not be created for a 1024x1024 matrix size. The data size for the training of the VGG-16 model has been reduced by linear interpolation. Two different hardware was used in the study. 16 Gb Ram, i7 9700 HQ processor and GTX-850M graphics card are used for preprocessing process. Model trainings and tests were performed on Tesla K80 GPU with "Google Colaboratory" due to insufficient hardware in personal computers. In this study, 4 known feature extraction methods as well as PTMEMPR feature extraction method we proposed were used. To compare these methods, 3 well-known artificial neural networks and 3 models we developed for Covid-19 were used. According to Table 3 , all feature extraction methods show their effect on training time, as they significantly reduce the data size. By using feature extraction methods, %99.9 data reduction has reduced the training time approximately 230 times. Training time with full resolution images are at a high rate for each model. As the number of parameters of the trained model increases, time efficiency decreases. Reducing the data size with feature extraction for complex models with a high number of parameters accelerates the training and testing processes. Obtaining distinctive information by reducing the size of the data by feature extraction and selection, affects the success of the classification. According to the graph in Figure 7 , when the results for the Lenet-5 CNN model are examined, the feature extraction method we have developed has reached the most successful result with an accuracy of %99.8. The Lenet-5 model is the CNN model with a simple architecture that gives the first successful results. According to the tests made with this model, high-dimensional raw data and PTMEMPR method gave %99.8 accuracy. The main difference between these two methods is the amount of data used. While a 32x32 matrix was used as input with PTMEMPR method, 1024x1024 data were used in raw images. A significant amount of data reduction does not reduce the success result. The method we have developed is advantageous in terms of time efficiency and success rate compared to training with raw data. According to the tests performed with the Lenet-5 model, the success achieved with the plain TMEMPR method was lower than SVD. SVD is a state-ofthe-art method among matrix decomposition methods. The PTMEMPR method we have developed has yielded successful results compared to the SVD method. When the test results for FSMCov and FSMCov-L models are examined in Figure 7 , the highest accuracy rate was obtained by PTMEMPR method. The success achieved with the training with high-dimensional raw data remained low compared to the PTMEMPR method. The method developed in line with these results increases the success rate while decreasing the data size. Compared to other methods, obtaining distinctive features is stronger in our developed method. The highest success rate in the VGG-16 model was achieved with DCT. Training results with SVD, PTMEMPR and raw data provided an equal rate of accuracy. The VGG16 model has the highest convolution layer compared to other models. The number of layers and trainable parameters of this model is higher than that of other models. In the model comparison in Table 2 , the VGG16 model is a complex model that takes up a lot of memory. FSMCov-N model is the only ANN model used in this study. The other models used have CNN architecture, while FSMCov-N does not have a convolutional layer consisting only of neural networks. When the outputs of this model are analyzed in Figure 7 , the success achieved with the raw data remained at a low rate. The convolution layer in CNN models is the layer where feature extraction takes place from the data. The absence of convolutional layer in FSMCov-N model negatively affected the test results made with raw data. SVD, TMEMPR and PTMEMPR methods provide a high accuracy for this model. The SVD method and the method we propose have achieved an accuracy rate of %99.8. When the model comparisons in Table 1 are examined, the FSMCov-N model is at a very low level in terms of size compared to the other models. The fact that the size of the model and the number of parameters are too low does not decrease the success of the model. With the developed FSMCov-N model and the PTMEMPR method developed in this study, the diagnosis of Covid-19 was achieved with an accuracy rate of 99.8%. Bold values indicate the architecture with lowest size (MB) and least number of parameters In addition to these results, Table 4 compares our proposed methods with 11 state-of-the-art models recently reported in the literature to detect and classify Covid-19 cases. Our methods presented in Table 4 are PTMEMPR+FSMCov , PTMEMPR+FSMCov-L and PTMEMPR+FSMCov-N, respectively. The types of data used in other studies, the amount of data and the number of cases in the data set are indicated in Table 4 . In addition, the methods used for classification are available in Table 4 . Although they cannot fully be compared due to the use of different data sets and data types, this table shows the position of the methods we have developed amongst the literature, as well as the results of other studies which were similar. In this study, feature extraction and classification with artificial neural networks are discussed for the diagnosis of Covid-19, which negatively affects public health, the world economy trade causes the suspension of social life. With the PTMEMPR method proposed in this article, the original data were reduced by 99.9%. As a result of the classification made with the properties obtained with the developed method, 99.8% accuracy was achieved. The high success rate obtained has been achieved with less working time compared to known methods. These results show that the developed method is a successful feature extraction method with its data reduction, high accuracy classification and low runtime capabilities. This feature extraction method seems to be an important method for future medical applications that require high accuracy and fast results. With the developed method, medical data can be encrypted, compressed and used in classification problems such as disease diagnosis. Effect of photic stimulation for migraine detection using random forest and discrete wavelet transform Optimal feature selection-based medical image classification using deep learning model in internet of medical things Fast automated detection of COVID-19 from medical images using convolutional neural networks Segmentation and feature extraction in medical imaging: A systematic review Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets CNN-based transfer learning-BiLSTM network: A novel approach for COVID-19 infection detection Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms Imaging profile of the covid-19 infection: Radiologic findings and literature review Clinical features of patients infected with 2019 novel coronavirus in wuhan, china COVID-19 image data collection The Influence of the Support Functions on the Quality of Enhanced Multivariate Product Representation Block tridiagonal matrix enhanced multivariance products representation (BTMEMPR) Iterative Enhanced Multivariance Products Representation for Signal, Image and Video Processing Effective Compression of Hyperspectral Images Classification of covid-19 x-ray images using tridiagonal matrix enhanced multivariance products representation (tmempr) Gradient-based learning applied to document recognition Automatic detection of coronavirus disease (COVID-19) using x-ray images and deep convolutional neural networks Detection of Coronavirus Disease (COVID-19) Based on Deep Features and Support Vector Machine Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks A deep learning algorithm using CT images to screen for corona virus disease (COVID-19) Automated detection of COVID-19 cases using deep neural networks with X-ray images Coronet: a deep neural network for detection and diagnosis of COVID-19 from chest x-ray images A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2 COVID-net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-Ray images Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with ct images Covidx-net: a framework of deep learning classifiers to diagnose COVID-19 in x-ray images Improving the performance of CNN to predict the likelihood of COVID-19 using chest x-ray images with preprocessing algorithms