key: cord-1024092-y1nxvr4f
authors: Chen, Chao; Mao, Jinhong; Liu, Xinzhi; Tan, Yi; Abaido, Ghada M; Alsayed, Hamdy
title: Compressed Feature Vector-based Effective Object Recognition Model in Detection of COVID-19
date: 2021-12-25
journal: Pattern Recognit Lett
DOI: 10.1016/j.patrec.2021.12.016
sha: e9f1a765c9d475010d37b2fc85d9608555be9306
doc_id: 1024092
cord_uid: y1nxvr4f

To better understand the structure of the COVID-19, and to improve the recognition speed, an effective recognition model based on compressed feature vector is proposed. Object recognition plays an important role in computer vison aera. To improve the recognition accuracy, most recent approaches always adopt a set of complicated hand-craft feature vectors and build the complex classifiers. Although such approaches achieve the favourable performance on recognition accuracy, they are inefficient. To raise the recognition speed without decreasing the accuracy loss, this paper proposed an efficient recognition modeltrained witha kind of compressed feature vectors. Firstly, we propose a kind of compressed feature vector based on the theory of compressive sensing. A sparse matrix is adopted to compress feature vector from very high dimensions to very low dimensions, which reduces the computation complexity and saves enough information for model training and predicting. Moreover, to improve the inference efficiency during the classification stage, an efficient recognition model is built by a novel optimization approach, which reduces the support vectors of kernel-support vector machine (kernel SVM). The SVM model is established with whether the subject is infected with the COVID-19 as the dependent variable, and the age, gender, nationality, and other factors as independent variables. The proposed approach iteratively builds a compact set of the support vectors from the original kernel SVM, and then the new generated model achieves approximate recognition accuracy with the original kernel SVM. Additionally, with the reduction of support vectors, the recognition time of new generated is greatly improved. Finally, the COVID-19 patients have specific epidemiological characteristics, and the SVM recognition model has strong fitting ability. From the extensive experimental results conducted on two datasets, the proposed object recognition model achieves favourable performance not only on recognition accuracy but also on recognition speed.

Since the outbreak of the COVID-19, The number of infections worldwide is increasing every day. The main symptoms are respiratory cough, fever, shortness of breath, difficulty breathing, etc. Severe cases have pneumonia, acute respiratory syndrome, kidney failure, and even die. There is no specific treatment for it. The manifestations of the COVID-19 patients are still not fully understood, and the virus is mutating, making the research of COVID-19 urgent.

Object recognition plays an important role in the field of computer vision and various multimedia applications. The task of object recognition is to determine whether a certain type of object is existed in a given image. Due to the problem of blur, deformation, partial occlusion, illumination change, clutter background, etc.; object recognition becomes a challenge task. To address such kind of problems, the high dimensional hand-craft features and the classification model with high computation complexity are adopted to ensure the recognition accuracy. However, the high dimensional hand-craft features and complicated classification models make the prediction are inefficiency. In practice use, we demand not only on recognition accuracy but also on recognition speed. Because that the fast and accurate object recognition can provide effective support for other computer vision tasks, such as object detection and tracking. It provides powerful basis for the system's decision-making. Moreover, with the growth of the image data and the improvement of the demand of system's intelligence, developing the high-precision, real-time and welladapted approaches have become a trend.

Traditional virus classification tests take into account the morphology, serology, host range, protein, and physical and chemical characteristics, such as sensitivity to organic solvents, cell culture, structure, and molecular weight. With the development of computer algorithms, intelligent systems are widely used in identification of chemical composition and complex biological macromolecules because of their sensitivity and accuracy. Some researchers have used relevant algorithms to conduct systematic cluster analysis to identify viruses. Similar strains clustering together has important implication significance for identifying virus strains. This paper deals with the practical problems of object recognition based on feature compression and classification model optimization. This model assists in the identification and classification of the COVID-19. Firstly, from the aspect of feature compression, the traditional approaches always involve a large number of matrix decomposition operations, such as Principal Component Analysis (PCA) [1] and Singular Value Decomposition (SVD) [2] , which are inefficient. To improve the compression efficiency, a feature compression algorithm based on compressive sensing is proposed. Though this algorithm, a spares random matrix is adopted to map the high-dimensionality feature into the low-dimensionality feature space, and the mapping just involves the operation of matrix multiplication. Secondly, although the commonly used classification model--Kernel SVM [3] as good generalization capability, the prediction cost will increase with the number of support vectors increases. Therefore, a support vector reduction algorithm is proposed to optimize the classification model, which reconstructs a simplified subset of original support vectors through the way of cyclic iterations. The optimized classification model achieves the similar generalization capability with the original model but spends less time on predicting. Figure 1 shows the workflow of the proposed method, which is divided into two training stage and the testing stage. High-dimensional feature vectors are extracted from the COVID-19 image first In the training stage, the highdimensional feature vectors are extracted from the image firstly, and then a very sparse matrix is constructed to map the high-dimensional feature vectors into the low-dimensional domain. Lastly, a kernel SVM classifier is trained by the compressed features, and the kernel SVM is optimized by the support vector reduction method. In the testing phase, we adopt the same method to compress the features and put the low-dimensional features into the optimized kernel SVM for prediction. The rest of this paper is organized as follows: In Section 2, we introduce the relevant studies about the related works. In Section 3, the basic theories and analysis regarding to feature compression and model optimization are introduced. In Section 4, experimental results and analysis are presented. In section 5, we conclude this paper and propose future work.

After the virus infects the body, the body recognizes the virus, initiates an innate immune response, produces an antiviral response, secretes a large number of inflammatory cytokines, and mediates the occurrence of inflammation.

The body resists viral infections by clearing the infected virus, like type I interferon can induce the expression of "Mx" gene, hinders the initial transcription process of PB2 polymerase. The joint feature and compressed dictionary learning have few applications in detection of viruses. Some scholars have applied this learning algorithm to classify cars and found, that the learning algorithm can improve the accuracy of car model recognition.

In this section, we briefly review the recent studies [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] in the literature on object recognition.

Recently, object recognition approaches focus on extracting discriminative features, which can be divided into two kind of frameworks: deep learning [4] and bag-of-words [5] . Although the deep learning approaches achieved excellent performance in object recognition, these approaches always involved higher computation resources and training set. The classical deep model such as AlexNet, GoogleNet and ResNet were designed for large scale image classification task, which adopt the ImageNet that is a very large image dataset contained about 14 million images for training and testing. However, the deep model is not suitable for the dataset with less samples. Moreover, the deep models always trained on multi-GPUs, which is time consuming for the platform with on CPU architecture. Therefore, for the situation of less samples and less computing resources, the bag-of-word framework is well adoptive.

For the bag-of-words framework, the image features always extracted with dense sampling scheme, which adopted a fixed size and scale to extract a large number of local descriptions from the image. The extracted local feature descriptions have higher redundancy, which can be encoded into a feature vector, then, the feature vector is fed into classifier for classification. For example, Xu et al. [8] designed the HOG (Histogram of oriented gradient) feature and trained a linear SVM, which achieved great success in vehicle recognition. Ma et al. [9] adopted GMM (Gaussian mixture model) and LCC (Local coordinate coding) to reduce the redundant information of the densely sampled SIFT (Scale-invariant feature transform) descriptions from the image and trained a linear classifier for object recognition.

Zhuang et al. [10] fused multi kind of local feature, vector quantization coding and adopted spatial pyramid match to generate the high dimensional feature vectors, then trained them by a kernel SVM for recognition. Due to the rich and hierarchical information of the images, this approach performed well on object recognition. Clearly, with the richer information of feature vector, the classifier obtained better performance. Therefore, in order to improve the recognition performance, multi kind of feature were fused, but this will cause the computation cost. Jorge et al. Dr.

Lowery from the University of Ars in Northern Ireland developed a "DNA fingerprinting" system. It can detect a variety of viruses including smallpox within 15 years. [11] proposed the approach that adopted the fisher vector to encode the feature vector and then compressed the high dimensional feature vector by product quantization algorithm. With the compressed feature vectors, the trained classifier performed well on object recognition. Feature compression aims to save as much information of original feature vectors as possible, which reduce the model training time and the storage space. The PCA and SVD are two main feature compression methods in the file of object recognition. The PCA is used to map the feature vector from the high dimensional space to the low dimensional space by orthogonal linear projection. While the SVD adopts the singular value decomposition of matrix to extract the key information of original feature vectors. However, the matrix decomposition usually involves higher computation complexity. Additionally, classification model is another important in object recognition framework. The SVM is the classical classification model and it was adopted in a lot of works, such as the studies [8, 9, 10] . It shows well generalization capability in the situation of less samples, non-linear and high dimensional problems. However, the SVM still has some limitations, such as the prediction speed will decrease with the number of the support vectors increases.

Koibayashi et al. [13] think that the support vector set can be reduced by finding a simplified support vector set in feature space. Geebelen et al. [14] proposed a method that selected a typical sample set from the original training set and adopted the reduced sample set to train the prediction model. Therefore, this work deal with the object recognition problems from two aspects: 1) designing an efficient and compressed feature to reduce the computational complexity and speed up the prediction, 2) developing a model optimization method to improve the recognition speed without the accuracy loss.

Corona viruses have no cell structure and can colonizes living cells. They rely on living cells to synthesize protein.

The main genetic material of the coronaviruses is single-stranded positive-stranded RNA. Its basic unit is ribonucleotides, which can be stained with Piro Red dye. They bind to ACE2 on the cell membrane to invade the cell, indicating that the cell membrane is responsible for information exchange. The offspring of the virus are discharged out of the cell through the vesicles, indicating that the biofilm has certain fluidity. Figure 2 shows the structure of the coronavirus. S is the spike protein, E is the mantle protein, M is the membrane protein, and N is the ribonucleoprotein. The SIFT descriptor and the dense sampling scheme is adopted in our work. Firstly, the image of COCID-19 is segmented into local patch with 16x16 pixels; each local patch is divided into 16 grids with the size of 4x4 pixels.

Secondly, the gradient magnitude g(x, y) and orientation (x, y) of each position (x, y) in the grid is calculated by equation (1), then the orientation of each position (x, y) to is assigned to a specific bin and forming a histogram (which is ranged from 0 to 360° and divided into 8 bins, each of which is cover 45°). Thirdly, all of the orientation histograms of each local patch (with 4x4 grids) is aggregated and is formed into a 128D (D is short for dimension) SIFT descriptor; the detail is shown in Figure 2 . The SIFT descriptor of next local patch is calculated with spacing of 8 pixels until whole image is traversed. 

The extracted 128D local features contained a lot of redundancy and are very difficult to model the semantic features. Therefore, though the bag-of-word framework, the quantified local feature descriptors can be seen as "visual word", and the image content is expressed by the distribution of the "visual words" in the image. In our approach, we employed 200 images and extracted their local feature descriptors according to the method in section 3.1, then the K-Means [15] algorithm is adopted to calculated the cluster centers. We set the hyper parameter K to 1024, therefore, 1024 "visual words" were selected to build a "visual dictionary". Each image can be encoded by the "visual dictionary" with the size of 1024D. This procedure is described in Figure 3 . Fig. 5 The framework of bag-of-word

From compressive sensing theory [16, 17] , it is known that if dimensionality of feature space is extremely high, these features can be randomly projected to a low-dimensional feature space that includes enough information. Given a vector v in the low dimensionality space, n v  , a vector u in the high dimensionality feature space, m u  , and the mapping by a random matrix P is defined as equation (2):

wheren<<m.

The projection v is similar to a compressive measurement in the compressive sensing encoding stage. From the JL (Johnson-Lindenstrauss) lemma, given one signal is linear combination of only K basis [18] , that signal is possible to reconstruct from a small number of random measurements. 

Liu et al. [19] adopted random Gaussian matrix and showed that sparse random measurement matrix obtained favorable results in texture classification. Actually, a very sparse measurement matrix with sparse elements can be defined as equation (7): 1, with probability 1/2x * 0, with probability 1-1/x 1, with probability 1/2x

Chen et al. [20] 

When the optimal classification hyperplane of kernel SVM is determined, the normal vector  of its eigenspace (in the equation (8) 

Take the derivative of  in equation (10) and the result is shown in equation (11) and (12) (13) Take the derivative of sin equation (13), the result in equation (14) (14) In our approach, we define (15) According to the equation (12) and (14), we can iteratively solve the  m and s m (shown in equation 16). where, 1 :

The condition for the iteration to terminate is that 1 m  less than a threshold or m reaches the specified number of support vectors.

In order to validate the performance of ourapproach, we tested it on Caltech 101 dataset. In the experiment,all approaches were programmed in Matlab 2014a and were run on a PC with an CPU (Intel Core i5 and 2.5 GHz) and 12 GB memory.

The Caltech 101 dataset is a popular dataset in the field of computer vision. Each category of this includes from 31 to 1100 images. These images are medium resolution with the size of 500 ×800. In our experiments, about 775 images of 8 sub-categorieswere selected for training and testing. Figure 6 and (1) Recognition accuracy

In the training stage, firstly, we randomly selected 10 images from each sub-categories and extracted the SIFT descriptors with the method mentioned in section 3.1,and then the K-Means algorithm (K was set to 1024) was adopted to build a "visual dictionary" with the size of 1024 (described in section 3.2). The rest images were randomly divided into 1:1 and adopted for model training and testing. Secondly, the feature vectors were compressed to 100D by a very sparse matrix of size 100×1024 (which was built based on the compressive sensing method mentioned in In the testing stage, we firstly extracted the feature vector from each testing image; secondly, the feature vectors were adopted the same sparse matrix (in training stage) for feature compression. Thirdly, the compressed features were fed into the optimized model for prediction. We compared the proposed method with other two methods as follows, which employed the uncompressed feature vectors and unoptimized classification model. The tests were conducted 5 times and the comparison results show in Table2. extracting SIFT descriptors and building 1024D feature vector with the same "visual dictionary" of our method, then adopting linear SVM for prediction extracting SIFT descriptors and building 1024D feature vector with the same "visual dictionary" of our method, then adopting Kernel SVM for predictionfrom the results in Table 2 , the kernel SVM achieved the best performance but consumed most time. Our proposed method obtained the approximate recognition performance with kernel SVM, but the recognition speed is much faster than kernel SVM. The linear SVMachieved the worst recognition performance and the worse recognition speed than our method. Therefore, the proposed method achieved the balance of accuracy and speed. (2) Feature compression

We compared the performance of the feature compression, the classical PCA and SVD were employed for comparison. In addition to the method of feature compression, we adopted the same settings for training and testing stage. The original 1024D feature vectors were compressed to different dimensions (D=20, 40, 60, 80, 100, 120, 140, 160, 180, 200).The results are shown in figure 7 . Additionally, we validated the performance under different reduction rate (it means the proportion of the reduced support vectors to the total number of support vectors). Figure 8 illustrate the results. From the aspect of accuracy, the optimized model keeps the performance approximately unchanged with reduction rate from 10% to 50%, when the reduction rate is raised to 60% or higher, the performance becomes decrease. From the aspect of recognition speed, with the reduction rate increases, the recognition speed increases linearly.

Relationship between reduction rate and accuracy

Relationship between reduction rate and speed 

In this paper, we focus on dealing with the practical problem in the task of object recognition. In order to improve the recognition efficiency of the object recognition workflow, we proposed the method to improve the stage of feature compression and classification. Traditionally, high dimensional features are compressed and extracted the main information by PCA or SVD. To improve the compression efficiency, we adopted a very sparse matrix to map the high dimensional features to the low dimensional space by the theory of compressive sensing. Moreover, to improve the recognition speed further, we proposed a model optimization method, which accelerates the classification speed of the model by reducing the number of the support vector of the kernel SVM. From the experimental results, our proposed model achieved favourable recognition accuracy and speed.

Our future work will focus on developing more powerful features. In this work, although the local feature and yield favourable results, we think there is room for improvement. Moreover, applying this method into other computer vison tasks (such as object detection and object tracking) is very interesting as well.

Structured Sparse Principal Components Analysis with the TV-Elastic Net penalty

The singular value decomposition: Its computation and some applications

Accelerated max-margin multiple kernel learning

An Analysis Of Convolutional Neural Networks For Image Classification

Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice

Analysis of ResNet and GoogleNet models for malware detection

Deep Coupled ResNet for Low-Resolution Face Recognition

A hybrid vehicle detection method based on viola-jones and HOG + SVM from UAV images

Generalized Pooling for Robust Object Tracking

Binary feature from intensity quantization and weakly spatial contextual coding for image search

Exponential family Fisher vector for image classification

Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification

Efficient reduction of support vectors in kernel-based methods

Reducing the number of support vectors of SVM classifiers using the smoothed separable case approximation

A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video

A performance comparison of measurement matrices in compressive sensing

Compressive Sensing Image Restoration Using Adaptive Curvelet Thresholding and Nonlocal Sparse Regularization

An elementary proof of a theorem of johnson and lindenstrauss

Texture Classification from Random Features

Efficient extreme learning machine via very sparse random projection

The authors declared that they have no conflicts of interest to this work.

Thanks are due to Dawei Liu for assistance with the experiments and to Lin Fu for valuable discussion.