key: cord-155804-ft2pbgsl authors: Yamac, Mehmet; Ahishali, Mete; Degerli, Aysen; Kiranyaz, Serkan; Chowdhury, Muhammad E. H.; Gabbouj, Moncef title: Convolutional Sparse Support Estimator Based Covid-19 Recognition from X-ray Images date: 2020-05-08 journal: nan DOI: nan sha: doc_id: 155804 cord_uid: ft2pbgsl Coronavirus disease (Covid-19) has been the main agenda of the whole world since it came in sight in December 2019. It has already caused thousands of causalities and infected several millions worldwide. Any technological tool that can be provided to healthcare practitioners to save time, effort, and possibly lives has crucial importance. The main tools practitioners currently use to diagnose Covid-19 are Reverse Transcription-Polymerase Chain reaction (RT-PCR) and Computed Tomography (CT), which require significant time, resources and acknowledged experts. X-ray imaging is a common and easily accessible tool that has great potential for Covid-19 diagnosis. In this study, we propose a novel approach for Covid-19 recognition from chest X-ray images. Despite the importance of the problem, recent studies in this domain produced not so satisfactory results due to the limited datasets available for training. Recall that Deep Learning techniques can generally provide state-of-the-art performance in many classification tasks when trained properly over large datasets, such data scarcity can be a crucial obstacle when using them for Covid-19 detection. Alternative approaches such as representation-based classification (collaborative or sparse representation) might provide satisfactory performance with limited size datasets, but they generally fall short in performance or speed compared to Machine Learning methods. To address this deficiency, Convolution Support Estimation Network (CSEN) has recently been proposed as a bridge between model-based and Deep Learning approaches by providing a non-iterative real-time mapping from query sample to ideally sparse representation coefficient' support, which is critical information for class decision in representation based techniques. C ORONAVIRUS disease 2019 (Covid-19) has been declared as a pandemic by the World Health Organization (WHO) two months after its first appearance in December, 2019 in Wuhan, China. It has infected more than 3 million people, caused thousands of causalities and has so far paralyzed the mobility all around the World. The spreading rate of Covid-19 is so high that the number of cases is expected to be doubled every three days if the social distancing is not strictly observed to slow this accretion [1] . Roughly around half of Covid-19 positive patients exhibit also a comorbidity [2] , making difficult to differentiate Covid-19 from other lung diseases. Automated and accurate Covid-19 diagnosis is critical for both saving lives and preventing its rapid spread in the community. Currently, RT-PCR (Reverse transcription polymerase chain reaction) and CT (computed tomography) are the common diagnosis techniques used today. RT-PCR results are ready at the earliest 24 hours for critical cases and generally take several days to conclude a decision [3] . CT may be an alternative at initial presentation; however, it is expensive and not easily accessible [4] . The most common tool that medical experts use for both diagnostic and monitoring the course of the disease is X-ray imaging. Compared to RT-PCR or CT test, having an X-ray image is an extremely low cost and a fast process, usually taking only few seconds. Recently, WHO reported that even RT-PCR may give false results in Covid-19 cases due to several reasons such as poor quality specimen from the patient, inappropriate processing of the specimen, taking the specimen at an early or late stage of the disease [5] . For this reason, X-ray imaging has a great potential to be an alternative technological tool to be used along with the other tests for an accurate diagnosis. Accordingly, there are several recent works [6] , [7] , [8] , [9] that have been proposed for Covid-19 detection/ classification from X-ray images. However, they use a rather small dataset (the largest containing only a few hundreds of X-ray images), with only a few Covid-19 samples. This makes it difficult to generalize their results in practice. To address this deficiency and provide reliable results, in this study the researchers of Qatar University and Tampere University have compiled the largest Covid-19 dataset, called QaTa-Cov19. Compared to the earlier benchmark dataset created in this domain, such as COVID Chestxray Dataset [10] or Covid-19 DATASET [11] , QaTa-Cov19 has the followıng unique benchmarking properties. First, it is the largest dataset, not only in terms of the number of images (more than 6200 images) but its versa-tility i.e., QaTa-Cov-19 contains additional major pneumonia categories, such as Viral and Bacterial, along with the control (normal) class. Moreover, this is the most diverse dataset encapsulating X-ray images from several countries (e.g. Italy, Spain, China, etc.) produced by different X-ray machines. Finally, the images are in different quality, resolution and SNR levels as shown in Fig. 1 . Fig. 1 : Sample Covid-19 X-ray images from QaTa-Cov19. QaTa-Cov19 contains many X-ray images from the Covid-19 patients who are in the early stages; therefore, their Xray images show mild or no-sign of Covid-19 infestation by the naked eye. Some sample images are shown in Fig. 2-(b) . Another fact which makes the diagnosis far more challenging is that inter-class similarity can be very high for many X-ray images as some samples shown in Fig. 2-(a) . Against such high inter-class similarities and intra-class variations, in this study we aim for a high robustness level. Our primary objective is to achieve the highest sensitivity possible in the diagnosis of Covid-19 induced pneumonia with an acceptable false-alarm rate (e.g. specificity > 95%). In particular, the misdiagnosis of a Covid-19 X-ray image as a normal case should be minimized whilst a small number of false negatives is tolerable. In numerous classification tasks, Deep Learning techniques have been shown to achieve state-of-the-art performance in term of both recognition accuracy and their parallelizable computing structures which play an important role especially in real-time applications. Despite their advantages, in order to achieve a desired performance level in a deep model, a proper training over a massive training dataset is usually needed. Nevertheless, this is unfortunately not an option yet for this problem since the available data is still rather limited. An alternative supervised approach, which requires a limited number of training samples to achieve satisfactory classification accuracy is representation-based classification [12] , [13] , [14] . In representation-based classification systems, a dictionary, whose columns consist of the training samples that are stacked in such a way that a subset of them corresponding to a class, is pre-defined. A test sample is expected to be a linear combination of all points from the same class as the test sample. Therefore, given a predefined dictionary matrix, D and a test sample y, we expect the solutionx from y = Dx, carry enough information about the class of y. The two well-known representation based classification methodologies are sparse representation-based classification (SRC) [13] and collaborative representation based classification (CRC) [12] . Out of these two, SRC provides slightly improved accuracy by solving a sparse representation problem, i.e., producing a sparse solutionx from y = Dx. Then, the location of the non-zero elements ofx, which is also known as support set, provides us with the class of the query y. Despite improved recognition accuracy, SRC solutions are iterative solutions and can be computational demanding compared to CRC. In a recent work [15] , a compact neural network design that can be considered as a bridge between learning-based and representation-based methodologies was proposed. The socalled Convolutional Support Estimation Network (CSEN) uses a pre-defined dictionary and learns a direct mapping using moderate/low size training set, which maps query samples, y, directly to the support set of representation coefficients, x (as it should be purely sparse in the ideal case). In this study, to address the aforementioned limitations in Covid-19 diagnosis from X-ray images we propose a CSENbased approach. Since the largest set of Covid-19 X-ray images ever compiled is used in this study, the proposed approach can be evaluated rigorously against a high-level of diversity to obtain a reliable analysis. The general pipeline of the proposed CSEN based recognition scheme is illustrated in Fig. 3 . In order to obtain highly discriminative features, we use the recently proposed CheXNet [16] , which is the fine-tuned version of 121 layer Dense Convolutional Network (DenseNet-121) [17] by using over 100000 frontal view Xray images form 14 classes. Having the pre-trained CheXNet for feature extraction, we develop two different strategies to obtain the classes of query X-ray images: (i) using collaborative representation-based classification with a proper preprocessing; (ii) a slightly modified version of our recently proposed convolution support estimator (CSEN) models. The proposed CSEN scheme outperforms the competing methods and achieves over 98% of sensitivity and over 95% for specificity in this challenging dataset. The rest of the paper is organized as follows. In Section II, notations and mathematical preliminaries are given with emphasis on sparse representation and sparse support estimation. Then in Section III, a literature review on deep learning models over X-ray images and representation based classification is presented. The proposed CSEN-based Covid-19 recognition system is introduced in Section IV along with two recent alternative approaches that are used as the competing methods. The data collection is also explained in this section. Experimental setup and the main results are provided in Section V. Finally, Section VI concludes the paper and suggests topics for future research. In this study, the p -norm of a vector x ∈ R n is defined as On the other hand, Sparse representation (SR) of a signal s ∈ R d in a predefined set of waveforms, Φ ∈ R d×n , can be defined as representing s as a linear combination of only a small subset of atoms of in the dictionary Φ, i.e, s = Φx. Defining these sets, which dates back to Fourier's pioneering work [18] , has been excessively studied in the literature. In the early approaches, these sets of waveforms have been selected as a collection of linearly independent and generally orthogonal waveforms (which are called a complete dictionary or basis i.e, d = n) such as Fourier Transform, DCT and Wavelet Transform, until the pioneering work of Mallat [19] on overcomplete dictionaries (n >> d). In the last decade, interest in SR research increased tremendously and their wide range of applications includes denoising [20] , classification [21] , anomaly detection [22] , [23] , Deep Learning [24] and Compressive Sensing (CS) [25] , [26] . With a possible dimensional reduction that can be satisfied via a compression matrix A ∈ R m×d (m << d), sample can be obtained from s, where D ∈ R m×n can be called the equivalent dictionary. Because Eq. (1) describes an under-determined system of linear equations, finding the representation coefficient vector x requires at least one more constraint to have a unique solution. Using the prior information about sparsity, the following representation which is also a sparse representation of x has a unique solution provided that D satisfies some required properties [27] . However, the optimization problem in Eq. (2) is a NPhard. Fortunately, the following relaxation produces exactly the same solution as that of Eq. (2) provided that D obeys some criteria [28] and m > k(log(n/k)). In addition, real world applications generally exhibit not exact sparsity but approximate sparsity. Furthermore, the query sample y can be corrupted with an additive noise pattern. In this case, the equality constraint in Eq. (3) can be further relaxed such as in the Basis Pursuit Denoising (BPDN) [29] : where is a small constant that depends on the noise level. We may refer to the Sparse Support Estimation (SE) problem as finding the indices a set, Λ, of non-zero elements of x [30] , [31] . Indeed, in many applications, SE can be more important than finding the magnitude and sign of x as well as Λ, which refers to the sparse Signal Recovery (SSR) via a recovery technique, such as Eq. (3). For example, in a sparse representation based classification system, a query sample y can be represented with sparse coefficient vector, x, in the dictionary, D in such a way that when we recover this representation coefficient from y = Dx, the solution vectorx is expected to have a significant number of non-zero coefficients coming from the particular locations corresponding to the class of y. Readers are referred to [15] for more detailed literature review on SE and its applications. In the sequel, we briefly summarize the building blocks of the proposed approach. In the proposed approach, we first use the pre-trained deep network, CheXNet, to extract discriminative features from raw X-ray images. CheXNet was developed for pneumonia detection from the chest X-ray images [16] . In [16] , it was claimed that their CheXNet can perform even better than expert radiologist in the pneumonia detection problem. This deep neural network design is based on previously proposed DenseNet [17] that consists of 121 layers. It is first pre-trained over ImageNet dataset [32] and performed transfer learning over 112120 frontal-view chest X-ray images in the ChestX-ray14 dataset [33] . Given a test sample y, which represents either the extracted features, s, or their dimensionally reduced version, i.e., y = As. In developing the dictionary, training samples are stacked in D with particular locations in such a way that the optimal support for a given query y should be the set of all points coming from the same class as y. Therefore, a solution vector,x of y = Dx is supposed to have enough information, i.e., the sparse support should be the set of location indices of the training sample from the same class as y. This strategy is generally known as representation-based classification. However, a typical solutionx of y = Dx is not necessarily a sparse one especially when its size grows with more training samples, which results in a highly underdetermined system of linear equations. Fortunately, if one estimates the representation coefficient vector with a sparse recovery design such as 1 -minimization as in Eq. (3), we can expect that the important non-zero entries of the solution, x, are grouped in the particular locations that correspond to the locations of the training samples from the same class as y. This can be a typical example of scenarios where support estimation can be more valuable than the magnitudes and sign recovery as explained in Section II-B. For instance, [14] proposed a systematic way of determining the identity of face images using 1 -minimization. The authors develop a three-step classification technique that includes: (i) normalization of all the atoms in D and y to have unit 2 -norm; (ii) estimating the representation coefficient vector via sparse recovery, i.e.,x = arg min x x 1 s.t y − Dx 2 ; and (iii) finding the residuals corresponding to each class via e i = y − D ixi 2 , wherex i is the group of the estimated coefficients,x, that correspond to class i. This technique, which is known as Sparse Representation based Classification (SRC), and its variants have been applied to a wide range of applications in literature [34] , [35] , e.g., human action recognition [36] , and hyperspecral image classification [37] , to name a few. Despite the good recognition accuracy performance of SRC systems, their main drawbacks is the fact that their sparse recovery algorithms (e.g., 1 -minimization) is iterative methods and computationally costly, rendering them infeasible in real time applications. Later, the authors of [12] introduced Collaborative Representation based Classification (CRC), which is similar to SRC except for the use of traditional 2 -minimization in the second step; x = arg min x y − Dx . Thus, CRC does not require an iterative solution to obtain representation coefficient thanks to that 2 -minimization has a closed form solution, Although, the sparsity inx cannot be guaranteed, it has often been reported to achieve a comparable classification performance, especially in smallsize training datasets. Covid-19 chest X-ray images were gathered from different publicly available but scattered image sources. However, the major sources of Covid-19 images are Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [11] , Radiopaedia [38], Chest Imaging (Spain) at thread reader [39] and online articles and news-portals. The authors have carried out the task of collecting and indexing the X-ray images for Covid-19 positive cases reported in the published and preprint articles from China, South Korea, USA, Taiwan, Spain, and Italy, as well as online news-portals (up to 20th April 2020). Therefore, these X-ray images represent different age groups, gender, ethnicity and country. Negative Covid19 cases were normal, viral and bacterial pneumonia chest X-ray images and collected from the Kaggle chest Xray database. Kaggle chest X-ray database contains 5863 chest X-ray images of normal, viral and bacterial pneumonia with varying resolutions [40] . Out of these 5863 chest X-ray images, 1583 images are normal images and the remaining are bacterial and viral pneumonia images. Sample X-ray images from QaTa-Cov19 dataset are shown in Fig. 6 . With their outstanding performance in image classification along with other inference tasks, deep neural networks became a dominant paradigm. However, these techniques usually necessitates a large number of training samples (e.g., several hundred-thousand to millions depending on the network size) to achieve an adequate generalization capability. That is to say, the aforementioned problem of the data scarcity with the Covid-19 case prevents us from training a deep learning technique from scratch. Albeit, we can still leverage their power by finding properly pre-trained models for similar problems. To this end, we use a-state-of-the art pneumonia detection network, CheXNet, whose details are summarized in Section III-A. With the pre-trained model, we extract 1024long vectors, right after the last average pooling layer. After data normalization (zero mean and unit variance), we obtain a feature vector s ∈ R d=1024 . A dimensionality reduction PCA is applied to s in order to get the query sample, y = As ∈ R m , where A ∈ R m×d is PCA matrix (m < d). Considering the limited number of training data in our Covid-19 dataset, a representation-based classification can be applied hereafter to obtain the class of y using the dictionary Φ (in the form of D = AΦ), whose columns are stacked training samples with class-specific locations. As discussed earlier, sparse representation-based classification is a support estimation problem which is expected to be an easier task than a sparse signal recovery problem. On the other hand, even if the exact signal recovery is not possible in noisy cases or in cases wherex is not exactly but approximately sparse (which is the case in almost all the time in dictionarybased classification problems), it is still possible to recover the support set exactly [9] , [30] , [41] , [42] or partially [42] , [43] , [44] . However, many works in the literature dealing with SE problems tend to first apply a sparse recovery technique on y to first getx, then use simple thresholding overx to obtain a sparse support estimation,Λ. Nevertheless, SSR techniques such as 1 -minimization are rather slow and their performance varies from one SRR tool to another [15] . In our previous work [15] , we proposed an alternative solution for this handcrafted sparse recovery approach which aims to learn a direct map from test sample y to the support setΛ. Along with the speed and stability compared to conventional SSR based techniques, and recent deep learning based solutions to SRR problem, CSEN has a crucial advantage of having a compact design that can achieve a good performance level even over scarce training data. Mathematically speaking, an ideal CSEN is supposed to yield a binary mask v ∈ {0, 1} n : which indicates the true support i.e., Λ = {i ∈ {1, 2, .., n} : v i = 1}. In order to approximate this ideal case, a CSEN network, P (y, D) produces a probability vector p which returns a measure about the probability of each index being in Λ such that p i ∈ [0, 1]. Having the estimated probability map, estimating the support can easily be done viaΛ = { i ∈ { 1, 2, .., n} : p i > τ }, by thresholding p with τ where τ is a fixed threshold. A CSEN is composed of fully convolutional layers, and as input it takes a proxy,x, of sparse coefficient vector, which is a coarse estimation of x i.e., D T D + λI −1 D T y or simplyx = D T y. Using such a proxy of x, instead of making inference directly on y has also studied in a few more recent studies. For instance, In [45] , [46] , the authors proposed reconstruction-free image classification from compressively sensed images. The input vectorx is reshaped to a 2-D plane in order to use is with 2-D convolutional layers. This transformation is performed via re-ordering the indices of the atoms in such a way that the non-zero elements of the representation vector x for a specific class come together in the 2-D plane. A representative illustration of the proposed dictionary design with compared to the traditional one is shown in Fig. 7 . Hereafter the proxyx is convolved with the weight kernels, connecting the input with the next layer with N filters to yield the inputs of the next layer, with the biases b 1 as follows: where b 1 is the weight bias, S(.) is the down-or up-sampling operation and ReLu(x) = max(0, x). In more general form, the k th feature map of layer l is defined as, (6) Therefore, the trainable parameters of CSEN will be: In developing the dictionary that is to be used in the sparse representation based classification, the training samples are stacked-in by grouping of them according to their classes. Thus, instead of using traditional 1 -minimization formulation as in Eq. (3), the following group 1 -minimization formulation may result in increased classification accuracy, where x Gi is the group of coefficients from the i t h class. In this manner, one possible cost function for a SE network would be, where P Θ (x) p is network output at location p and v p is the ground truth binary mask of the sparse code x. Due to its high computational complexity, we approximate the cost function in (8) with a simpler average pooling layer after convolutional layer, which can produce directly the estimated class in our CSEN design. An illustration of proposed CSEN-based Covid-19 recognition is shown in Fig. 3 . This section summarizes the competing methods that are selected among numerous alternatives due to their superior performance levels obtained in similar problems. For a fair comparative evaluations, all classification methods have the same input feature vectors fed to the proposed CSENs. 1) Collaborative representation-based classification: As a possible competing technique to the proposed CSEN based technique which is a hybrid method, CRC [12] is a direct and representation-based classification method. It is a noniterative support estimation technique, that satisfies faster and comparable classification performance with SRC while it is more stable compared to existing iterative sparse recovery tools as it is shown in [15] . In the first step of CRC, the trade-off parameter of regularized least square solution is set as λ = 2 * 10 −12 . 2) Multi-layer Perceptron (MLP) classification: As one of the most-common classifiers, a 4-hidden layer MLP is used for this problem. For training we used Back-Propagation (BP) with Adam optimization technique [47] . The network and training hyper-parameters are as follows: learning rate, α = 10 −4 , and moment updates β 1 = 0.9, β 2 = 0.999, and 50 as the number of epochs. Fig. 8 illustrates the network configuration in detail. This network configuration has achieved the best performance among others (deeper and shallower) where deep configurations have suffered from over-fitting while the shallow ones exhibit an inferior learning performance. 3) Support Vector Machines (SVMs): For a multi-class problem, the first objective is to select the SVM topology for ensemble learning: one-vs-one or one-vs-all. In order to find the optimal topology and the hyper-parameters (e.g. kernel type and its parameters) we first performed a grid-search with 4) k-Nearest-Neighbor (k-NN): Finally, we use a traditional approach, k-Nearest Neighbor (k-NN) is used with PCA dimensionality reduction. In a similar fashion, the distance metric and the k-value are optimized by a prior gridsearch. The following distance metrics are evaluated: Cityblock, Chebyshev, correlation, cosine, Euclidean, Hamming, Jaccard, Mahalanobis, Minkowski, standardized Euclidean, and Spearman metrics. The k-value is varied within the range of [1, 4416] with log scale. We have performed our experiments over the QaTa-Cov19 dataset, which consists of normal and three pneumonia classes: bacterial, viral, and Covid-19. The proposed approach is evaluated using a stratified 5-fold cross-validation (CV) scheme with a ratio of 80% for training and 20% for the test (unseen folds) splits, respectively. Table II shows the number of X-ray images per class in the QaTa-Cov19 dataset. Since the dataset is unbalanced, we have applied data augmentation to the training set in order to balance the size of each class in the train set. Therefore, the X-ray images in viral and Covid-19 pneumonia, and normal classes are augmented up to the same number as the bacterial pneumonia class in the train set. We use Image Data Generator by Keras to perform data augmentation by applying ZCA whitening with epsilon of 10 −6 , randomly rotating the X-ray images in a range of 10 degrees, randomly shifting images both horizontally and vertically within the interval of [−0.1, +0.1]. In each CV fold, we use a total of 8832 and 1257 images in the train and test (unseen in the fold) sets, respectively. The experimental evaluations of SVM, k-NN and CRC are performed using MATLAB version 2019a, running on PC with Intel R i7-8650U CPU and 32 GB system memory. On the other hand, MLP and CSEN methods are implemented using Tensorflow library [48] with Python on NVidia R TITAN-X GPU card. For the CSEN training, ADAM optimizer [47] is used with the proposed default learning parameters: learning rate, α = 10 −3 , and moment updates β 1 = 0.9, β 2 = 0.999 with only 15 Back-Propagation epochs. Neither grid-search nor any other parameter or configuration optimization was performed for CSEN. The same network configurations are used for CSEN as in [15] . Accordingly, we use two compact CSEN designs: CSEN1 and CSEN2, respectively. The first CSEN network consists of only two hidden convolutional layers, the first layer has 48 neurons and the second has 24. ReLu activation function is used in the hidden layers and the filter size was 3×3. On the other hand CSEN2 uses max-pooling and has one additional hidden layer with 24 neurons to perform transposedconvolution. CSEN1 and CSEN2 are compared against the 6 competing methods under the same experimental setup. For the dictionary construction in Φ each CSEN design, 625 images for each class (from the augmented training samples per fold) are stacked in a such way that the representation coefficient in the 2-D plane, X has 50 × 50 size as shown in Fig. 7 . The rest of the images in the training set are used to train each CSEN i.e., 1583 samples from each class. We use PCA dimensional reduction matrix, A with the compression ratio, CR = m d = 0.5. Therefore, we have 512 × 2500 equivalent dictionary, D, and 2500 × 512 denoiser B = D T D + λI −1 D T to obtain a coarse estimation of the representation (sparse in ideal case) coefficients,x ∈ R n=2500 . Hereafter, the CSEN networks are trained to have class of y from inputx as illustrated in Fig., 3 . Due to the lack of other learning-based SE studies in the literature, we chose a deeper network compared to CSEN designs to investigate the role of network depth in this problem. ReconNet [49] was proposed as a non-iterative deep learning solution to compressive sensing problem i.e.,ŝ ← P (y) and it is one of the state-of-the-art in compressively sensed image recognition task. It consists of 6 fully convolutional layers and one dense layer in front of the convolutional ones, which act as the learned denoiser for the mapping from y ∈ R m tos ∈ R d . Then, the convolutional layers are responsible for producing the reconstructed signal,ŝ froms. Therefore, by replacing this dense layer with the denoiser matrix B, this network can be used as a competing method. Both CSEN and the modified ReconNet usex as a input, which is produced using an equivalent dictionary D and its pseudo-inverse matrix B. In designing the dictionary of CRC system, all training samples are stacked in the dictionary, Φ, i.e., 2208 samples from each class. The same PCA matrix used in CSEN based recognition, A is applied to features, s ∈ R d=1024 . Therefore, a dictionary D of size 512 × 8832 and the corresponding denoiser matrix B of size 8832 × 512 are used in the CRC framework. The classification performance of the proposed CSEN-based approach and the competing methods is presented in Table I . As can be easily observed from the Table I , the proposed approaches surpass all competing methods in Covid-19 recognition performance by achieving 98.5% sensitivity, and over 95% specificity. As shown in Table III , compared to MLP and ReconNet, the proposed CSEN designs are very compact, and computationally efficient. This is evident in Table IV where the computational complexity (measured as total computation, time over the 1257 test images) is reported. When compared against CRC in particular, CSEN-based classification has two advantages; computational efficiency and, a superior Covid-19 recognition performance. The computational efficiency comes from the fact that a larger size dictionary matrix (of size of 512 × 8832) is used in CRC and hence, this requires more computations in terms of matrix-vector multiplications. Furthermore, saving the trainable parameters (∼ 16k) and a light dictionary matrix coefficients (∼ 1280k) in the test device is more memory efficient compared to saving coefficients (∼ 4521k) of larger size dictionary used in CRC. For further analysis, we also tested the CRC framework by using the light dictionary (of size 512 × 2500) used in CSEN based recognition. We called it CRC (light), and as it can be seen in Table V , the performance of CRC further reduced, and there was no significant improvement concerning the computational cost. When it comes to creating deeper convolutional layers instead of using CSEN designs, such as the modified ReconNet, the results presented in Table I shows us that compact CSEN structures are indeed preferable to achieve superior classification performances compared to deeper networks. Finally, Table VI presents the overall (cumulative) confusion matrix of the proposed CSEN-based Covid-19 recognition approach over the new QaTa-Cov19 Dataset. The most critical mis-classifications are the false-positives, that is, the misclassified Covid-19 X-ray images. The confusion matrix shows that the proposed approach has mis-classified 7 Covid-19 images (out of 462). The 3 out of 7 misclassifications are still in "Viral Pneumonia" category, which can be an expected confusion due to the viral nature of Covid-19. However, the other four cases are mis-classified as "Normal" which is indeed a severe clinical misdiagnosis. A close look to these falsenegatives in Fig. 9 reveals the fact that they are indeed very similar to normal images where typical Covid-19 patterns are hardly visible even by an expert's naked eye. It is possible that these images come from the patients who were in the very early stages of Covid-19. The commonly used methods in Covid-19 diagnosis, namely Reverse Transcription-Polymerase Chain Reaction and Computed Tomogrophy have certain limitations and drawbacks such as long processing times and unacceptably high misdiagnosis rates. These drawbacks are also shared by most of the recent works in the literature based on deep learning due to the data scarcity from the Covid-19 cases. Although Deep Learning based recognition techniques are dominant in Computer Vision where they achieved state-of-the-art performance, their performance degrades fast due to data scarcity, which is the reality in this problem at hand. This study aims to address such limitations by proposing a robust and highly accurate Covid-19 recognition approach directly from raw X-ray images without any pre-or post-processing. The proposed approach is based on the CSEN that can be seen as a bridge between Deep Learning models and representationbased methods. CSEN uses both a dictionary and a set of training samples to train direct map from the query samples to the sparse support set of representation coefficients. With this unique ability and having the advantage of a compact network, the proposed CSEN-based Covid-19 recognition systems surpass the competing methods and achieve over 98% sensitivity and over 95% specificity. Furthermore, they yield the most computationally efficient scheme in terms of speed and memory. Finally, the largest dataset of X-ray images, QaTa-Cov19 will be released along with this study as a benchmark dataset in this domain. This will, henceforth, accelerate the research efforts globally and support the fight against Covid-19 worldwide. Challenges in control of covid-19: short doubling time and long delay to effect of interventions Clinical course and risk factors for mortality of adult inpatients with covid-19 in wuhan, china: a retrospective cohort study Sensitivity of chest ct for covid-19: comparison to rt-pcr Advanced but expensive technology. balancing affordability with access in rural areas Laboratory testing for coronavirus disease 2019 (covid-19) in suspected human cases: interim guidance Can ai help in screening viral and covid-19 pneumonia Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Finding covid-19 from chest x-rays using deep learning on a small dataset Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting Covid-19 image data collection Covid-19 database Sparse representation or collaborative representation: Which helps face recognition? Robust face recognition via sparse representation Sparse representation for computer vision and pattern recognition Convolutional sparse support estimator network (csen) from energy efficient support estimation to learning-aided compressive sensing Chexnet: Radiologistlevel pneumonia detection on chest x-rays with deep learning Densely connected convolutional networks Mémoire sur la propagation de la chaleur dans les corps solides Matching pursuits with time-frequency dictionaries The curvelet transform for image denoising Linear spatial pyramid matching using sparse coding for image classification Sparse coding with anomaly detection Detecting anomalous structures by convolutional sparse models Learning structured sparsity in deep neural networks Compressed sensing Compressive sampling Optimally sparse representation in general (nonorthogonal) dictionaries via ell 1 minimization The restricted isometry property and its implications for compressed sensing Atomic decomposition by basis pursuit Informationtheoretic limits on sparse support recovery: Dense versus sparse measurements Robust support recovery using sparse compressive sensing matrices Imagenet: A large-scale hierarchical image database Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Joint sparse representation for robust multimodal biometrics recognition Robust visual tracking and vehicle classification via sparse representation Learning sparse representations for human action recognition A survey on representation-based classification and detection in hyperspectral remote sensing imagery Chest x-ray images (pneumonia)," kaggle, Marzo Nearly sharp sufficient conditions on exact sparsity pattern recovery Limits on support recovery with probabilistic models: An information-theoretic framework Sampling bounds for sparse support recovery in the presence of noise Approximate sparsity pattern recovery: Information-theoretic lower bounds Compressively sensed image recognition Direct inference on compressive measurements using convolutional neural networks Adam: A method for stochastic optimization Tensorflow: Large-scale machine learning on heterogeneous distributed systems Reconnet: Non-iterative reconstruction of images from compressively sensed measurements