key: cord-1009022-qlyuvts7 authors: Ben Atitallah, Safa; Driss, Maha; Boulila, Wadii; Ben Ghézala, Henda title: Randomly initialized convolutional neural network for the recognition of COVID‐19 using X‐ray images date: 2021-09-19 journal: Int J Imaging Syst Technol DOI: 10.1002/ima.22654 sha: 29d84b0346a7aca41fa822030661f773efd34aab doc_id: 1009022 cord_uid: qlyuvts7 By the start of 2020, the novel coronavirus (COVID‐19) had been declared a worldwide pandemic, and because of its infectiousness and severity, several strands of research have focused on combatting its ongoing spread. One potential solution to detecting COVID‐19 rapidly and effectively is by analyzing chest X‐ray images using Deep Learning (DL) models. Convolutional Neural Networks (CNNs) have been presented as particularly efficient techniques for early diagnosis, but most still include limitations. In this study, we propose a novel randomly initialized CNN (RND‐CNN) architecture for the recognition of COVID‐19. This network consists of a set of differently‐sized hidden layers all created from scratch. The performance of this RND‐CNN is evaluated using two public datasets: the COVIDx and the enhanced COVID‐19 datasets. Each of these datasets consists of medical images (X‐rays) in one of three different classes: chests with COVID‐19, with pneumonia, or in a normal state. The proposed RND‐CNN model yields encouraging results for its accuracy in detecting COVID‐19 results, achieving 94% accuracy for the COVIDx dataset and 99% accuracy on the enhanced COVID‐19 dataset. The novel coronavirus (COVID-19) appeared in Wuhan, China at the end of 2019, and by March 11, 2020 , the World Health Organization (WHO)* had categorized this virus as a pandemic. Since then, COVID-19 has spread rapidly across the world, and to date has resulted in almost three million deaths. As demonstrated in Figure 1 , the number of confirmed cases increases day by day and had reached about 725 275 cases worldwide on May 13, 2021. In the United States alone, the number of deaths caused by COVID-19 surpassed 575 000 cases by May 4, 2021 . Figure 2 illustrates the total number of COVID-19 deaths in the most impacted countries worldwide. Patients infected with COVID-19 exhibit some symptoms in common to those infected with the general flu. These include fever, cough, headaches, and loss of taste or smell. However, COVID-19 often also comes with more serious symptoms, including difficulty or shortness of breath, pain in the chest, and the inability to speak or move normally. Although COVID-19 causes only mild illness for most, it can be fatal for others, especially for older people and those with preexisting medical conditions, such as heart problems and high blood pressure. About 15% of COVID-19 cases progress to grave diseases, and 5% become critical cases. Between the infectiousness of this disease and its rapid transmission, the early diagnosis of COVIDpositive patients is crucial for avoiding further spread and minimizing the cases of critical illness. To this end, many types of interdisciplinary research have focused on finding ways of combatting COVID-19, often through its prevention or treatment. In addition, although, detecting those who have been infected in a quick and automatic way could be an effective solution, particularly in the process of isolating and treating patients as a means of preventing further spread. In this context, there has been an increased interest in developing computer-aided diagnosis systems using artificial intelligence (AI) techniques. For instance, well-trained deep learning (DL) models can focus on specific details that are not noticeable to the human eye and can also utilize existing forms of medical data. 3, 4 More specifically, X-ray images constitute a promising avenue of exploration, and they also already play a vital role in the early diagnosis and treatment of COVID-19. Likewise, these particular medical images tend to be readily accessible for disease diagnosis and are already widely used in health centers worldwide, making them an even more desirable avenue for DL models. The use of X-rays can also help circumvent existing difficulties. For instance, due to the limited number of radiologists present in many hospitals, fast and simple AI models can present an effective solution for early diagnosis of COVID-19 by eliminating multiple problems already observed with RT-PCR test kits, such as the cost and wait times associated with getting back test results. This is not an entirely novel idea, either. Innovations in computer-aided vision, including DL techniques, have made many improvements in the larger domain of healthcare. 5 In particular, CNNs have demonstrated high performance in medical imaging, offering high accuracy of diagnosis through their ability to extract multiple levels of features from medical images. To date, CNNs have been employed in the recognition of several different diseases and medical conditions, such as screening mammographic images for signs of cancer and other issues. 6 Likewise, radiology images are already being used to detect COVID-19 in rapid and automatic ways. 7 To date, F I G U R E 1 Worldwide confirmed cases of COVID-19 from January 23, 2020 to May 13, 2021 1 F I G U R E 2 Number of COVID-19 deaths among the most impacted countries worldwide as of May 4, 2021 2 several studies have been conducted on COVID-19 recognition from chest X-ray images that employed different architectures of CNN with transfer learning. [8] [9] [10] For instance, Wang et al. 11 have proposed a COVID-Net that can detect relevant abnormalities in chest X-ray images from a sample population of patients with COVID-19, those with pneumonia infections, and those with no illnesses or health conditions. Their COVID-Net achieved an overall accuracy of 92.4% using the COVIDx dataset, but it was pre-trained with the ImageNet dataset and initialized with the obtained weights. 12 These researchers are also the ones who created and curated the COVIDx dataset used both in the current study and many others that have worked on designing DL models for the automatic detection of COVID-19. For example, Karim et al. 13 designed a deep neural network (DNN) called DeepCovidExplainer for the detection of COVID-19 using chest X-ray images. These researchers used transfer learning based on a combination of DenseNet, ResNet, and VGG architectures to create their model snapshots. The results they achieved demonstrated that their approach could automatically detect COVID-19 positive patients with 94.6% accuracy of precision, 94.3% accuracy of recall, and 94.6% accuracy of F1-score. Elsewhere, Luz et al. 14 also designed a low computational model for the automatic detection of COVID-19 patients using chest X-ray images. These researchers used the EfficientNet family of DNN, in addition to the hierarchical classifier. To test and evaluate the results of this model, the COVIDx dataset version 1 and its 13 569 Xray images were used. The experimental results that Luz et al. achieved demonstrate that this model also performed effectively, accomplishing 93.9% accuracy and 90% sensitivity. As these examples demonstrate, though, previous work regarding COVID-19 detection using DL models has focused mainly on pre-trained networks and transfer learning. Few researchers have addressed the issue of designing a CNN from scratch and tuning the right parameters to improve its performance. One of the few exceptions to this is Irmak, 15 who proposed two novel CNN architectures for COVID-19 detection whose parameters are automatically tuned by the Grid Search algorithm. The first CNN is developed for binary classification: that is, to determine whether or not a patient is infected with COVID-19 using the chest X-ray images. The second CNN here then classifies the chest X-ray images into one of three classes, which include "COVID-19," "pneumonia," and "normal" images. Although these models' performance is interesting, Irmak fails to account for the data augmentation techniques that may help to improve overall performance. Likewise, in Ref. [16] researchers proposed a new CNN-based approach for the classification of COVID-19 severity. This CNN divides and categorizes COVID-19 patients into one of four severity groups: mild, moderate, severe, and critical. This study employs also grid search optimization to select the CNN parameters. This e experimental results demonstrate the effectiveness of the proposed CNN model, which achieves an accuracy of 95.5%. In this study, we propose a randomly initialized convolutional neural network (RND-CNN) as a means of classifying chest X-ray images and identifying the patient's condition as one of three classes: pneumonia, COVID-19, or normal state. Randomized Neural Networks (RNN) are defined as neural networks with multilayered architecture, where the connections between these layers are untrained before the initialization. 17 Recent works show that RNNs tend to demonstrate acceptable performance for feature extraction and classification correlated with the results of pre-trained models. 18, 19 The following objectives outline our specific goals in this work: 1. Propose a classification CNN model with randomly initialized connections between layers, for the automatic recognition of COVID-19; 2. Study the impact of data preprocessing and balancing as means of enhancing the performance of this proposed model; 3. Apply different techniques of data augmentation on the dataset samples, such as rotation, flips, and scaling; 4. Testing the proposed method using different datasets, including the COVIDx dataset (15 000+ chest X-ray images) and the enhanced COVID-19 dataset (1000+ enhanced images); 5. Compare the performance of the proposed model with other models that use different weight initialization techniques as well as with methods proposed in previous works. To achieve these objectives, this article is structured as follows. Section 2 presents an overview of the basic concepts of CNN and initialization methods, while Section 3 provides details about the architecture of our proposed RND-CNN model. In Section 4, we describe the datasets utilized and how we have prepared the data from each one to be used in training our own model. Section 5 discusses the experiments carried out and the results obtained. Section 6 features comparisons made between our proposed model and other existing models using different weight initialization techniques. Finally, Section 7 offers our concluding remarks and thoughts on directions for future work. In this section, we present foundational knowledge related to CNNs and weight initialization methods. A wide range of healthcare services have been improved by the rapid development of DL techniques. 20 Different applications have been proposed and developed for various healthcare services, including elder care and disease prediction. 21-23 CNN, for instance, has been presented as a deep neural network (DNN) that consists of a sequence of layers, wherein different filter operations are performed. 24, 25 This type of DL model is suitable for processing images and/or videos. In addition, because CNN employs supervised learning, it can be classified as a discriminative DL architecture. 26 This network utilizes a set of layers that begins with an input layer, then includes a set of hidden layers, and finally ends with an output layer. It also consists of a sequence of convolutional and pooling layers, followed by a fully connected layer. In each of these convolutional layers, several filters are embedded in order to produce different analyses on the input data, which in turn creates feature maps. Thus, three essential layers build up the CNN network: the convolutional layer, the pooling layer, and the fully connected layer. 27 A brief definition of these layers is as follows: • Convolutional layer: The CNN model begins with this layer. Here, a filter is applied over the input image so that it is converted into a matrix. A feature map is the output of this convolutional layer, including the learned features. Briefly, a filter is presented as a simple matrix with a predefined size smaller than the input data as well as randomly chosen values. This filter goes through the input data from left to right, and then goes down with a step size specified by the stride until it covers the completely input matrix. Equation (1) represents the convolutional layer process: where, x is the input image, y j is the jth convolutional layer output, f is the activation function, K ij presents the convolutional kernel multiplied by the ith input x i , and b j denotes the bias. We illustrate convolutional operations in Figure 3 . • Pooling layer: This layer is applied to lower the complexity of the overall CNN model. The pooling layer captures the most relevant parts of the feature maps generated by applying an average or max-pooling operation. In turn, the pooling operation applies a kernel, or a predefined window, to the feature map. This kernel is responsible for gathering the average or maximum value of the matrix elements according to the method being used, so that it slides across the whole feature map with a predefined stride. Figure 4 depicts the pooling layer as it uses the max-pooling operation. In the illustrated example, four slide positions are performed on the feature map, as presented in Figure 4 with these four different colors. The resultant pooling values demonstrate how the complexity of the model computations will be reduced. • Fully connected layer: This layer performs the last step in this network, reconnecting the processed portions in order to obtain the full image. The two-dimensional array is converted into a single list. Then, using F I G U R E 3 Illustration of the operations completed within the convolutional layer different activation functions (Sigmoid, Tanh, and ReLu), this layer is converted to probability values, which indicate the probability that the processed image will be in a specific class. Based on the highest probability value, the output layer assigns the image to its output class. Generally, the convolutional and pooling layers are stacked together at the head of the architecture. The fully connected layers, however, are stacked with one another at the end of the network architecture. CNNs have demonstrated effective performance in many different areas, including image classification and recognition. One of the greatest advantages of CNNs is their ability to extract and learn hidden features from big datasets and raw data. To achieve this, CNN proceeds through a set of steps, which can be explained as follows. 26, 27 Once the training phase starts, the input layer assigns the weights to the input data to be passed to the next layer. According to the type of initializer used, the weights are determined as having either constant values or random ones. The following layers get the input weights, perform the filter operations, and determine the output that is passed on to the next layer as an input. Then in the last layer, the final output is defined. Within the training process, a loss function is used to examine the performance of this prediction. This function calculates the error rate by examining and comparing both predicted and actual results. Various loss functions are also designed for different purposes. For example, the binary cross-entropy function is used to deal with classification problems that distinguish between only two classes, while categorical cross-entropy is used for classification problems with more than two classes and mean-squared error is utilized for regression problems. Different optimization algorithms, such as Adam and Stochastic Gradient Descent, can be utilized in order to check the neuron's weights. These algorithms examine the gradient of the loss function and then attempt to change and update the network weights or learning rates to minimize these losses. This set of steps is repeated throughout the training phase until the weights become balanced for each layer's neuron and the error rate value falls. Then once this training has finished, the model is ready for use. Tuning the right parameters of a CNN model will help improve its performance. These parameters include the internal values of the model configuration, which are estimated from data such as the weights between neural network layers. In order to understand the importance of weight initialization, it is first crucial to understand the neurons; that is, the units that make up each layer of the CNN. These neurons take the input data and perform calculations upon it in order to achieve a weighted summation, and then produce an output through an activation function. 28 Every neuron consists of weights and a bias. In the first layer, the weights are initialized and assigned according to input size, while the bias is optimized throughout the training process. The structure of a neuron is depicted in Figure 5 . Weight initializers are meant to regulate the initial weight values of various neural network layers. 29 The process of weight initialization is intended to keep the layer activation outputs from falling into the common problem of gradients vanishing and exploding. In particular, the vanishing gradients occur due to back-propagation during the training phase. 30 Propagating a feedback signal from the output loss to the earlier layers may affect it to the point of signals becoming weak or getting lost, which in turn makes the network untrainable. Therefore, a careful weight setting is required in order to achieve better results and higher performance. There are three categories of initialization methods. 31 The first category includes constant methods, which employ the same set of weights for the network F I G U R E 4 Illustration of the operations performed within the pooling layer connections' initialization, such as the zero and one initializers. However, when using these initialization methods, the equations of the learning algorithm often become incapable of changing or updating the network weights, which often leads to the model becoming locked. For all iterations, then, all layers should have the same weights and perform the same calculations. The second category of initialization methods presents the distribution options, wherein either the Gaussian or the uniform distribution is used and input matrix numbers are assigned with random values. However, assigning the appropriate parameters for the network-including both the mean and the standard deviation of the distribution-may be done incorrectly, which will affect the performance of the model during training and may also lead to the problem of vanishing gradients. Finally, the third category includes the random initialization based on previous knowledge. To initialize layer weights here, heuristics are used in addition to the nonlinear activation functions. "Heuristics" of course describes the approach of solving a problem without using a method that ensures an optimal solution. Using this type of randomization, normal distribution variance is assigned based on the number of inputs. Heuristics largely mitigate the issue of exploding or vanishing gradients, but they cannot prevent this issue entirely. Table 1 compares the different types of neural networks initialization. Randomized initialization methods with previous knowledge serve as a good starting point for weight initialization. The main advantages of this means include its abilities in the following areas: • initializing the layer weights randomly but more intelligently; • reducing the chances of gradients vanishing and/or exploding; • helping avoid slow speeds of convergence; • mitigating the oscillating of minima. We now turn to detailing the architectural design of our proposed RND-CNN, as well as the techniques used for data preprocessing. Our motivation here is to develop a deep CNN capable of automatically learning the features of COVID-19 and recognizing instances of it across two different datasets. The proposed RND-CNN consists of an input layer and four hidden blocks for feature learning and extraction, followed by two fully connected layers and a SoftMax layer for case classification (classes: COVID-19/ Pneumonia/Normal). Figure 6 presents the proposed RND-CNN architecture. The Xavier initializer is a form of randomly based knowledge initialization. To solve the problem of selecting correct parameters, this initializer is used to automatically determine the scale of initialization according to the number of input and output neurons. 32 This method helps keep the signal within a reasonable range of values between layers. Let y denotes the output of a layer; then, its value is computed according to Equation 2: where, i presents the input image matrix, w is the weight, and b is a bias. Using the Xavier initializer, the weights are initialized in such a way that the variance of the input and output remains the same. The values of these weights are calculated using Equation 3 . where, n signifies the number of layer neurons. Thus, the network weights are initialized in such a way that the neuron activation functions are neither F I G U R E 6 Proposed RND-CNN architecture too small nor too large within a reasonable range. The values of the weights that connect two successive layers are usually initialized within the following range: 3.2 | Network architecture In this section, we now introduce the design of the proposed RND-CNN. As previously mentioned, this network consists of an input layer, four hidden blocks, and a classification layer. The input images of this network are sized as (150, 150, 3) , while the output can be one of three different classes: COVID-19, pneumonia, or normal. This network model consists of four different-sized hidden blocks, each of which consists of a set of convolutional and pooling layers. As we go deeper within the model architecture, the number of convolutional layers also increases. Consequently, this difference in layer blocks gives the model considerable ability to cover different features through its set of convolutions. The model also comes with a significant reduction in the number of parameters and is capable of achieving high performance within a reasonable execution time, particularly as compared with other existing CNNs. Before the model training, we had to assign its initial parameters. We adopted the Xavier initializer to define the weights of the network. This method is proposed by Glorot and is based on the assumption that the activation function is linear. The model proposed here consists of 10 convolutional layers and 4 pooling layers. For each convolutional layer, filters of (3 Â 3) size are applied with padding, and every pooling layer implements a maxpooling window of (2 Â 2) size. In the following, "Conv2D," "MaxPool2D," and "FC" refer to the convolution, the pooling, and the fully connected layers, respectively. Table 2 illustrates the architecture of our proposed CNN and defines the learning parameters used. In order to validate the approach, we propose here, we used two different datasets to evaluate the performance of this model. A description of these datasets and how we preprocessed them is provided in the following subsections. In this work, we used the COVIDx dataset recently created and published by COVID-Net researchers. 11 COVIDx is an open-access benchmark dataset that is continuously updated and enriched with the addition of more images from different sources. 33 The version of the The dataset also consists of two image folders: one for training and one for testing. The distribution of these images is depicted in Table 3 . Figure 7 presents some samples of patient chest X-ray images from each of the three classes: COVID-19, normal, and pneumonia, respectively. The COVIDx dataset takes its data from five different repositories. As a result, the images in this dataset are of all different sizes and shapes. These differences affect the effectiveness of any classification attempted, so in order to enhance the performance of our classification approach, image preprocessing is first applied to all images across the dataset. For starters, all input images for the proposed method are resized to a standard size, which is defined by height: 150 and width: 150. It is also worth noting that the COVIDx dataset is imbalanced. In other words, the number of COVID-19 Xray images is much fewer than the images available for the other two classes (pneumonia and normal). However, it is important to work with balanced data in order to achieve better results. 34 Imbalanced datasets return inaccurate results because they bias the model toward the predictions of the majority class. Different techniques have been proposed to handle this problem, including random oversampling, random undersampling, Synthetic Minority Oversampling Technique (SMOTE), and reweighing of classes. 35 Each method has its own benefits and disadvantages. For instance, undersampling consists of randomly sampling from the majority class and reducing its number of instances to be equilibrated with the other classes, while oversampling treats the minority class by replicating its samples to be balanced with other classes. SMOTE is a type of oversampling that generates new instances from the samples of the minority class. However, SMOTE is not effective for high-dimensional data and can actually lead to the model being overfitted. Here, the class reweight method is used in order to consider the asymmetry of cost errors within the model training directly. Due to the severe imbalance between the three different classes in this COVIDx dataset, resampling methods are not suitable for our problem. As previously states, our goal is to detect COVID-19 from X-ray images, but also as mentioned above, we did not have as many samples of those images to work with. Because of the high dimensionality of the used dataset, we choose to apply the class reweight approach as a balancing method, which penalized the model if a positive sample was misclassified. 36 To achieve this, we calculated the weight for each class and assigned these weights to the classifier model. The heaviest weight is applied for the COVID-19 class, which allows the model to pay more attention to the COVID-19 samples: The weight of each class is computed using Equation (5): where, W i represents the weight for the ith class, n is the total number of samples, k is the number of classes, and n i is the number of samples in class i. The second dataset † used to test our model was merged and enhanced by Canayas. 37 The enhanced COVID-19 dataset consists of more than 1000 images and includes three balanced classes: COVID-19, pneumonia, and normal chest X-ray images. The images of the dataset are gathered from two different sources: the first part contains 145 images of labeled COVID-19 X-ray images available on GitHub, ‡ while the second is collected by Chowdhury et al., 38 publicly available on Kaggle, and contains 219 X-ray images of chests with COVID-19 infections. The distribution of chest X-ray images in the enhanced COVID-19 dataset is depicted in Table 4 . In Ref. [37] the author made changes to this dataset by applying a contrast enhancement on each image from the original dataset. Using the Image Contrast Enhancement Algorithm (ICEA), 39 the best contrast was applied on the dataset images and the noise was eliminated. Figure 8 plots select samples of X-ray images from this enhanced dataset. While CNNs offer several benefits and can make great strides in solving important problems, these networks rely heavily on big data in order to learn properly. Unfortunately, many different use cases, especially in healthcare, do not have the types of big data needed for these purposes. Data augmentation is the approach of modifying some of the data provided to the learning model and thus creating more training data in order to avoid overfitting. 40 Overfitting occurs when a model learns a function with high variance, which means that it models the training data well but does not perform properly with new data, leading to poor generalization. 41 With data augmentation, more images are generated through different random transformations applied to the existing dataset images. 42 As the size of input data increases, this helps to improve the training model's generalization abilities. In this work, we choose to apply six augmentation strategies for data augmentation and transformations, including scaling, horizontal flip, random rotation (10 degrees), zoom, intensity shift, and lighting conditions. The images of the datasets are only flipped horizontally; we do not apply vertical flips since these do not reflect the images in their normal form. Thus, data augmentation was employed to enlarge the training dataset, while valid and test data were not augmented. Figure 9 illustrates some samples of X-ray images resulting from the data augmentation process. In this section, we present the experimental setup, workflow, and parameters. Then, we describe the metrics used for the evaluation of the model's performance. After that, we discuss the results obtained and examine the impact of data augmentation and balancing on the model's performance. In this study, the Jupyter notebook was used to encode the whole process in Python 3.8. Both the Keras library 43 and the TensorFlow backend 44 were also used to program neural networks. Keras is a high-level library that works atop the TensorFlow and Theano frameworks and is suitable for convolutional networks as it delivers high performance when conducting multiple experiments. TensorFlow is a flexible DL framework developed using C++, and it helps to run experiments with low latency and high performance. In addition to these Keras and TensorFlow, OpenCV was used for data loading and preprocessing, while Sci-Kit Learn was used to generate the classification reports. For faster computation, we also used the Nvidia GeForce MX 250 GPU with CUDA and cuDNN library. cuDNN is a GPU-accelerated library designed to optimize different DL frameworks. The infrastructure used to conduct our experiments was a PC with the following The experimentation workflow consisted of a set of steps that included: (a) data preprocessing, (b) data augmentation, (c) model training, (d) model evaluation using the validation data, (e) final evaluation of the model with the best weights using the test data, and (f) calculation of the performance metrics. These steps are detailed and illustrated in Figure 10 . The model configuration was set up as follows: 1. Initializer: before starting the training process, we initialize the layers with proper weights in order to ensure accurate functions. In this context, we have used the Xavier initializer with a learning rate value of (1e À 4). This initializer is capable of determining F I G U R E 8 Samples of chest X-ray images illustrating different classes from the enhanced COVID-19 dataset F I G U R E 9 Samples of training X-ray images resulting from the data augmentation process the scale of initialization randomly according to the input and output nodes number. 2. Optimizer: the Adam optimizer, 45 which was proposed recently as an extension to the stochastic gradient descent, 46 is used for its ability to achieve high performance in a short time. 3. Loss function: categorical cross-entropy function was used to measure the network's performance on the training data. 4. Activator: rectified linear unit (ReLu) 47 has been used as an activation function. ReLu is known as a faster training function than other options, such as Sigmoid, Tanh, and so on. Precision, accuracy, sensitivity, specificity, loss, and F1-score measures were used to evaluate the results achieved by the model in this work. These terms are defined as follows: • Precision: is used to assess the number of classes classified accurately or truly by the model, and this describes the model's ability to not classify a negative sample as positive. Therefore, a high precision indicates that the errors in classification are low. • Accuracy: is the ratio of the number of correct predictions to the total number of input samples. High accuracy requires high precision. • Sensitivity: is used to assess the overall number of correctly predicted labels according to the total number of labels predicted. This factor describes the model's ability to classify samples correctly. • Specificity: is used to measure the model's effectiveness in the recognition of negative samples. • Loss: is used to calculate the error value and determine how well the model treats the data. A lower value for loss indicates that the model is making fewer errors. • F1-score: is computed using precision and recall in order to achieve a balanced average result. These measures are computed according to the following Equations 6 to 10: In this section, we present the experimental results obtained by applying our proposed RND-CNN on the COVIDx and enhanced COVID-19 datasets. We also demonstrate the effectiveness of this proposed model by showing the impact of data augmentation and balancing through several performance measures. F I G U R E 1 0 The experimentation workflow In this subsection, we illustrate the details and results obtained by implementing the proposed approach described in Section 3 on the COVIDx dataset. For model building, we need to begin with three sets of data: training, validation, and testing. The training data are used to train the model, while the validation data are utilized for evaluating it during the training process. Once the model completes training, then the test data are employed to test its performance. Following this distribution, we randomly split the training folder of the COVIDx dataset into 80% for training and 20% for validation. Figure 11 represents the distribution of images of each class into our training, validation, and test sets. The proposed RND-CNN was trained using the COVIDx dataset over 100 epochs. Figure 12 visualizes the accuracy and loss achieved by this RND-CNN during the training and validation phases. During the training, the model achieved 95% accuracy, while the value of loss continued to decrease until it had reached its minimum by the end of the training phase. After that, the overall performance of our proposed RND-CNN was tested using over 1500 new chest X-ray images. According to Table 5 , our proposed model achieved an accuracy of 95% in training, 92% in validation, and 94% in testing. Figure 13 shows that we obtained a high area under the ROC curve for the COVID-19, pneumonia, and normal classes alike, which demonstrates that our approach achieved high performance. Examples of features extracted from chest X-ray images across the first, second, and last convolution layers are presented in Figure 14 . Additional interesting observations are discovered in feature maps as we dive into layers. In the first convolutional layer, the edges of each image are detected and most of its information is scanned. Going deeper into the CNN, the filters focus further on specific features. In order to examine the effectiveness of our proposed RND-CNN, we implemented it using another dataset collected and enhanced for the purpose of COVID-19 detection. 37 Unlike the COVIDx dataset, this enhanced COVID-19 dataset is balanced and consists of the same number of images in each class with a contrast enhancement. Figure 15 illustrates the distribution of the enhanced dataset images of each class into training, validation, and test sets. Our RND-CNN was trained using the enhanced COVID-19 dataset over 100 epochs. Figure 16 visualizes the accuracy and loss achieved by the RND-CNN during the training and validation phases. During the training, the model achieved an accuracy of 98% and a very small value of loss of only 0.0822. Using the test data, we evaluated the overall performance of the newly-trained model. According to Table 6 , our proposed model achieved an accuracy of 99% in training, 98% in validation, and 99% in testing. Figure 17 plots the ROC curves for COVID-19, pneumonia, and normal classes. High results of AUC are achieved, with 99% for COVID-19, 100% for normal, and 98% for pneumonia. Our results reveal that using different data augmentation techniques on dataset images has a significant impact on the efficiency of the model, especially for imbalanced datasets. To examine the results obtained, we trained the same proposed CNN architecture without augmentation for the dataset images, and when we did, the results showed a significant drop in accuracy. The results F I G U R E 1 3 Obtained ROC curves for the COVID-19, normal, and pneumonia classes using the COVIDx dataset F I G U R E 1 4 Learned features from the first, second, and last convolution layers F I G U R E 1 5 Distribution of the enhanced COVID-19 dataset images of each class into training, validation, and test sets depicted in Table 7 demonstrate the critical role of data augmentation in improving the model's performance by increasing its accuracy and reducing the rate of loss. To deal with the imbalanced distribution of data across the COVIDx dataset, we employed the class reweight method to re-balance the whole dataset. We then checked the model's performance both before and after making this change. Balancing the dataset was particularly important in order to ensure better results for COVID-19 recognition. As depicted in Table 8 , the accuracy of the model when using the data balancing is higher than its accuracy without data balancing. It is also noticeable that the performance achieved with the second dataset (i.e., the enhanced COVID-19 dataset) was much better than that achieved with the first dataset (i.e., the COVIDx dataset) due to its balance in the number of images in each class. From the results obtained, we conclude that correcting the dataset's imbalance is a very important step to consider before starting the model's training. In order to validate the performance of our proposed RND-CNN model, we compared the results we obtained with the COVIDx dataset with the results obtained by employing other models and different types of weight initialization. As explained previously, a zero initializer will result in poor model performance. Both to prove this fact and for the sake of comparison, we trained the same CNN We refer to this architecture as CNN-0. The results of performance metrics were with the same values in each training epoch, ending with an accuracy of 56%. Typically, with the backpropagation algorithm, the weights of layers are updated in each iteration. When the initial weight value of the first layer is 0, the operation of multiplying it by any value in the backpropagation delta does not change the weight. Therefore, the weights of each layer are still with the same value for each iteration without being optimized. All neurons in every layer network perform the same calculation, giving the same output. Thus, we also employed the random uniform initializer to train the proposed CNN architecture, creating an architecture we refer to as RU-CNN. We achieved 90% accuracy using the RU-CNN network, which was acceptable but still lower than the accuracy of the RND-CNN we are proposing. Accordingly, we can conclude that the choice of the Xavier initializer will help provide better results. Besides changing the network's weight initialization, we also examine the results of two other DL models that use transfer learning. Transfer learning is the approach of learning based on previous knowledge and then transferring that knowledge thus gained to address new problems. 27 The networks developed using this approach are based on VGG16 48 and Xception 49 pre-trained models. VGG16 is a deep CNN proposed by the Visual Geometry Group at Oxford University. VGG16 consists of 16 layers and as a network, has demonstrated strong generalization on many large benchmarking datasets for different tasks. Meanwhile, Xception is a CNN composed of 71 layers. We loaded pre-trained versions of these two models, which had been performed on the ImageNet dataset that includes more than one million images. For sake of comparison, the same parameter values were used across the developed models: optimizer: Adam, learning rate: 1e À 4, and loss function: categorical-cross-entropy. Table 9 shows that our proposed model outperforms all the other DL models in terms of accuracy, precision, sensitivity, specificity, and F1-score. It also achieves the minimum loss rate we have seen, as compared with the other DL models considered. These results indicate that the architecture of a DL network and the choice of its parameters will have a direct impact on its performance. In addition, the choice of the right method of initialization will help obtain better results for the tasks of classification and recognition. As we can see with the results illustrated in Table 9 , using the randomized method of initialization provides better results for all performance metrics as compared with both the constant and distributed methods. In addition, compared with the high-performing VGG16 and Xception models, the obtained results demonstrate the excellent effectiveness of our proposed RND-CNN. These indicate that the proposed architecture with its different sets of layers could extract several features, although with random weights. Significant research has been conducted seeking ways of combatting the novel coronavirus, or COVID-19. In the field of machine learning and deep learning, though, most of this work thus far has been based on transfer learning approaches. In our work, though, instead of using trained weights we created a DL model from scratch for the detection of COVID-19 cases using chest X-ray images. With the proposed model, we obtained excellent results that demonstrate its effectiveness. Table 10 shows the accuracy and the F1-score results of different existing models currently used for the recognition of COVID-19. This comparison demonstrates that our proposed approach produces excellent results for both the COVIDx and enhanced COVID-19 datasets. It is also worth noting that we have obtained similar results as the models presented in Ref. [37] but the later was only applied on one small dataset, which is the enhanced COVID-19 dataset. Also Ref. [13] , provides a higher F1-score when using COVIDx dataset compared with our model, but it needs to conduct more exhaustive experiments to measure additional performance metrics such as accuracy, precision, sensitivity, and specificity. Given these considerations, then, we are confident in noting that our model offers some of the highest performance currently available. In this study, we developed a novel CNN model to classify chest X-ray images as a means of detecting COVID-19 cases. The model was tested using two different datasets, a large dataset with a high imbalance of classes (the COVIDx dataset) and a small dataset with balanced classes and enhanced images (the enhanced COVID-19 dataset). Following several experiments, the results we achieved demonstrate the excellent performance of the proposed model for both datasets. However, better results are reached using the enhanced COVID-19 dataset. We also observe that the enhancement of contrast for chest X-ray images helped the model to learn more features, therefore increasing its accuracy and ability to detect the presence of COVID-19. In addition, using a dataset that has balanced classes helps to achieve better outcomes than an unbalanced dataset, even while correcting the imbalance. Besides this, we demonstrate that using different techniques of data augmentation for the training images helps to enhance the final model's predictions. The experiments that did not apply data augmentation achieved a significantly reduced classification accuracy as compared with the experiments that adopt augmentation. Results show that a randomly initialized CNN (RND-CNN) can be used for analyzing chest X-ray images and T A B L E 9 Comparison of performance results between RND-CNN and other DL models using the COVIDx dataset can also reach higher accuracy rates than using pretrained networks. In this article, an efficient and low-computational approach is proposed to detect COVID-19 patients from chest X-ray images. This approach is based on a novel randomly initialized CNN architecture, or RND-CNN. This proposed architecture is used to classify images into one of three classes: normal, pneumonia, and COVID-19. We have used two datasets for the evaluation of this model: a large dataset with a high imbalance of classes (the COVIDx dataset) and a small dataset with balanced classes and enhanced images (the enhanced COVID-19 dataset). We analyzed the performance of our model through six performance metrics: precision, accuracy, sensitivity, specificity, loss, and F1-score. The conducted experiments recorded insightful results for both the COVIDx and enhanced COVID-19 datasets. Based on the obtained results, we demonstrated the high rates of recognition achieved by our RND-CNN model compared with other models and other types of weight initialization. Possible extensions of our work include applying the RND-CNN model to analyze different types of images such as CT and MRI images. Future work could also expand its ability to classify images according to additional labels, such as Pneumothorax, Emphysema, and Fibrosis, among others. COVID-19 New Cases Worldwide by Day Statista COVID-19 Cases and Deaths Statistics by Country Big data and IoT-based applications in smart environments: a systematic review Deep learning-based rumor detection on microblogging platforms: a systematic review Deep learning for healthcare applications based on physiological signals: a review Detecting cardiovascular disease from mammograms with deep learning Future forecasting of COVID-19: a supervised learning approach COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios Automated detection of COVID-19 cases using deep neural networks with X-ray images Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks COVID-net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images Imagenet: A large-scale hierarchical image database Explainable covid-19 predictions based on chest X-ray images Towards an effective and efficient deep learning model for COVID-19 patterns detection in X-ray images Implementation of convolutional neural network approach for COVID-19 disease detection COVID-19 Disease Severity Assessment Using CNN Model. IET Image Processing Deep randomized neural networks Deep image prior Randomly weighted CNNs for (music) audio classification Deep learning for health informatics Cognitive healthcare system and its application in pill-rolling assessment Privacy-preserving wandering behavior sensing in dementia patients using modified logistic and dynamic Newton Leipnik maps Privacy-preserving non-wearable occupancy monitoring system exploiting Wi-fi imaging for next-generation body centric communication RS-DCNN: a novel distributed convolutional-neural-networks based-approach for big remote-sensing image classification A novel CNN-LSTM-based approach to predict urban expansion Recent advances in convolutional neural networks Leveraging deep learning and IoT big data analytics to support the smart cities development: review and future directions Deep learning for IoT big data and streaming analytics: a survey A comparison of weight initializers in deep learning-based side-channel analysis The vanishing gradient problem during learning recurrent neural nets and problem solutions Evolving deep convolutional neural networks for image classification Understanding the difficulty of training deep feedforward neural networks Survey on deep learning with class imbalance A survey on addressing high-class imbalance in big data Classification on imbalanced data diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images Can AI help in screening viral and COVID-19 pneumonia? A new image contrast enhancement algorithm using exposure fusion framework Data augmentation for improving deep learning in image classification problem The problem of overfitting A survey on image data augmentation for deep learning Semisupervised learning with deep generative models Stochastic gradient descent tricks Improving deep neural networks for LVCSR using rectified linear units and dropout Very deep convolutional networks for large-scale image recognition Xception: deep learning with depthwise separable convolutions Randomly initialized convolutional neural network for the recognition of COVID-19 using X-ray images The authors would like to thank Prince Sultan University for their support. The authors declare that they have no conflict of interest.DATA AVAILABILITY STATEMENT Data will be available upon request to the corresponding author.