key: cord-0953262-8tnma9sm authors: Karacı, Abdulkadir title: VGGCOV19-NET: automatic detection of COVID-19 cases from X-ray images using modified VGG19 CNN architecture and YOLO algorithm date: 2022-01-24 journal: Neural Comput Appl DOI: 10.1007/s00521-022-06918-x sha: 14733cb405ce51c1e08c7c8d5edfc8fbed431364 doc_id: 953262 cord_uid: 8tnma9sm X-ray images are an easily accessible, fast, and inexpensive method of diagnosing COVID-19, widely used in health centers around the world. In places where there is a shortage of specialist doctors and radiologists, there is need for a system that can direct patients to advanced health centers by pre-diagnosing COVID-19 from X-ray images. Also, smart computer-aided systems that automatically detect COVID-19 positive cases will support daily clinical applications. The study aimed to classify COVID-19 via X-ray images in high precision ratios with pre-trained VGG19 deep CNN architecture and the YOLOv3 detection algorithm. For this purpose, VGG19, VGGCOV19-NET models, and the original Cascade models were created by feeding these models with the YOLOv3 algorithm. Cascade models are the original models fed with the lung zone X-ray images detected with the YOLOv3 algorithm. Model performances were evaluated using fivefold cross-validation according to recall, specificity, precision, f1-score, confusion matrix, and ROC analysis performance metrics. While the accuracy of the Cascade VGGCOV19-NET model was 99.84% for the binary class (COVID vs. no-findings) data set, it was 97.16% for the three-class (COVID vs. no-findings vs. pneumonia) data set. The Cascade VGGCOV19-NET model has a higher classification performance than VGG19, Cascade VGG19, VGGCOV19-NET and previous studies. Feeding the CNN models with the YOLOv3 detection algorithm decreases the training test time while increasing the classification performance. The results indicate that the proposed Cascade VGGCOV19-NET architecture was highly successful in detecting COVID-19. Therefore, this study contributes to the literature in terms of both YOLO-aided deep architecture and classification success. A new pneumonia epidemic called acute respiratory syndrome coronavirus 2 (COVID- 19) started in Wuhan in the Hubei province of China in December 2019 [1] [2] [3] . COVID-19 rapidly expanded from a single city to the whole country within 30 days. This sudden increase in cases caused the collapse of the health system in China [4, 5] . COVID-19 is transmitted from person to person through direct contact and by small droplets emitted by infected people. Similar to the flu, a COVID-19 patient can develop a variety of symptoms and signs of infection, such as fever, cough, and respiratory illness. In more severe cases, the infection can cause breathing difficulties, multiple organ failure, and a rapidly progressive fatal pneumonia in 2-8% of those infected. Pneumonia can be detected from chest X-ray images [6] [7] [8] . Chest radiological imaging such as X-ray and computerized tomography (CT) plays a vital role in the early diagnosis and treatment of this disease [1, 9] . X-rays can detect several characteristic signs associated with COVID-19 in the lung [9] . Early results show that patients suggestive of COVID-19 have abnormalities on chest X-rays [10] . The real-time reverse-transcription polymerase chain reaction (RT-PCR) method is also frequently used to diagnose COVID-19. However, the sensitivity of this method is around 60-70%. Therefore, it is important to diagnose patients by examining X-ray images [1, 9] . In addition, a detection system based on X-ray images has many advantages. It is fast, can analyze multiple cases at the same time, and has more usability. In addition, X-ray equipment is available in every hospital in the modern healthcare system. This makes the radiography-based approach convenient and easily accessible [11] . CT scans and X-rays are widely used for the detection of COVID-19 in countries in which there are few test kits. Also, researchers specify that the combination of clinical image properties with laboratory results may be helpful in the early diagnosis of COVID-19 [9, 12, 13] . There are also studies reporting that some changes have been encountered in the chest X-rays and CT images of asymptomatic children and the elderly in the early periods of the disease [14] . These changes can be detected by deep convolutional neural networks (CNN). Deep CNN outperformed all other known methods in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) 2012 [15] . Deep CNNs are among the strong deep learning architectures that are widely and instinctively applied in many practical applications such as pattern recognition and image classification [16] . The greatest advantage of CNN for image classification is the end-to-end training of the whole system from raw pixels to the final categories. This advantage decreases the need to design a manually convenient feature extractor [15] . Researchers are focusing on deep learning techniques to determine specific properties of COVID-19 patients from the chest radiography images [17, 18] . Recently, deep learning has been highly successful in various visual tasks including medical image analysis. Deep learning has revolutionized the automatic diagnosis and management of the disease by correctly analyzing, defining and classifying the patterns in medical images [11] . In addition, transfer learning is widely used in CNN architectures. Transfer learning is used to improve the learner by transferring the information in a related area. It aims to apply the data extracted from one or more tasks to the target task. Together with the growth of deep learning, transfer learning has become an inseparable part of many applications especially in medical imaging [19, 20] . There are many pre-trained deep CNN architectures (VGG16, VGG19, MobileNet, Inception, Xception, ResNet, DarkNet, AlexNet, etc.) in which the transfer learning approach can be applied. The Visual Geometry Group Network (VGG) was developed based on the convolutional neural network architecture at the Oxford Robotics Institute [16] and was introduced by Simonyan and Zisserman [21] . VGGNet has shown particularly good performance in the ImageNet data cluster. This network has been trained on more than 1 million images in 1000 classes with more than 370,000 iterations to calibrate 138 million weight parameters. In particular, VGG19 won first place in a classification and localization competition in the Large-Scale Visual Recognition Competition, a global image recognition competition, in 2014 [21, 22] . The VGG is a series of transfer learning networks that includes VGG11, VGG13, VGG16, and VGG19. The common feature of these network structures is that several convolution layer modules are connected to three full connection layers [23] . VGG19 consists of five building blocks. The first and second building blocks consist of two convolutional layers and one pooling layer. The third and fourth blocks have four convolutional layers and one pooling layer. The final block consists of four convolutional layers. 3 9 3 small filters are also used [24] . The detection of the zone that should be the area of focus and supplying it to these CNN architectures as input instead of raw images may increase the classification performance. Detection algorithms are used for this process. One of these algorithms is YOLO (You Only Look Once). The YOLO architecture consists of 27 CNN layers with 24 convolution layers, followed by two fully connected layers. YOLOv3 is a deep single-stage CNN developed from YOLO and YOLOv2 for object detection. The YOLOv3 algorithm is one of the best object detection methods and it provides perfect improvements in terms of speed. The YOLO-V3 has high detection accuracy and speed. It also performs well in detecting small targets. This algorithm reframes object detection as a single regression problem, from image pixels to bounding box coordinates and class probabilities. It works significantly faster than other detection methods. It uses a different architecture called Darknet-53 for feature extraction. DarkNet-53 uses 3 9 3 and 1 9 1 convolutional layers and has 53 layers [25] [26] [27] [28] . This study aims to develop a new state-of-the-art VGG19 (Visual Geometry Group Network) architecturebased improved model to diagnose COVID-19 automatically from chest X-rays, to reveal a new cascade model by combining this developed model with the YOLOv3 [25] detection algorithm, and to evaluate the efficiency of these models. For this purpose, the fully connected (FC) layer of the pre-trained VGG19 CNN architecture was re-arranged and attempts were made to diagnose COVID-19 from binary (COVID-19 vs. normal) and three-class (COVID-19 vs. normal vs. pneumonia) data sets. The modified VGG19 model was also combined with the YOLOv3 algorithm to detect the lung zone and the classification process was conducted with this newly formed cascade model. The performances of these models were compared to each other and to models in previous studies. The main contributions of this study can be summarized as follows. • We propose the VGG19-based improved VGGCOV19-NET (Visual Geometry Group Network COVID19- The manuscript is organized as follows: Previous studies are summarized in Part 2. The structures, parameters, and training of Cascade VGGCOV19-NET and VGGCOV19-NET model and the data sets and metrics used in the study are explained in Part 3. Results and classification performances are presented in Part 4. The classification performances of models are compared to that of previous studies and the results are discussed in Part 5. Finally, in Part 6, the conclusion is given. There are some studies using radiology images and machine learning to diagnose COVID-19 in the literature. Hemdan et al. [16] tried different CNN classifiers to diagnose COVID-19 from X-ray images. They attained the highest classification performance from VGG19 and Den-seNet201 classifiers with 90% accuracy. Wang et al. [29] classified the normal, pneumonia, and COVID-19 classes with 93.3% accuracy with a CNN model they called COVID-Net. Aslan et al. [30] classified the normal, pneumonia, and COVID-19 classes with 98.70% accuracy using hybrid CNN model. Hira et al. [31] classified the normal, pneumonia and COVID-19 classes with 97.55% accuracy using ResNeXt-50. Khan et al. [11] classified three classes (COVID-19, pneumonia, and normal) with 95% accuracy and binary class (COVID-19 and normal) with 99% accuracy using CNN. Medhi et al. [32] diagnosed COVID-19 with 93% accuracy with their CNN model. Ozturk et al. [9] classified three classes (COVID-19, pneumonia, and normal) with 87.02% accuracy and the binary class with 98.08% accuracy with the CNN called DarkCovidNet. Harit et al. [33] classified COVID-19 with 94% accuracy using ResNet. Ahammed et al. [34] classified three classes (COVID-19, pneumonia, and normal) with 18 different classifiers. They attained the highest accuracy value (94%) in their CNN model. Apostolopoulos and Mpesiana [35] classified three classes and binary classes using the transfer learning approach with five different state-of-the-art pre-trained CNNs. They attained the highest accuracy in the VGG19 model. These accuracy values are, respectively, 98.75% and 93.48% for the binary class and three-class classification. Similarly, Narin et al. [36] classified the binary class using the transfer learning approach with three different state-of-the-art pre-trained CNNs. They attained the highest accuracy value (98%) in the ResNet model. Benbrahim et al. [37] classified COVID-19 with 99% accuracy with the InceptionV3 model and transfer learning approach. Butt et al. [38] classified three classes with 86.7% accuracy on chest CT images with the ResNet model. Ying et al. [39] classified COVID-19 with 94% accuracy with the DRE-Net CNN model from chest CT images. Zheng et al. [40] diagnosed COVID-19 with 90.1% accuracy from chest CT images with the 3D deep CNN model. Harmon et al. [7] detected COVID-19 with 90.8% accuracy using the Densnet-121 CNN architecture on a total of 2724 CT images, 1029 of which were COVID-19 positive. Waheed et al. [10] stated that it is difficult to collect X-ray images because the epidemic is new, and they proposed a method to produce synthetic X-ray images. They suggested that when synthetic X-ray images produced by this method are added to the data set, the model performance increases. In this part, the structure of the VGGCOV19-NET, Cascade VGGCOV19-NET, VGG19, and Cascade VGG19 models, the data set used in the formation of the models, and the performance metrics used in the evaluation of the models are explained. The data set used in this study was attained from the X-ray images used by Ozturk et al. [9] in their study. They have published this data set as an open resource at https://github. com/muhammedtalo/COVID-19. The data set consists of 125 COVID-19, 500 pneumonia, and 500 no-findings X-ray images. They used two different resources in the formation of the data set: COVID-19 X-ray images from the database attained from various open access resources by Cohen [41] and the ChestX-ray8 database provided by Wang et al. [42] for normal and pneumonia X-ray images. Forty-three of the patients in the COVID-19 data set are women and 82 are men. Metadata information for this data set are not provided for all patients. The average age of the 26 COVID-19 positive patients is approximately 55 years. It is a data set convenient for both binary and multi-class classifications. Figure 1 shows the X-ray images of some cases with COVID-19, pneumonia, and no-findings taken from the data set. Deep learning or deep structured learning represents a class of improved machine learning techniques [37] . The occurrence of deep learning technology has been a revolution in artificial intelligence. The word ''deep'' expresses the increase in the dimension of this network in addition to the number of layers [9] . CNN is the deep learning approach commonly used for feature extraction and data classification in various areas. CNN has many layers, the input layer, the convolutional layer, the pooling layer, the fully connected layer, and the output layer [43, 44] . The most efficient way to use deep CNNs on small data sets is transfer learning [45] . The data gained by the pre-trained model on a large data set is transferred on the model to be trained with the transfer learning method commonly used in deep learning. The greatest advantage of using the transfer learning method is that it allows the training of fewer data clusters and incurs lower calculation costs [36] . The Visual Geometry Group Network (VGG) was developed based on the convolutional neural network architecture at the Oxford Robotics Institute [16] . VGGNet has shown particularly good performance in the ImageNet data cluster. VGG19 consists of five building blocks. The first and second building blocks consist of two convolutional layers and one pooling layer. The third and fourth blocks have four convolutional layers and one pooling layer. The final block consists of four convolutional layers. 3 9 3 small filters are also used [24, 46] . In this study, the suggested VGGCOV19-NET model is based on the modified VGG19 [21] to diagnose COVID-19. The Cascade VGGCOV19-NET model is a hybrid model in which the VGGCOV19-NET model and the YOLOv3 detection algorithm are used together. The architectures of the VGGCOV19-NET and Cascade VGGCOV19-NET models are shown in Fig. 2 . The first five blocks of VGG19 are out of the top layer in VGGCOV19-NET, and they have been transferred to the new model with their weights. The top layer has been made compatible to the data with 224 9 224 9 3 dimension and the FC layer has been reformed. The FC layer has one Flatten and three dense layers in the raw state of the VGG19 model. The first two dense layers consist of 4096 neurons. The final layer is the output layer that consists of 1000 neurons. The modified FC layer of the VGGCOV19-NET model has one flatten and four Dense layers. The neuron numbers in the first three dense layers are, respectively, 128, 256 and 512. The Relu activation function was used in these layers. The final dense layer is the output layer whose activation function is Softmax. The neuron number in this layer is two (COVID-19 and no-findings) for binary classification and three (COVID-19, no-findings, and pneumonia) for multi-class classification. The FC layer is given in Fig. 3 with a more detailed image. As seen in this figure, the flatten layer output at the front of the FC layer consists of 25,088 features. As a result, the input layer of the FC layer has 25,088 neurons. There are 150,528 data for each X-ray image. These data are filtered, and 25,088 features are extracted in convolution and pooling layers that are connected from the front to the flatten layer. The dense layer number, neuron The YOLOv3 detection algorithm is placed at the front of the VGGCOV19-NET and VGG19 models in the Cascade VGGCOV19-NET and Cascade VGG19 models. The YOLOv3 algorithm detects the chest zones in X-ray images and crops them so that a new X-ray data set focusing on the chest zone is attained. This data set is given to the VGGCOV19-NET and VGG19 models as inputs and the training and test processes of the Cascade models are conducted. CNNs are designed using main multiple building blocks such as convolution, pooling, and FC layers [47] . The Here, CL refers to the convolutional layer and PL refers to the pooling layer using the maximum pooling method. In the VGGCOV19-NET model, the convolution layer is very important as it performs the feature extraction process. This layer uses the convolution operation (represented by *) instead of the general matrix multiplication and is the core building block of CNN. The parameters of the convolution layer are made up of a set of learnable filters, also known as kernels. The main task of the convolutional layer is to identify the features in the local regions of the input image and to create a feature map [11] . This process is based on the method of wandering a selected filter on the input image. The dimension of the filter can be 3 9 3, 5 9 5 or 7 9 7 pixels. The input of the next layer (m 2 9 m 3 ) is formed with the filter applied to the image. Activation maps occur as a result of this convolution process. Activation maps have local distinctive features. Each convolution layer has a filter (m 1 ). The output of represents the deviation matrix, and K l ð Þ i;j represents the filter dimension [48] [49] [50] . Pooling layers gradually decrease the image dimension so the parameter number and calculation complexity of the model are decreased [51] . The pooling layer has two parameters, (l) the dimension of F l ð Þ filter and S l ð Þ step. This layer takes the data in the dimension of m as input and again, provides m 3 output volume. The operation of the pooling layer is shown in Eqs. 2, 3 and 4 in short [49] . In the pooling process, a v vector is reduced to a single scaler f (v) with pooling process f. There are basically two pooling types: average pooling f a v ð Þ ¼ 1 [52] . Max pooling is used in the VGGCOV19-NET model. FC layers turn the feature maps output of (1) final convolution or pooling layer into a unidimensional vector, (2) bind to one or more dense layers (3) update the weights, (4) and give the estimation of the final classification [47] . FC layer mapping is a multi-layered sensor for . When l À 1 ð Þ is considered as an FC layer, the process stages of the FC layer can be shown as in Eq. 5 [49] . In addition, the activation function is used at the output of each layer of the VGGCOV19-NET model. This activation function is Softmax for the output layer of the model, and rectified linear unit (Relu) for other layers. Softmax and Relu activation functions are mathematically defined as in Eqs. 6 and 7, respectively. Softmax is used to normalize the probability vector x for any input [9, 53] . YOLOv3 is a deep CNN developed for object detection. The chest region on the chest X-ray image should be tagged before the training of the YOLOv3 model. The labe-lImg (https://github.com/tzutalin/labelImg) graphical image annotation tool was used for this. A text file was formed for each image using this tool, and the coordinates of the chest region taking place in the image were saved in this file. There are (X 0 , Y 0 , X 1 , Y 1 ) points giving information about the exact location of height (cH), width (cW), and chest region of the images tagged in the text file. The normalization process is conducted to make the formed data set convenient for YOLOv3 architecture. With this process, the center point coordinates (X, Y), height (H), and width (W) data of the chest region tagged in the image are attained as in Eqs. 8 and 9 [54] . These data are shown in Fig. 4 . Training and test processes were conducted by randomly separating the data into two as 70% training and 30% test data after the completion of the preliminary processes. Model performance was evaluated according to Intersection over Union (IoU) and mean average Precision (mAP) metrics. IoU is an important metric to reveal the model success in object detection. IoU calculates the similarity distance between the bounding box of the target and the estimated output [55] . The division of the region remaining between the estimated value and the necessary reference value to the whole region is attained as in Eq. 10 [54, 55] . Overlap area between bounding box of target and predicted Combined area between bounding box of target and predicted ð10Þ mAP ensures the evaluation of the values such as certainty, sensitivity, F1-score, and IoU from a single point. It is calculated as in Eq. 11 [54] . Processes were conducted with bounding boxes above a certain IoU threshold value while calculating the mAP value to determine the model performance. When the threshold value is taken as 0.50, mAP value for the test data was calculated as 99.80%, and the mean IoU value was calculated as 85. 37 . These values show that the Chest region was detected with high precision. The weights attained for these performance metrics were saved. Then, all the X-ray images in the data set were given to the trained YOLOv3 algorithm as input and the chest region was automatically detected. The detected chest region was cropped with OpenCv and a new data set containing the chest region was formed. The X-ray images whose Chest region were detected and cropped are shown in Fig. 5 . VGGCOV19-NET models VGGCOV19-NET and Cascade VGGCOV19-NET models were formed using Tensorflow and the Keras library in the Phyton programming language. Some preliminary processes were conducted on the data sets before the training. Firstly, the images were adjusted to the dimensions 224 9 224 9 3. Then, each input datum was divided by 255, and the normalization process was completed. One hot encoding process was applied on the output class labels. Different optimizer algorithms (Adam, Adadelta, Sgd, RMSprop, Adamax, Nadam), loss functions (binary crossentropy, categorical cross-entropy), and learning rates were tried to train the models. The best classification performance was attained for Adam optimizer, categorical crossentropy loss function and 0.001 learning rate. Pre-trained weights of the VGG19 model were used in convolution layers and the trainable properties of these layers were passivized so decreasing calculation time and cost. The fivefold cross-validation method was used in the training and testing of the models. Models were trained by 100 epochs for each fold, and a high level of classification performance was achieved. The formation and training of CNN models without using GPU is very hard. For this reason, the training process of models was conducted using T4 and P100 GPUs on Google Colab (a product of Google Research). In addition, the original VGG19 model was also trained under similar conditions, and classification performance was attained. Thus, the difference in the classification performance of the VGGCOV19-NET model was better determined. The pseudocode carrying out the processes of forming the model explained in Sect. 3.2 and training the model explained in this section is given below. This code clearly reveals the formation and training of VGGCOV19-NET and Cascade VGGCOV19-NET models. Two different scenarios were tested to diagnose COVID-19 using X-ray images. While the VGGCOV19-NET, VGG19, and Cascade models had initially been trained to classify three categories (COVID-19 vs. nofindings vs. pneumonia), they were trained to classify two categories (COVID-19 vs. no-findings) in the second scenario. In other words, multi-class classification was conducted in scenario-1 and binary class classification in scenario-2. Fivefold cross-validation was used in the training and test processes of the models. Training accuracy and training loss values of multi-class and binary classifier models are shown in Figs. 6 and 7 for the fold-2 step. It was observed that the Cascade VGGCOV19-NET model showed a faster training process than the other models for multi-class. When loss values were examined for the multi-class classification, the Cascade VGGCOV19-NET model decreased the loss value very rapidly and approached zero value. However, the loss value decreasing performances of VGGCOV19-NET and VGG19 models were low, inconsistent and unable to approach zero value. It was also noted that the VGG19 model had many fluctuations in both the loss and accuracy graphs. This is an indication that the VGG19 model is unstable in the learning process and does not perform good learning. In contrast to the VGG19 model, the Cascade VGG19 model conducted stable learning in the multi-class classification. All models conducted stable learning in the binary classification, and the difference between their learning speeds was a little lower and similar to the multiclass classification. The training, testing times, and total trainable parameters of the models are shown in Table 1 . The lowest time belongs to the binary Cascade VGGCOV19-NET model. The highest time belongs to the multi-class VGG19 model. In addition, the training and testing times of the Cascade models are significantly lower than the other models. While this rate was approximately 70% in the binary classification, it was approximately 60% in the multi-class classification. The reason for this decrease in training and testing Neural Computing and Applications time is that Cascade models focus only on the chest region and decrease the numbers of pixels to be processed. Also, the training and testing times of the VGGCOV19-NET models were lower than those of the VGG19 models. Similarly, the trainable parameters of the VGGCOV19-NET model are lower because the neuron number in the FC layer is low. Models were evaluated according to various performance metrics as follows: precision (P), recall (R), specificity (S), F1-score (F1), area under the receiver operating characteristics (ROC) curve (AUC), and accuracy (acc). • Precision This parameter measures the proportion of anticipated positives that are true positives. Therefore, it is dependent on true positive (TP) and false positive (FP) values [56] . • Recall Recall is the ratio of the true positives classified correctly by the model [57] . TP and FN values are used to calculate Recall [56] . • Specificity Specificity is the ratio of the true negatives (those not from pathology) classified correctly by the model [57] . TN and FP values are used to calculate specificity [36] . • F1-Score F1-score is an overall measure of the model's accuracy that combines precision and recall. It is the double of the ratio of the multiplication of F1-score precision and recall metrics to their totals [16] . • AUC ROC curves are widely used to reveal the results for binary classification problems in machine learning [58] . The ROC curve verifies the classification performance by showing the false positive rate (FPR) against the true positive rate (TPR). AUC is an important parameter for assessing the discriminative ability of prediction models [16, 59] . In this part, the results obtained from VGGCOV19-NET, Cascade VGGCOV19-NET, VGG19 and Cascade VGG19 models are presented in detail. As specified above, the VGGCOV19-NET is a model developed by the reformation of the full connected layer of VGG19 CNN architecture. Cascade models are the ones formed by feeding the input of VGGCOV19-NET and VGG19 models with the output of the YOLOv3 algorithm. R, P, F1, and acc metrics whose mathematical formulations were explained in the methods part were used to evaluate the classification performance of the models. The acc metric determines at what ratio a model anticipates the values. The P metric determines the repeatability of the measurement or how many anticipations are true. R represents the number of correct results found. The F1 score uses precision and recall metrics to calculate an average result [38] . Both multi-class and binary classification performances of the models were evaluated according to these metrics for each fold. Also, an overlapped confusion matrix (CM) was formed for the purpose of conducting a general evaluation of the models and the performance metrics generally representing the model were calculated using this matrix. To create the overlapped CM, the CMs obtained from all the folds are summed [9] . For multi-class classification, the classification performances of VGGCOV19-NET, Cascade VGGCOV19-NET, VGG19 and Cascade VGG19 models are given in Table 2 , and overlapped CM is given in Fig. 8 . As seen in this table, the weighted average values of R, P, and F1 metrics are given for each fold. However, the metric values presented under the line ''overlapped'' have been calculated separately for each category from the overlapped CM, and these values are not weighted averages. Consequently, the use of these values will provide more explanatory data in the revelation of the general performances of the models for each category. When an evaluation is conducted according to the acc metric, the classification performance of the Cascade VGGCOV19-NET model is seen to be much higher than that of the VGGCOV19-NET, VGG19 and Cascade VGG19 models. The highest mean accuracy value (acc = 97.16%) belongs to the Cascade VGGCOV19-NET model. The Cascade version of VGGCOV19-NET model increased the classification performance. A similar situation also exists between the VGG19 model (acc = 84.17%) and the Cascade VGG19 model (acc = 94.31%). However, the increase in the classification performance is far higher here. Also, according to the R metric, the Cascade VGGCOV19-NET model correctly classified the COVID-19, pneumonia, and no-findings patients to an extremely high extent. The Cascade VGGCOV19-NET model cor- The second scenario used to diagnose COVID-19 is the binary classification. Sensitivity, specificity, precision, F1score, auc, accuracy, and confusion matrix results of the models for binary classification are given in Table 3 Another metric used in the verification of the classification performance of the models is the ROC curve. ROC curves in each fold step and AUC values are shown in Fig. 9 for binary classification. The closeness of the AUC value to one shows that the classification performance of the model is high. The fact that the classification Two hundred and nineteen different COVID-19 X-ray images were [60] used to verify the performance of the Cascade VGGCOV19-NET model which yielded the best performance in the study. No different data sets were used for no-findings and pneumonia. The Cascade VGGCOV19-NET model displayed better performance for this new dataset in both binary and three-class classification. The accuracy value was 99.18% for three-class classification and 99.58% for binary classification. These values are These results prove that the YOLO-aided Cascade architecture increases the classification performance. The accuracy values of the models are shown comparatively in Fig. 10 for binary and multi-class classifications. As can be seen in this figure, the classification performance of Cascade VGGCOV19-NET is higher than that of the other models for both multi-class and binary classifications. Moreover, the Cascade VGGCOV19-NET model generally predicts COVID-19 at high ratios. For the Cascade VGGCOV19-NET model to have high accuracy values in binary and multi-class classification is an indicator that the model has been structured well. Furthermore, the Cascade VGGCOV19-NET model classifies COVID-19 correctly by 98.40% (R = 0.9840) in the multiclass classification in which there are three categories. This ratio is lower in the VGGCOV19-NET model (R = 0.9280) than the Cascade VGGCOV19-NET model. The fact that the recall parameter is high in the Cascade VGGCOV19-NET model proves that they can correctly distinguish COVID-19 patients from pneumonia patients and no-findings patients at a high ratio. In addition, the fact that making some changes to the original VGG CNN architecture increases the model performance is also shown in this study. Similarly, it also revealed that instead of using pre-trained CNN models alone, the formation of a Cascade structure via feeding with algorithms such as the YOLO detection algorithm increases the classification performance. A comparison of the model performance attained in this study to previous studies is also important. These results are given in Table 4 . According to this table, the classification performance of this study is higher than other studies in the literature both for binary and three-class classifications. DarkCovidNet developed by Ozturk et al. [9] correctly classified the binary class with an accuracy of 98.08% and three-class by 87.02%. DarkCovidNet is based on DarkNet architecture where fewer layers and filters are used. The data set used in that study is the same as the one used in our study. The classification performance of Cascade VGGCOV19-NET and Cascade VGG19 developed in our study is very much higher than that of DarkCovidNet. This performance difference is approximately 9.5% for the three-class classification and 1.28% in the binary classification. There are two studies [11, 37] showing the closest performance to Cascade VGGCOV19-NET and Cascade VGG19 among other studies using different data sets for binary classification. The accuracy values of both studies are 99%. While the first of these studies [11] performed the classification process with a new Xception CNN architecture-based CNN model on 284 COVID-19 positive and 310 normal X-ray images, the second [37] performed the classification process using pre-trained CNN InceptionV3 architecture and the transfer learning approach on 160 COVID-19 (?) and 160 normal X-ray images. There are also studied in the literature classifying three classes. Khan et al. [11] classified three classes with a new Xception CNN architecture-based CNN model with an accuracy ratio of 95%, Ahammed et al. [34] classified three classes with the deep CNN model they created with accuracy of 94%, Apostolopoulos and Mpesiana [35] classified three classes with the pre-trained VGG19 architecture with accuracy of 93.48%, Aslan et al. [30] classified three classes with the hybrid CNN architecture with accuracy of 98.70%, Hira et al. [31] classified three classes with the specially ResNeXt-50 with accuracy of 97.55%, and Wang et al. [29] classified three classes with the deep CNN model they called COVID-Net with an accuracy of 93.3%. The Cascade VGGCOV19-Net model suggested in our study classify three classes with accuracy higher than these studies, except for two studies [30, 31] . Furthermore, there are two studies using the YOLO architecture in the literature. Nigam et al. [62] tried to detect COVID-19 from X-ray images. They obtained the highest classification performance for three-class classification in the EfficientNet model with 93.48%. First, X-ray images are given to the YOLO algorithm in this study to determine the lung zone as in our study. Then, these images are classified with pre-trained CNN models. The data set used in the study is a large data set. However, the classification accuracy is lower than our study. Furthermore, it could not be demonstrated whether the YOLO-based model affected the performance as the study was not conducted on raw images. Al-antari et al. [61] tried to detect nine lung diseases including COVID-19 and pneumonia. Unlike our study, they only used the YOLO detection algorithm through X-ray images whose ground truth was labeled. The average detection accuracy for all classes was 90.67%. There are also studies [38-40, 63, 64] diagnosing COVID-19 using chest CT in the literature. One of these studies [64] states that mobile devices have limited storage and calculating capacity and suggests an architecture including robot, edge, and cloud layers so that deep learning applications can be run on mobile devices. They detected COVID-19 from CT images using DenseNet169 on this architecture. The accuracy values in these studies are relatively lower than those of studies conducted using X-rays. As specified by Ozturk et al. [9] , CT is very costly and not easily accessible because scanners are only available in major health centers. When compared to X-rays, CT causes higher radiation amounts in the patients. Consequently, we recommend using a deep learning model with X-ray images because they are more accessible than CT with lower radiation dosages. Because X-ray radiographies are easily accessible, they are commonly used for the diagnosis of COVID-19 in health centers worldwide. The Cascade VGGCOV19-NET model can be used as auxiliary tool in the diagnosis of COVID-19 in a much shorter period than X-ray radiographies. The main advantages of this model are as follows: • They classify chest X-ray images that can be very easily, rapidly, and smoothly attained from any hospital without needing extra feature extraction methods. • They generally have higher classification performance than the other studies in the literature. • As in some studies in the literature, the original state-ofthe-art CNN architecture was not used. VGG19 CNN architecture has been modified and its classification performance increased. • New Cascade VGGCOV19-NET model based on the YOLOv3 detection algorithm and the VGG19 pretrained CNN architecture has increased the classification performance and guided future studies. • They are able to correctly distinguish COVID-19 at higher ratios than pneumonia and no-findings. Cascade VGGCOV19-NET detects COVID-19 in minutes and at a high accuracy rate. For this reason, it can be used in areas where there is a shortage of specialist doctors and radiologists. In addition, even in small healthcare facilities where only an X-ray device is available, patients can be diagnosed using the model and those who are positive for COVID referred to more advanced health centers for immediate treatment. Another advantage of the model in application is that it prevents patients who are detected negatively from entering PCR tests. Thus, unnecessary clutter in health centers can be avoided. The trained Cascade VGGCOV19-NET model is saved in the computer environment. After a simple interface is designed for the model, the X-ray image can be easily loaded into the system and results obtained very quickly. Also, X-ray images can be rapidly transferred to the physician's computer without the need for additional procedures. All the physician need do is upload the X-ray image to the system with a few mouse clicks to see the result. From this point of view, Cascade VGGCOV19-NET can easily be used for pre-diagnosis. In this study, the deep CNN model modified VGG19 architecture-based VGGCOV19-NET was presented to diagnose COVID-19 cases from chest X-ray images. In addition, we created the Cascade VGGCOV19-NET model by feeding this model with the YOLOv3 algorithm. The VGGCOV19-NET model performs the binary class and multi-class classification tasks with an accuracy of 98.72% and 88.89%, respectively. These accuracy values are 99.84% and 97.16%, respectively, in the Cascade VGGCOV19-NET model. Feeding VGGCOV19-NET with the YOLOv3 algorithm increases the classification performance. Moreover, it decreases the training and test time by approximately 70% in the binary classification and by 60% in the multi-class classification. The Cascade VGGCOV19-NET is an important and original model that detects COVID-19 with high accuracy that will lead to future studies. This model automatically detects COVID-19 with high accuracy from X-ray images, which are widely used and easily accessible for the diagnosis of COVID-19 in health centers worldwide. Conventional pneumonia has great similarities with COVID-19, and there is a severe shortage of specialists. For this reason, Cascade VGGCOV19-NET, an artificial intelligence-supported automatic detection system, is especially important in detecting COVID-19. The Cascade VGGCOV19-NET model is a tool that can assist doctors by diagnosing COVID-19 rapidly and easily at low cost. In addition, the YOLO-supported COVID-19 detection method proposed in this study can be applied to various fields, such as tumor detection from brain images, skin cancer detection, and implant-type detection from implant images. One limitation of the study is the use of a limited number of COVID-19 X-ray images. As medical images for COVID are obtained from local hospitals, it is very difficult to obtain various data sets for such studies conducted with CNN. The number of data sets in the literature was not enough when we initiated this study. Furthermore, we could not use data augmentation as it is not regarded as a good practice in medical imaging. Another limitation of the study is that the proposed cascade method was used only on X-ray images. On the other hand, it is thought that the proposed cascade method could successfully classify CT images. In a future study, we will try to detect COVID-19 through different CNN models on a very large mixed data set by adding other open-source X-ray and computed tomography images in the literature to the X-ray images data set used in this study. Another limitation of the study is that the chest zone was determined only as a quadrant with the YOLO algorithm. In particular, the three-class classification performance can increase if the right and left lung zones are segmented using image processing techniques. This segmentation will be carried out as a future study, and the results will be compared with the YOLObased model. Moreover, another limitation of the study is that only the VGG19 SoTA (state-of-the-art) model was used. Therefore, SoTA models such as EfficientNet, ResNet, MobileNet, Densenet169, and Inceptionv3 fed with YOLO can be used to improve the performance of the COVID diagnosis system. Funding No funds, grants, or other support was received. Conflicts of interest The author declare that there is no any conflict of interest. Coronavirus disease 2019 (COVID-19): a perspective from China A new coronavirus associated with human respiratory disease in China Discovering genomic patterns in SARS-CoV-2 variants Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China Detection of COVID-19 from CT scan images: a spiking neural network-based approach Impact of COVID-19 and other viruses on reproductive health Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets Coronavirus: covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate Automated detection of COVID-19 cases using deep neural networks with X-ray images CovidGAN: data augmentation using auxiliary classifier GAN for improved Covid-19 detection CoroNet: a deep neural network for detection and diagnosis of COVID-19 from chest X-ray images A review of coronavirus disease-2019 (COVID-19) COVID-19 pneumonia: What has CT taught us? A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster A new image classification method using CNN transfer learning and web data augmentation COVIDX-Net: a framework of deep learning classifiers to diagnose COVID-19 in X-ray images CVDNet: a novel deep learning architecture for detection of coronavirus (Covid-19) from chest X-ray images COVID-CXNet: detecting COVID-19 in frontal chest X-ray images using deep learning A survey on transfer learning A survey of transfer learning Very deep convolutional networks for large-scale image recognition Deep neural networks for dental implant system classification An improved VGG19 transfer learning strip steel surface defect recognition deep neural network based on few samples and imbalanced datasets A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition YOLOv3: an incremental improvement Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and deepsort techniques An accurate and robust monitoring method of full-bridge traffic load distribution based on YOLO-v3 machine vision You only look once: unified, real-time object detection COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images CNN-based transfer learning-BiLSTM network: a novel approach for COVID-19 infection detection An automatic approach based on CNN architecture to detect Covid-19 disease from chest X-ray images Automatic detection of COVID-19 Infection from chest X-ray using deep learning Performance result for detection of COVID-19 using deep learning Neural Computing and Applications Early detection of coronavirus cases using chest X-ray images employing machine learning and deep learning approaches Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks Deep transfer learning with apache spark to detect COVID-19 in chest X-ray images RETRACTED ARTI-CLE: deep learning system to screen coronavirus disease 2019 pneumonia Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images Deep learning-based detection for COVID-19 from chest CT using weak label COVID-19 image data collection ChestX-Ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases Convolutional neural network based emotion classification using electrodermal activity signals and time-frequency features CNNpred: CNN-based stock market prediction using a diverse set of variables A new image classification method using CNN transfer learning and web data augmentation Hierarchical convolutional neural networks for fashion image classification Convolutional neural networks: an overview and application in radiology Evaluation of pooling operations in convolutional architectures for object recognition A deep feature learning model for pneumonia detection applying a combination of mRMR feature selection and machine learning models Understanding convolutional neural networks with a mathematical model An introduction to convolutional neural networks A theoretical analysis of feature pooling in visual recognition CovXNet: a multidilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization Tactile paving surface detection with deep learning methods Fighting against COVID-19: a novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients Statistics notes: diagnostic tests 1: sensitivity and specificity The relationship between precisionrecall and ROC curves Reflection on modern methods: revisiting the area under the ROC Curve Can AI help in screening Viral and COVID-19 pneumonia? Fast deep learning computer-aided diagnosis against the novel COVID-19 pandemic from digital chest X-ray images COVID-19: automatic detection from X-ray images by utilizing deep learning methods A Deep learning system to screen novel coronavirus disease The role of edge robotics as-aservice in monitoring COVID-19 infection