key: cord-0502785-6debxy2k
authors: Sarkar, Arjun; Vandenhirtz, Joerg; Nagy, Jozsef; Bacsa, David; Riley, Mitchell
title: Detection of COVID-19 from Chest X-rays using Deep Learning: Comparing COGNEX VisionPro Deep Learning 1.0 Software with Open Source Convolutional Neural Networks
date: 2020-08-03
journal: nan
DOI: nan
sha: 1c5419483519aefce5e01f1dcc50a25aceebc588
doc_id: 502785
cord_uid: 6debxy2k

The COVID-19 pandemic has been having a severe and catastrophic effect on humankind and is being considered the most crucial health calamity of the century. One of the best methods of detecting COVID-19 is from radiological images, namely X-rays and Computed Tomography or CT scan images. Many companies and educational organizations have come together during this crisis and created various Deep Learning models for the effective diagnosis of COVID-19 from chest radiography images. For example, the University of Waterloo, along with Darwin AI, has designed its Deep Learning model COVID-Net and created a dataset called COVIDx, consisting of 13,975 images. In this study, COGNEXs Deep Learning Software-VisionPro Deep Learning is used to classify these Chest X-rays from the COVIDx dataset. The results are compared with the results of COVID-Net and various other state of the art Deep Learning models from the open-source community. Deep Learning tools are often referred to as black boxes because humans cannot interpret how or why a model is classifying an image into a particular class. This problem is addressed by testing VisionPro Deep Learning with two settings, firstly by selecting the entire image, that is, selecting the entire image as the Region of Interest-ROI, and secondly by segmenting the lungs in the first step, and then doing the classification step on the segmented lungs only, instead of using the entire image. VisionPro Deep Learning results-on the entire image as the ROI it achieves an overall F-score of 94.0 percent, and on the segmented lungs, it gets an F-score of 95.3 percent, which is at par or better than COVID-Net and other state of the art open-source Deep Learning models.

The novel coronavirus disease -named COVID-19 by the World Health Organization-is caused by a new coronavirus class known as the SARS-CoV2 (Severe Acute Respiratory Syndrome Coronavirus 2). It is a single-stranded RNA (Ribonucleic Acid) virus that causes severe respiratory infections. The first COVID-19 cases were reported in December 2019, in Wuhan, Hubei province, China [1] . Since the virus has spread worldwide, it has been given the status of a pandemic by the World Health Organization. As of 30 th July 2020, 12:00 GMT, 17.2 million people have been infected, and 670 thousand people have died due to COVID-19 [2] . There have been no vaccines available, so far, for treating COVID-19. One of the best solutions has been detecting the virus in its early stages and then isolating the infected people by quarantining them, thus preventing healthy people from getting infected.

In many cases, Real-time Reverse Transcriptase-Polymerase Chain Reaction (RRT-PCR) of nasopharyngeal swabs have been used for diagnosis [3] . The RT-PCR throat swabs are collected from patients with COVID-19, and the RNA is then extracted. This process takes over two hours to complete and has a long turnaround time with limited sensitivity. The best alternative is to detect COVID-19 from radiology images [4, 5, 6] (chest X-ray images and chest Computed Tomography (CT) images). The advantages of using chest X-rays over CT images are as follows: X-ray imaging systems are much more widely available than CT imaging systems, they are cost-effective, and digital X-ray images can be analyzed at the same point of acquisition, thus making the diagnosis process extremely quick [7] .

X-ray images are grayscale images. In medical imaging terms, these are images with values ranging from 0 to 255, where 0 corresponds to the completely dark pixels, and 255 corresponds to the completely white pixels. Different values on the X-ray image correlate to different areas of density. The different values are -Dark: Locations in the body which are filled with air are going to appear black, Dark Grey: Subcutaneous tissues or fat, Light Grey: Soft tissues like the heart and blood vessels, Off White: Bones such as the ribs, Bright White: Presence of metallic objects such as pacemakers or defibrillators. The way physicians interpret an image is by looking at the borders between the different densities. The ribs appear off-white because they are dense tissues, but since the lungs are filled with air, the lungs appear dark. Similarly, below the lung is the hemidiaphragm, which is a soft tissue and hence appears light grey. This helps in finding the location and extent of the lungs. If two objects with different densities are present close to each other, they can be demarcated in an X-ray image. If something happens in the lungs, such as Pneumonia, the air-dense lungs change into water-dense lungs. This causes the demarcation lines to fade since the pixel densities start closing in on the grayscale bar [8] .

About 20% of the patients infected with COVID-19 develop pulmonary infiltrates and some develop very serious abnormalities [9] . The virus reaches the lungs' gas exchange units and infects alveolar type 2 cells [10] [11] . The most frequent CT abnormalities observed are ground-glass opacity, consolidation, and interlobular septal thickening in both lungs [12] . But due to infection control issues related to patient transport to CT rooms, problems encountered in CT room decontamination, and the lack of CT scanners availability in different parts of the world, portable chest X-rays are likely to be one of the most common modalities for identification and follow up of COVID-19 lung abnormalities [13] . Hence a significant number of expert radiologists who can interpret these radiology images are needed. Due to the ever-increasing number of cases of COVID-19 infections, it is getting harder for radiologists to keep up with this demand. In this scenario, Deep Learning techniques prove to be beneficial in both classifying the abnormalities from lung X-ray images and in aiding the radiologists to accurately predict COVID-19 cases in a reduced time frame.

While many studies have demonstrated success in detecting COVID-19 using Deep Learning with both CT scans and Xrays, most of the Deep Learning architectures need extensive programming . Moreover, most of the architectures fail to showcase if the Deep Learning model is being triggered by abnormalities in the lungs or on some artifacts not related to COVID-19. Due to the absence of a GUI (Graphical User Interface) with most of these Deep Learning models, it is difficult for radiologists, who lack knowledge in Deep Learning or programming, to use these models, let alone train them. Therefore, we showcase an already existing Deep Learning software with a very intuitive GUI, which can be used as a pretrained software or can even be trained on new data from particular hospitals or research centers. COGNEX VisionPro Deep Learning TM is a Deep Learning vision software, from COGNEX Corporation (Headquarters: Natick, MA, United States). It is a field-tested, optimized, and reliable software solution based on a state-of-the-art set of machine learning algorithms. VisionPro Deep Learning combines a comprehensive machine vision tool library with advanced Deep Learning tools.

In this study, we used the latest version -VisionPro Deep Learning 1.0 to classify images as Normal, Non COVID-19 (Pneumonia), or COVID-19 chest X-rays. The results are compared with various state of the art open-source neural networks.

The VisionPro Deep Learning GUI, also called COGNEX Deep Learning Studio, has three tools for image classification, segmentation, and location. It contains various Deep Learning architectures built within the GUI, to carry out specific tasks:

1) Green Tool -This is the Classify tool. It is used to classify objects or complete scenes. It can be used to classify defects, cell types, images of different labels, or different types of test tubes used in laboratories. The Green tool learns from the collection of labeled images of different classes and can then be used to classify images that it has not seen previously. This tool is similar to classification neural networks such as VGG [14] , ResNet [15] or DenseNet [16] .

2) Red Tool -This is the Analyze tool. It is used for segmentation and defect/anomaly detection, for example, to detect anomalies in blood samples (clots), incomplete or improper centrifugation, or sample quality management. The Red tool is also used to segment specific regions, such as defects or areas of interest. The Red tool comes with the option of using either Supervised Learning or Unsupervised Learning for segmentation and detection. This is similar to the segmentation neural network, such as U-Net [17] . 3) Blue Toola) This is the Feature Localization and Identification tool. The Blue tool finds complex features and objects by learning from labeled images. It has self-learning algorithms that can locate, classify, and count the objects in an image. It can be used for locating organs in X-ray images or cells on a microscopic slide.

b) The Blue tool also has Read feature. It is a pretrained model that helps decipher severely deformed and poorly etched words and codes using optical character recognition (OCR). This is the only pretrained tool. All the other tools need to be trained on images first to get results.

For classification of COVID-19 images, two settings are used: 1) Green tool for classification of the entire chest X-ray images 2) Red tool for segmentation of the lungs, and then, a subsequent Green tool classifier just running on the segmented lungs to make sure the Deep Learning software predicts its results based on just the lungs.

The open access benchmark dataset called COVIDx is used for training the various models [18] . The dataset contains a total of 13,975 Chest X-ray images from 13,870 patients. The dataset is a combination of five different publicly available datasets. According to the authors [18] of COVID-Net, COVIDx is one of the largest open source benchmark datasets in terms of the number of COVID-19 positive patient cases.

These five datasets were used by the authors of COVID-Net to generate the final COVIDx dataset: a) Non-COVID 19 pneumonia patient cases and COVID-19 cases from the COVID-19 Image Data Collection [19] , b) COVID-19 patient cases from the Figure 1 COVID-19 Chest Xray Dataset [20] , c) COVID-19 patient cases from the ActualMed COVID-19 Chest X-ray Dataset [21] , d) Patient cases who have no pneumonia (that is, normal) and non-COVID-19 pneumonia patient cases from RSNA Pneumonia Detection Challenge dataset [22] , e) COVID-19 patient cases from COVID-19 radiography dataset [23] .

The idea behind using these five datasets was that these are all open source COVID-19/Pneumonia Chest X-ray datasets, so they can be accessed by everyone in the research community and by the general public, and also add variety to the dataset. However, the lack of COVID-19 Chest X-ray images made the dataset highly imbalanced. Of the total 13,975 images, the data was split into 13,675 training images and the remaining 300 into test images. The data was divided across three classes, 1. Normal (for X-rays which did not contain Pneumonia or COVID-19), 2. Non-COVID-19/Pneumonia (for X-rays, which had some form of bacterial or viral pneumonia, but not COVID-19), and 3. COVID-19 (for X-rays which were COVID-19 positive). In the training set, there were 13,675 images, with 7966 of those belonging to the Normal class, 5451 images belonging to the class Non-COVID-19/Pneumonia, and only 258 images in the class COVID-19. The test set was a balanced set, with each of the three classes having 100 images each [18] .

The authors of COVID-Net have shared the dataset generating scripts, for the construction of the COVIDx dataset for public access available at the following link -https://github.com/lindawangg/COVID-Net [18] . The python notebook 'create_COVIDx_v3.ipynb' is used to generate the dataset. The text files 'train_COVIDx3.txt' and 'test_COVIDx3.txt' contains the file names used in the training and test set, respectively. It is then tested with VisionPro Deep Learning, and the results are compared with COVID-Net results and other open-source Convolutional Neural Network (CNN) architectures such as VGG [14] and DenseNet [16] . Tensorflow [34] (developed by Google Brain Team [35] ) library is used to generate and train the open source CNN architectures. 

The scripts for generating the COVIDx dataset are used to merge the five datasets together and separate the images into train and test folders. Along with the images, the script also generates two text files containing the names of images belonging to the train and test folders, and their class labels [18] .

To simplify classification, a python script is used to convert the '.txt' files into 'pandas' data frames and then finally converted to '.csv' files for better understanding. Next, another python script is created to rename all the X-ray images of the train and test folders according to their class labels and store them in a new train and test directories. Since the goal is classification of the X-ray images, renaming the images makes it easier to interpret the images directly from their file names, rather than consulting a '.csv' file every time. Finally, we have all the 13,975 images in train and test folders, with their file names containing the class labels.

Unlike most other Deep Learning architectures VisionPro Deep Learning does not need any preprocessing of the images. The images can be fed directly into the GUI, and the software automatically does the preprocessing, before starting to train the model.

Since the COVIDx dataset is a combination of various datasets, the images have different color depth, and VisionPro Deep Learning GUI found 326 anomalous images. Training could have been done keeping the anomalous images in the dataset, but it might have reduced the overall F-score of the model. Therefore, we normalize the color depth of all COVIDx images to 24-bit color depth using an external software, IrfanView (open source: irfanview.com). Then the images are added into the VisionPro Deep Learning GUI.

No other preprocessing steps are necessary with VisionPro Deep learning, such as -image augmentation or setting class weights or oversampling of the imbalanced classes, which are necessary for training the other open-source CNN models. Once the images are fed into the VisionPro Deep learning GUI, they are ready to be trained.

Before training the CNN models, such as VGG [14] or DenseNet [16] , it is necessary to execute some preprocessing steps, such as resizing, artificial oversampling of the classes with fewer images, image standardization and finally data augmentation. First the images are resized to 256x256 pixels. The entire training is done on a Nvidia 2080 GPU, and as to not run into 'GPU memory errors' this is found to be the perfect image size. Once the images are resized, and the images and labels loaded together, it is necessary to oversample images which belong to the classes having fewer images, that is, for Non-COVID-19 and the COVID-19 classes. For oversampling, random artificial augmentations are carried out, such as rotation (-20 degrees to +20 degrees), translation, horizontal flip, gaussian blur, and adding external noise. All these are applied randomly using the 'random' library in python. Then all the X-ray images are standardized to have values with a mean of zero and a standard deviation of 1. This is done, keeping in mind that standardization helps the Deep Learning network to learn much faster. Finally, data augmentation is added on all the classes, irrespective of the number of images belonging to those classes. Augmentations include rescaling, height and width shifting, rotating, shearing and zooming. After all these preprocessing steps, the images are ready to be fed into the deep neural networks.

The goal of the study is the classification of Normal, Non-COVID-19(Pneumonia) and COVID-19 X-ray images. For classification, VisionPro Deep Learning uses the Green tool. Once the images are loaded and labeled, they are ready for training. In VisionPro Deep Learning, the Region of Interest (ROI) of the images can be selected. Thus, it is possible to reduce the edges by 10-20% to remove artifacts like the letters or borders, which are usually at the edges of the images. In this case, the entire images are used without cropping the edges because many images have the lungs towards the edges, and we didn't want to remove essential information.

For feeding the images into VisionPro Deep learning, the images do not need to be resized. Images of all resolutions and aspect ratio can be fed into the GUI, and the GUI does the preprocessing automatically before starting the training.

In VisionPro Deep learning, the Green tool has two subcategories, High-detail and Focused. Under High-detail there are several options such as sizes of model architectures -Small, Normal, Large and Extra-Large models, which can be selected for training the model. We train the network using the High detail subcategory and selecting the 'Normal' size model.

Out of the 13,675 images, 80% of the images are used for training. The VisionPro Deep Learning suite automatically selects the other 20% images for validation. Both the training and validation sets are randomly selected by the VisionPro Deep Learning suite. The user just needs to specify the train-validation split. The maximum number of epoch counts are selected to be 100. There are options of selecting the minimum epochs and patience for which the model will train, but this is not selected. Once these are selected, training is started by clicking on the 'brain' icon on the green tool, as seen in Figure 3 .

The VisionPro Deep Learning GUI loaded with the X-ray images from the COVIDx dataset [18] . On the left of the GUI, there are options to select various parameters for training the model, such as model type, model size, epoch count, minimum epochs and patience, train and validation split, class weights, threshold, heat-map and the different data augmentation options of flip, rotation, contrast, zoom, brightness, sharpen, blur, distortion and noise. In the middle, the selected image is shown. On the right, thumbnails of all the images in the training and test set are shown. On the top, there is the tool selection option. In the figure, the green tool has been selected for classification.

Clicking on the 'brain' shaped icon in the green tool, starts the training of the model.

The Green tool is used to classify entire X-ray images but for detection of COVID-19 the Deep Learning model needs to focus on the lungs, and not the peripheral bones, organs and soft tissues. The model must make its predictions exclusively based on the lungs and not on the differences in spinous process, clavicles, soft tissues, ornaments worn around the patient's neck or even the background. This way we can be sure that the model is classifying based entirely on the normal and infected lungs. Therefore, segmentation of the lungs from each image makes sure that the model trains only on these segmented lungs, and not on the entire image. To implement this, the VisionPro Deep Learning Red tool is used. The Red tool is used for segmenting the images, such that only the lungs are visible to the Deep Learning model for training. For achieving this, 100 images of the training set are manually masked using the 'Region selection' option in the Red Tool. The training set consists of 13,675 images, but a manual masking of 100 images is enough to train the model. Once the manual masking is done on the 100 images, the Red tool is trained. After training is completed, the VisionPro Deep Learning GUI has all the training and test images properly masked, such that only the lungs are visible. Anything outside the lungs is treated as outside the ROI and is not used in classification. The Red tool is added in the same environment as the previous green tool and there is no need to create a new instance for segmentation.

Once all the images are segmented, a Green classification tool is implemented after the Red tool. The Green tool is then used to start the classification (similar to Step 3 of Methods), but this time exclusively on the segmented lungs and not on the entire images. 

The VGG [14] network is a deep neural network and is still one of the state-of-the-art Deep Learning models used in image classification.

We use the 19-layer VGG 19 model for training using transfer learning, on the COVIDx dataset. VGG takes an image of size 224x224 pixels. Preprocessing of the images are done automatically by calling 'preprocess input' from the VGG19 model in TensorFlow. The 'preprocess input' is fed into the 'ImageDataGenerator' from TensorFlow(Keras). 'ImageNet' weights are used for training. The COVIDx dataset is also resampled as stated in 2 (b) of the methods section. This ensures that all classes have similar number of images, so as to avoid the model favoring a particular class during training. The VGG19 architecture uses 3x3 convolutional filters which performs much better than the older AlexNet [24] models. All the activation functions used in the hidden layers are ReLU (Rectified Linear Units) [25] . After the VGG architecture we add four fully connected layers with 1024 nodes each. All four layers use the ReLU activation function and L2 regularization [26] [27] . To provide better regularization, after each of these layers a Dropout is set. The final layer is a fully connected layer of 3 nodes for the classification of the 3 classes. The final layer has the activation function 'SoftMax' [28] .

In the preprocessing steps, the labels of the images are not 'one hot encoded' but kept as three distinct digits. So instead of using 'categorical cross entropy' [29] , which is commonly used when the labels are one-hot encoded, the 'sparse categorical cross entropy' is used as the loss function. 'Adam' [30] optimizer is used with learning rate scheduling, such that the learning rate decreases after every thirty epochs. During training, several callbacks are set, such as saving the model each time the validation loss decreases, and using early stopping, to stop the training when the validation loss does not improve even after several epochs. The epoch count is set to 100. For training, batches of 32 images are fed to the model at once. Once all these hyperparameters are set, training of the model is started. After training completion, the program is set to plot the confusion matrix and give results on the various evaluation metrics, based on which the various models are compared. Figure 6 : Residual learning: a building block. Image from original ResNet paper [15] .

One of the bottlenecks of the VGG network is that it does not go too deep as it starts losing generalization capability the deeper it goes. To overcome this problem ResNet or Residual Network [15] is chosen.

The ResNet architecture consists of several residual blocks with each block having several convolutional operations. The implementation of skip connections, as shown in Figure 5 , makes the ResNet better than VGG. The skip connections between layers add the outputs from previous layers to the outputs of the stacked layers. This allows the training of deeper networks. One of the problems that ResNet solves is the vanishing gradient problem [31] .

For training the COVIDx dataset we use the 50-layer ResNet50V2 (Version 2) architecture. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularization followed by Dropouts for better regularization. All the other settings and hyperparameters are kept similar to the training of the VGG19 network (Method, part 5). Figure 7 : A 5-layer dense block. Each layer takes all preceding feature-maps as input. Image from original DenseNet paper [16] .

DenseNet (Dense Convolutional Network) [16] is an architecture which focuses on making the Deep Learning networks go even deeper, but at the same time makes them more efficient to train, by using shorter connection between the layers. DenseNet is a convolutional neural network where each layer is connected to all other layers that are deeper in the network, that is, the first layer is connected to the 2nd, 3rd, 4th and so on, the second layer is connected to the 3rd, 4th, 5th and so on. Unlike ResNet [15] it does not combine features through summation but combines the features by concatenating them. So, the 'i th ' layer has 'i' inputs and consists of feature maps of all its preceding convolutional blocks. It hence requires fewer parameters than traditional convolutional neural networks.

For training the COVIDx dataset we use the 121 layered DenseNet121 architecture. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularization followed by Dropouts for better regularization. All the other settings and hyperparameters are kept similar to the training of the VGG19 network (Method, part 5). Inception Network [32] has been developed with the idea of going even deeper with convolutional blocks. Very deep networks are prone to overfitting, and it is hard to pass gradient updates throughout the entire network. Also, images may have huge variations, and thus choosing the right kernel size for convolution layers is hard. To address these problems Inception network is one of the best possible networks. Inception network version 1 has multiple sizes of filters in the same level. It has various connections of 3 different sizes of filters of 1x1, 3x3, 5x5, with max pooling in a single inception module. All the outputs are concatenated and then sent to the next inception module.

For training the COVIDx dataset we use the 48 layered InceptionV3 [33] architecture, which also includes 7x7 convolutions, Batch Normalization and Label smoothing in addition to the Inception version 1 modules. We use transfer learning to train the model, and then add eight fully connected layers with L2 regularization followed by Dropouts for better regularization. All the other settings and hyperparameters are kept similar to the training of the VGG19 network (Method, part 5).

In medical imaging, since the decisions are of high impact, it is very important to understand exactly which evaluation metrics are necessary to decide if a model works on a patient or not. Accuracy of a model is not the best metric for deciding whether the model is fit for a patient. Rather it is important to look into other evaluation metrices such as sensitivity, predictive values and overall F-scores.

First, the confusion matrix is plotted for the 300 test images, for all the models that we use for the comparison. A report can also be generated on all the test images as seen in Figure 10 . It shows a small snippet of 6 COVID-19 positive images from the test set. The report contains details of the 300 test images, including the filename, the image, the original label as 'Labeled', and the predictions made by VisionPro Deep learning as 'Marked', with the percentage of confidence of prediction on each class. If the prediction is different from that of the label, then it is marked in red.

Misclassification Results: Out of the 300 test images, VisionPro Deep Learning classifies 18 images incorrectly with entire ROI selected, and 16 images incorrectly with the segmented lungs. COVID-Net had classified 20 images incorrectly [18] . VGG19 [14] , ResNet50V2 [15] , Densenet121 [16] , and InceptionV3 [33] networks make 47, 37, 41, 26 misclassifications, respectively. VisionPro Deep Learning has fewer misclassifications than all the open-source models in both the settings. Compared to COVID-Net [18] , the performance of VisionPro Deep learning is similar with the entire image as ROI and much better when using the segmented lungs. Heatmaps are a great way to visualize predictions of the Deep Learning algorithm. They point out exactly which parts of the image triggers the model to come up with its predictions. Figure 12 shows the heatmaps generated by VisionPro Deep Learning on four COVID-19 images. The arrows on the X-ray images are indicating the position of the most infected parts of the lungs. The heatmaps clearly show, that VisionPro Deep learning generates its results based on exactly these positions on the lungs, indicating that the predictions are not based on artifacts, but rather on the actual abnormalities.

Among the open-source network architectures, the InceptionV3 gives the best results, followed by ResnetV2 [15] . The confusion matrix for all the open-source architectures and COVID-Net [18] are shown in Figure 13 ,14, respectively. [15] , c: Densenet121 [16] , d: Inception V3 [33] . Inception V3 has the best results, with the least number of false predictions. ResNet50 V2 has the next best result, followed by DenseNet121 and VGG19, respectively.

Confusion matrix on the 300 test images for COVID-Net. Image from the original COVID-Net paper [18] . COVID-Net results are better than all the open source models that we use for training.

A confidence interval is a range of values we are fairly sure our true value always lies in. Since the number of images in test set are so few, only 100 images in each class, we see high confidence intervals in most of the cases, both with the open source models, as well as with VisionPro Deep Learning. The best possible way to reduce the confidence interval is to increase the number of images in the test set, by a range, which lies somewhere in the thousands and not in the hundreds. Since the number of COVID-19 images are very few, and we want to make a one-to-one comparison with the results of COVID-Net [18] , we use the same number of images provided in the test set of the COVIDx dataset. We calculate a 95% confidence interval on the predicted sensitivity and the positive predicted values, to figure out a possible range of values by which the actual results may vary on the given test data.

The confidence interval of the accuracy rates is calculated using the formula:

r= z (√ (accuracy(1-accuracy)))/N where, z is the significance level of the confidence interval (the number of standard deviation of the Gaussian distribution), accuracy is the estimated accuracy (in our cases sensitivity, positive predictive value, and F-score), and N (100 for each class) denotes the number of samples for that class. Here we use 95% confidence interval, for which the corresponding value of z is 1.96 [35] . Sensitivity or Recall measures the true positive rate. It is the proportion of the true positives detected by a model to the total number of positives. The better the sensitivity the better is the model at correctly identifying the infection.

DenseNet121 [16] has the best sensitivity for Normal images, VisionPro Deep Learning has the best sensitivity for COVID-19 images and COVID-Net [18] has the best sensitivity for Non-COVID-19 images. Though it is not the best in comparison, VisionPro Deep Learning still has a very high sensitivity on both settings for Normal images, at 98.0%, and good sensitivity value to Non COVID-19 images, 91.0% on the images with the lungs segmented. The overall confidence interval is not so good. It is decent at ±2.7% for Normal images, but really high for Non COVID-19 images at ±5.6%. Positive Predictive Value (PPV) or Precision shows the percentage of how many predictions selected the model are relevant.

For Normal and COVID-19 classes, VisionPro Deep Learning significantly outperforms all other models, as seen in Table  2a . For the Non-COVID-19 class COVID-Net [18] has the best results. VisionPro Deep Learning has a really good PPV to COVID-19 images, 96.9% for the images with the entire ROI selected, and 97.0% on the images with the lungs segmented. COVID-Net has a really good confidence interval for COVID-19 images.

F-score takes into consideration both the Sensitivity and PPV of a model. It can be considered as an overall score of the performance of the model.

As seen in VisionPro Deep Learning has the best F-scores on COVID-19 images for both the settings. On the entire ROI it has an F-score of 96.0% and on the segmented lungs it has an F-scores of 97.0%. Overall, on all the three classes, VisionPro Deep Learning achieves an F-score of 94.0% on the entire image as ROI, and an F-score of 95.3% on the segmented lungs. The similarity of the results in both the settings and the heatmaps, both show that even without the lungs being segmented, VisionPro Deep Learning is still predicting its classes based on the actual abnormalities.

As expected, when comparing the confidence intervals, none of the models perform well, due to the significantly low number of images in each class in the test set. Figure 15 , due to the significantly high number of images in the Normal and Non-COVID-19 class, the confidence interval significantly improves from the previous values ranging from 3%-5% to just 1.0%-2.4%. These results clearly indicate that as the number of test images are increased, the confidence interval improves significantly. Similarly, as this dataset has only 91 images in the COVID-19 class, the confidence interval is hence similar to the previous results.

Also, table 4 indicates that even when the number of images in the test set are significantly increased, the performance of VisionPro Deep Learning does not fall, but rather it still produces sensitivity, PPV and F-scores above 90% in all the classes. If table 4 is compared with the previous results, it is seen that the results are very consistent in VisionPro Deep Learning, even with a change in the number of images in the train and test set. Also, the results of the Sensitivity, PPV and F-scores are very similar with the entire image as ROI and also on the segmented lungs, further indicating that the predictions are made based on the lungs and not on the surrounding artifacts.

In this study we use COGNEX's Deep Learning Software-VisionPro Deep Learning (version 1.0) and compare its performance with other state of the art Deep Learning architectures. VisionPro Deep Learning has an intuitive GUI making the software very easy to use. Building applications requires no coding skills in any programming language. Little to no preprocessing is required, also decreasing the development time. Imbalanced data is automatically balanced within the software. Once the images are loaded into VisionPro Deep Learning and the right tool is selected, the Deep Learning training can start. After completion of training, it outputs a confusion matrix, along with the various important metrics, such as precision, recall and F-score. Additionally, a report can be generated that identifies all misclassified images. This makes it particularly suitable for radiologists, hospitals, and research workers to harness the power of Deep Learning without advanced coding knowledge. Moreover, as the results from this study indicates, the Deep Learning algorithms in VisionPro Deep Learning are robust and comparable or even better than the various state of the art algorithms available today. The problem of Deep Learning algorithms being a "black box" can be overcome by using a pipeline of tools, stacked sequentially to first segment the lungs then classify only on the segmented lungs. It is like combining a U-Net [17] and Inception [32] model together. This ensures the algorithm does not focus on any artifacts when generating its classification results. A heatmap can be generated to showcase exactly where the model is focusing on while making the predictions. And with both the settings, using the entire image as the Region of Interest and classification on the segmented lungs, VisionPro Deep Learning achieves the highest overall F-scores, surpassing the results of the various open source architectures.

In the future, more testing will be done to understand how changing the number of training images or using augmentations in the training set affects the performance of VisionPro Deep Learning as compared to the other opensource models.

This software is by no means a stand-alone solution in the detection of COVID-19 from Chest X-ray images, but can aid radiologists and clinicians to achieve a faster and understandable diagnosis using the full potential of Deep Learning, without the prerequisite of having to code in any programming language.

We will like to thank COGNEX for providing their latest Deep Learning software for testing, and University of Waterloo, along with Darwin AI for collecting and merging the X-ray images from various sources and for providing the python scripts for generating the COVIDx dataset.

Arjun Sarkar wrote the manuscript. Arjun Sarkar, Joerg Vandenhirtz, Jozsef Nagy and David Bacsa conducted the experiment. Arjun Sarkar, Joerg Vandenhirtz, Jozsef Nagy, David Bacsa, Mitchell Riley analyzed the results. All authors reviewed the manuscript.

Arjun Sarkar, Joerg Vandenhirtz, Jozsef Nagy, David Bacsa, Mitchell Riley are affiliated with COGNEX.

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China

Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirusinfected pneumonia in Wuhan, China

Imaging profile of the COVID-19 infection: Radiologic findings and literature review

Clinical features of patients infected with 2019 novel coronavirus in wuhan, china

Clinical characteristics of coronavirus disease 2019 in china

COVIDAID: COVID-19 Detection Using Chest X-Ray

Squire's Fundamentals of Radiology

Pathogenesis of COVID-19 from a cell biologic perspective

SARS-CoV replicates in primary human alveolar type II cell cultures but not in type I-like cells

Influenza A virus's target type II pneumocytes in the human lung

Chest CT Findings in Patients with Corona Virus Disease 2019 and its Relationship with Clinical Features

Portable chest x-ray in coronavirus disease-19 (COVID-19): A pictorial review

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Residual Learning for Image Recognition

Densely Connected Convolutional Networks

Convolutional Networks for Biomedical Image Segmentation

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images

COVID-19 image data collection

Figure 1 COVID-19 chest x-ray data initiative

Actualmed COVID-19 chest x-ray data initiative

RSNA pneumonia detection challenge

Can AI help in screening Viral and COVID-19 pneumonia

ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning using Rectified Linear Units (ReLU)

L2 Regularization for Learning Kernels

L2 Regularization versus Batch and Weight Normalization

Deep Learning

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

Residual Networks Behave Like Ensembles of Relatively Shallow Networks

Going Deeper with Convolutions

Rethinking the Inception Architecture for Computer Vision

TensorFlow: Large-scale machine learning on heterogeneous systems

Deep-COVID: Predicting COVID-19 From Chest X-Ray Images Using Deep Transfer Learning