key: cord-0802605-qrivd52o authors: Zokaeinikoo, M.; Mitra, P.; Kumara, S.; Kazemian, P. title: AIDCOV: An Interpretable Artificial Intelligence Model for Detection of COVID-19 from Chest Radiography Images date: 2020-05-25 journal: nan DOI: 10.1101/2020.05.24.20111922 sha: 46164196fd5b762921e338351d7ad283aa811d8c doc_id: 802605 cord_uid: qrivd52o As the Coronavirus Disease 2019 (COVID-19) pandemic continues to grow globally, testing to detect COVID-19 and isolating individuals who test positive remains to be the primary strategy for preventing community spread of the disease. The current gold standard method of testing for COVID-19 is the reverse transcription polymerase chain reaction (RT-PCR) test. The RT-PCR test, however, has an imperfect sensitivity (around 70%), is time-consuming and labor-intensive, and is in short supply, particularly in resource-limited countries. Therefore, automatic and accurate detection of COVID-19 using medical imaging modalities such as chest X-ray and Computed Tomography, which are more widely available and accessible, can be beneficial. We develop a novel hierarchical attention neural network model to classify chest radiography images as belonging to a person with either COVID-19, other infections, or no pneumonia (i.e., normal). We refer to this model as Artificial Intelligence for Detection of COVID-19 (AIDCOV). The hierarchical structure in AIDCOV captures the dependency of features and improves model performance while the attention mechanism makes the model interpretable and transparent. Using a publicly available dataset of 5801 chest images, we demonstrate that our model achieves a mean cross-validation accuracy of 97.8%. AIDCOV has a sensitivity of 99.3%, a specificity of 99.98%, and a positive predictive value of 99.6% in detecting COVID-19 from chest radiography images. AIDCOV can be used in conjunction with or instead of RT-PCR testing (where RT-PCR testing is unavailable) to detect and isolate individuals with COVID-19 and prevent onward transmission to the general population and healthcare workers. The outbreak of the novel coronavirus known as Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) started in Wuhan, China, in December 2019 and rapidly spread worldwide. As the number of people with Coronavirus Disease 2019 escalates in the United States and around the world, reducing the number of transmissions from infected individuals to the general population * Correspondence to pooyan.kazemian@mgh.harvard.edu or pooyan.kazemian@gmail.com and healthcare workers becomes increasingly important and challenging. Although about 8 in 10 people who contract the SARS-CoV-2 virus remain asymptomatic or develop only mild to moderate symptoms (Mizumoto et al. 2020; Arons et al. 2020; Wu and McGoogan 2020; Yang et al. 2020) , others may develop life-threatening conditions, such as dyspnea, pneumonia, or severe acute respiratory syndrome, which require hospital or ICU care with supplemental oxygen or mechanical ventilation (Mahase 2020; WHO ) . The rapid spread of COVID-19 is, in part, due to the lack of sufficient testing and isolation of positive cases, which subsequently leads to community transmission from undiagnosed cases. This rapid spread may result in overwhelming and collapsing healthcare systems, even in developed countries, due to the surge in demand for hospital and ICU care. A critical step in controlling the transmissions and flattening the curve is to use a widely-available, fast, and accurate COVID-19 detection method, and to immediately isolate diagnosed cases until they are no longer infectious. The current gold standard screening method for COVID-19 is the direct detection of SARS-CoV-2 RNA by reverse transcription polymerase chain reaction (RT-PCR) test (Patel, Jernigan, and others 2020) . Several RT-PCR assays are used in the U.S. and around the world. Each has different performance characteristics and turnaround time (ranging from minutes to several hours) and requires different specimen types (Organization and others 2020) . The sensitivity of RT-PCR testing is widely variable; depending on the assay, the type and quality of the specimen obtained, the stage of the disease, and the duration of infection, it can vary between 32% to 73% (Wang et al. 2020a; Kucirka et al. 2020; Guo et al. 2020) . Therefore, there is an immediate need for accessible, rapid, and accurate testing tools to help combat the spread of the SARS-CoV-2 virus. Medical imaging modalities such as Chest X-Ray (CXR) and Computed Tomography (CT) can be used as an alternative to RT-PCR testing to detect characteristic symptoms of COVID-19 in patients' chest images Ng et al. 2020) . Detecting COVID-19 from chest radiography images has shown promising results and higher sensitivity compared to RT-PCR testing . More-over, CT images of patients with COVID-19 may show abnormalities before the patient develops symptoms and before the detection of viral RNA from upper respiratory specimens (Sutton et al. 2020; Zhao et al. 2020) . However, visual evaluation of radiography images to find subtle signs of COVID-19 is both fallible and time-consuming. In this context, artificial intelligence (AI) methods can be leveraged to analyze medical images for subtle signs of SARS-CoV-2 infection automatically and to detect COVID-19 rapidly and accurately. In this study, we develop a new deep learning model to detect COVID-19 using CXR and CT scan images automatically. We refer to this model as Artificial Intelligence for Detection of COVID-19 (AIDCOV). AIDCOV employs a novel hierarchical attention structure, which can tell clinicians the specific locations of the lung affected by the SARS-CoV-2 infection rapidly and with high sensitivity and specificity. AIDCOV includes a novel two-level hierarchical attention structure for classification of chest radiography images into one of the three classes: COVID-19 viral infection, other viral/bacterial infection (i.e., non-COVID-19 infection), or normal (i.e., no infection). This hierarchical structure enables the model to capture the dependency of features extracted from chest images via a pre-trained network (e.g., VGG-16) in both horizontal and vertical directions and helps improve model performance. The attention mechanism makes the black-box deep neural network model interpretable such that the model can designate the specific locations of patients' lungs that manifest subtle signs of infection. AIDCOV is an end-to-end deep neural network model, which does not require any feature engineering. Deep neural network models often include hundreds of thousands of hyperparameters, and thus they need to be trained on very large datasets. Transfer learning is a technique that allows training deep neural network models on small datasets by taking a pre-trained deep neural network model and repurposing it for a different task. We leverage the powerful idea of transfer learning by employing VGG-16 (Simonyan and Zisserman 2014), a pre-trained convolutional neural network that is trained on a dataset of more than 15 million images (Krizhevsky, Sutskever, and Hinton 2012) . VGG-16 has shown promising performance for medical image analysis (Yadav and Jadhav 2019; Shen et al. 2019; Guan et al. 2019) . VGG-16 has 13 convolutional layers, 5 pooling layers, and 3 fully connected layers. We removed the 3 fully connected layers and replaced them with our novel hierarchical attention structure. While the early layers of VGG-16 learn low-level features of the image, our hierarchical attention model learns subtle signs of COVID-19 and other viral/bacterial infections and determines the final classification. In this section, we describe our novel hierarchical attention structure for image classification, which considers the dependencies of feature components in both horizontal (width) and vertical (height) directions. Step 1: Resizing the image. First, we need to resize input chest radiography images to the format that is compatible with the pre-trained VGG-16 model. We resized all the input images to size 160 × 160 × 3. Step 2: Feature extraction. We use VGG-16 to obtain a low-dimensional feature representation vector for each resized image (Figure 1 ). In general, if the input image is of size (A, B, 3), the output of the VGG-16 model will be a tensor of size (A/32,B/32, 512). In our case, since the input image is of size (160,160,3), the output of the model is of size (5, 5, 512). We show this output by X, which is obtained as follows: where C is the resized radiography image, and W V GG and b V GG are the trained weight and bias matrices obtained from the pre-trained VGG-16 model. As mentioned above, X is the output of size (5, 5, 512). We refer to each (1, 1, 512) block of X as x ij , where i ∈ [1, 5] and j ∈ [1, 5] ( Figure 2 ). Step 3: Horizontal feature encoding. Next, for each level of i ∈ [1, 5], we are going to encode the output block along the x axis (i.e., horizontally). To do so, we apply a GRU-BRNN on x i1 , ..., x i5 for each level of i to incorporate horizontal dependencies (in both forward and backward directions) within each row of an image. We have, where h ij is the output of the GRU-BRNN and u ij is its hidden representation. W i and b i are the weight matix and bias vector for each row of the x ij block learned through training. Since we aim to determine the contribution of each x ij block within each row (horizontal level) to the overall prediction, we applied an attention layer on top of the hidden representations u ij to obtain the attention scores α ij by learning the row context vector u i , Finally, we encode eachŷ i block as a weighted sum of h ij and the attention scores α ij (Figure 3) , Step 4: Vertical feature encoding. In this step, we encode the representationsŷ i computed from the previous step along the y axis (i.e., vertically). We further determine the attention scores α i for eachŷ i block as follows. where h i capture the dependencies ofŷ i blocks using GRU-BRNN and u i is its hidden representation obtained through training of W and b parameters. The attention weights α i for y i are computed using u i and the trained context vector of u. The image encoding is the weighted sum of h i encodings and attention scores α i . Finally, we use the image encoding I to build a multi-class classifier as, Figure 4 shows the hierarchical encoding network of Steps 3 and 4 described above. We leverage a labeled dataset of 5801 chest X-ray images and CT scans to train and test the AIDCOV model. This dataset consists of 269 images from patients with COVID-19 obtained in early April 2020 from an open-source GitHub repository (Cohen, Morrison, and Dao 2020) , which can be accessed on the web: https://github.com/ieee8023/covidchestxray-dataset. Moreover, our dataset includes 3949 images from patients with other viral/bacterial infections as well as 1583 images from individuals with no pneumonia (i.e., normal), both obtained from the "Chest X-Ray Images" Kaggle repository (Kaggle 2013). Therefore, each image in our dataset is labeled as either COVID-19, other infections, or normal. This study was exempt from institutional review board (IRB) review since it used publicly available, deidentified data. We set the GRU dimension to 50 for both horizontal ( Step 3) and vertical (Step 4) encoding levels. So, the bidirectional GRU has 100 dimensions. We used a mini-batch size of 20 images and trained on 10 epochs. We used 'categorical cross entropy' as our loss function to classify chest radiography images into one of the three classes (COVID-19, other infection, or normal). We employed the Adaptive Subgradient (Adagrad) as the optimizer. To evaluate the model's performance, we conducted 10fold stratified cross-validation such that in each fold, we trained our model on 4698 samples, validated on 523 samples, and tested it on 580 samples. Each set of stratified cross-validation includes the same ratio of subjects of each class in each fold. To better evaluate the value of our novel hierarchical attention structure, we developed two related but simpler deep learning models. In one model, we removed the attention mechanism from our base model, i.e., we fed the feature representations obtained from VGG-16 to the hierarchical structure of Figure 4 without the attention layers. In another model, we replaced the whole hierarchical attention network with a fully connected network. AIDCOV achieved a mean cross-validation accuracy of 97.8% across the 10 folds (Table 1) . The model demonstrated excellent performance in detecting COVID-19. The hierarchical attention model had a sensitivity (true positive rate) of 99.3%, a specificity (true negative rate) of 99.98%, and a positive predictive value (PPV) of 99.6% for detecting COVID-19 from chest radiography images (Figure 2) . AIDCOV also demonstrated promising performance for correctly classifying non-COVID-19 chest images. The model had a sensitivity (specificity) of 98.7% (96.0%) for detecting other viral/bacterial infections and a sensitivity (specificity) of 95.4% (98.8%) for normal chest radiography images. The PPV for other infections and normal images were 98.1% and 96.8%, respectively (Figure 2) . These results suggest that AIDCOV performs well in detecting COVID-19, other viral/bacterial infections, and normal cases based on the chest radiography images. The Value of Hierarchical Attention Structure The hierarchical model (without the attention layers) and the fully connected structure had lower accuracy levels than the hierarchical attention model. The hierarchical (no attention) model had an overall cross-validation accuracy of 97.5%, slightly lower than the hierarchical attention model. The sensitivity of this model to detect COVID-19 from chest radiography images was 99.3%, similar to the hierarchical attention model. The model with a fully connected structure had the poorest performance among the three models and resulted in an overall cross-validation accuracy of 96.0% when tested on our dataset. This model had a sensitivity of 93.3% in detecting COVID-19. These results highlight the value of our novel hierarchical structure, which can capture the dependency of all feature representation blocks (obtained from the VGG-16) in both horizontal and vertical directions and improve model performance. The attention mechanism in our hierarchical attention model also helps with the interpretability and transparency of the model predictions. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 25, 2020. . https://doi.org/10.1101/2020.05.24.20111922 doi: medRxiv preprint The strengths of AIDCOV are not limited to its superior sensitivity, specificity, and PPV in detecting COVID-19 and other infections. To gain deeper insights into how the model makes its predictions and identify the areas of the lung affected by the infection, we extracted attention scores for each image. Figure 5 illustrates the areas of the chest radiography images for each type (COVID-19, other infections, and normal) that the model paid more attention to based on the final attention scores averaged over all images of that type. It can be seen that the model identified signs of SARS-CoV-2 infection mostly in the lower zone and other infections around the middle zone of the lung. Since the model determines the attention scores relatively, the normal radiography images had the highest attention score on the very top level corresponding to the upper zone of the lung. In other words, since the lower and middle zones of the lung contain signs of COVID-19 and other infections and, therefore, dominate the attention scores for these zones, the normal images receive lower attention scores for the middle and lower zones and higher attention scores for the upper zone of the lung. Moreover, AIDCOV can identify the specific blocks within each individual's chest image that may include subtle signs of infection via the attention scores of both encoding levels (i.e., horizontal and vertical). To better illustrate the model's interpretability, we included the chest images of three patients with COVID-19 in Figure 6 and provided attention scores for each zone of the lung and each block of the image. Given that there are five zones in each image, we highlighted the zones that received an attention score of 0.2 or higher. Then, within those zones, we highlighted the blocks that received an attention score greater than or equal to 0.2. In Figure 6 , the first patient (Panels A and B) was admitted to the hospital with fever, shortness of breath (dyspnea), and low oxygen saturation. Radiological worsening with changes within the lower lobes was noted on her chest Xray. Our model correctly gave a higher attention score to the lower zone of the lung and identified the specific areas of the lower zone that demonstrated signs of SARS-CoV-2 infection. The second patient (Panels C and D) in Figure 6 came to the hospital with suspected pneumonia. The radiographic investigation indicated abnormalities in the middle part of the right lung. As seen in Figure 6 .C and 6.D, our model correctly identified these abnormalities in the middle zone of the right lung. (Note that the right lung appears on the left side of the radiography image.) The chest image of the third patient (Panels E and F) in Figure 6 indicated small consolidation in the right upper lobe and ground-glass opacities in both lower lobes. Our deep learning model correctly identified the middle and the lower zones as the areas with abnormalities ( Figure 6 .E). It further pointed to the right lung (that appears on the left side of Figure 6 .F) and the lower zone of both lungs. In this study, we introduced AIDCOV, an artificial intelligence model for detection of COVID-19 from chest radiography images. The model performs multi-class prediction, i.e., it labels each image as belonging to a person with either COVID-19, other infections, or no pneumonia (i.e., normal). AIDCOV leverages VGG-16 to obtain a low-dimensional feature representation for each image. It then encodes the features along the horizontal (width) and vertical (height) directions using a novel two-level hierarchical attention structure. This allows the model to capture the horizontal and vertical dependencies of the features, which is ignored in a fully connected network. The attention mechanism further helps make the model interpretable and gives transparency to model predictions. We trained and tested AIDCOV on publicly available datasets comprising 5801 chest radiography images. Of these, 269 samples had COVID-19, 3949 samples had other viral/bacterial infections, and the remaining 1583 samples were normal. We demonstrated that the model has an overall accuracy of 97.8% across the ten folds of cross-validation. AIDCOV showed excellent sensitivity (99.3%), specificity (99.98%), and positive predictive value (99.6%), in detecting COVID-19 from chest radiography images. The high sensitivity and specificity of our model are critical in practice since a false negative result can lead to not isolating an individual with COVID-19, which can subsequently result in many transmissions from that person to others, including to the healthcare workers. Transmission from patients to healthcare workers results in undermining the healthcare capacity. In China, about 5%, and in Italy, about 10% of infections were among the healthcare workers (Wang, Zhou, and Liu 2020) . Thus, leveraging chest radiographs in hospitals to identify patients with COVID-19 and using appropriate personal protective equipment (PPE) when providing care to these patients can be beneficial. The high positive predictive value of AIDCOV implies a very low rate of false positives. This is also crucial because false-positive results can put additional burden on the healthcare system due to unnecessary use of scarce resources such as hospital isolation rooms (e.g., negative pressure rooms) and PPE for healthcare workers, which could be used for actual COVID-19 patients. These results indicate that the AID-COV model can be a reliable tool for detecting COVID-19 from chest radiography images. To better assess the value of our hierarchical attention structure, we developed two simpler models. In one model, we kept the hierarchical structure but removed the attention mechanism from it. In another model, we replaced the whole hierarchical attention structure with a fully connected network. The model with a fully connected network had the poorest performance among the three models and achieved an accuracy of 96.0%. The hierarchical model without attention had an accuracy of 97.5%, which is only slightly below our hierarchical attention model's accuracy. However, the main advantage of including an attention mechanism is making the model interpretable and providing transparency to model predictions. Using an analysis of the attention scores that AIDCOV gave to COVID-19, other infections, and normal samples, we demonstrated that the model identified signs of SARS-CoV-2 infection, on average, in the lower and middle zones of the lung more frequently. This is consistent with findings from radiology reports, which indicate that abnormalities due to SARS-CoV-2 infection are more commonly found in inferior and middle lobes of the lung, corresponding to the lower and middle zones of the chest radiography images (Pan et al. 2020; Zhou et al. 2020; Bernheim et al. 2020; Zu et al. 2020; Wong et al. 2020) . Furthermore, our hierarchical attention model divides each image into 25 blocks (5 × 5) and provides the attention score for each block. This can shed light on particular regions within each radiography image that may contain subtle signs of infection and makes the model more transparent and trustable. Currently, the primary screening tool to detect COVID-19 is reverse transcription polymerase chain reaction (RT-PCR), which is a laboratory test to detect viral nucleic acid (Patel, Jernigan, and others 2020) . However, not only the capacity for RT-PCR testing is limited (both in the U.S. and in many other countries), but also the result return time can range between several minutes to hours (Organization and others 2020). More importantly, the RT-PCT test has a sensitivity of around 70%, which can be even lower depending on the assay, type and quality of the specimen, and the disease stage Ai et al. 2020; Wang et al. 2020a; Kucirka et al. 2020; Guo et al. 2020) . This means that about three in ten indi-viduals with COVID-19 receive a false-negative test result. In this context, AIDCOV provides an accurate and fast alternative to RT-PCR testing that can quickly detect COVID-19 from chest radiography images. Moreover, given that RT-PCR testing kits are in short supply in many resource-limited settings, using chest Xray and CT scan images, which are generally more available around the world, as a screening tool for COVID-19 may be worth considering. We demonstrated that our model is a viable alternative to RT-PCR testing, which can be used to detect and quarantine infected individuals and stop the spread of the SARS-CoV-2 virus. A number of other artificial intelligence models to detect COVID-19 from chest radiography images have also been developed recently. Li et al. developed COVNet, a neural network model to detect COVID-19 and communityacquired pneumonia from CT images . COV-Net demonstrated a sensitivity of 89.8% and a specificity of 95.8% in detecting COVID-19. It also achieved a sensitivity and specificity of 86.9% and 92.3%, respectively, in detecting community-acquired pneumonia from CT images. Wang et al. developed COVID-Net using a deep convolutional neural network structure designed for detecting COVID-19 and other infections from chest X-ray images (Wang, Lin, and Wong 2020) . COVID-Net reports a sensitivity of 91.0% and a positive predictive value (PPV) of 98.9% in detecting COVID-19. The sensitivity and PPV of COVID-Net in detecting other infections were 94.0% and 91.3%, respectively. Zhang et al. developed a deep learning model to detect COVID-19 from chest X-ray images . Their model had a sensitivity (specificity) of 72.0% (98.0%) with a threshold of 0.5 and 96.0% (70.7%) with a threshold of 0.15. Other studies evaluated a number of convolutional neural network structures on very small datasets of chest radiography images and achieved various accuracy levels (Hemdan, Shouman, and Karar 2020; Narin, Kaya, and Pamuk 2020) . AIDCOV outperforms the models described above in terms of sensitivity, specificity, and PPV in detecting COVID-19 and other infections from chest radiography images. Our study has a number of limitations and, therefore, our results should be interpreted with caution. First, as was the case with the other related studies, our dataset was limited in size and had only 269 cases with COVID-19. Further validation on datasets with a larger number of chest radiography images from patients with COVID-19 would be valuable. Second, chest Xray images may not show signs of SARS-CoV-2 infections in the early stages of illness. Abnormalities are more likely to develop over the course of the disease (Wang et al. 2020b; Simpson et al. 2020) . However, some preliminary data suggest that abnormalities may show in CT images in the presymptomatic stage and prior to the detection of viral RNA from upper respiratory specimens (Sutton et al. 2020; Zhao et al. 2020) . Our dataset does not include information on time since symptom onset or the disease stage at the time the image was taken; thus, we could not assess our model's accuracy based on these factors. In conclusion, AIDCOV demonstrated high sensitivity, specificity, and positive predictive value in detecting COVID-19 from chest Xray and CT images. Given that ra-diography is widely available in many countries around the world, AIDCOV can be used in conjunction with or instead of RT-PCR testing (e.g., where RT-PCR testing is unavailable) to find individuals infected with the SARS-CoV-2 virus, isolate them, and prevent the spread of COVID-19. Correlation of chest ct and rtpcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases Presymptomatic sars-cov-2 infections and transmission in a skilled nursing facility Chest ct findings in coronavirus disease-19 (covid-19): relationship to duration of infection Covid-19 image data collection Sensitivity of chest ct for covid-19: comparison to rt-pcr Deep convolutional neural network vgg-16 model for differential diagnosing of papillary thyroid carcinomas in cytological images: a pilot study Profiling early humoral response to diagnose novel coronavirus disease (covid-19) Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images Use of chest ct in combination with negative rt-pcr assay for the 2019 novel coronavirus but high clinical suspicion Imagenet classification with deep convolutional neural networks Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based sarscov-2 tests by time since exposure Artificial intelligence distinguishes covid-19 from community acquired pneumonia on chest ct Coronavirus: covid-19 has killed more people than sars and mers combined, despite lower case fatality rate Estimating the asymptomatic proportion of coronavirus disease 2019 (covid-19) cases on board the diamond princess cruise ship Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Imaging profile of the covid-19 infection: radiologic findings and literature review Laboratory testing for 2019 novel coronavirus (2019-ncov) in suspected human cases, interim guidance Time course of lung changes on chest ct during recovery from 2019 novel coronavirus Initial public health response and interim clinical guidance for the 2019 novel coronavirus outbreak-united states Deep learning to improve breast cancer detection on screening mammography Very deep convolutional networks for large-scale image recognition Radiological society of north america expert consensus statement on reporting chest ct findings related to covid-19. endorsed by the society of thoracic radiology, the american college of radiology, and rsna Universal screening for sars-cov-2 in women admitted for delivery Detection of sars-cov-2 in different types of clinical specimens Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images Reasons for healthcare workers becoming infected with novel coronavirus disease 2019 (covid-19) in china Frequency and distribution of chest radiographic findings in covid-19 positive patients Characteristics of and important lessons from the coronavirus disease 2019 (covid-19) outbreak in china: summary of a report of 72 314 cases from the chinese center for disease control and prevention Deep learning system to screen coronavirus disease 2019 pneumonia Deep convolutional neural network based medical image classification for disease diagnosis Epidemiological and clinical features of covid-19 patients with and without pneumonia in beijing Relation between chest ct findings and clinical conditions of coronavirus disease (covid-19) pneumonia: a multicenter study Ct features of coronavirus disease 2019 (covid-19) pneumonia in 62 patients in wuhan, china Coronavirus disease 2019 (covid-19): a perspective from china