key: cord-0611880-e2l1lhc1 authors: Aboutalebi, Hossein; Pavlova, Maya; Shafiee, Mohammad Javad; Sabri, Ali; Alaref, Amer; Wong, Alexander title: COVID-Net CXR-S: Deep Convolutional Neural Network for Severity Assessment of COVID-19 Cases from Chest X-ray Images date: 2021-05-01 journal: nan DOI: nan sha: 1e64466a612260dbd1f5c39a3c5c053241bb1d60 doc_id: 611880 cord_uid: e2l1lhc1 The world is still struggling in controlling and containing the spread of the COVID-19 pandemic caused by the SARS-CoV-2 virus. The medical conditions associated with SARS-CoV-2 infections have resulted in a surge in the number of patients at clinics and hospitals, leading to a significantly increased strain on healthcare resources. As such, an important part of managing and handling patients with SARS-CoV-2 infections within the clinical workflow is severity assessment, which is often conducted with the use of chest x-ray (CXR) images. In this work, we introduce COVID-Net CXR-S, a convolutional neural network for predicting the airspace severity of a SARS-CoV-2 positive patient based on a CXR image of the patient's chest. More specifically, we leveraged transfer learning to transfer representational knowledge gained from over 16,000 CXR images from a multinational cohort of over 15,000 patient cases into a custom network architecture for severity assessment. Experimental results with a multi-national patient cohort curated by the Radiological Society of North America (RSNA) RICORD initiative showed that the proposed COVID-Net CXR-S has potential to be a powerful tool for computer-aided severity assessment of CXR images of COVID-19 positive patients. Furthermore, radiologist validation on select cases by two board-certified radiologists with over 10 and 19 years of experience, respectively, showed consistency between radiologist interpretation and critical factors leveraged by COVID-Net CXR-S for severity assessment. While not a production-ready solution, the ultimate goal for the open source release of COVID-Net CXR-S is to act as a catalyst for clinical scientists, machine learning researchers, as well as citizen scientists to develop innovative new clinical decision support solutions for helping clinicians around the world manage the continuing pandemic. The impact of the coronavirus disease 2019 (COVID-19) pandemic on the health and economy has been unprecedented. While more than one year has been passed since the declaration of the global pandemic by the World Health Organization 1 , countries are still struggling with controlling the spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus causing the pandemic. In this regard, the global healthcare system has suffered a devastating impact from this pandemic, with hospitals and clinics overwhelmed by the surge of patients and thus not all patients can have access to intensive care units for further treatment and care 2 . Furthermore, there have been shortages in personal protective equipment (PPE), ventilators, and other medical supplies due to the increasing demand on healthcare resources 2, 3 . This significant strain on healthcare resources caused by the COVID-19 pandemic from both a personnel and supplies perspective that necessitates improved clinical decision support tools for aiding clinicians and front-line healthcare workers with more efficient and effective clinical resource allocation. A critical part of clinical resource allocation during this pandemic has been severity assessment [4] [5] [6] , which is often conducted with the assistance of chest x-ray (CXR) images [4] [5] [6] [7] [8] [9] [10] [11] [12] . More specifically, radiology indicators within the patient's lungs such as ground-glass opacities can provide critical information for determining whether the condition of a SARS-CoV-2 positive patients warrants advanced care such as ICU admission and ventilator administration. For example, Wong et al. 4 proposed a scoring strategy for SARS-CoV-2 severity assessment based on the Radiographic Assessment of Lung Edema (RALE) score introduced by Warren et al. 5 . Toussie et al. 6 proposed a scoring strategy where each lung was divided into three zones (for a total of six zones), with each zone assigned a binary score based on Figure 1 . COVID-Net CXR-S network design. The COVID-Net backbone design exhibits high architectural diversity and sparse long-range connectivity, with macroarchitecture and microarchitecture designs tailored specifically for the detection of COVID-19 from chest X-ray images. The network design leverages light-weight design patterns in the form of projection-expansion-projection-expansion (PEPE) patterns to provide enhanced representational capabilities while maintaining low architectural and computational complexities. opacity and the final severity score computed as the aggregate of the scores from the different zones. Borghesi and Maroldi 13 proposed a scoring strategy where each lung was divided into three zones, as with that of Toussie et al. 6 , but each zone was assigned a score from 0 to 3 based on interstitial and alveolar infiltrates. Tsai et al. 14 , as part of the Radiological Society of North America (RSNA) RICORD initiative, proposed an airspace disease grading strategy where each lung is split into three zones (for a total of six zones), as with that of Toussie et al. 6 and Borghesi et al. 13 , but instead quantified the level of severity based on the number of lung zones with opacities. Given that severity assessment using CXR images can be quite challenging for front-line healthcare workers without expertise in radiology, providing computer-aided clinical support for this task can greatly benefit the hospitals in determining patients' conditions and respond more quickly to those who may require more advanced treatments or intensive care. While much of research literature have focused on computer-aided COVID-19 detection from CXR images 15, 16 and computed tomography (CT) scans [17] [18] [19] [20] [21] , the area of computer-aided severity assessment is significantly less well explored. Some of the notable works in this area are COVID-Net S, a tailored deep convolutional neural network proposed by Wong et al. 22 to predict the extent scores from CXR images, as well as the study by Cohen et al. 23 where the same extent scores are predicted. Motivated to extend upon this area in the direction of airspace disease grading, we introduce COVID-Net CXR-S, a convolutional neural network for predicting the airspace severity of a SARS-CoV-2 positive patient based on a CXR image of the patient's chest, as part of the COVID-Net open source initiative 15, 18, 19, 22, 24, 25 . The paper is organized as follows. Section 2 describes the methodology behind the design and construction of the proposed COVID-Net CXR-S, a demographic and protocol analysis of the multi-national patient cohort used, as well as radiologist validation. Section 3 presents and discusses the quantitative and qualitative results obtained from the experiments evaluating the efficacy of the proposed COVID-Net CXR-S. In this work, we introduce COVID-Net CXR-S, a convolutional neural network tailored for the prediction of airspace severity of a SARS-CoV-2 positive patient based on chest X-ray images. To train COVID-Net CXR-S, we transferred representational knowledge from CXR images of a large multi-national patient cohort, and then leveraged CXR data grouped based on airspace severity levels. We further validated the behaviour of COVID-Net CXR-S in a transparent and responsible manner via explainability-driven performance validation, as well as conducted radiologist validation on select cases by two expert board-certified radiologists. The details between network design, data preparation, explainability-driven performance validation, and radiologist validation are described below. Example chest X-ray images from the multi-national patient cohort curated by the RSNA RICORD initiative: (1) Level 1 airspace severity: opacities in 1-2 lung zones and (2) Level 2 airspace severity: opacities in 3 or more lung zones. The proposed COVID-Net CXR-S architecture is depicted in Figure 1 . More specifically, we leveraged a machine-driven design exploration strategy 26 to construct a backbone architecture tailored for a strong balance between accuracy and efficiency 27 . The constructed backbone architecture exhibits light-weight macro-architecture and micro-architecture designs comprised primarily of depthwise and pointwise convolutions with selective long-range connectivity. We then leveraged transfer learning to transfer representational knowledge gained from over 16,000 CXR images from a large multi-national cohort of over 15,000 patient cases 14, 28-32 via the constructed backbone architecture into a custom network architecture for severity assessment, where a combination of a dense layer and a severity prediction layer is used to predict between two levels of airspace severity. All of the model development was conducted using Python and the Keras deep learning library with a TensorFlow backend. The COVID-Net CXR-S network and associated scripts are available in an open source manner at http://www. covid-net.ml. In this study, we leveraged the multi-national patient cohort curated by the RSNA RICORD initiative 33 for the severity scoring to train COVID-Net CXR-S after transfer learning. The multi-national patient cohort was curated by the Radiological Society of North America (RSNA) as part of a global initiative to assemble an international task force of scientists and radiologists to create a multi-institutional, multinational, expert-annotated COVID-19 imaging data collection. More specifically, we leveraged the airspace disease grading provided by RSNA RICORD, where each lung is split into 3 separate zones (for a total of 6 zones) and opacity is studied for each zone. We group the patient cases into two airspace severity level groups: 1) Level 1: opacities in 1-2 lung zones, and 2) Level 2: opacities in 3 or more lung zones. Example CXR images for the different airspace severity level groups from the multi-national patient cohort are shown in Figure 2 . This severity level designation was chosen given clinical similarities between patient cases within each airspace severity level groups in terms of the treatment regimen, and thus facilitates clearer guidelines for course of action. Given this airspace severity level grouping scheme, the multi-national patient cohort used in this study consists of 909 CXR images from 258 patients, with 227 images from 129 patients in Level 1 and 682 images from 184 patients in Level 2. We used 150 randomly selected CXR images for the test set, ensuring no patient overlap between the test and train data. Table 1 summarizes the demographic variables and imaging protocol variables of the CXR data in the multi-national patient cohort used in this study. It can be observed that the patient cases in the cohort used in the study are distributed across the different age groups, with the mean age being 59.11 and the highest number of patients in the cohort are between the ages of 50-69. The data preparation scripts are available in an open source manner at http://www.covid-net.ml. Training on the COVID-Net CXR-S architecture after knowledge transfer is conducted on the training portion of the aforementioned patient cohort using the Adam optimizer with a learning rate of 0.0001 for 137 epochs with a batch size of 50. During the training, we account for the imbalance in the number of patient cases between the airspace severity levels by performing batch balancing, where at each epoch we randomly sampled an equal number of CXR images from each severity level for each batch of data. As a pre-processing step, the CXR images were cropped (top 8% of the image) prior to the training process to better mitigate the influence of commonly-found embedded textual information, and resampled to 480⇥480 for training purposes. In addition, we leveraged data augmentation during the training process with the following augmentation types: translation (± 10% in x and y directions), rotation (± 10 ), horizontal flip, zoom (± 15%), and intensity shift (± 10%). The scripts for the aforementioned process are available in an open source manner at http://www.covid-net.ml. To study the decision-making behaviour of the proposed COVID-Net CXR-S network in a transparent and responsible manner, we conducted explainability-driven performance validation using GSInquire 34 , which was shown to provide state-of-the-art explanations. More specifically, GSInquire leverages the concept of generative synthesis 26 from the machine-driven design exploration process via an inquisitor I within a generator-inquisitor pair {G , I } to generate quantitative interpretations of the decision-making process of COVID-Net CXR-S on a given CXR image. In this case, the generator G in the generator-inquisitor pair is the optimal generator that was leveraged to construct the backbone architecture during the network design process. The details pertaining to GSInquire for explaining the decision-making behaviour of deep neural networks on CXR images can be found in Wang et al. 15 . An interesting property of GSInquire that also makes it well-suited for explainability-driven performance validation is that it is capable of producing explanations identifying specific critical factors within an image that quantitatively impacts the decisions made by a deep neural network, thus making it more readily interpretable and more quantitative for validation purposes than the types of relative importance variations visualized by other methods. This explainability-driven performance validation process enables the identification of anomalies in decision-making behaviour or potential erroneous indicators leading to invalidate decisions or biases, as well as the validation of whether clinically relevant indicators are leveraged. The results obtained for COVID-Net CXR-S during the explainability-driven performance validation process for selected patient cases are further reviewed and reported on by two board-certified radiologists (A.S. and A.A.). The first radiologist (A.S.) has over 10 years of experience, and the second radiologist (A.A.) has over 19 years of radiology experience. To evaluate the efficacy of the proposed COVID-Net CXR-S for the purpose of severity assessment of COVID-19 cases from CXR images, we conducted both quantitative performance evaluation as well as qualitative explainability-driven performance validation. The quantitative and qualitative results are presented and discussed below. The overall quantitative performance assessment results of the proposed COVID-Net CXR-S network on the multi-national patient cohort from the RSNA RICORD initiative can be seen in Table 3 . For comparison purposes, quantitative performance assessment was also conducted on the ResNet-50 36 network architecture as well as on CheXNet 35 , a state-of-the-art deep neural network architecture that has been shown to outperform other network architectures for CXR image analysis tasks. The architectural and computational complexity of COVID-Net CXR-S in comparison to CheXNet 35 and ResNet-50 36 is also shown in Table 2 . A number of observations can be made from the quantitative results. More specifically, it can be observed that the COVID-Net CXR-S network can achieve a high level of accuracy at 92.66%, which is 9.33% and 5.33% higher than that achieved by CheXNet and ResNet-50 network architectures, respectively. The COVID-Net CXR-S network achieved higher accuracy at a significantly lower computational complexity of ⇠57% and ⇠69% (at ⇠11.1G FLOPs) in comparison to the CheXNet and ResNet-50 networks, respectively, as well as a ⇠63% lower architectural complexity (at ⇠8.8M parameters) 5/9 It can also be observed that the COVID-Net CXR-S network achieved a sensitivity of 92.3% on the Level 2 cases and sensitivity of 92.85% on the Level 1 cases. Furthermore, it can also be seen that the proposed COVID-Net CXR-S network can achieve high positive predictive value (PPV) of 95.78% and 87.27% for Level 2 cases and Level 1 cases, respectively. This high PPV for Level 2 cases ensures that fewer false positives for more severe cases are reported by COVID-Net CXR-S, which is important since patients with severe conditions require advanced treatment and management and thus high false positive rate can lead to significant burden on the healthcare system where resources are limited. Finally, Table 4 provides a more detailed picture of the performance of COVID-Net CXR-S via the confusion matrix. Results from the conducted explainability-driven performance validation 34 (see Figure 3) show that clinically relevant visual indicators in the lungs were leveraged in the decision-making process. This validation is very important for auditing COVID-Net CXR-S in a transparent and responsible manner to ensure that not only is it leveraging the right indicators for driving the severity assessment process, but also that it is not primarily leveraging erroneous visual indicators (e.g., embedded markers, motion artifacts, imaging artifacts, etc.) to make 'right decisions for the wrong reasons'. Furthermore, this validation have the potential to help in the discovery of additional visual indicators to assist a clinician in their severity assessment as well as improve the trust that clinicians may have during operational use. The expert radiologist findings and observations for select patient cases with respect to the identified critical factors during explainability-driven performance validation as shown in Figure 3 are as follows. In all three cases, COVID-Net CXR-S correctly detected them to be patients with Level 2 airspace severity, which were clinically confirmed. Case 1. According to radiologist findings, it was observed by both radiologists that there is patchy airspace opacity in the lower left lung lobe, which is consistent with one of the identified critical factors leveraged by COVID-Net CXR-S. Case 2. According to radiologist findings, it was observed by both radiologists that there are patchy airspace opacities in the left and right midlung regions that coincide with the identified critical factors leveraged by COVID-Net CXR-S in that region. It was further observed by one of the radiologists that there are additional lower lobe opacities in both lungs. Case 3. According to radiologist findings, it was observed by one of the radiologists that there are hilar opacities on the right lung that coincide with the identified critical factors leveraged by COVID-Net CXR-S. It was also observed that there are opacities in the left lower lobe, with the superior aspect of the opacities being leveraged by COVID-Net CXR-S. The second radiologist observed patchy airspace opacities in the right lung that overlap with the critical factors leveraged by COVID-Net CXR-S. As such, based on the radiologist findings and observations on the three patient cases, it was shown that although some opacities were not identified by GSInquire as critical factors driving the decision-making behaviour of COVID-Net CXR-2, 6/9 several other abnormalities identified as critical factors were consistent with radiologist interpretations. Therefore, based on the critical factors identified by GSInquire as critical factors driving the decision-making behaviour of COVID-Net CXR-S, the network was able to differentiate between the airspace severity levels but not necessarily leverage all regions of concern in making its severity assessment decisions. In this study, we introduced COVID-Net CXR-S, a convolutional neural network for the prediction of airspace severity of a SARS-CoV-2 positive patient based on a CXR image of the patient's chest. Leveraging transfer learning, we transferred representational knowledge gained from over 16,000 CXR images from a multinational cohort of over 15,000 patient cases into a custom network architecture for severity assessment. The promising quantitative and qualitative results obtained from the conducted experiments demonstrate that the proposed COVID-Net CXR-S, while not a production-ready solution, can be a potentially become a powerful tool for aiding clinicians and front-line healthcare workers via computer-aided severity assessment of CXR images of COVID-19 positive patients. The ultimate goal for the open source release of COVID-Net CXR-S (http://www.covid-net.ml) is to act as a catalyst for clinical scientists, machine learning researchers, as well as citizen scientists to develop innovative new clinical decision support solutions for helping clinicians around the world manage the continuing pandemic. Economic consequences of the covid-19 outbreak: the need for epidemic preparedness Admission decisions to intensive care units in the context of the major covid-19 outbreak: local guidance from the covid-19 paris-region area Managing intensive care admissions when there are not enough beds during the covid-19 pandemic: a systematic review Frequency and distribution of chest radiographic findings in COVID-19 positive patients Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS The role of chest imaging in patient management during the COVID-19 pandemic: A multinational consensus statement from the fleischner society Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical characteristics of coronavirus disease 2019 in china Assessing risk factors for SARS-CoV-2 infection in patients presenting with symptoms in Shanghai, China: a multicentre observational cohort study Mobile X-rays are highly valuable for critically ill COVID patients COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression The RSNA international covid-19 open annotated radiology database (RICORD) Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images Deep learning based detection and analysis of covid-19 on chest x-ray images Covid-19 image classification using deep features and fractional-order marine predators algorithm Covid-net ct-2: Enhanced deep neural networks for detection of covid-19 from chest ct images through bigger, more diverse learning COVIDNet-CT: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest CT images Sample-efficient deep learning for covid-19 diagnosis based on ct scans Clinically applicable ai system for accurate diagnosis, quantitative measurements, and prognosis of covid-19 pneumonia using computed tomography Covid-net s: Towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest x-rays for sars-cov-2 lung disease severity Predicting covid-19 pneumonia severity on chest x-ray with deep learning Covidx-us -an open-access benchmark dataset of ultrasound imaging data for ai-driven covid-19 analytics Tb-net: A tailored, self-attention deep convolutional neural network design for detection of tuberculosis cases from chest x-ray images Learning generative machines to generate efficient neural networks via generative synthesis Towards universal metrics for large-scale performance analysis of deep neural networks for practical on-device edge usage RSNA pneumonia detection challenge COVID-19 radiography database Figure 1 COVID-19 chest x-ray data initiative Actualmed COVID-19 chest x-ray data initiative COVID-19 image data collection The rsna international covid-19 open annotated radiology database (ricord) Do explanations reflect decisions? a machine-centric strategy to quantify the performance of explainability algorithms Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning Deep residual learning for image recognition We thank the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs program, the Canadian Institute for Advanced Research (CIFAR), DarwinAI Corp., and the organizations and initiatives from around the world collecting valuable COVID-19 data to advance science and knowledge. The study has received ethics clearance from the University of Waterloo (42235). All experimental protocols were approved by University of Waterloo. All methods were carried out in accordance with University of Waterloo ethics guidelines and regulations. Informed consent was obtained from all participants.