key: cord-0172901-7zgpwl8z authors: Wong, Alexander; Lin, Zhong Qiu; Wang, Linda; Chung, Audrey G.; Shen, Beiyi; Abbasi, Almas; Hoshmand-Kochi, Mahsa; Duong, Timothy Q. title: Towards computer-aided severity assessment: training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity date: 2020-05-26 journal: nan DOI: nan sha: e5dcc8ad1847493b7f86e0067dc46d069c2013b6 doc_id: 172901 cord_uid: 7zgpwl8z Background: A critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this proof-of-concept study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system. Materials and Methods: Data consisted of 130 CXRs from SARS-CoV-2 positive patient cases from the Cohen study. Geographic extent and opacity extent were scored by two board-certified expert chest radiologists (with 20+ years of experience) and a 2nd-year radiology resident. The deep neural networks used in this study are based on a COVID-Net network architecture. 100 versions of the network were independently learned (50 to perform geographic extent scoring and 50 to perform opacity extent scoring) using random subsets of CXRs from the Cohen study, and evaluated the networks using stratified Monte Carlo cross-validation experiments. Findings: The deep neural networks yielded R$^2$ of 0.673 $pm$ 0.004 and 0.636 $pm$ 0.002 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively, in stratified Monte Carlo cross-validation experiments. The best performing networks achieved R$^2$ of 0.865 and 0.746 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively. Interpretation: The results are promising and suggest that the use of deep neural networks on CXRs could be an effective tool for computer-aided assessment of SARS-CoV-2 lung disease severity, although additional studies are needed before adoption for routine clinical use. As the COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), continues around the world, radiology has seen growing importance in providing clinical insights for aiding the diagnosis, treatment, and management of the disease. Much of early literature have focused on imaging features presented in computed tomography (CT) scans of SARS-CoV-2 positive patients given its use in China during the earlier stages of the global pandemic [2] [3] [4] [5] [6] ; however, the low availability of CT scanners in many parts of the world due to its high costs, the high risk of SARS-CoV-2 transmission during patient transport to/from CT imaging suites, and long decontamination times between scans have limited the use of CT scans for SARS-CoV-2 diagnosis and treatment planning. A number of recent studies have illustrated the growing interest and usage of chest x-ray (CXR) imaging around the world [7] [8] [9] [10] [11] [12] [13] , with some studies foreseeing a greater reliance on portable CXR 8 and the high value of portable CXR for critically ill patients 14 . Compared to CT scanners, CXR imaging systems are widely available around the world due to their relatively low cost, and have comparatively faster decontamination times; in addition, the existence of portable CXR units means that imaging can occur within an isolation room and, thus, greatly reduce transmission risk 7, 8, 15 . Furthermore, CXR imaging is frequently performed for patients with respiratory complaints as part of standard procedure 16 , and have been shown to give valuable insights on disease progression 9 . In the context of detecting SARS-CoV-2, CXR imaging can also be useful in situations where patients with initial negative reverse transcription-polymerase chain reaction (RT-PCR) results, the current gold standard for viral testing, revisit the emergency department with worsening symptoms 15 . Several studies have investigated imaging features presented in CXR images of SARS-CoV-2 positive patients 12, 13, 17 , with commonly found features being bilateral abnormalities, ground-glass opacity, and interstitial abnormalities. Leveraging the presence of these imaging features in combination with the ability to observe their progression and extent over the duration of disease onset, an important role that CXR assessment has in aiding with disease treatment and management is in determining the severity of a patient's condition. As such, a number of recent studies have focused on severity scoring [9] [10] [11] , where the goal is to quantify SARS-CoV-2 lung disease severity. Disease severity scoring can help with determining the best course of treatment and management given a SARS-CoV-2 case (e.g., at-home quarantine, oxygen therapy, ventilation, etc.), allowing for the individualized treatment of each patient. We hypothesise that deep learning could potentially be a valuable tool for enabling computer-aided severity scoring of SARS-CoV-2 lung severity using CXRs of SARS-CoV-2 positive patients. Using CXR training data acquired from a global pool of SARS-CoV-2 positive patients, deep neural networks can learn to identify the important imaging features within a CXR image indicative of SARS-CoV-2, and output scores for quantifying the severity of a patient's disease progression. In this study, we assess the feasibility of computer-aided severity scoring of SARS-CoV-2 lung severity using deep learning by developing, training, and validating 100 versions of a deep neural network (50 for performing geographic extent scoring and 50 for performing opacity scoring) using stratified Monte Carlo cross-validation experiments on data consisting of 130 CXRs from positive patient cases. Two board-certified chest radiologists and a radiology resident assess the results achieved by the deep neural networks. The primary goal of this study is to assess the feasibility of computer-aided severity scoring of SARS-CoV-2 using deep learning. To this end, we develop and evaluate deep neural networks that can score CXRs of patients with SARS-CoV-2. We collected CXR data from the Cohen study 1 pertaining to SARS-CoV-2 positive cases, and ethics approval for the Cohen study data collection was received by the University of Montreal's Ethics Committee. In this study specifically, the 130 CXRs from the Cohen study used here represent a patient population of 85 patients between 12 and 87 years old around the world. The CXR data were acquired using a range of X-ray imaging equipment types and acquisition protocols that are representative of routine imaging practice (including supine and upright, posterioranterior and anteriorposterior). Radiological scoring was performed by two board-certified chest radiologists with 20+ years of experience (A.A. and M.H.) and a 2nd-year radiology resident (B.S.) to stage SARS-CoV-2 disease severity using a score system adapted from Wong et al. 9 . The two assessment metrics scored in the radiological scoring are geographic extent and opacity extent. More specifically, for geographic extent, the extent of lung involvement by ground glass opacity or consolidation of each lung (with the right and left lung scored separately) is scored as: 0 = no involvement; 1 = <25%; 2 = 25-50%; 3 = 50-75%; 4 = >75% involvement. The scores are then added together, and the total geographic extent score ranges from 0 to 8 (right + left lung). For opacity extent, the degree of opacity was similarly scored for the right and left lung separately as: 0 = no opacity; 1 = ground glass opacity; 2 = consolidation; 3 = white-out. The scores are similarly added together, and the total opacity extent score ranges from 0 to 6 (right + left lung). Fleiss' Kappa 18 for inter-rater agreement was 0.45 for opacity extent and 0.71 for geographic extent. The mean scores are then calculated across the radiologists. After radiological scoring, all CXR data used in this study underwent data processing to facilitate the training of deep neural networks. To discourage the deep neural networks from learning irrelevant visual cues when making severity scoring predictions, the boundaries of the CXR data were cropped to remove boundary artifacts and embedded metadata outside of the patient region of interest. Furthermore, all CXR data were resized to the same data dimensions to enable training of the deep neural networks in this study. Finally, the geographic extent scores (with a dynamic range of 0 to 8) and opacity extent scores (with a dynamic range of 0 to 6) were re-mapped to a unified dynamic range from 0 to 1. The development of the deep neural network architecture for computer-aided severity scoring is important as it dictates the sequence of mathematical operations that maps the input CXR data to the predicted severity scores (e.g., geographic extent score and opacity extent score). Specifically, the architecture of the deep neural network will affect the efficiency and effectiveness with which it is able to learn the underlying parameters and operations in this complex, hierarchical mapping. In this study, the architecture of the deep neural networks used to evaluate the feasibility of computer-aided severity scoring of SARS-CoV-2 lung disease severity is based on the COVID-Net deep neural network architecture 19 , which was found to achieve state-of-the-art performance in SARS-CoV-2 detection. The last layers of the COVID-Net architecture are replaced with a set of new layers to enable the prediction of severity scores corresponding to scores within the dynamic range of 0 to 1. These scores can be mapped back to the original dynamic ranges of geographic extent score and opacity extent score used during radiological scoring. Figure 1 presents an overview of this network architecture. The network architecture consists of projection-expansion-projection design patterns for high representational capacity while maintaining computational efficiency, selective long-range connectivity to improve learning efficiency, and high architectural diversity. To improve the performance of the deep neural networks, a technique known as transfer learning 20 is used to initialize the deep neural network parameters in this study using the parameters from deep neural networks trained on COVIDx, a dataset introduced in the Wang study 19 containing 13,975 CXR images across 13,870 patient cases consisting of healthy patients and patients with different forms of pneumonia (e.g., viral, bacterial, etc.). Statistical distribution details of COVIDx can be found in the Wang study 19 . We also leverage data augmentation 21 in this study to improve the performance of the deep neural networks, which consists of synthesizing new training samples by applying randomly generated translations, rotations, horizontal flips, zooms, intensity shifts, cutout, and Gaussian noise to the CXR data in the training set to increase data diversity and allow the deep neural networks to learn improved robustness. All of the model development was conducted using Python, OpenCV, and the Keras deep learning library with a TensorFlow backend. To evaluate the efficacy of the deep neural networks developed for computer-aided severity scoring of SARS-CoV-2 lung disease severity, stratified Monte Carlo cross-validation 22 was conducted. For geographic extent and opacity extent independently, 100 different deep neural networks (50 for geographic extent scoring and 50 for opacity extent scoring) were learned using 100 different random subsets of CXR data from the Cohen study (50 for geographic and 50 for opacity). Each of the 100 different deep neural networks was then tested on 100 different subsets of CXR data that was held out from the learning process. For each trial, a random subset consisting of 80% of the CXR data was used to train a deep neural network, with the remaining 20% of the CXR data held out and used for testing. To quantify the performance of the deep neural networks learned in this study, we compute the coefficient of determination, R 2 , between predicted scores outputted by the deep neural networks and scores by expert radiologists for both geographic extent and opacity extent in the test sub-set of CXR data for each trial. To present a quantitative summary for the cross-validation results, the R 2 was averaged over the trials for geographic extent and opacity extent independently, resulting in means and standard deviations across the cross-validation results. Table 1 summarizes the demographic variables and imaging protocol variables of the CXR data used in this study from the Cohen study. Note that the majority of the patient cases are from Europe and Asia, and reflects the earlier rise of the COVID-19 pandemic in those two continents. In addition, the majority of the cases are above the age of 50, with the mean age being 56.6, and is consistent with the greater effect of SARS-CoV-2 on the older population. Examining the R 2 between predicted scores from the deep neural networks and the radiologist scores for the 100 experiments (50 deep neural networks for geographic extent scoring and 50 deep neural networks for opacity extent scoring) led to number of observations. First, the deep neural networks yielded R 2 of 0.673 ± 0.004 and 0.636 ± 0.002 for geographic extent and opacity extent, respectively, in the stratified Monte Carlo cross-validation experiments (see Table 2 ). Second, the best performing networks achieved R 2 of 0.865 and 0.746 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively (see Figure 2 for scatter plots of predicted scores vs. radiologist scores for these networks). Third, the results show that the mean R 2 between predicted scores and radiologist scores for geographic extent is higher than that for opacity extent. In this study, we hypothesised that computer-aided deep learning algorithms can accurately predict lung disease severity on CXRs associated with SARS-CoV-2 infection against expert chest radiologist ground truths, and the experimental results of study support this hypothesis. Results from the stratified Monte Carlo cross-validation experiments showed that the learned deep neural networks could achieve mean R 2 between predicted scores and radiologist scores for geographic extent and opacity extent greater than 0.5 when evaluated for 100 different subsets of CXR data from the Cohen study (50 for geographic extent scoring and 50 for opacity extent scoring). Severity scoring for SARS-CoV-2 has gained recent attention due to the rise and continued prevalence of the COVID-19 pandemic across the globe, and the need to assess the severity of a patient who is SARS-CoV-2 positive is crucial for determining the best course of action regarding treatment and care. Several severity scoring mechanisms have recently been proposed for the severity assessment of SARS-CoV-2. Wong et al. 9 introduced a scoring scheme for severity quantification of SARS-CoV-2 by adapting and simplifying the Radiographic Assessment of Lung Edema (RALE) score introduced by Warren et al. 10 . Toussie et al. 11 introduced a scoring scheme where each lung was divided into three zones (for a total of six zones) and each zone was assigned a binary score based on opacity, with the final severity score being the aggregate of the scores from the different zones. Borghesi and Maroldi 23 introduced a scoring scheme where, similar to Toussie et al., each lung was divided into three zones, but each zone was instead assigned a score from 0 to 3 based on interstitial and alveolar infiltrates. Considering the large quantity of patients that are being screened due to the COVID-19 pandemic and the need for expert radiologists to assess the severity of each patient, the use of artificial intelligence for computer-aided severity scoring has strong potential to assist in clinical workflow efficiency given the situation. This study has a few limitations. First, the data were obtained from various sources and could exhibit bias. Second, disease severity is based on radiologist ground truths, and functional outcomes such as measures of lung function or mortality were not available. Third, the image quality of the CXRs can vary. Note that although some CXRs have lower resolution, they are observed to be of acceptable diagnostic quality. Fourth and finally, future studies should investigate longitudinal changes in disease severity. In conclusion, our results support the hypothesis that the use of deep neural networks on CXRs can be an effective tool for computer-aided assessment of lung disease severity, although additional studies are needed before adoption for routine 5/7 clinical use. This tool may be helpful in ER and ICU settings for triaging patients into general admission or ICU, as well as determining when to put SARS-CoV-2 patients on a mechanical ventilator and when to extubate. COVID-19 image data collection CT features of coronavirus disease 2019 (COVID-19) pneumonia in 62 patients in Wuhan, China CT imaging features of 2019 novel coronavirus (2019-nCoV) Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases Sensitivity of chest CT for covid-19: Comparison to RT-PCR Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study The role of chest imaging in patient management during the COVID-19 pandemic: A multinational consensus statement from the fleischner society Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review Frequency and distribution of chest radiographic findings in COVID-19 positive patients Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical characteristics of coronavirus disease 2019 in China Mobile X-rays are highly valuable for critically ill COVID patients The Canadian Society of Thoracic Radiology (CSTR) and Canadian Association of Radiologists (CAR) Consensus Statement Regarding Chest Imaging in Suspected and Confirmed COVID-19 A British Society of Thoracic Imaging statement: considerations in designing local imaging diagnostic algorithms for the COVID-19 pandemic Chest Imaging Appearance of COVID-19 Infection The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images A survey on transfer learning The effectiveness of data augmentation in image classification using deep learning Monte carlo cross validation COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression We would like to thank Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chairs program, DarwinAI Corp., Nvidia Corp., and Hewlett Packard Enterprise Co.