key: cord-0961314-u2uswgy3 authors: Wang, Shuai; Kang, Bo; Ma, Jinlu; Zeng, Xianjun; Xiao, Mingming; Guo, Jia; Cai, Mengjiao; Yang, Jingyi; Li, Yaodong; Meng, Xiangfei; Xu, Bo title: A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19) date: 2020-02-17 journal: nan DOI: 10.1101/2020.02.14.20023028 sha: 2988cf2f259a4093dbad920054e97418192d0d42 doc_id: 961314 cord_uid: u2uswgy3 Background: The outbreak of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2) has caused more than 2.5 million cases of Corona Virus Disease (COVID-19) in the world so far, with that number continuing to grow. To control the spread of the disease, screening large numbers of suspected cases for appropriate quarantine and treatment is a priority. Pathogenic laboratory testing is the gold standard but is time-consuming with significant false negative results. Therefore, alternative diagnostic methods are urgently needed to combat the disease. Based on COVID-19 radiographical changes in CT images, we hypothesized that Artificial Intelligence's deep learning methods might be able to extract COVID-19's specific graphical features and provide a clinical diagnosis ahead of the pathogenic test, thus saving critical time for disease control. Methods and Findings: We collected 1,065 CT images of pathogen-confirmed COVID-19 cases (325 images) along with those previously diagnosed with typical viral pneumonia (740 images). We modified the Inception transfer-learning model to establish the algorithm, followed by internal and external validation. The internal validation achieved a total accuracy of 89.5% with specificity of 0.88 and sensitivity of 0.87. The external testing dataset showed a total accuracy of 79.3% with specificity of 0.83 and sensitivity of 0.67. In addition, in 54 COVID-19 images that first two nucleic acid test results were negative, 46 were predicted as COVID-19 positive by the algorithm, with the accuracy of 85.2%. Conclusion: These results demonstrate the proof-of-principle for using artificial intelligence to extract radiological features for timely and accurate COVID-19 diagnosis. The outbreak of atypical and person-to-person transmissible pneumonia caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2, also known as 2019-nCov) has caused a global alarm. There have been nearly 64,000 confirmed cases of the Corona Virus Disease in China, as of February 14, 2020. In addition to these China also has, more than 14,000 other suspected cases.. According to the WHO, 16-21% of people with the virus in China have become severely ill with a 2-3% mortality rate. With the most recent estimated viral reproduction number (R0), the average number of other people that an infected individual will transmit the virus to in a completely non-immune population, stands at about 3.77 [1] , indicating that a rapid spread of the disease is imminent. Therefore, it is crucial to identify infected individuals as early as possible for quarantine and treatment procedures. The diagnosis of COVID-19 relies on the following criteria: clinical symptoms, epidemiological history and positive CT images as well as positive pathogenic testing. The clinical characteristics of COVID-19 include respiratory symptoms, fever, cough, dyspna, and viral pneumonia [2] [3] [4] [5] , however, these symptoms are nonspecific, as there are isolated cases where, for example, in an asymptomatic infected family a chest CT scan revealed pneumonia and the pathogenic test for the virus came back positive. Once someone is identified as a PUI (person under investigation), lower All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint respiratory specimens, such as bronchoalveolar lavage, tracheal aspirate or sputum, will be collected for pathogenic testing. This laboratory technology is based on real-time RT-PCR and sequencing of nucleic acid from the virus [6, 7] . Since the beginning of the outbreak, the efficiency of nucleic acid testing has been dependent on several rate-limiting factors, including availability and quantity of the testing kits in the affected area. More importantly, the quality, stability and reproducibility of the detection kits are questionable. The impact of methodology, disease development stages, specimen collection methods, nucleic acid extraction methods and the amplification system are all determinant factors for the accuracy of test results. Conservative estimates of the detection rate of nucleic acid are low (between 30-50%), and tests need to be repeated several times in many cases before they can be confirmed. Radiological imaging is also a major diagnostic tool for COVID-19. The majority of COVID-19 cases have similar features on CT images including ground-glass opacities in the early stage and pulmonary consolidation in the late stage. There is also sometimes a rounded morphology and a peripheral lung distribution [5, 8] . Although typical CT images may help early screening of suspected cases, the images of various viral pneumonias are similar and they overlap with other infectious and inflammatory lung diseases. Therefore, it is difficult for radiologists to distinguish COVID-19 from other viral pneumonias. Artificial Intelligence involving medical imaging deep-learning systems has been developed in image feature extraction, including shape and spatial All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint relation features. Specifically, Convolutional Neural Network (CNN) has been proven in feature extraction and learning. CNN was used to enhance low-light images from high-speed video endoscopy with the limited training data being just 55 videos [9] . Also, CNN has been applied to identify the nature of pulmonary nodules via CT images, the diagnosis of pediatric pneumonia via chest X-ray images, automated precising and labeling of polyps during colonoscopic videos, cystoscopic image recognition extraction from videos [10] [11] [12] [13] . There are a number of features for identifying viral pathogens on the basis of imaging patterns, which are associated with their specific pathogenesis [14] . The hallmarks of COVID-19 are bilateral distribution of patchy shadows and ground glass opacity [2] . Based on this, we believed that CNN might help us identify unique features that might be difficult for visual recognition. To test this notion, we retrospectively enrolled 453 CT images of pathogen-confirmed COVID-19 cases along with previously diagnosed typical viral pneumonia. We trained 217 images using the inception migration-learning model in order to establish the algorithm. We achieved a total accuracy of 83% with specificity of 80.5% and sensitivity of 84% for validation. The external testing showed a total accuracy of 73% with specificity of 67% and sensitivity of 74%. These observations demonstrate the proof-of-principle using the deep learning method to extract radiological graphical features for COVID-19 diagnosis. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint Retrospective collection of datasets. We retrospectively collected CT images from 99 patients, in which the cohort includes 55 cases of typical viral pneumonia and the other 44 cases from three different hospitals with confirmed nucleic acid testing of SARS-COV-2. The hospitals providing the images were Xi'an Jiaotong University First Affiliated Hospital, Nanchang University First Hospital and Xi'An No.8 Hospital of Xi'An Medical College. All CT images were de-identified before sending for analysis. This study is in compliance with the Institutional Review Board of each participating institutes. Informed consent was exempted by the IRB because of the retrospective nature of this study. Our systematic pipeline for the prediction architecture is depicted in Figure 1 . The architecture consists of three main processes. 1) Randomly selection of ROIs; 2) Training of the CNN model to extract features; 3) Classification model training of fully connected network and prediction of multiple classifiers. Deep learning algorithm framework. For each patient computed tomography scan, we randomly selected ROIs and used the Inception network to extract features and then make a prediction. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. We modified the typical Inception network, and fine-tuned the modified Inception (M-Inception) model with pre-trained weights. During the training phase, the original Inception part was not trained, and we only trained the All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint After generating the features, the final step is to classify the pneumonia based on those features. Ensembling of classifiers was used to improve the classification accuracy. In this study, we combined Decision tree and Adaboost to produce the performance. We compared the classification performance using Accuracy, Sensitivity, Additionally, performance was evaluated with F-measure (F1) to compare the similarity and diversity of performance. 1 Pre Sen F =2 Pre+Sen g g All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint In order to develop a deep learning algorithm for the identification of viral pneumonia images, we retrospectively enrolled 99 patients, in which the cohort includes 55 cases of typical viral pneumonia that were diagnosed previously before the COVID-19 outbreak. These patients are termed perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint A. The training loss curve of the models on internal and external. The loss curve tends to be stable after descending, indicating that the training process converges. The DL algorithm yielded an AUC of 0.90 (95% CI, 0.86 to 0.94 ) on the internal validation and 0.78 (95% CI, 0.71 to 0.84) on the external validation. The AUC was shown in Figure 3 . Using the maximized Youdenindex threshold probability, the sensitivity was 80.5% and 67.1%, specificity 84.2% and 76.4%, theaccuracy was 82.9% and 73.1%, the negative prediction value was 0.88 and 0.81, the Youden index was 0.69 and 0.44 and F1 score was 0.77 and All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint algorithm was executed at a rate of xxxx seconds per case on the graphics processing unit. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint Timely diagnosis and triaging of PUIs are crucial for the control of emerging infectious diseases such as the current COVID-19. Due to the limitation of nucleic acid -based laboratory testing, there is an urgent need to look for fast alternative methods that can be used by front-line health care personal for quickly and accurately diagnosing the disease. In the present study, we have developed an AI program by analyzing representative CT images using a deep learning method. This is a retrospective, multicohort, diagnostic study. We constructed an Inception migration neuro network that achieved 82.9% accuracy. Moreover, the high performance of the deep learning model we developed in this study was tested using external samples with 73% accuracy. These findings have demonstrated the proof of principle that deep learning can extract CT image features of COVID-19 for diagnostic purposes. Further developing this system can significantly shorten the diagnosis time for disease control. In addition, it can reduce the diagnostic workload of physicians in the field. Our study represents the first study to apply artificial intelligence technologies to CT images for effectively screening for COVID-19. The gold standard for COVID-19 diagnosis has been nucleic acid based detection for the existence of specific sequences of the SARS-COV-2 gene. While we still value the importance of nucleic acid detection in the diagnosis of the viral infection, we must also note that the significantly high number of All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint false negatives due to several factors such as methodological disadvantages, disease stages, and methods for specimen collection might delay diagnosis and disease control. Recent data have suggested that the accuracy of nucleic acid testing is only about 30-50%. Using CT imaging feature extraction, we are able to achieve above 83% accuracy, significantly outplaying nucleic acid testing. In addition, this method is non-invasive with minimal cost. Although we are satisfied with the initial results, we believe that with more CT images included in the training, we will achieve higher accuracy. Therefore, further optimizing and testing this system is warranted. To achieve this, we have generated a webpage that licensed healthcare personnel can access to upload CT images for testing. The webpage information is as following: https://ai.nscc-tj.cn/thai/deploy/public/pneumonia_ct. There are some limitations to our study. CT images present a difficult classification task due to the relatively large number of variable objects, specifically the imaged areas outside the lungs that are irrelevant to the diagnosis of pneumonia [11] . In our study, only one radiologist was involved in outlining the ROI area. In addition, the training data set is relatively small. The performance of this system is expected to increase when the training volume is increased. It should also be noted that, the features of the CT images we analyzed are from patients with severe lung lesions at later stages of disease development. A study to associate this with the progress and all pathologic stages of COVID-19 is necessary to optimize the diagnostic system. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint In future, we intend to link hierarchical features of CT images to features of other factors such as genetic, epidemiological and clinical information for multi-omics and multi-modeling analysis for enhanced disease diagnosis. The artificial intelligence system developed in our study could significantly contribute to COVID-19 disease control by reducing the number of PUIs for timely quarantine and treatment. artificial neural networks for automated analysis of cystoscopic images: a review of the current status and future prospects. World J Urol 2020. [13] Wang P, Xiao X, Glissen Brown JR, Berzin TM, Tu M, Xiong F, et al. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 17, 2020. . https://doi.org/10.1101/2020.02.14.20023028 doi: medRxiv preprint Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Clinical features of All rights reserved. No reuse allowed without permission preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Euro surveillance : bulletin Europeen sur les maladies transmissibles Molecular Diagnosis of a Novel Coronavirus (2019-nCoV) Causing an Outbreak of Pneumonia CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV) Deep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning Application of All rights reserved. No reuse allowed without permission. perpetuity preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted