key: cord-0042496-tovaxufu
authors: Butt, Charmaine; Gill, Jagpal; Chun, David; Babu, Benson A.
title: Deep learning system to screen coronavirus disease 2019 pneumonia
date: 2020-04-22
journal: Appl Intell
DOI: 10.1007/s10489-020-01714-3
sha: 9439a5ed7c53f89eb7dd70fd9d6cf26d80c24347
doc_id: 42496
cord_uid: tovaxufu

Radiographic patterns on CT chest scans have shown higher sensitivity and specificity compared to RT-PCR detection of COVID-19 which, according to the WHO has a relatively low positive detection rate in the early stages. We technically review a study that compared multiple convolutional neural network (CNN) models to classify CT samples with COVID-19, Influenza viral pneumonia, or no-infection. We compare this mentioned study with one that is developed on existing 2D and 3D deep-learning models, combining them with the latest clinical understanding, and achieved an AUC of 0.996 (95%CI: 0.989–1.00) for Coronavirus vs Non-coronavirus cases per thoracic CT studies. They calculated a sensitivity of 98.2% and a specificity of 92.2%.

In December 2019 the SARS-CoV-2 zoonotic virus, originating from the Phinolophus bat, was transmitted to humans, for the first time recorded. The Huanan Seafood Wholesale Market in Wuhan City, Hubei Province, China, was the epicenter of the coronavirus disease (COVID-19) outbreak caused by SARS-CoV-2, which rapidly spread worldwide and was declared a pandemic by WHO on 11th March 2020 [1] . COVID-19 has led to complications such as acute respiratory disorder, heart problems, and secondary infections in a relatively high proportion of patients and thus significant mortality. Early detection and commencement of treatment in severe cases is key to reducing mortality [1] .

Radiographic patterns on CT chest scans have shown higher sensitivity and specificity compared to RT-PCR detection of COVID-19 which, according to the WHO has a relatively low positive detection rate in the early stages. One report, of 1041 cases from China, found that sensitivity of COVID-19 detection on chest CT was 97% (95%CI, 95-98%, 580/601 patients) based on positive RT-PCR results [2] . CT chest images for COVID-19 positive cases have a distinct radiographic pattern: ground-glass opacities, multifocal patchy consolidation, and/or interstitial changes with a predominantly peripheral distribution [2, 3] . A study of 21 patients with the 2019 novel coronavirus, found that 15 (71%) had involvement of more than two lobes at chest CT, 12 (57%) had ground-glass opacities, seven (33%) had opacities with rounded morphology, seven (33%) had a peripheral distribution of disease, six (29%) had consolidation with ground-glass opacities, and four (19%) had crazy-paving pattern. No lung cavitation, discrete pulmonary nodules, pleural effusions, and lymphadenopathy were seen. Fourteen percent of patients (three of 21) presented with a normal CT scan [3] . Information and CT scans are available from this current outbreak, which shows the evolvement of lung opacities over time, from symptom onset to 41 days from onset ( Fig. 1) , in a COVID-19 positive patient [4] .

The characteristic manifestations of COVID-19 in CT scans can make it distinguishable from other viral types of pneumonia: though not reliable solely based on human observation due to the overlap of certain manifestations with other viral pneumonia [4] (see Table 1 ). However, with the use of AI and deep learning technologies it is possible to accurately differentially diagnose between the viral types of pneumonia, making it an effective screening tool. We review here, one such study, which compared multiple convolutional neural network (CNN) models to classify CT samples with COVID-19, Influenza viral pneumonia, or no-infection [7] .

Given a large number of COVID-19 cases in China, it has been an area of experimental development for emerging deep learning technology companies. This pandemic may provide large CT data sets to train on and obtain accurate COVID recognition. Worldwide, a shared national CT chest database is necessary to train the AI model with accurate generalizable Figure 2 below, shows the process of COVID-19 diagnostic report generation in this study. To sum up, first, the CT images were preprocessed to extract effective pulmonary regions. Second, a 3D CNN model was used to segment multiple candidate image cubes. The center image, together with the two neighbors of each cube, was collected for further steps. Third, an image classification model was used to categorize all image patches into one of three types: COVID-19, Influenza-A-viralpneumonia, and irrelevant-to-infection. Image patches from the same cube voted for the type and confidence score of the candidate as a whole. Finally, the overall analysis report for one CT sample was calculated using the Noisy-or Bayesian function.

The image classification model was designed to distinguish the appearance and structure of different infections. Moreover, relative distance-from-edge as the extra weight was used for the model to learn the relative location information of the patch on the pulmonary image. The focus of infections located close to the pleura was more likely to be recognized as COVID-19.

The relative distance-from-edge of each patch was calculated as follows:

1) Measure the minimum distance from the mask to the center of the patch. 2) Obtain the diagonal of the minimum circumscribed rectangle of the pulmonary image.

3) The relative distance-from-edge is the distance obtained in step 1 divided by the diagonal from step 2.

Two CNN three-dimensional classification models were evaluated in this study. One was a relative traditional ResNet23based network and the other model was designed based on the first network structure by concatenating the location-attention mechanism in the full-connection layer, to improve the overall accuracy rate. The classical ResNet-18 network structure was used for image feature extraction (see Fig. 3 ). Pooling operations were also used for the dimensional reduction of data to prevent For the location-attention network, the value of relative distance-from-edge was first normalized to the same order of magnitude and then concatenated to this full-connection network structure. Next, three full-connection layers were followed to output the final classification result together with the confidence score.

One of the most classical loss functions, cross-entropy, was used in this study. When the epoch number of training iterations increased to more than 1000, the loss value did not decrease or increase obviously, suggesting that the models converged well to a relative optimal state without distinct overfitting. The training curves of the loss value and the accuracy for two classification models are shown in Fig. 4 . The network with the location-attention mechanism achieved better performance on the training dataset compared with the original ResNet.

The accuracy of a method determines how correct the values are predicted. The precision determines the reproducibility of the measurement or how many of the predictions are correct. Recall shows how many of the correct results are discovered. F1-score uses a combination of precision and recall to calculate a balanced average result. Classification for a single image patch: a total of 1710 image patches were acquired from 90 CT samples, including 357 COVID-19, 390 Influenza-A-viral-pneumonia, and 963 irrelevant-toinfection (ground truth). To determine which was the optimal approach, the design of each methodology was assessed using a confusion matrix. Two network structures were evaluated: with and without the location-attention mechanism Table 2 .

To summarize, using ResNet to extract features from CT images together with a location-attention mechanism model, compared to without the location-attention model, could more accurately distinguish COVID-19 cases from others, with an overall accuracy rate of 86.7%. Remarkably, it took on average less than 30s for a CT set (with 70 layers) from datapreprocessing to the output of the report.

The RT-PCR test of 2019-nCoV RNA can make a definite diagnosis of COVID-19 from Influenza-A viral pneumonia patients. However, nucleic acid testing has some drawbacks such as time lag, relatively low detection rate, and shortness of supply. In the early stages of COVID-19, some patients may already have positive pulmonary imaging findings but no sputum and hence get negative RT-PCR test results from nasopharyngeal swabs. These patients are not diagnosed as suspected or confirmed cases, not isolated or treated, and become potential sources of infection.

The restrictions in movement and closing of the nonessential manufacturing industry in China has had a huge impact on its economy as well as supply-chain repercussions all over the world. Now, with the worldwide spread of COVID-19 at an alarming rate, other countries are also declaring an emergency: taking measures such as the closing of schools, restricting movement and urging remote-working (except for essential workers e.g. nurses, doctors, care workers). It has become increasingly clear even for Fortune 1000 companies, that it is not going to be business-as-usual for the next couple of months. Economists have warned that this pandemic could cost $1.1 trillion in lost income globally (https://www. dentons.com/en/insights/alerts/2020/march/11/covid-19-andits-impact-on-the-global-economy).

Whilst the WHO strongly recommends the testing of as many suspects as possible, this has not been followed by many countries seemingly due to a lack of resources/personnel as well as a shortage of RT-PCR tests. Here, the use of CT imaging with AI could aid with the detection of COVID-19 by providing a fast alternative and hence help limit the spread. 

The deep learning model reviewed in this paper does allow very fast and reliable detection of COVID-19 from chest CT datasets (as shown in 3.2) and has a higher detection rate than RT-PCR testing, however, a few other studies experimenting with AI models recently, have obtained more promising rates of detection. For example, one study developed on existing 2D and 3D deep-learning models, combining them with the latest clinical understanding, and achieved an AUC of 0.996 (95%CI: 0.989-1.00) for Coronavirus vs Non-coronavirus cases per thoracic CT studies. They calculated a sensitivity of 98.2% and a specificity of 92.2% [8] . This system comprised of several components and analyzed each CT case at two distinct levels: Subsystem A: 3D analysis of the case volume for nodules and focal opacities (using a commercial software called 3D lung volume from RADLogics Inc.) and Subsystem B: newly developed 2D analysis of each slice of the case to detect and localize larger-sized diffuse opacities including ground glass infiltrates. Furthermore, they proposed a Corona score: a volumetric measurement of the lung opacities, and an effective way to monitor patient progress over time. This study performed much better than the one reviewed in this paper, in terms of rate of detection, sensitivity, and specificity, mainly because the system used was specifically developed for the detection of COVID-19 manifestations. Such fast and accurate analyses of CT datasets, as can only be provided by optimized AI models, is crucial for front-line clinicians screening patients during this pandemic. Infervision launched a coronavirus artificial intelligence solution in China this past month that is tailored for front-line use to help clinicians detect and monitor the disease more effectively.

The GIS technology has become an important tool for stopping the spread of the coronavirus with John Hopkins University leading the way in this area (https:// towardsdatascience.com/how-to-fight-the-coronavirus-withai-and-data-science-b3b701f8a08a, https://medium.com/@ ngocson2vn/build-an-artificial-neural-network-from-scratchto-predict-coronavirus-infection-8948c64cbc32). Data mining is critical for GIS technology to work because of using the information to detect areas where people talk about the disease. Social media sites are good information sources for GIS as the technology maps the area of interest where people are talking about the coronavirus. Accordingly, prevention measures can be implemented since these heatmaps can better track both the location and the spread of disease. Ten years ago, it was practically impossible to track diseases; today, with AI, machine learning and GIS, data mining and extracting insights are both easier and more powerful at location viruses.

The bottom line: prevention response time is quicker today with the use of AI.

Coronavirus disease 2019 (COVID-19) Situation Report -51

Correlation of chest CT and RT-PCR testing in Coronavirus disease 2019 (COVID-19

CT imaging features of 2019 Novel Coronavirus (2019-NCoV). Radiology, p. 200230. DOI.org (Crossref)

Evolution of CT manifestations in a patient re

Radiographic and CT features of viral pneumonia

Incubation periods of acute respiratory viral infections: a systematic review

Deep learning system to screen coronavirus disease

Rapid AI development cycle for the Coronavirus (COVID-19) Pandemic: Initial results for automated detection & patient monitoring using deep learning CT image analysis

Conflict of interest None.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.