key: cord-0954544-h82upvg9
authors: Nivet, Hubert; Crombé, Amandine; Schuster, Paul; Ayoub, Thomas; Pourriol, Laurent; Favard, Nicolas; Chazot, Alban; Alonzo-Lacroix, Florian; Youssof, Emile; Ben Cheikh, Alexandre; Balique, Julien; Porta, Basile; Petitpierre, François; Bouquet, Grégoire; Mastier, Charles; Bratan, Flavie; Bergerot, Jean-François; Thomson, Vivien; Banaste, Nathan; Gorincour, Guillaume
title: The accuracy of teleradiologists in diagnosing COVID-19 based on a French multicentric emergency cohort
date: 2020-10-29
journal: Eur Radiol
DOI: 10.1007/s00330-020-07345-z
sha: c77b78ff80e92367d372f206979f80aa47ede41b
doc_id: 954544
cord_uid: h82upvg9

OBJECTIVES: To evaluate the accuracy of diagnoses of COVID-19 based on chest CT as well as inter-observer agreement between teleradiologists during on-call duty and senior radiologists in suspected COVID-19 patients. MATERIALS AND METHODS: From March 13, 2020, to April 14, 2020, consecutive suspected COVID-19 adult patients who underwent both an RT-PCR test and chest CT from 15 hospitals were included in this prospective study. Chest CTs were immediately interpreted by the on-call teleradiologist and were systematically blind reviewed by a senior radiologist. Readings were categorised using a five-point scale: (1) normal; (2) non-infectious findings; (3) infectious findings but not consistent with COVID-19 infection; (4) consistent with COVID-19 infection; and (5) typical appearance of COVID-19 infection. The diagnostic accuracy of chest CT and inter-observer agreement using the kappa coefficient were evaluated over the study period. RESULTS: In total, 513 patients were enrolled, of whom 244/513 (47.6%) tested positive for RT-PCR. First readings were scored 4 or 5 in 225/244 (92%) RT-PCR+ patients, and between 1 and 3 in 201/269 (74.7%) RT-PCR− patients. The data were highly consistent (weighted kappa = 0.87) and correlated with RT-PCR (p < 0.001, AUC(1st-reading) = 0.89, AUC(2nd-reading) = 0.93). The negative predictive value for scores of 4 or 5 was 0.91–0.92, and the PPV for a score of 5 was 0.89–0.96 at the first and second readings, respectively. Diagnostic accuracy was consistent over the study period, irrespective of a variable prevalence rate. CONCLUSION: Chest CT demonstrated high diagnostic accuracy with strong inter-observer agreement between on-call teleradiologists with varying degrees of experience and senior radiologists over the study period. KEY POINTS: • The accuracy of readings by on-call teleradiologists, relative to second readings by senior radiologists, demonstrated a sensitivity of 0.75–0.79, specificity of 0.92–0.97, NPV of 0.80–0.83, and PPV of 0.89–0.96, based on “typical appearance,” as predictive of RT-PCR+. • Inter-observer agreement between the first reading in the emergency setting and the second reading by the senior emergency teleradiologist was excellent (weighted kappa = 0.87). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s00330-020-07345-z) contains supplementary material, which is available to authorized users.

Coronavirus disease 2019 (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2, which was identified in Wuhan, China, in late December 2019 [1] . It rapidly spread worldwide and by early 2020 had affected most Western countries. It was officially recognised as a pandemic on the 11th of March 2020.

Real-time reverse transcription polymerase chain reaction (RT-PCR) has emerged as the gold standard for the qualitative detection of nucleic acid from SARS-CoV-2 in upper and lower respiratory specimens, with a nasopharyngeal swab being the preferred method for sample collection. Recent studies have shown possible inaccurate RT-PCR results (false negatives and false positives) with regard to detection of COVID-19, with reported sensitivity ranging from 60 to 71%, and specificity around 96% [2] [3] [4] [5] [6] . These inaccurate RT-PCR results may be due to inadequate sampling techniques, variations in viral load, and test kit sensitivity.

Chest CT has been suggested as a potential complementary approach alongside RT-PCR in the context of the current outbreak. Although COVID-19 CT findings may overlap with other diseases, such as other viral infections, they display a typical pattern in the context of this outbreak, presenting ground-glass opacities (GGO), with or without consolidations, in lung regions close to visceral pleural surfaces [7] . Chest CT may be implemented prior to RT-PCR testing, with a sensitivity of 60 to 98% [2, 4, 8, 9] . Furthermore, chest CT may provide crucial information regarding the extent and complications of COVID-19, or its differential diagnoses, within short time periods, without the need to wait for the results of RT-PCR.

Most radiology societies recommended that chest CT should not be used as first-line screening [3, 10] and should only be used in patients who are suspected or confirmed to have COVID-19, who have moderate to severe disease requiring hospitalisation, or who have underlying comorbidities if access to RT-PCR testing is restricted [11] .

The accuracy of COVID-19 diagnosis based on CT features by radiologists has been previously investigated, however, largely on a retrospective basis and within settings with a high prevalence of COVID- 19 . In a previous study [12] , structured CT reports by radiologists from several hospitals were evaluated based on multicentric homogeneous data using logistic regression modelling. However, analysis of diagnostic accuracy may be hampered by the heterogeneity of radiologists involved with varying degrees of experience.

Outsourced teleradiology is now widely implemented, but opinions differ regarding this practice. In the 2016 European Society Radiology survey, 70.8% of National Member countries practiced outsourcing [13] . While teleradiological outsourcing enable to provide radiology services in a variety of local settings, studies often point out the quality of reports and insufficient communication with clinicians. A clear need to improve confidence in the accuracy of outsourced reports and ensuring timely responses have been highlighted [14, 15] .

Our aim was to prospectively assess the diagnostic accuracy of chest CT for COVID- 19 and determine the interobserver agreement between radiologists in patients suspected to have the virus from 15 French emergency departments operating an outsourced emergency teleradiological system during the COVID-19 outbreak in France.

The local institutional ethics review board approved this multicentric observational prospective study (N°CRM-2005-088) .

At our emergency teleradiology center, all consecutive patients who were suspected, by a board-certified emergency physician, to have COVID-19, and who underwent both chest CT imaging and RT-PCR from March 13, 2020, to April 14, 2020, in 15 French emergency departments, were included (Fig. 1) . The examinations took place during on-call duty periods between 6 pm and 8.30 am on weekdays and 24 h a day on weekends. A COVID-19 dedicated workflow was implemented, which consisted of a standardised COVID-19 CT request form for the requesting physician, structured and standardised radiological reports, and a systematic review by a senior radiologist.

Chest CT examinations were performed using a 16, 64, or 80detector row CT scanner with a standardised non-contrast chest CT COVID-19 protocol for all hospitals. If pulmonary embolism was suspected, a CT pulmonary angiographic protocol with bolus-tracking intravenous iodine contrast agent administration at a rate of 3-4 mL/s was used instead. The on-site requesting physician supervised contrast administration if needed. Details regarding the CT scanner and the number of examinations for each hospital are given in Supplementary material 1. The geographic distribution of patients included is presented in Fig. 2 .

The teleradiology interpretation protocol met the current French recommendations for teleradiology practice [16] . Reports and requests with clinical data for the interpretation of COVID-19 chest CT images were received from partner hospitals at our teleradiology center, using teleradiology software (ITIS; Deeplink Medical). The images were securely transferred over a virtual private network (VPN) to a local picture archiving and communication system for interpretation (PACS; Carestream Health 12). Images were interpreted by a teleradiologist (TR) in two dedicated emergency reading rooms during the study period. The panel of TRs consisted of 106 senior radiologists with at least 5 years of emergency imaging experience (mean length of practice: 7 years) and 45 junior radiologists (i.e., residents) with between 3 and 5 years of emergency imaging experience (mean length of practice: 4 years). TRs operated an on-call rota in groups of at least five TRs per night, and the report turn-around times were recorded.

CT examinations were systematically reviewed within a week after each on-call period by a senior radiologist (15 senior radiologists; mean length of practice: 12.1 years) who was not involved in the on-call duty period, blinded to RT-PCR results and the first reader report, and unblinded to the patient's medical history.

Clinical information was prospectively provided by emergency physicians upon presentation and was collected using the teleradiology software as a dedicated COVID-19 CT request form (ITIS; Deeplink Medical). This clinical information included age, gender, active smoking, significant medical history, recent medication with anti-inflammatory drugs, time since onset of symptoms (categorised as: < 1 week, 1-2 weeks, ≥ 2 weeks), oxygen saturation (categorised as ≥ 95%, 90-95%, and < 90%), dyspnoea, fever (≥38°C), cough, asthenia, headache, and ear, nose, and throat symptoms. The RT-PCR results were retrospectively collected from the patients' electronic medical records by each partner hospital. The initial RT-PCR was considered as the standard of reference.

Discrepancies between the RT-PCR results and the score from the second reading were reviewed by contacting hospitals and investigating patients' outcome, in order to determine whether a second chest CT and/or a second RT-PCR test had been performed.

Six common radiological features were extracted from the specific, structured COVID-19 chest CT reports by the first and second independent readers. These features included the presence of GGO, consolidation, fibrosis (with traction bronchiectasis and architectural distortions), intralobular reticulations, and extent of abnormalities (categorised as low [< 25%], moderate [25-50%] or high [> 50%]). Additionally, the second reading included an assessment of image quality (categorised as good, moderate, or poor) and the following radiological features: (a) underlying pulmonary disease (categorised as emphysema, lung cancer, interstitial lung disease, pleural lesions, bronchiectasis); (b) GGO pattern (categorised as rounded and non-rounded GGO); (c) consolidation pattern (categorised as rounded and non-rounded consolidations and subpleural bands); (d) predominant pattern (categorised as GGO or consolidation); (e) distribution pattern of lesions (categorised as peripheral predominant, central predominant, or mixed); (f) bilateral lesions; (g) diffuse lesions (i.e., five lobes involved); (h) basal-predominant lesions; (i) pleural effusion (categorised as uni-or bilateral); (j) adenomegaly (defined as lymph node with short axis > 10 mm); (k) bronchial wall thickening (further categorised as lobar/segmental or diffuse); (l) airways secretions; (m) tree-in-bud centrilobular micronodules; and (n) pulmonary embolism.

Each reading was categorised using a five-point score, adapted from the recommendations of the Société Française de Radiologie (SFR) [17] : (1) normal;

(2) non-infectious findings; (3) infectious findings but not consistent with COVID-19 infection; (4) consistent with COVID-19 infection; (5) typical appearance of COVID-19 infection (Fig. 3) .

To support the conclusions of the TRs, all radiologists underwent a 2-h e-learning session based on reported chest CT findings associated with COVID-19 from the literature, which was made publicly available on the 7th of April [18] . A private medical discussion group (PandaLab) was used at the onset of the outbreak, such that on-call TRs could discuss and share images from their cases with all TRs who were not on call, prior to completing their reports.

Statistical analyses were performed using R (version 3.5.3, R foundation for statistical computing). A p value of less than 0.05 was deemed significant.

Inter-observer agreement between the six radiological features common to both readings and the scores were assessed using Cohen's kappa (for dichotomised variables) and weighted kappa (for ordinal variables).

The frequencies of all clinical and radiological categorical variables from the second reading were compared between patients with positive RT-PCR (RT-PCR+) and those with negative RT-PCR (RT-PCR−) using Pearson χ 2 or Fisher exact tests, except for age which was compared between the two groups using the Student t test. The odds ratio (OR), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), as measurements of accuracy with 95% confidence (95% CI), were calculated for each categorical variable independently and for a score of 4 or 5 and for a score of 5 for the 1st and 2nd readings. The probability of RT-PCR+ based on the scores was estimated using univariate binary logistical regression.

Changes in accuracy measurements (including the percentage of correctly predicted observations), inter-observer agreement, and disease prevalence over time were estimated by dividing the study period into eight standard periods of 4 days (as a compromise between an acceptable number of patients for each period [> 20] and preserving the shape of the epidemic curve).

Overall, 513 patients were included (Fig. 1) . The median age of the population was 68.4 years old (range: 18-100) and 241/ 513 were female (47%). The prevalence of RT-PCR+ was 244/513 (47.6%). The distribution of RT-PCR+ and chest CT over the study period is shown in Fig. 4 . Table 1 presents the descriptive features of the study population. The average dosimetry for chest CT was 251 ± 130 mGy cm.

The 513 chest CT scans were interpreted at a first reading by 101 radiologists from the panel of the 151 teleradiologists available during on-call periods. Among them, 69/101 (68.3%) were senior and 32/101 (31.7 %) were junior radiologists. The average report turn-around time was 15.7 ± 9.1 min for a chest CT scan and 22.7 ± 12.5 min when there was a chest CT scan with another part of the body. Table 2 provides a comparative summary of the diagnostic accuracy of the two readings. Scores from both readings were significantly associated with RT-PCR status (p < 0.001). The inter-observer agreement for scores was excellent (weighted kappa = 0.87, 95% CI = 0.84-0.90, p < 0.001). Other interobserver agreements are shown in Table 3 and ranged from 0.41 (for fibrosis) to 0.79 (for GGO).

Regarding the second reading, the accuracy measurements were as follows: sensitivity of 0.92 (95% CI: 0.88-0.95), specificity of 0.84 (95% CI: 0.79-0.88), PPV of 0.84 (95% CI: 0.80-0.87), and NPV of 0.92 (95% CI: 0.86-0.95). The mean number of CT reviews per senior radiologist was 34.

By applying a univariate binary logistic regression model, we were able to estimate the probability of RT-PCR+. For a score of 1, 2, or 3, the probability of RT-PCR+ was 0.09 and 0.08 for the first and second readings, For scores of 4 or 5 at the first reading, over the eight periods, accuracy ranged from 0.67 (after the first period; 95% CI: 0.45-0.84, RT-PCR+ prevalence: 0.25) to 0.90 (after the seventh period; 95% CI: 0.67-0.99, RT-PCR+ prevalence: 0.29). For a score of 5, accuracy at the first reading ranged from 0.80 (after the fourth period; 95% CI: 0.73-0.86, RT-PCR+ prevalence: 0.58) to 1 (after the last period; 95% CI: 0.86-1, RT-PCR+ prevalence: 0.04). Changes in other measurements of diagnostic accuracy of on-call radiologists and weighted kappas over the study periods are shown in Fig. 4 .

In total, discrepancies were reported in 62 patients. Forty-three of these 62 (69.4%) patients had negative RT-PCR but strongly suggestive chest CT. Of these 43 patients, 14 (32.6%) had a second RT-PCR test, which turned out to be positive in one 3 days later, and remained negative for the 13 other patients. The final diagnoses according to the emergency medical records were confirmed to be COVID-19 ( Nineteen of the 62 patients (31.6%) had positive RT-PCR but negative chest CT. The median time from onset of symptoms to CT was 3 days (range: 0-16). Of these 19 patients, two (10.5%) underwent repeat chest CT, the first 7 days later which was strongly suggestive of COVID-19, and the second 15 days later which was still classified as normal.

Clinical and radiological findings associated with RT-PCR status Table 4 shows the overall and subgroup frequencies for all variables as well as measurements of their accuracy. The following clinical variables were associated with RT-PCR+: time since onset of symptoms (≥ 1 week, p = 0.04), oxygen saturation (< 95% [p = 0.03] and < 90% [p = 0.005]), presence of fever (p < 0.001), cough (p = 0.02), asthenia (p = 0.001), and myalgia (p = 0.008).

The following radiological variables were positively associated with RT-PCR+: presence of GGO (p < 0.001), nonrounded GGO (p < 0.001), rounded GGO (p < 0.001), presence of consolidation (p < 0.001), non-rounded consolidation (p = 0.01), subpleural bands (p < 0.001), intralobular reticulations (p < 0.001), fibrosis (p < 0.001), GGO predominant pattern (p < 0.001), peripheral predominant location (p < 0.001), bilateral lesions (p < 0.001), diffuse lesions The following radiological variables negatively correlated with RT-PCR+: consolidation predominant pattern (p = 0.02), central predominant location (p = 0.001), mixed predominant location (p = 0.002), airway secretion (p < 0.001), bronchial wall thickening (p < 0.001), either lobar/segmental (p < 0.001) or diffuse (p < 0.001), and tree-in-bud micronodules (p < 0.001).

Ninety-seven of 513 (18.9%) patients underwent a CT pulmonary angiographic protocol for suspected pulmonary embolism. Six of them (6/93, 6.5%) presented with a pulmonary embolism with the same frequency in each RT-PCR group (3/52 [5.8%] in the RT-PCR− group and 3/45 [6.7%] in the RT-PCR+ group). For these 6 cases, chest CT categorisations at the first and second reading were in accordance with the RT-PCR status.

Our study found that inter-observer agreement was excellent between on-call radiologists with varying degrees of experience and senior radiologists. The ability to categorise chest CT scans from various hospitals was strongly reproducible between the emergency setting and the later second reading. This indicator was higher than previously reported in recent investigations [7] . Prokop et al [7] considered multiple rates based on a smaller population using Fleiss kappa, a different statistical approach to that used in our study. Moreover, the COVID-19 Reporting and Data System (CO-RADS) classification is not entirely equivalent to our adapted SFR classification. For instance, CO-RADS scores of 4 and 5 are equivalent to a score of 5 based on our SFR system. Hence, highly suspected cases of COVID-19 are classified under a single category in the latter system, rather than the two categories for CO-RADS, which may limit inter-observer discrepancies for the SFR system. The PPV for a score of 5 was 0.89 according to the first reading and 0.96 for the second reading, highlighting the strong reliability with which radiologists diagnosed COVID-19. The diagnostic sensitivity for radiologists (0.92) was lower than that reported by Ai et al, but prevalence of the disease was close to 100% and no threshold for determining positive scans was specified in the latter [2] .

Based on this multicentric population of suspected COVID-19 in an emergency setting, radiologist scoring was shown to strongly correlate with RT-PCR status. Therefore, it may have been useful to perform initial CT screening while waiting for RT-PCR results, before hospitalisation. The clinical and radiological variables that highly correlated with RT-PCR status are consistent with those found in the literature, namely: fever, asthenia, oxygen saturation, presence of GGO (with a GGO main pattern), consolidation (especially nonrounded), subpleural bands, and intralobular reticulations with bilateral, diffuse, basal-predominant, and peripheral distributions [19, 20] .

In our study, the AUCs were 0.89-0.93, slightly higher than those previously reported under on-site radiology conditions and were, overall, good [7, 21] . Accuracy and inter-observer agreement over the eight periods remained almost stable over time, although the number of RT-PCR+ and chest CT cases followed a bell curve distribution. There may be several reasons for this: knowledge regarding COVID-19 radiological findings has considerably increased since the initial papers were published in January and February 2020; open-source publications issued by radiological societies worldwide were immediately relayed to our facility and implemented in our workflow; we provided e-learning courses to junior and senior TRs; during on-call duty, TRs could ask for collegial help through a private discussion forum and also from other TRs on-call working together on-site in two dedicated emergency reading rooms; in addition to simply reducing reporting variability, the use of structured standardised reports and the scoring may have provided confidence and guidance to TRs [3] .

It should be noted that the PPV for the combined scores of 4 and 5 declined when prevalence was at its lowest (25% to 4%), whereas the PPV for a score of 5 remained stable. These findings confirm the impression that, at low prevalence rates, a GGO ground-glass opacities § The corresponding radiological variable was an ordinal variable with more than two levels; therefore, weighted kappas were computed rather than classic Cohen kappas *p < 0.001 Table 4 Correlation between clinical and radiological features and RT-PCR status or corresponding diagnostic accuracy measurements patient should be classified as suspected of having COVID-19 by radiologists only when confronted with consistent evidence of COVID-19 CT features. It should also be noted that 19 patients who were RT-PCR+ had a chest CT which was not categorised as compatible with COVID-19 infection. In this subgroup, the median time period between the onset of symptoms and the chest CT was only 3 days. Previously, similar CT findings were seen in 56%, 9%, and 4% of patients at 0-2, 3-5, and 6-12 days after the onset of symptoms, respectively [22] . However, upon further investigation, only two of these 19 patients were shown to have had a second chest CT, one which was normal 2 weeks later and the other which was positive 1 week later. These observations highlight that normal chest CT (i.e., a score of 1) does not exclude COVID-19 within the first 3 days of symptoms, as mentioned in the SFR recommendations. Conversely, 9/201 (4.5%) patients with typical CT findings according to the second reading showed a negative RT-PCR result. We investigated these discrepancies and found that 14 of the 49 (28.6%) patients were treated as COVID-19 patients, although only one of them was RT-PCR+ when tested again. This illustrates that chest CT could be useful to reclassify false-negative RT-PCR cases in patients with a high clinical suspicion of COVID-19.

There are, however, limitations to this study. First, we were not able to collect exhaustive data on clinical outcomes prospectively due to the limited nature of teleradiological investigation, as part of the patient care pathway. Second, in our multicentric study, local prevalence varied between the hospitals across the six French regions. Moreover, emergency physicians may have had different clinical screening practices, leading to heterogeneity among the study cohort. Third, we did not have access to on-site radiologist's performance of the same departments. But in our teleradiology workflow, most of the time on-site radiologists do not interpret or re-interpret the exams we take care of. Finally, the RT-PCR results were collected retrospectively, leading to the exclusion of several patients and thereby presenting a risk of sampling bias.

To conclude, we present data from our French multicentric emergency cohort based on prospective radiological reports and secondary expertise. Our results illustrate the high diagnostic accuracy in diagnosing COVID-19 of TRs with various degrees of experience, in settings with different levels of prevalence, as well as excellent inter-observer agreement for chest CT. Thus, this kind of structured outsourced teleradiology model could bring high-quality structured and standardised reports with report turn-around time meeting the requirements of emergency medicine during the pandemic. MACON p values in bold correspond to significant associations between clinical or radiological features and RT-PCR status Regarding variables with more than two levels, the p value in italics corresponds to the p value considering all its levels. The p values below are based on the Fisher or χ 2 test for this variable that was dichotomised according to the level of the line

• The support teams at Imadis Teleradiology

A novel coronavirus from patients with pneumonia in China

Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

Radiological Society of North America expert consensus statement on reporting chest CT findings related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA

Clinical and CT features of early stage patients with COVID-19: a retrospective analysis of imported cases in Shanghai, China

Essentials for radiologists on COVID-19: an update-radiology scientific expert panel

for Innovative New Diagnostics (2020) SARS-COV-2 MOLECULAR ASSAY EVALUATION: RESULTS

CO-RADS: a categorical CT assessment scheme for patients suspected of having COVID-19-definition and evaluation

Sensitivity of chest CT for COVID-19: comparison to RT-PCR

Interim guidelines for collecting, handling, and testing clinical specimens from persons under investigation (PUIs) for coronavirus disease

A British Society of Thoracic Imaging statement: considerations in designing local imaging diagnostic algorithms for the COVID-19 pandemic

Epidémie de Covid-19 : POINT SUR L'IMAGERIE

Whole-body CT in patients with multiple traumas: factors leading to missed injury

ESR teleradiology survey: results. Insights Imaging

Teleradiological outsourcingcompromises and hidden costs

Clinicians' perceptions of the quality of outsourced radiology and actions taken around perceived imaging errors in practice

Qualité et sécurité des actes de téléimagerie -Guide de bonnes pratiques

La société d'Imagerie Thoracique propose un compte-rendu structuré de scanner thoracique pour les patients suspects de COVID-19

Lyon F (2020) Web-based e-learning

Detection of SARS-CoV-2 in different types of clinical specimens

Clinical features of patients infected with 2019 novel coronavirus in Wuhan

Performance of radiologists in differentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT

Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Acknowledgements We would like to sincerely thank:• 

Funding The authors state that this work has not received any funding.

Guarantor The scientific guarantor of this publication is Guillaume Gorincour, MD, PhD.

Thomas Ayoub is employed by Deeplink Medical for activities not related to the present article.The other authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

One of the authors has significant statistical expertise.Informed consent Written informed consent was obtained from all subjects (patients) in this study.Ethical approval Institutional Review Board approval was obtained.

• prospective • observational • multicenter study