key: cord-0768096-m2ca3tmd
authors: Debray, Marie-Pierre; Tarabay, Helena; Males, Lisa; Chalhoub, Nisrine; Mahdjoub, Elyas; Pavlovsky, Thomas; Visseaux, Benoît; Bouzid, Donia; Borie, Raphael; Wackenheim, Catherine; Crestani, Bruno; Rioux, Christophe; Saker, Loukbi; Choquet, Christophe; Mullaert, Jimmy; Khalil, Antoine
title: Observer agreement and clinical significance of chest CT reporting in patients suspected of COVID-19
date: 2020-08-29
journal: Eur Radiol
DOI: 10.1007/s00330-020-07126-8
sha: 3367b9052c259285d4818733627f59604b0e0909
doc_id: 768096
cord_uid: m2ca3tmd

OBJECTIVES: To assess interobserver agreement and clinical significance of chest CT reporting in patients suspected of COVID-19. METHODS: From 16 to 24 March 2020, 241 consecutive patients addressed to hospital for COVID-19 suspicion had both chest CT and SARS-CoV-2 RT-PCR. Eight observers (2 thoracic and 2 general senior radiologists, 2 junior radiologists, and 2 emergency physicians) retrospectively categorized each CT into one out of 4 categories (evocative, compatible for COVID-19 pneumonia, not evocative, and normal). Observer agreement for categorization between all readers and pairs of readers with similar experience was evaluated with the Kappa coefficient. The results of a consensus categorization were correlated to RT-PCR. RESULTS: Observer agreement across the 4 categories was good between all readers (κ value 0.61 95% CI 0.60–0.63) and moderate to good between pairs of readers (0.54–0.75). It was very good (κ 0.81 95% CI 0.79–0.83), fair (κ 0.32 95% CI 0.29–0.34), moderate (κ 0.56 95% CI 0.54–0.58), and moderate (0.58 95% CI 0.56–0.61) for the categories evocative, compatible, not evocative, and normal, respectively. RT-PCR was positive in 97%, 50%, 31%, and 11% of cases in the respective categories. Observer agreement was lower (p < 0.001) and RT-PCR positive cases less frequently categorized evocative in the presence of an underlying pulmonary disease (p < 0.001). CONCLUSION: Interobserver agreement for chest CT reporting using categorization of findings is good in patients suspected of COVID-19. Among patients considered for hospitalization in an epidemic context, CT categorized evocative is highly predictive of COVID-19, whereas the predictive value of CT decreases between the categories compatible and not evocative. KEY POINTS: • In patients suspected of COVID-19, interobserver agreement for chest CT reporting into categories is good, and very good to categorize CT “evocative.” • Chest CT can participate in estimating the likelihood of COVID-19 in patients presenting to hospital during the outbreak, CT categorized “evocative” being highly predictive of the disease whereas almost a third of patients with CT “not evocative” had a positive RT-PCR in our study. • Observer agreement is lower and CTs of positive RT-PCR cases less frequently “evocative” in presence of an underlying pulmonary disease.

Since December 2019, a new respiratory disease related to a new coronavirus, SARS-CoV-2, developed in China and rapidly spread to other countries, reaching a pandemic stage in March 2020 [1, 2] . Even if the disease follows a benign course in many cases, some patients develop respiratory difficulties requiring hospitalization, leading to a large amount of patients with clinical suspicion of coronavirus disease 2019 presenting to the emergency departments [3] . Accurate identification of COVID-19 patients is crucial to isolate them from not infected patients and to limit the diffusion of the outbreak. The reference standard is the positivity of the realtime reverse transcription-polymerase chain reaction (RT-PCR); nevertheless, the sensitivity of this test remains unclear, having been reported between 42 and 71% in some early series [4] [5] [6] , because of suboptimal sampling technique, limitations in performance assay, or low viral load in the nasopharyngeal area. Chest CT shows abnormalities in a large majority of cases, with some signs described as typical or very evocative of the disease in the current outbreak context [7] [8] [9] [10] [11] . Sensitivity of chest CT has been reported as high as 97% as compared to RT-PCR [4] and CT abnormalities could precede RT-PCR positivity [12] . Because it is readily available, chest CT may assist first-line triage of patients presenting to hospital [3] . Several radiology societies [6, [13] [14] [15] have proposed structured reporting of CT into categories, defined according to the typical or less typical appearance of lung involvement, to facilitate communication with physicians. In routine practice, categorization is based on each reader individual impression supported by numerous papers having described imaging signs of COVID-19 pneumonia [7] [8] [9] [10] [11] . Such categorization may directly impact the clinical decision-making. However, the reproducibility of the categorization is unknown and the clinical significance of the different categories is unclear. Thus, the objectives of our study were to assess interobserver agreement to categorize CT findings as well as performances of chest CT across the different categories in patients suspected of COVID-19 presenting to hospital.

This is a monocentric retrospective study conducted in a University Hospital (Bichat Claude-Bernard Hospital, Paris, France) between March 16, 2020, and March 24, 2020. Institutional review board was approved and written informed consent waived. During this period of COVID-19 outbreak, patients presenting at our hospital for COVID-19 suspicion and for whom hospitalization was considered had both chest CT scan and SARS-CoV-2 RT-PCR. Diagnosis of COVID-19 relied on the positivity of the RT-PCR and CT could assist early triage in critically ill patients or with clinically overt pneumonia. Patients with a negative RT-PCR result could have a subsequent RT-PCR test and/or another chest CT during the next few days, depending on the physician's judgment.

Consecutive adult patients attending the emergency room or the infectious diseases department of our hospital with clinical suspicion of COVID-19 and having both chest CT and SARS-CoV-2 RT-PCR were included.

Demographic, clinical and laboratory data at presentation, and follow-up data when available were extracted from electronic medical records. Clinical data included symptoms, any need for oxygen supply, time from symptom onset to CT, comorbidities, and pre-existing pulmonary diseases. RT-PCR was performed on nasopharyngeal swabs or aspiration, using RealStar® SARS-CoV-2 RT-PCR Kit (Altona Diagnostics) or Cobas® SARS-CoV-2 Test (Roche).

Chest CT scans were acquired on a multidetector-row CT (Aquilion One Genesis or Prime, Canon Medical Systems Corporation) without contrast medium injection. They were performed in the supine position at full inspiration. The scanning parameters were as follows: 120 kVp, automatic exposure control for tube current (SD:15), exposure time 0.27-0.35 s per rotation depending on the CT unit, collimation 40 mm. Images were reconstructed with 1-mm slice thickness and 0.8-mm inter-slice gap, using a high-frequency reconstruction algorithm.

All CT scans were analyzed by 8 readers, including 2 senior emergency physicians (T.P., D.B., with 10 years of experience each), 2 radiology residents (H.T., N.C., 4 and 5 years of experience), 2 senior general radiologists (L.M., E.M., 6 and 9 years of experience), and 2 senior thoracic radiologists (M.P.D., A.K., 23 and 25 years of experience), blindly to RT-PCR results and final diagnosis. All readers classified each examination into one out of 4 categories, as follows: evocative, compatible, not evocative of COVID-19, and normal following the recommendations of the French Society of Radiology [14] . The global impression of each reader was supported by previous typical and less typical reported signs in the literature. A guide was provided, recalling these signs. The "evocative" category included multifocal ground-glass opacities (GGO), being nodular or not, or crazy-paving with or without consolidations, with a bilateral, peripheral, or mixed distribution and involvement of the posterior zones. The intermediate category "compatible" corresponded to cases showing abnormalities already reported in COVID-19 but that may be encountered in other diseases or very limited in extent. It included GGO and/or consolidations with very few lesions and unilateral distribution, exclusive central distribution or absence of posterior lung area involvement, halo sign as main abnormality, and association of typical opacities, with atypical signs or other lesions. The category "not evocative" corresponded to cases showing abnormalities very rarely reported in COVID-19 or typical of another diagnosis as isolated systematized consolidation, discrete centrilobular nodules with tree-in-bud appearance or lung cavitation in favor of other lung infection, centrally distributed GGO with septal lines and pleural effusion in favor of cardiogenic pulmonary edema, peripheral reticulations with or without honeycombing, traction bronchiectasis, and GGO in favor of interstitial lung disease. This category also included non-specific abnormalities as sub-segmental atelectasis or opacities considered to be sequelae. Because the distinction of such minor non-specific or typically sequellar abnormalities from normal parenchyma may have little clinical relevance in the present study, findings were analyzed according both to the 4 categories and to the 3 categories that resulted from merging of "not evocative" and "normal." Any disagreement between the 4 senior radiologists was analyzed in consensus of these 4 readers giving a final consensus categorization for all cases.

Finally, all chest CTs were described by one thoracic radiologist (M.P.D.) for presence and distribution of various elementary signs, as well as signs of any underlying pulmonary disease (significant pulmonary emphysema, interstitial lung disease, bronchiectasis, parenchymal sequelae, bronchial carcinoma).

Categorical variables were described by numbers and percentage for each category. The agreement between two and more than two readers was evaluated with the Cohen's kappa coefficient and the Fleiss' kappa, respectively, and their 95% confidence interval, which measures the excess proportion of agreement after taking chance into account. Interobserver agreement was considered poor for a kappa value < 0.20, fair for 0.21-0.40, moderate for 0.41-0.60, good for 0.61-0.80, and very good for 0.81-1.00. Comparisons between dependent kappas (e.g., for different couple of readers for the same images) were performed with bootstrapping (N = 10000 samples) and the p value corresponds to the proportion of bootstrap samples that yield a couple of kappa value in a different order than the observed one. Comparisons between independent kappa (e.g., for different levels of a categorical variable) were performed according to the method proposed in [16] .

Comparison of the frequency of radiologic signs between categories was performed with the fisher exact test. All analysis were done using R v4.0.2.

In total, 241 patients were included. Their demographic, signs at presentation, and comorbidities are in Table 1 . COVID-19 was confirmed in 158 patients by RT-PCR positivity. COVID-19 was deemed likely despite two negative RT-PCR in 2 cases with clinical and CT follow-up strongly supportive of this diagnosis. Fifteen patients were considered non-COVID-19 because of at least 2 consecutive negative RT-PCR and absence of clinical and radiological signs favoring COVID-19 during follow-up. Sixty-six patients were considered non-COVID-19 with only one negative RT-PCR but including 38 with CT and/or clinical follow-up ( Fig. 1 ).

Kappa coefficient between all readers across the 4 CT categories was good (0.61, 95% CI 0.60-0.63). It was moderate to good for each pair of readers, significantly better between any pair of radiologists as compared to emergency physicians (p < 0.001) ( Table 2 ). The kappa value between all readers was lower when abnormalities were unilateral as compared to bilateral lesions (p = 0.002) and in the presence of underlying pulmonary lesions (p < 0.001). It was lower for patients older as compared to those younger than 70 years (p ≤ 0.001), and when time from symptom onset to CT was shorter than 5 days (p ≤ 0.001). Observer agreement was very good between all readers for the category "evocative" (0.81, 95% CI 0.79-0.83), fair for the category "compatible" (0.32, 95% CI 0.29-0.34), moderate for each category "not evocative" and "normal" (0.56, 95% CI 0.54-0.58; and 0.58, 95% CI 0.56-0.61, respectively), and good (0.74, 95% CI 0.71-0.76) for chest CTs classified either not evocative or normal ( Table 3) .

The RT-PCR positivity rate was highly significantly different among the 4 categories (p < 0.0001): 119 out of 123 (97%), 15 out of 30 (50%), 22 out of 70 (31%), and 2 out of 18 (11%) cases considered evocative, compatible, not evocative, and normal by the consensus reading had a positive RT-PCR, respectively ( Table 3 ). The rate of RT-PCR positivity was 27% (24 out of 88) among CTs categorized either not evocative or normal.

With RT-PCR as reference, chest CT classified evocative had 75% sensitivity (95% CI 68-81%) and 95% specificity (95% CI 87-98%) whereas chest CT classified evocative or compatible had 85% sensitivity (95% CI 78-90%) and 77% specificity (95% CI 66-85%) for COVID-19. Sensitivity for evocative CT was significantly lower for patients with a delay since symptom onset lower than 5 days (68% 95% CI [60-78%] vs 83% 95% CI [73-91%], p = 0.038) and for patients with an underlying pulmonary disease (36% 95% CI [21-54%] vs 87% 95% CI [80-92%], p < 0.001), but the difference was not significant between younger and older patients (age > 70, p = 0.23).

Among 90 patients with a first negative RT-PCR, 22 were re-tested, including 5 out of 6 patients (83%) with CT considered evocative, 5 out of 17 patients (29%) with CT considered compatible, and 12 out of 67 patients (18%) with CT considered not evocative or normal. Of these subsequent RT-PCR, 2 out of 5 were positive for each category "evocative" and "compatible" and 3 out of 12 for the third category.

Clinical and chest CT findings in the different categories CT features of the whole population, of RT-PCR positive cases, and of the different CT categories are in Table 4 . The most frequent pattern of CT considered evocative was mixed with predominant GGO. Typical bilateral and peripheral distribution with posterior involvement was almost constant. Some centrally distributed lesions were associated to peripheral lesions in 72% of cases (Fig. 2) . The 30 chest CTs considered compatible more frequently showed pure GGO as compared to evocative cases (p = 0.0012). Among these 30 cases, 12 and 6 showed features of an underlying pulmonary disease and of an associated pulmonary edema, respectively (Figs. 3, 4). As compared with cases classified "not evocative" (Figs. 5, 6), cases classified compatible more frequently showed a typical distribution and atypical signs were absent among those with positive RT-PCR. Time from symptom onset to CT was longer, patients were older, and need for oxygen supply was more frequent in patients whose CT was categorized evocative as compared to other patients.

The current study describes observer agreement for chest CT reporting in a large series of consecutive patients suspected of COVID-19. We found that categorization of CT reports was reproducible and meaningful in patients considered for hospitalization. With SARS-CoV-2 RT-PCR as reference, chest CT reported "evocative of COVID-19 pneumonia" was highly predictive of the disease in this population during the outbreak and agreement for this category between observers of various Data are numbers with percentages, or 95% confidence interval for the κ value, in brackets a Cohen's kappa value for agreement between 2 readers, and Fleiss' kappa value for agreement between all readers b Kappa value calculated for the 4 following categories: evocative, compatible, not evocative, and normal *Observer agreement significantly better between any pair of radiologists as compared to agreement between emergency physicians (p < 0.001) and not significantly different between the 3 pairs of radiologists c Kappa value calculated for the 3 following categories: evocative, compatible, and not evocative or normal **Observer agreement significantly better between resident radiologists as compared to agreement between thoracic senior radiologists (p < 0.001) experiences and sub-specialties was very good. The positivity rate of RT-PCR was highly significantly different among the categories, suggesting that CT may help in disease likelihood stratification. It should be emphasized that almost a third of patients with chest CT classified "not evocative" had a positive RT-PCR, highlighting that no CT pattern can rule out COVID-19 in an epidemic context. As previously reported [4, 17] , RT-PCR may also be positive in patients without lung abnormalities on CT. We observed only fair observer's agreement for the category "compatible." This may be explained by the method used to classify patients. Thus, categorization of chest CT was mainly based on the global impression of each reader, according to the routine practice in our hospital, where more than thousand chest CTs have been performed to date in COVID-19 patients. The guide we provided to all observers before readings to help case classification partly relied on interpretation of findings which may vary according to each reader experience. Cases classified "compatible" encompassed only 12% of all, i.e., 30 cases at all. More than half of them showed features of an underlying pulmonary disease or of a mixed pattern with some opacities that could be attributable to pulmonary edema. Such mixed features complicate interpretation and categorization of findings and lower the reader's confidence. Indeed we observed significant lower observer agreement in cases showing an underlying pulmonary disease. These mixed and complex cases are part of the routine practice.

The recent Radiological Society of North America proposal for CT findings related to COVID-19 includes 4 categories: typical, indeterminate, atypical appearance, or CT negative for pneumonia [6] , the "indeterminate" category appearing similar to the "compatible" category we used in the present study. Terms and categories proposed by other radiology societies vary, according to whether or not they individualize a normal category, and to the intermediate category being named either "compatible" or "indeterminate" [13] [14] [15] . We herein retained 4 categories, including a normal category, following the recommendations of the French Society of Radiology but also performed an analysis on 3 categories, resulting from merging of a normal appearance of the lung parenchyma with nonspecific lung abnormalities or features suggesting an alternative diagnosis, in accordance with the guidelines of the European Society of Radiology. The surprisingly only moderate interobserver agreement we observed for the category "normal" may be explained by minor disagreements between minor non-specific abnormalities, as plate-like atelectasis or even dependent-induced opacities and strictly normal findings. By showing significant differences in the RT-PCR positivity rate among CT categories, our study supports that chest CT can participate in estimating the likelihood of COVID-19, in association with contact history, clinical presentation, and prevalence of the disease in the population [18] . The role of chest CT for patients suspected of COVID-19 is not completely established. Despite limitations in sensitivity and result delays, the RT-PCR remains the diagnostic reference and chest CT is not recommended for screening by most radiology societies [6, 15, 19, 20] . According to a recent consensus statement from the Fleischner Society [20] , imaging may be indicated for diagnosis when RT-PCR is negative or unavailable in patients having risk factors for worsening or moderate-tosevere respiratory signs. In our study, most patients who had chest CT at the emergency room had indeed either moderate or severe clinical features or comorbidities. Chest CT helped addressing or transferring some patients showing a chest CT evocative of COVID-19 into the proper COVID-19 hospitalization area, especially those needing urgent decision, before the RT-PCR result was provided. However, none of the other categories could rule out the disease, even the "normal" category whose RT-PCR positivity rate was 11%. Chest CT could favor re-testing in cases with negative RT-PCR [12] . Patients with a first negative RT-PCR and a chest CT considered "compatible" tended to be more frequently re-tested and to have a subsequent PCR more frequently positive, as compared to patients with a first negative RT-PCR and a chest CT considered "not evocative."

To date, performances of chest CT have been analyzed according to a binary consideration, i.e., CT positive or negative for COVID-19 pneumonia, using RT-PCR as reference. Most studies have reported high sensitivity, up to 97% [4, 5, 21] , and specificity between 25 and 56%, with pooled sensitivity and specificity of 94% and 37%, respectively, according to a recent meta-analysis [22] . Whether positive CTs in these studies showed typical imaging features is unclear. Our results differ, chest CT classified evocative having 75% sensitivity and 95% specificity and chest CT classified evocative or compatible having 85% sensitivity and 77% specificity. These differences may be attributable to differences in CT features between an "evocative" CT and a "positive" CT as well as differences in characteristics of the population having chest CT. The prevalence of the disease in the population, severity and type of clinical presentation, time from symptom to CT, age of patients, and any underlying pulmonary pathology may modify the performances of chest CT for diagnosing COVID-19 pneumonia [20, 23] . Indeed we observed that the presence of an underlying pulmonary disease lowered the sensitivity for an evocative CT. This concerned a quarter of the whole population in our study and almost a quarter of the patients with positive RT-PCR. CT reporting in several categories seems best suited to the routine practice than a binary conclusion, when CT abnormalities may mix different types of lesions or are very limited in extent. It allows identifying a category with typical features, whose high specificity can be useful in an epidemic context, allowing relying on chest CT for diagnosis in some cases.

Our study has some limitations. Firstly, because it is monocentric and because of various presentation and prevalence of the disease around the world, caution should be taken to extrapolate CT predictive values, which vary according to the disease prevalence, to other populations and periods [24, 25] . We may assume that the high positive predictive value we report for the category "evocative" would be lower if the prevalence of COVID-19 decreased and the one of other viral pneumonia or some interstitial lung diseases, as drugs or connective tissue diseases related, increased. On the contrary, in a very low prevalence context, we may expect the negative predictive value of chest CT classified "not evocative" or "normal" would be very high and CT would be useful for triage of negative cases. Secondly, the clinical significance of CT reporting should integrate the risk level of each patient that we have not precisely taken into account, even if the study period took place during the outbreak. Thirdly, the performances of chest CT have been evaluated in comparison to the results of the RT-PCR, whose sensitivity is imperfect. Two patients with repeated negative tests but typical clinical and CT features, driving a likely diagnosis of COVID-19 although remaining uncertain, were merged in the analysis with other unequivocal negative RT-PCR cases.

In conclusion, interobserver agreement to report chest CT findings into categories for clinical suspicion of COVID-19 is good, among readers of various experience levels and subspecialties. Chest CT can participate in estimating the likelihood of COVID-19 in patients presenting to hospital during the outbreak. CT categorized evocative of COVID-19 pneumonia were highly predictive of the disease, whereas the predictive value of CT decreased between the categories "compatible", "not evocative" and "normal," from 50 to 11%. Category reports need to be integrated to the clinical presentation and risk level for COVID-19.

Funding The authors state that this work has not received any funding. 

Guarantor The scientific guarantor of this publication is Antoine Khalil.

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry One of the authors, Jimmy Mullaert, has significant statistical expertise.

Informed consent Written informed consent was not required for this observational non interventional study. It was waived by the Institutional Review Board.

Ethical approval Institutional Review Board approval was obtained.

• retrospective • diagnostic • performed at one institution

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China

World Health Organization (2020) Director General's speeches

How imaging should properly be used in COVID-19 outbreak: an Italian experience

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases

Sensitivity of chest CT for COVID-19: comparison to RT-PCR

Radiological Society of North America Expert Consensus Statement on Reporting Chest CT Findings Related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA. Radiology: Cardiothoracic Imaging

Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review

Coronavirus disease 2019 (COVID-19): a perspective from China

Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients

Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: a multicenter study

Coronavirus disease 2019: initial chest CT findings

Chest CT for typical 2019-nCoV pneumonia: relationship to negative RT-PCR testing

Reporting templates COVID-19

French Society of Radiology (2020) COVID-19 Compte rendu TDM thoracique

COVID-19 patients and the radiology department -advice from the European Society of Radiology (ESR) and the European Society of Thoracic Imaging (ESTI)

Statistical methods for rates and proportions

Chest CT findings in coronavirus disease-19 (COVID-19): relationship to duration of infection

Primary stratification and identification of suspected corona virus disease 2019 (COVID-19) from clinical perspective by a simple scoring proposal

A British Society of Thoracic Imaging statement: considerations in designing local imaging diagnostic algorithms for the COVID-19 pandemic

The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society

Chest CT features of COVID-19 in

Diagnostic performance of CT and reverse transcriptase-polymerase chain reaction for coronavirus disease 2019: a meta-analysis

Differential diagnosis for coronavirus disease (COVID-19): beyond radiologic features

A call for caution in extrapolating chest CT sensitivity for COVID-19 derived from hospital data to patients among general population

Chest CT and coronavirus disease (COVID-19): a critical review of the literature to date

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations