key: cord-0745377-n65ur3do
authors: Mruk, Bartosz; Walecki, Jerzy; Wasilewski, Piotr Gustaw; Paluch, Łukasz; Sklinda, Katarzyna
title: Interobserver Agreement in Semi-Quantitative Scale-Based Interpretation of Chest Radiographs in COVID-19 Patients
date: 2021-07-18
journal: Med Sci Monit
DOI: 10.12659/msm.931277
sha: c418b4f5a0886b07b25a0e6601cd66dd1aa13e1d
doc_id: 745377
cord_uid: n65ur3do

BACKGROUND: The chest X-ray is the most available imaging modality enabling semi-quantitative evaluation of pulmonary involvement. Parametric evaluation of chest radiographs in patients with SARS-CoV-2 infection is crucial for triage and therapeutic management. The CXR Score (Brixia Score), SARI CXR Severity Scoring System, and Radiographic Assessment of Lung Edema (RALE), proposed to evaluate SARS-CoV-2 infiltration of the lungs, were analyzed for interobserver agreement. MATERIAL/METHODS: This study analyzed 200 chest X-rays from 200 consecutive patients with confirmed SARS-CoV-2 infection, hospitalized at the Central Clinical Hospital of the Ministry of the Interior and Administration in Warsaw. Radiographs were evaluated by 2 radiologists according to 3 scales: SARI, RALE, and CXR Score. RESULTS: The overall interobserver agreement for SARI ratings was good (κ=0.755; 95% CI, 0.817–0.694), for RALE scale assessments it was very good (κ=0.818; 95% CI, 0.844–0.793), and for CXR scale assessments it was very good (κ=0.844; 95% CI, 0.846–0.841). A moderate correlation was found between the radiological image assessed using each of the scales and the clinical condition of the patient in MEWS (Modified Early Warning Score) (r=0.425–0.591). CONCLUSIONS: The analyzed scales are characterized by good or very good interobserver agreement of assessments of the extent of pulmonary infiltration. Since the CXR Score showed the strongest correlation with the clinical condition of the patient as expressed using the MEWS scale, it is the preferred scale for chest radiograph assessment of patients with COVID-19 in the light of data provided.

Besides computed tomography scans, chest radiographs (CXR) are the primary method for the assessment of the extent of pulmonary lesions in the course of SARS-CoV-2 infection [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] . Despite its lower sensitivity in the detection of pulmonary lesions compared to chest CT, radiography is the preferred diagnostic modality in multiple sites owing to its availability [3, 10, 11] . Toussie et al demonstrated the usefulness of chest radiographs acquired at a hospital emergency department as predictors of hospitalization and intubation of patients with COVID-19 [1] . Previous work involving patients examined during the acute respiratory syndrome (SARS) coronavirus outbreak in 2003 as well as patients with other pneumonias confirmed the relationship between the extent of pulmonary infiltrates and prognosis [12] [13] [14] .

To determine the appropriate clinical management and respiratory support for COVID-19 patients, it is essential to quantitatively assess the extent of pulmonary infiltrates. There is no standardized and acknowledged scale that would be considered a criterion standard for reporting and interpretation of chest X-ray results in COVID-19 patients. At least 3 different scales have been described in the literature to evaluate chest radiographs of patients with COVID-19. The SARI CXR Severity Scoring System and RALE Classification have been proposed prior to the outbreak of COVID-19 and the CXR Score was designed specifically for evaluation of patients with confirmed SARS-CoV-2 infection [11, 15, 16] .

The SARI CXR Severity Scoring System was proposed in the pre-COVID era, with an aim to simplify the clinical grading of CXR reports from inpatients with confirmed acute respiratory infection into 5 severity categories [15] . The CXR findings were categorized as: 1 -normal; 2 -hyperinflation and/or patchy atelectasis and/or bronchial wall thickening; 3 -focal consolidation; 4 -multifocal consolidation; and 5 -diffuse alveolar changes (Figure 1 ). Soon Ho Yoon et al used this scoring system to quantify the pulmonary involvement in patients with COVID-19 [4] .

The Radiographic Assessment of Lung Edema (RALE) score as proposed by Warren et al was simplified by Wong et al and used in the assessment of COVID-19 patients [10, 16] . This scale assessed each lung individually. The score of 0 to 4 points was assigned based on the extent of involvement, ie, groundglass opacity or consolidation (0 -no involvement; 1 -less than 25%; 2-25% to 50%; 3-50% to 75%; 4 -more than 75% involvement), with the overall score being the total of points from both lungs (Figure 2 ).

To date, the CXR Score (Brixia Score) is the only available method for CXR assessment that has been designed specifically for patients with confirmed COVID-19 [11] . This CXR scoring system, as proposed by Andrea Borghesi and Roberto Maroldi, is comprised of 2 steps of imaging analysis [11] . The first step is to divide each lung as seen in frontal chest projection (posteroanterior -PA or anteroposterior AP view) into 3 zones designated with letters A, B, and C for the right lung and D, E, and F for the left lung. The letters divide lungs into 3 levels: the upper level (A and D) above the inferior wall of the aortic arch, the middle level (B and E) below the inferior wall of the aortic arch and above the inferior wall of the right inferior pulmonary vein (the hilar structures), and the lower level (C and F) below the inferior wall of the right inferior pulmonary vein (the lung bases) (Figure 3) .

The purpose of this study was to analyze the interobserver agreement of chest radiographs obtained from patients with COVID-19 as assessed using the 3 scales described above by the same group of 2 independent radiologists as well as to establish correlations between the radiological image and the clinical condition of the patient as expressed using the Modified Early Warning Score (MEWS), which includes measurements of systolic blood pressure, heart rate, respiratory rate, body temperature, and level of consciousness ( Table 1 ) [17] .

A total of 200 chest X-ray examinations collected from 200 consecutive patients hospitalized due to SARS-CoV-2 infection at the Central Clinical Hospital of the Ministry of the Interior and Administration in Warsaw were analyzed retrospectively in the study. Each patient admitted to the hospital had to have a positive PCR test result confirmed twice. All the patients' data were fully anonymized before they were accessed. Within the analyzed group there were 109 men and 91 women. The mean age was 62.6 (range 19-90 years old).

The study was approved by the Bioethics Committee of the Central Clinical Hospital of the Ministry of the Interior and Administration in Warsaw.

Radiographs were acquired using 2 Siemens Multix Pro stationary units and 1 Shimadzu Mobile Dart Evolution MX8 portable device, using a standardized technique (80 kV, 10 mAs, 180-cm film-focus distance for posteroanterior; 80 kV, 10 mAs, 100-cm film-focus distance for anteroposterior). There were 128 posteroanterior and 72 anteroposterior radiographs. system. In the left picture both lungs present no radiological signs of parenchymal involvement and were assessed as 1 with SARI scoring system; in the middle picture multifocal consolidations can be spotted and the image was assessed as 4 with SARI scoring system; in the right image nearly entire parenchyma of both lungs present diffuse alveolar changes and the image was assessed as 5 with SARI scoring system.

Chest X-ray images of 3 COVID-19-positive patients with different intensity of lung involvement assessed with RALE classification. In the left picture both lungs present no involvement and the overall score was assessed as 0; in the middle picture the right lung involvement is assessed as 25-50% and the left lung as 50-75%, the overall RALE score was assessed as 5; in the right image lungs are involved in nearly 100%, the overall RALE score was assessed as 8. Radiographs were evaluated according to 3 scales: SARI in the range of 1-5 points; RALE in the range of 1-4 points for each of the 2 lungs (range 1-8 for both lungs); and CXR Score in the range of 1-3 points for each of the 6 anatomical regions of the lungs (range 1-18 for both lungs).

All patients whose images were included in the analysis had their clinical condition assessed using MEWS scale (on the day of the CXR). For the purposes of statistical analyses, patients were divided into 3 groups: Group A (MEWS score 0-1; 96 patients), Group B (MEWS score 2-3; 53 patients), and Group C (MEWS score ³4; 51 patients).

To assess the interobserver agreement of CXR interpretation between 2 radiologists, Cohen's k was calculated. Since the results were presented on ordinal scales, weighted Cohen's k was used for the interobserver agreement analysis. The weights were selected using the Fleiss-Cohen method [18] . The interclass correlation coefficient (ICC) was also calculated for the CXR scale. The weighted k values were interpreted according to McHugh, while ICCs were interpreted according to Koo and Li [19, 20] . Agreement was defined as moderate (k >0.4-0.6), good (k >0.6-0.8) and very good (k >0.8-1.0). Spearman's linear correlation coefficient was used to analyze the correlation between the extent of inflammatory lesions and the clinical condition of the patient. The correlation coefficient was defined as low (r=0-0. 

The overall interobserver agreement of SARI ratings was good (k=0.755; 95% CI, 0.817-0.694). With regard to the group-bygroup analyses carried out in patients with different MEWS scores, the highest interobserver agreement was observed in patients with mild disease (MEWS 0-1 points): k=0.791; 95% CI, 0.835-0.746. The lowest interobserver agreement was observed in the group of patients with MEWS in the range of 2-3 points (k=0.574; 95% CI, 0.849-0.349). In the group of patients with the most severe clinical course (MEWS ³4), the kappa value was 0.681 (95% CI, 0.828-0.533). Significant differences were noted in the interobserver agreement of the radiographic assessments depending on the type of examinations. The interobserver agreement of the assessments of AP radiographs was lower (k=0.624; 95% CI, 0.874-0.475) than the assessments of PA examinations (k=0.819; 95% CI, 0.892-0.789) ( Table 2) .

The overall interobserver agreement of RALE scale assessments was very good (k=0.818; 95% CI, 0.844-0.793). With regard to the group-by-group analyses carried out in patients with different MEWS ratings, the highest interobserver agreement was (Tables 3, 4) .

The overall interobserver agreement of CXR scale assessments was very good (k=0.844; 95% CI, 0.846-0.841). With regard to the group-by-group analyses carried out in patients with different MEWS ratings, the highest interobserver agreement was observed in patients with mild disease (MEWS 0- Table 4 . Analysis of the interobserver agreement of RALE assessments of radiographs for the right and the left lung.

e931277-5 assessments of AP radiographs was lower (k=0.796; 95% CI, 0.817-0.775) than the agreement of the assessments of PA examinations (k=0.846; 95% CI, 0.849-0.844) (Tables 5, 6) .

There was a moderate correlation between the clinical condition of the patient as expressed using MEWS and the radiological image as assessed using each of the scales (r=0.425-0.591) ( Table 7) . According to both radiologists, the strongest correlation was observed for the CXR scale (r=0.577 and 0.591) and the weakest correlation was observed for the RALE scale (r= 0.425 and 0.462).

The analysis confirmed good and very good interobserver agreement of assessments for CXRs evaluated using each of the 3 scales. Scores obtained using CXR Score scales are comparable to these presented by Borghesi et al (k=0.82; 95% CI, 0.79-0.86) [11] .

Although no validation of the SARI and RALE scales was performed in a COVID-19 patient group, the agreement of the 2 radiologists of the scale as assessed on the basis of pulmonary infiltrates of other etiology is within the range of k=0.75-83 for SARI and ICC=0.93 for RALE scale [15, 16] .

Lower interobserver agreement was observed for AP radiographs as compared to PA radiographs for each scale, 

suggesting the relationship between the reported results and the quality of the scan.

In the anatomical context, somewhat lower interobserver agreement was observed for SARI scale assessments of the left lung as compared to the right lung. Similarly, in the case of the CXR Score scale, the lowest interobserver agreement was observed for the lower left lung field.

These findings may suggest a conclusion that evaluation of regions where other structures cover the parenchyma of lungs (such as heart) can be more subjective. This affects the overall scoring of an assessing radiologist, and their evaluation may be biased.

In each of the analyzed scales, the best interobserver agreement was observed in patients in mild clinical condition (MEWS of 0-1). Lower agreement was observed both in patients with the moderate severity of symptoms (MEWS of 2-3) and in patients in severe condition (MEWS ³4). Moderate correlation (r=0.425-0.591) was identified in the study between the score obtained in each of the analyzed scales and the clinical condition of the patient as expressed using MEWS.

The strongest correlation with the patient's clinical condition was shown for the 18-point CXR Score scale (r=0.577 and 0.591).

The present study is limited by a relatively small number of patients (200 cases) and radiologists assessing the scans. However, kappa values comparable to those presented in other studies on patients with COVID-19 suggest that these factors had no effect on the obtained results.

In our opinion, parametric evaluation of chest radiographs in patients with SARS-CoV-2 infection is crucial for patient triage and therapeutic decision making. Table 7 . Analysis of the correlation between the radiological image as assessed in individual scales and clinical condition as expressed using MEWS scale.

Further validation is required with regard to quantitative analysis of chest radiographs and their predictive value in the context of the clinical course of the disease.

Parameterization of radiological images can also provide a useful tool for the development of computer-aided diagnosis and AI artificial intelligence systems.

The analyzed scales are characterized by good or very good interobserver agreement of assessments of the extent of pulmonary lesions being made by independent, experienced radiologists.

The lowest interobserver agreement was observed for the SARI scale, while the results for the RALE and the CXR Score scales were similar, with overlapping CIs. Since the CXR Score showed the strongest correlation with the clinical condition of the patient as expressed using the MEWS scale, it is the preferred scale for chest radiograph assessment of patients with COVID-19 in the light of data provided.

Clinical and chest radiography features determine patient outcomes in young and middle age adults with COVID-19

Chest CT findings in coronavirus disease-19 (COVID-19): Relationship to duration of infection

Coronavirus disease 2019 (COVID-19): A perspective from China

Chest radiographic and CT findings of the 2019 novel coronavirus disease (COVID-19): Analysis of nine patients treated in Korea

Coronavirus disease 2019 (COVID-19): Role of chest CT in diagnosis and management

Sensitivity of chest CT for COVID-19: Comparison to RT-PCR

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

Time course of lung changes at chest CT during recovery from coronavirus disease 2019 (COVID-19)

Diagnostic role of chest computed tomography in coronavirus disease 2019

Frequency and distribution of chest radiographic findings in COVID-19 positive patients

COVID-19 outbreak in Italy: Experimental chest X-ray scoring system for quantifying and monitoring disease progression

Value of initial chest radiographs for predicting clinical outcomes in patients with severe acute respiratory syndrome

Severe acute respiratory syndrome: Correlation between clinical outcome and radiologic features

Chest radiograph scores as potential prognostic indicators in severe acute respiratory syndrome (SARS)

A chest radiograph scoring system in patients with severe acute respiratory infection: A validation study

Severity scoring of lung oedema on the chest radiograph is associated with clinical outcomes in ARDS

Validation of a modified Early Warning Score in medical admissions

The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability

Interrater reliability: The kappa statistic

A guideline of selecting and reporting intraclass correlation coefficients for reliability research

Semi-quantitative scale-based interpretation of chest radiographs in COVID-19 patients

None.

All figures submitted have been created by the authors, who confirm that the images are original with no duplication and have not been previously published in whole or in part.