key: cord-1055440-19p20wfm
authors: Shiri, I.; Mostafaei, S.; Haddadi Avval, A.; Salimi, Y.; Sanaat, A.; Akhavanallaf, A.; Arabi, H.; Rahmim, A.; Zaidi, H.
title: High-Dimensional Multinomial Multiclass Severity Scoring of COVID-19 Pneumonia Using CT Radiomics Features and Machine Learning Algorithms
date: 2022-04-28
journal: nan
DOI: 10.1101/2022.04.27.22274369
sha: 1cfcfbf35c8e47694e99d55cd3c0b02bd2a547b2
doc_id: 1055440
cord_uid: 19p20wfm

We aimed to construct a prediction model based on computed tomography (CT) radiomics features to classify COVID-19 patients into severe-, moderate-, mild-, and non-pneumonic. A total of 1110 patients were studied from a publicly available dataset with 4-class severity scoring performed by a radiologist (based on CT images and clinical features). CT scans were preprocessed with bin discretization and resized, followed by segmentation of the entire lung and extraction of radiomics features. We utilized two feature selection algorithms, namely Bagging Random Forest (BRF) and Multivariate Adaptive Regression Splines (MARS), each coupled to a classifier, namely multinomial logistic regression (MLR), to construct multiclass classification models. Subsequently, 10-fold cross-validation with bootstrapping (n=1000) was performed to validate the classification results. The performance of multi-class models was assessed using precision, recall, F1-score, and accuracy based on the 4 by 4 confusion matrices. In addition, the areas under the receiver operating characteristic (ROC) curve (AUCs) for multi-class classifications were calculated and compared for both models using multiROC and pROC R packages. Using BRF, 19 radiomics features were selected, 9 from first-order, 6 from GLCM, 1 from GLDM, 1 from shape, 1 from NGTDM, and 1 from GLSZM radiomics features. Ten features were selected using the MARS algorithm, namely 2 from first-order, 1 from GLDM, 2 from GLRLM, 2 from GLSZM, and 3 from GLCM features. The Mean Absolute Deviation and Median from first-order, Small Area Emphasis from GLSZM, and Correlation from GLCM features were selected by both BRF and MARS algorithms. Except for the Inverse Variance feature from GLCM, all selected features by BRF or MARS were significantly associated with four-class outcomes as assessed within MLR (All p-values<0.05). BRF+MLR and MARS+MLR resulted in pseudo-R2 prediction performances of 0.295 and 0.256, respectively. Meanwhile, there were no significant differences between the feature selection models when using a likelihood ratio test (p-value =0.319). Based on confusion matrices for BRF+MLR and MARS+MLR algorithms, the precision was 0.861 and 0.825, the recall was 0.844 and 0.793, whereas the accuracy was 0.933 and 0.922, respectively. AUCs (95% CI)) for multi-class classification were 0.823 (0.795-0.852) and 0.816 (0.788-0.844) for BRF+MLR and MARS+MLR algorithms, respectively. Our models based on the utilization of radiomics features, coupled with machine learning, were able to accurately classify patients according to the severity of pneumonia, thus highlighting the potential of this emerging paradigm in the prognostication and management of COVID-19 patients.

The highly contagious SARS-CoV-2 virus has led to significant morbidity and mortality 2 worldwide 1 . Pneumonia is regarded as one of the main complications of COVID-19 disease, 3 which can lead to lethal conditions while escalating the cost of healthcare 2 . The most popular 4 diagnostic test considered as the gold standard for coronavirus disease is the reverse 5 transcription polymerase chain reaction (RT-PCR) assay 3 . While highly specific, RT-PCR 6 has shown low sensitivity, as studies have reported significant false-negatives in patients who 7 had abnormalities in their chest CT images confirmed with secondary follow-up RT-PCR to 8 be positive for COVID-19 4 . 9

CT aids in the diagnosis and management of COVID-19 patients and could be potentially 10 used as an outcome/survival prediction tool, towards enhanced treatment planning [5] [6] [7] . CT 11 scanning has been utilized as a highly sensitive tool for COVID-19 diagnosis 8 since it is fast 12 and generates quantifiable features (e.g., the extent to which lung lobes are involved) and non-13 quantifiable features (e.g., ground-glass opacities and their laterality) to assess COVID- 19 14 pneumonia, besides the enhanced sensitivity compared to RT-PCR 9 . 15 Severity can be defined as an index that depicts the effects of a disease on mortality, 16 morbidity, and comorbidities 10 and has the potential to help physicians manage the patients 17 more decently whether in patients with cancer or with non-cancer diseases 11, 12 . A number of 18 severity scoring systems have been proposed to quantify disease advancement in patients, 19 including general assessments (e.g., APACHE score) and disease-specific ones (e.g., Pugh score) 13 . Several conventional scoring systems have been proposed for COVID- 19 21 severity assessment 14 . These include the usage of patient clinical, comorbidity, and laboratory 22 data, which are all helpful in constructing predictive models for severity assessment in 23 COVID-19 15 . 24 There has also been a growing interest in using imaging data of patients, such as thoracic 25 CT images. For example, a study by Sanders et al. 16 computed the score of CT images in 26 patients with cystic fibrosis and evaluated the prognostic ability. A promising line of research 27 that emerged recently reported on the CT severity index and its correlation with acute 28 pancreatitis severity [17] [18] [19] . The COVID-19 Reporting and Data System (CO-RADS) was 29 suggested for standardized visual assessment of COVID-19 pneumonia to enhance agreement 30 between radiologists 20 . This system includes features for the diagnosis of COVID-19 and 31

consists of a 5-point scale for categorizing patient CT images. In addition, other guidelines 32 aiming to reach consensus when interpreting COVID-19 suspected chest CT images were 33 proposed 21 . These guidelines are mostly based on visual assessment of images; e.g. the 34 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 28, 2022. ; https://doi.org/10.1101/2022.04.27.22274369 doi: medRxiv preprint amount to which lung lobes are involved, the volume of which is infected, and anatomical 1 assessments. 2 Francone et al. 22 reported a study on the correlation between CT score and the severity of 3 coronavirus disease. Zhao et al. 23 also conducted research on the measurement of the extent 4 to which lung lobes are infected and evaluation in COVID-19 patients' prognosis. Li et al. 24 5 also confirmed the association between chest CT score and COVID-19 pneumonia severity. 6

At the same time, most scoring systems involve visual assessment and hence are time-7 consuming 23, 24 . In this regard, medical image analysis using machine learning and radiomics 8 has been applied to quantify features to tackle these main challenges 25-35 . 9 The field of radiomics opens pathways for the study of normal tissues, cancer, and many 10 other diseases, including potentially the newly emerging COVID-19 disease 6,7,29,36-40 . 11 Specifically, Xie et al. 41 evaluated the potential of a radiomics framework to diagnose 12 COVID-19 from CT images. Di et al. 42 also studied whether radiomics features can help to 13 distinguish between pneumonia of COVID-19 and that of other viral/bacterial causes. A 14 number of studies reported on the application of radiomics analysis to CT images towards 15 COVID-19 classification and prognostication 43 . Homayounieh et al. 44 assessed the prognostic 16 power of CT-based radiomics features to determine severe and non-severe cases. In another 17 study, Li et al. 45 proposed a radiomics model based on CT images and classified patients 18 based on the criticality of their disease. A recent study by Yip et al. 46 applied a robust 19 radiomics model to CT images to predict the severity of COVID-19 disease in patients. All 20 above models pursued binary task performance, which reduced multiclass classification to 21 two class approaches. However, in the real clinical triage situation, scoring systems consist of 22 multi-class datasets. In the present study, involving a large cohort of patients, we aimed to 23 construct a CT radiomics-based multi-class classification model to predict the severity of 24 COVID-19 pneumonia. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 28, 2022. ;

Data Description 2 Figure 1 presents the different steps performed in this study. All experiments were performed 3 in accordance with relevant guidelines and regulations. 4

This study is based on the MosMed Dataset (mosmed.ai) consisting of 1110 patient CT scans, 6 also utilized in other efforts 46, 47 . Ethics approval and consent to participate were not needed 7 since the study was preformed on open access online dataset. The patients were referred to the 8 Municipal Hospital in Moscow, Russia, and were classified based on clinical and visual CT 9 findings as follows. 10

In the zero class, the patient has neither clinical symptoms (e.g. fever) nor CT findings in 11 favor of any kind of pneumonia (Class 0, non-pneumonic). The 1 st class contains patients who 12 have a low-temperature fever (t < 38 °C) in addition to a mild increase in respiratory rate (RR 13 <20) while showing none or < 25% ground-glass opacity (GGO) involvement (Class 1, namely 0, 1, 2, and 3, included 254, 684, 125, and 47 patients, respectively. The median age 20 was 47 (ranging from 18 to 97), and 42% of patients were female. Figure 2 shows an example 21 of representative CT images for each class. 22

All CT images were automatically segmented using a deep learning-based algorithm for 23 whole lung segmentation 48, 49 . After whole-lung 3D segmentation, all images were reviewed 24 and modified to ensure correct 3D-volume lung segmentation. 25 26

All images were resized to isotropic voxel size 1×1×1 mm 3 and image intensity was 28 discretized by 64-gray level binning, followed by feature extraction. The extracted features 29 from the whole-lung segmented regions, totalling 110, included shape (n=16), intensity 30 (n=19), and texture features, namely second-order texture of gray-level co-occurrence matrix 31 (GLCM, n=24), and high-order features, namely gray-level size-zone matrix (GLSZM, n=16), 32

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ;  neighbouring gray tone difference matrix (NGTDM, n=5), gray-level run-length matrix 1 (GLRLM, n=16) and gray-level dependence matrix (GLDM, n=14). Radiomics feature 2 extraction was performed using the Pyradiomics Python library 50 , which is compliant with 3 the Image Biomarker Standardization Initiative (IBSI) 51 . 4

In this study, we used two different feature selection algorithms, including Bagging Random 6

Forests (BRF) 52 and Multivariate Adaptive Regression Splines (MARS) 53 . BRF and MARS 7 algorithms were implemented in "VSURF" and "earth" R packages, respectively. For 8 multiclass classification, we implemented multinomial logistic regression using the "mnlogit" 9 R package. The MLR model fitness indices included p-value of the Wald test (corrected for 10 false-discovery rate via Benjamini and Hochberg method), pseudo R 2 (goodness of fit criteria 11

in a logistic regression model), as well as coefficient and Standard of Error (SE). In the MLR 12 model, class 0 served as a reference class whereas statistical comparison between two models 13 (the two feature selectors) was performed by the Likelihood Ratio Test. Ten-fold cross-14 validation with bootstrapping (n=1000) was used to validate model performance. We report 15 precision, recall, F1-score, and accuracy for different class for each model. In addition, the 16 areas under the receiver operating characteristic (ROC) curve (AUCs) for multi-class 17 classification models were calculated and compared for both models using "multiROC" and 18 "pROC" R packages, respectively. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ; RESULTS 1 Table 1 summarizes the selected features and their importance value (IV) by BRF and MARS 2 for multiclass classification. Nineteen radiomics features were selected by BRF, including 9 3 from first-order, 6 from GLCM, one from GLDM, one from shape, one from NGTDM, and 0.295 and 0.256, respectively. However, there were no significant differences between both 20 models when using a likelihood ratio test (p-value =0.319). 21 Table 3 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ;  DISCUSSION 1

In the current study, we constructed a CT radiomics-based model to predict the severity of 2 COVID-19 patients in a large cohort of patients. To this end, we extracted radiomics features 3 from whole lung segmentations and selected high-importance features utilizing two different 4 algorithms, namely BRF and MARS. The selected features were then fed to a multinomial 5 logistic regression classifier for multiclass severity scoring. We achieved 0.823 (95% CI: 6 0.795-0.852) and 0.816 (95% CI: 0.788-0.844) for AUC, and 0.933 and 0.922 for accuracy in 7 BRF-and MARS-selected features, respectively. 8

We used an automatic model 48 to segment chest CT images for two reasons. First, most 9

CT scans performed in the COVID-19 pandemic era are low-dose. In addition, these scans are 10 acquired with a high pitch. Hence, it is difficult for radiologists to find and follow lung 11 fissures to manually detect or segment the anatomical lobes. As such, we used our previously 12 constructed deep learning model to fully segment the entire lung of each patient. 13 Yip et al. 46 information about the intensity and heterogeneity of the lung in COVID-19 patients. 31

A noticeable advantage of the study by Yip et al. 46 was the use of a second radiologist 32 observer who classified patients' images into mild, moderate, and severe classes without 33 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ;  paying attention to the default classification of the dataset provider. This method helped to 1 observe the prediction power of the models in both "provider" and "radiologist" datasets. In 2 addition, they split the dataset into training and test sets. In contrast, we applied the 3 bootstrapping technique to estimate and ensure the reproducibility of our results. In addition, 4 the study by Yip et al. 46 may have reduced generalizability as it only predicts mild versus 5 severe, and moderate versus severe disease, having reduced multiclass classification into two-6 class approaches. In the real clinical triage situation, the radiologist may benefit from a 7 multiclass classification scheme for enhanced patient management, as provided by our study. high pitch chest CT scans. In the current and previous studies 44, 46, 55 , radiomics features, as 32 extracted from the entire lung (less challenging segmentation task for deep learning 33 algorithms), were evaluated to provide fast and robust severity scoring in COVID-19 patients. 34

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ;  In this work, chest CT was used for assessment. At the same time, there are few studies on 1 other modalities such as chest X-ray radiography in prognostication and outcome prediction 2 evaluation of COVID-19 patients. For example, Bae and colleagues 57 utilized radiomics 3 features and modeled them on chest X-rays of 514 patients and found out that their radiomics-4 and deep learning-based model can accurately predict mortality and the need for mechanical 5 ventilation in patients (AUCs = 0.93 and 0.90, respectively). Providing a severity score using 6 chest X-rays is a valuable venue to explore. Yet, such work requires extensive comparisons 7 with CT-based frameworks to assess the relative value of each modality for different tasks. 8

This study suffered from a few limitations, including the fact that our model was trained on 9 single-center data. At the same time, we evaluated our models using a 10-fold cross-validation 10 and bootstrapping technique to evaluate the repeatability and robustness of our results. In any 11 case, further research should be conducted on multicentric data and patient images with 12 multiple observers for improved training of the models and enhanced generalizability. 

The authors declare that they have no conflict of interest. 29 30 

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ;

1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 28, 2022. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 28, 2022. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 28, 2022. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 28, 2022. ; https://doi.org/10.1101/2022.04.27.22274369 doi: medRxiv preprint

A systematic review and meta-analysis of published research data on COVID-19 infection fatality rates

Evaluation, and Treatment of Coronavirus. in StatPearls (StatPearls Publishing Copyright © 2020

Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Euro surveillance : bulletin Europeen sur les maladies transmissibles

Testing for SARS-CoV-2 (COVID-19): a systematic review and clinical guide to molecular and serological in-vitro diagnostic assays

CT scans: balancing health risks and medical benefits

COVID-19 prognostic modeling using CT radiomic features and machine learning algorithms: Analysis of a multi-institutional dataset of 14,339 patients

Diagnosis of COVID-19 Using CT image Radiomics Features: A Comprehensive Machine Learning Study Involving 26

COVID-19): Role of Chest CT in Diagnosis and Management

Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT?

Encyclopedia of Behavioral Medicine

Symptom severity of patients with advanced cancer in palliative care unit: longitudinal assessments of symptoms improvement

Sivin, I. & Cullins, V. Severity of infection following the introduction of new infection control measures for medical abortion

Severity scoring systems in the critically ill

Determinants of COVID-19 disease severity in patients with cancer

Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan

Chest computed tomography scores of severity are associated with future lung disease progression in children with cystic fibrosis

Severity assessment of acute pancreatitis using CT severity index and modified CT severity index: Correlation with clinical outcomes and severity grading as per the Revised Atlanta Classification

CT Evaluation of Acute Pancreatitis and its Prognostic Correlation with CT Severity Index

Acute pancreatitis: value and impact of CT severity index

CO-RADS: A Categorical CT Assessment Scheme for Patients Suspected of Having COVID-19-Definition and Evaluation

Structured reporting of chest CT in COVID-19 pneumonia: a consensus proposal

Chest CT score in COVID-19 patients: correlation with disease severity and short-term prognosis

Relation Between Chest CT Findings and Clinical Conditions of Coronavirus Disease (COVID-19) Pneumonia: A Multicenter Study

The Clinical and Chest CT Features Associated With Severe and Critical COVID-19 Pneumonia

Applications and limitations of radiomics

Lung texture in serial thoracic computed tomography scans: correlation of radiomics-based features with radiation therapy dose and radiation pneumonitis development. International journal of radiation oncology

Radiomics-based machine learning model to predict risk of death within 5-years in clear cell renal cell carcinoma patients

CT imaging markers to improve radiation toxicity prediction in prostate cancer radiotherapy by stacking regression algorithm

Machine learning-based prognostic modeling using clinical data and quantitative radiomic features from chest CT images in COVID-19 patients

Treatment response prediction using MRI-based pre-, post-, and delta-radiomic features and machine learning algorithms in colorectal cancer

Multi-level multi-modality (PET and CT) fusion radiomics: prognostic modeling for non-small cell lung carcinoma

Overall Survival Prediction in Renal Cell Carcinoma Patients Using Computed Tomography Radiomic and Clinical Information

Non-small cell lung carcinoma histopathological subtype phenotyping using high-dimensional multinomial multiclass CT radiomics signature

Impact of feature harmonization on radiogenomics analysis: Prediction of EGFR and KRAS mutations from non-small cell lung cancer PET/CT images

Tensor Radiomics: Paradigm for Systematic Incorporation of Multi-Flavoured Radiomics Features

The Applications of Radiomics in Precision Diagnosis and Treatment of Oncology: Opportunities and Challenges

Cardiac SPECT radiomic features repeatability and reproducibility: A multi-scanner phantom study

Medical Imaging Technologists in Radiomics Era: An Alice in Wonderland Problem

Overall Survival Prognostic Modelling of Non-small Cell Lung Cancer Patients Using Positron Emission Tomography/Computed Tomography Harmonised Radiomics Features: The Quest for the Optimal Machine Learning Algorithm

Non-contrast Cine Cardiac Magnetic Resonance image radiomics features and machine learning algorithms for myocardial infarction detection

Discrimination of pulmonary ground-glass opacity changes in COVID-19 and non-COVID-19 patients using CT radiomics analysis

Hypergraph learning for identification of COVID-19 with CT imaging

Artificial intelligence-driven assessment of radiological images for COVID-19

Computed Tomography Radiomics Can Predict Disease Severity and Outcome in Coronavirus Disease 2019 Pneumonia

Temporal Changes of CT Findings in 90 Patients with COVID-19 Pneumonia: A Longitudinal Study

Performance and Robustness of Machine Learning-based Radiomic COVID-19 Severity Prediction. medRxiv : the preprint server for health sciences

Development and evaluation of an artificial intelligence system for COVID-19 diagnosis

COLI-NET: Fully Automated COVID-19 Lung and Infection Pneumonia Lesion Detection and Segmentation from Chest CT Images. medRxiv

COLI-Net: Deep learning-assisted fully automated COVID-19 lung and infection pneumonia lesion detection and segmentation from chest computed tomography images

Computational Radiomics System to Decode the Radiographic Phenotype

The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping

VSURF: an R package for variable selection using random forests

Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards

CT Radiomics, Radiologists, and Clinical Information in Predicting Outcome of Patients with COVID-19 Pneumonia

Identification of common and severe COVID-19: the value of CT texture analysis and correlation with clinical characteristics

Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT

Predicting Mechanical Ventilation Requirement and Mortality in COVID-19 using Radiomics and Deep Learning on Chest Radiographs: A Multi-Institutional Study