key: cord-0777024-tj4z1sfl
authors: Meng, Zirui; Wang, Minjin; Zhao, Zhenzhen; Zhou, Yongzhao; Wu, Ying; Guo, Shuo; Li, Mengjiao; Zhou, Yanbing; Yang, Shuyu; Li, Weimin; Ying, Binwu
title: Development and Validation of a Predictive Model for Severe COVID-19: A Case-Control Study in China
date: 2021-05-25
journal: Front Med (Lausanne)
DOI: 10.3389/fmed.2021.663145
sha: 156e903da18621db3d113ddbf14dc4d16cd98d6c
doc_id: 777024
cord_uid: tj4z1sfl

Background: Predicting the risk of progression to severe coronavirus disease 2019 (COVID-19) could facilitate personalized diagnosis and treatment options, thus optimizing the use of medical resources. Methods: In this prospective study, 206 patients with COVID-19 were enrolled from regional medical institutions between December 20, 2019, and April 10, 2020. We collated a range of data to derive and validate a predictive model for COVID-19 progression, including demographics, clinical characteristics, laboratory findings, and cytokine levels. Variation analysis, along with the least absolute shrinkage and selection operator (LASSO) and Boruta algorithms, was used for modeling. The performance of the derived models was evaluated by specificity, sensitivity, area under the receiver operating characteristic (ROC) curve (AUC), Akaike information criterion (AIC), calibration plots, decision curve analysis (DCA), and Hosmer–Lemeshow test. Results: We used the LASSO algorithm and logistic regression to develop a model that can accurately predict the risk of progression to severe COVID-19. The model incorporated alanine aminotransferase (ALT), interleukin (IL)-6, expectoration, fatigue, lymphocyte ratio (LYMR), aspartate transaminase (AST), and creatinine (CREA). The model yielded a satisfactory predictive performance with an AUC of 0.9104 and 0.8792 in the derivation and validation cohorts, respectively. The final model was then used to create a nomogram that was packaged into an open-source and predictive calculator for clinical use. The model is freely available online at https://severeconid-19predction.shinyapps.io/SHINY/. Conclusion: In this study, we developed an open-source and free predictive calculator for COVID-19 progression based on ALT, IL-6, expectoration, fatigue, LYMR, AST, and CREA. The validated model can effectively predict progression to severe COVID-19, thus providing an efficient option for early and personalized management and the allocation of appropriate medical resources.

The current outbreak of coronavirus disease 2019 (COVID- 19) has spread rapidly and widely across the world, causing panic and major public health challenges in the international community (1) . COVID-19 presents a wide clinical manifestation, including asymptomatic infection, mild upper respiratory tract illness, and severe viral pneumonia, with respiratory failure. Only a small proportion of the total number of cases progress to a severe condition (∼15-20%); however, ∼40% of patients with severe disease die (2) (3) (4) (5) . Although some research has shown that initial therapy with remdesivir or non-invasive positive pressure ventilation (NIPPV) is very efficient for severe cases, there is currently a lack of accepted recommendations for severe patients with regard to individualized treatment (6) (7) (8) . Therefore, the rapid deterioration of patients with severe COVID-19 deserves special attention. There is an urgent need to develop options for the personalized diagnosis and treatment of such patients, particularly with regard to protecting the relative shortage of medical resources.

Fever, cough, and fatigue are commonly present in patients with mild COVID-19 (9, 10) . As the disease progresses further, patients may also experience respiratory failure, acute respiratory distress syndrome, heart failure, metabolic acidosis, and septic shock (11) . Besides the well-defined clinical characteristics of COVID-19, previous studies have shown that abnormal laboratory findings and cytokine levels are often associated with disease progression, including coagulation-related markers such as D-dimer and fibrinogen (FIB), neutrophil count, lymphocyte count, and high-sensitivity C-reactive protein (HsCRP) (5, (12) (13) (14) (15) . In addition, research has identified that a cytokine storm could be the primary driver of severe progression in COVID-19 patients (16, 17) . However, the application of these independent indicators is limited by many factors, including insufficient information, individual differences, the experience of the attending physician, and the complexity of disease. Thus, there is an urgent need for advanced multivariable prediction models (18, 19) . Although several studies have attempted to develop prediction models, most of the existing models were developed in a single center and based on retrospective data; in some cases, only partial datasets were used, and there was a clear lack of validation. These factors may lead to the omission of key variables and the risk of over-fitting, thus limiting the clinical application of such models. Therefore, there is a critical need to develop more effective prediction models (14, 15, 20, 21) .

Here, we prospectively and consecutively enrolled a cohort of COVID-19 patients with a complete set of demographic data, clinical characteristics, laboratory findings, and cytokine information, and we then constructed a multiparameter prediction model for the early identification of severe COVID-19. Our model could help to monitor and guide precision medicine.

COVID-19 patients were prospectively and consecutively enrolled from regional medical institutions by the West China Medical Center between December 20, 2019, and April 10, 2020. The patients were divided into severe and non-severe groups according to the China National Health Commission Guidelines for Diagnosis and Treatment of COVID-19 infection (Versions 5 and 7). Serum samples were collected from patients within 3 days of infection confirmation and stored at −80 • C for the subsequent detection of cytokine levels. Demographic data, clinical characteristics, and laboratory findings were acquired from electronic medical records (Figure 1) . Two independent researchers reviewed the data collection forms.

Patients with pneumonia, typical findings on computed tomography (CT) chest scan, and positive severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) nucleic acid results, as determined by real-time fluorescent reverse transcription-polymerase chain reaction assessment from bronchoalveolar lavage (BAL) or sputum, were considered as COVID-19 "cases" according to the diagnosis and treatment guidelines released by the China Health and Medical Commission (22) . Patients with at least one of the following symptoms during hospitalization were allocated into the severe group: (1) respiratory distress, respiratory rate ≥30 times/min; (2) oxygen saturation ≤93% at rest; and (3) oxygen partial pressure (PaO 2 )/oxygen concentration (FiO 2 ) in arterial blood ≤300 mmHg. All patients were discharged or had died by the time the model was developed.

Circulating levels of interferon (IFN)-α2, IFN-β, IFN-γ, tumor necrosis factor (TNF)-α, interleukin (IL)-1α, IL-1β, IL-2, IL-4, IL-6, IL-8, IL-10, IL-17A, IL-17E, IL-17F, IL-22, and IL-33 in serum samples were measured by a multiplexed flow cytometric assay using Human Cytokine Kits on a Luminex R system (MAGPIX R with xPONENT) according to the manufacturer's instructions (MILLIPLEX R Analyst 5.1). All samples were measured in duplicate. Based on standard curves, we calculated the coefficient of variation (CV); this did not exceed 20%.

Patients from the Chengdu region were divided into a derivation cohort, including a training set for modeling and a testing set for internal validation. Stepwise selection was based on p-values; least absolute shrinkage and selection operator (LASSO) and the Boruta algorithm were used to select variables (23, 24) .

Stepwise selection, as based on p-values, is a classic regressionbased method. A variable's value with a p < 0.05 was regarded as significant and was retained. This practice generally achieves a better performance in smaller datasets and has been extensively used in previous research. LASSO regression can compress the coefficients of the features via penalty function to obtain optimal constraint models; this practice has been used effectively to avoid over-fitting and co-linearity in classical analysis methods based on significance differences and also enhances the ability of a model to be generalized. Boruta algorithm is a wrapper algorithm that uses random forest classification. This practice can iteratively remove features that prove to be less relevant than random probes and thus aims to retain relevant variables for the function of a response variable. In addition, these two algorithms are particularly suitable for a dataset with a small sample size but with a large number of variables. By using these three different variable selection methods, we were able to select three candidate predictor panels to construct different binary logistic regression models, which were then verified internally by 10-fold crossvalidation. The optimal model was then selected by comparing the area under the curve (AUC) and the Akaike information criterion (AIC) in order to generate a nomogram that could be encapsulated as an open-source online predictive calculator.

The independent validation cohort consisted of patients from outside Chengdu; this was used for external verification to predict the generalization ability of the model by comparing the predicted results with a set of follow-up results to calculate several metrics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). AUCs and decision curve analysis (DCA) were used to comprehensively evaluate the model's discrimination and net clinical benefits (25) .

Continuous variables and categorical variables are presented as the median (upper and lower quartiles) and as a frequency, respectively. The chi-squared test for categorical variables and the Student t-test or Mann-Whitney U-test for continuous variables were used to test the data between groups. Pearson correlation was used to determine the linear correlation between two variables. The diagnostic performance of equations was then displayed by AIC and receiver operating characteristic (ROC) curve and quantified by AUCs. An open-source online predictive calculator was then created using the Shiny tool in the R environment (version 1.2.0). All statistical analyses were completed using R 3.5.0 version. All statistical tests were two-tailed, and p ≤ 0.05 was considered to indicate statistical significance.

The protocol for this study was approved by the West China Hospital, Sichuan University Medical Ethics Committee (reference no. 193, 2020), and conformed to the principles of the Declaration of Helsinki. Written informed consent was obtained from all participants.

We recruited 206 patients with a confirmed diagnosis of COVID-19; of these, 44 patients progressed to severe COVID-19, and 162 patients were classified as having non-severe COVID-19. Patients in the severe group were significantly older (50 vs. 46, p = 0.005) and had a significantly higher frequency of underlying diseases (diabetes and hypertension) than the non-severe group (p < 0.001 and p = 0.013, respectively). There were no differences between the two groups in terms of gender (male: 54.940 vs. 56.810%, p = 0.400). With regard to epidemiological exposure, most of the patients (79.000%) in the severe group had been overseas or had visited Wuhan or surrounding regions within 14 days of disease onset; patients who had been overseas accounted for 50% of the patients with non-severe COVID-19. As of April 28, 2020, the time for the reversal of a negative nucleic acid test result in the non-severe and severe groups was 11 and 18 days (median) except for three patients who died from multiple organ failure (MOF).

Demographic data, clinical characteristics, laboratory findings, and cytokine levels are shown in Table 1 and Supplementary Figures 1, 2 . Several cytokines were significantly elevated in the severe COVID-19 group (p ≤ 0.010). The predictive value of each single cytokine, and a combined panel of cytokines, were evaluated by ROC curve analysis and quantified by AUC (Supplementary Figure 3) . Results showed that the AUCs were 0.830, 0.796, 0.729, 0.707, 0.694, 0.667, 0.656, and 0.653 for single IL-10, IL-6, IL-1α, IL-1β, IL-17A, IL-4, TNF-α, and IL-2 and that the binary logistic model had a similar AUC (0.796-0.848). These data indicated that IL-10 and IL-6 may represent potential biomarkers for patients with severe COVID-19. We found significant differences between the severe and non-severe COVID-19 group with regard to a range of clinical characteristics, including respiratory rate, cough, expectoration, dyspnea, asthma, and debilitation. Significant differences were also identified in several laboratory findings; lymphocyte ratio (LYMR), eosinophil ratio (EOSR), monocyte ratio (MONOR), total bilirubin (TBIL), total protein (TP), albumin (ALB), Ca, and URIC were all significantly lower in the severe COVID-19 group, while neutrophil ratio (NEUTR), FIB, aspartate transaminase (AST), glucose (GlU), and HsCRP were all significantly higher. However, the AUCs for these indicators when used to predict severe COVID-19 were all <0.690. Simple logistic analysis was not suited for the severe COVID-19 group, owing to the feature selection of such a large number of indicators. We identified significant correlations between each pair for all cytokines except IL-33 and IFN-β. In addition, IL-6, IL-10, and IFN-β were closely associated with certain laboratory indicators of hepatobiliary function. Similarly, hematocrit (HCT), tBIL, direct bilirubin (DBIL), indirect bilirubin (IBIL), TP, creatine kinase (CK), and myoglobin (Myo) were significantly associated with most cytokines except IL-33, which was not correlated with any of the indices. corresponding predictive models (predictive models A, B, and C, respectively) (Table 2, Figure 2) . Predictive model B exhibited a better performance than the other two models in terms of sensitivity, specificity, discrimination, calibration, and clinical net benefit. In addition, the predictors included in this model are objective and universal. An optimal model, with seven features, alanine aminotransferase (ALT), IL-6, expectoration, fatigue, LYMR, AST, and serum creatinine (CREA), were used to generate a nomogram (Figure 3) and were encapsulated as an open-source online predictive calculator with R/Shiny (https://severeconid-19predction.shinyapps.io/SHINY/).

Finally, we predicted the disease progression of the 108 patients in the validation cohort using our model. The model predicted that 18 patients would progress to severe COVID-19 while the remaining 90 would not. Compared with the follow-up results (91 patients with non-severe COVID-19 and 17 patients with (Figures 4, 5) .

The accurate and individualized assessment of a patient who may progress to severe COVID-19 will promote the efficiency of clinical intervention and improve the rational use of medical resources. In the present study, we recruited 206 patients (162 patients with non-severe COVID-19 and 44 patients with severe COVID-19). We analyzed a range of indicators associated with severe COVID-19 and developed a novel predictive model that included ALT, IL-6, expectoration, fatigue, LYMR, AST, and CREA. This model proved to have excellent ability to predict the progression of COVID-19 during hospitalization, in both the derivation and validation cohorts.

Our final model was visualized in the form of a nomogram and was then packaged into an open-source and free predictive calculator (https://severeconid-19predction.shinyapps.io/ SHINY/). The model represents a powerful tool with which to aid decision-making and guide treatment strategies for target patients who are at high risk of developing severe progression. The model could also be used to facilitate personalized management.

Previous research reported wide differences in the levels of a large number of cytokines from patients with non-severe and severe COVID-19 (26) (27) (28) . Our present results identified obvious elevations of various cytokines in patients with severe COVID-19, including IL-1α, IL-1β, IFN-γ, TNF-α, IL-2, IL-4, IL-6, IL-10, and IL-17A. Of these cytokines, IL-6 and IL-10 showed the highest fold-change, thus indicating the presence of a strong inflammatory reaction; this could be a sufficient response to trigger a cytokine storm. Univariate logistic analysis showed that a number of cytokines can be used as predictors for patients with severe illness, although their predictive efficacies can vary considerably; these cytokines could not be used individually. We also found that underlying diseases (diabetes and hypertension), initial clinical characteristics (cough, expectoration, dyspnea, asthma, and debilitation), and laboratory findings [LYMR ALT, AST, CK, GlU, and procalcitonin (PCT)] were also significantly associated with disease progression, although these were nonspecific. The extensive correlation between cytokines and the clinical response spectrum may be explained by multiple organ damage caused by the over-exuberant inflammatory response in severe COVID-19 (12, 29) .

Univariate logistic analysis indicated that using a certain evaluation index could not provide sufficient evidence for the prediction of progression and that modeling by data mining may be a more efficient and viable tool with which to compensate for the lack of a single source of information (30) . We used the LASSO algorithm and logistic regression and compared different modeling approaches. Finally, we selected a predictive model that included ALT, IL-6, expectoration, fatigue, LYMR, AST, and CREA. Our model achieved satisfactory predictive performance with AUCs of 0.910 and 0.879 in the derivation and validation cohorts, respectively. We also packaged this model into an open-source online format for clinical use. Although several predictive models have been published previously, these studies were associated with obvious limitations, including the fact that they were retrospective reviews or were associated with suboptimal predictive abilities or were not validated externally (31) (32) (33) . Taking these limitations into account, our study is superior in several respects. First, we considered potential predictors for severe COVID-19 and included a comprehensive dataset retrospectively. Second, our shrinking model, featuring representative key variables, may exhibit better levels of performance than a complex model. This can be supported by the fact that our predictive model was established by comparing several different methods; the optimal method had a significantly higher AUC than the other models; this finding was reconfirmed in the validation cohort. Third, the predictive model was used to create a nomogram that was then used to generate an open-source online calculator format with visualization and maneuverability function.

There are also some limitations associated with our study that need to be considered. For example, we mainly focused on the changes of symptoms and the levels of key indicators in patients after SARS-CoV-2 infection and did not consider the influence of individual differences on the progression of disease. More in-depth investigations and longitudinal dynamic monitoring studies now need to be conducted to explain the specific characteristics of the potential predictors. Furthermore, the predictive model needs to be validated in a larger patient cohort and other populations outside of China.

In this study, we developed and validated an online predictive calculator that provides personalized probability for the progression of disease based on seven commonly used variables. The model will be vital for early personalized management, to promote the appropriate allocation of medical resources, and to ensure that patients who may develop severe COVID-19 can receive appropriate treatment as soon as possible.

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

The studies involving human participants were reviewed and approved by West China Hospital, Sichuan University Medical Ethics Committee. The patients/participants provided their written informed consent to participate in this study.

ZM and MW designed the research and wrote the manuscript. ZZ and YoZ responsible for the recruitment of COVID-19 patients and clinical treatment. YW and SG responsible for the detection of candidate biomarkers. ML, SY, and YaZ responsible for collecting and organizing data. All authors contributed to the article and approved the submitted version. 

Coronavirus disease 2019 (COVID-19): prevention and control in gynecological outpatient clinic

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study

Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID

Feasibility of coronavirus disease 2019 eradication

Risk factors on the progression to clinical outcomes of COVID-19 patients in South Korea: using national data

Surviving sepsis campaign: guidelines on the management of critically ill adults with coronavirus disease 2019 (COVID-19)

Novel therapeutic approaches for treatment of COVID-19

Mesenchymal stem cell immunomodulation and regeneration therapeutics as an ameliorative approach for COVID-19 pandemics

Performance of two risk-stratification models in hospitalized patients with coronavirus disease

An updated insight into the molecular pathogenesis, secondary complications and potential therapeutics of COVID-19 pandemic

Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19) A review

Clinical characteristics and risk factors associated with COVID-19 disease severity in patients with cancer in Wuhan, China: a multicentre, retrospective, cohort study

Prediction model based on the combination of cytokines and lymphocyte subsets for prognosis of SARS-CoV-2 infection

A tool for early prediction of severe coronavirus disease 2019 (COVID-19): a multicenter study using the risk Nomogram in Wuhan and Guangdong, China

Early prediction of mortality risk among patients with severe COVID-19, using machine learning

Intravenous methylprednisolone pulse as a treatment for hospitalised severe COVID-19 patients: results from a randomised controlled clinical trial

Decreased serum level of sphingosine-1-phosphate: a novel predictor of clinical severity in COVID-19

Occupational COVID-19 prevention among congolese healthcare workers: knowledge, practices, PPE compliance, and safety imperatives

Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal

A simple algorithm helps early identification of SARS-CoV-2 infection patients with severe progression tendency

A novel simple scoring model for predicting severity of patients with SARS-CoV-2 infection

Proteomic and metabolomic characterization of COVID-19 patient sera

Signaling mode of the broad-spectrum conserved CO 2 receptor is one of the important determinants of odor valence in drosophila

Randomized lasso links microbial taxa with aquatic functional groups inferred from flow cytometry

Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations

Circulating levels of interleukin-6 and interleukin-10, but not tumor necrosis factor-alpha, as potential biomarkers of severity and mortality for COVID-19: systematic review with meta-analysis

COVID-19: the emerging immunopathological determinants for recovery or death

Severe acute respiratory syndrome coronavirus-2 induces cytokine storm and inflammation during coronavirus disease 19: perspectives and possible therapeutic approaches

Edaravone: a potential treatment for the COVID-19-induced inflammatory syndrome?

TIMP2 is a poor prognostic factor and predicts metastatic biological behavior in gastric cancer

Validation and repurposing of the MSL-COVID-19 score for prediction of severe COVID-19 using simple clinical predictors in a triage setting: The Nutri-CoV score

Development and validation of a deep learning-based model using computed tomography imaging for predicting disease severity of coronavirus disease 2019

Development and validation of the HNC-LL score for predicting the severity of coronavirus disease 2019

Akaike information criterion

AUC, area under the ROC curve

DCA, decision curve analysis

DM, diabetes

HsCRP, high-sensitivity C reactive protein

IFN, interferon; IL, interleukins; INR, International Normalized Ratio; LDL-C, low-density lipoprotein cholesterol

WBC, white blood cell

The authors would like to express their gratitude to EditSprings (https://www.editsprings.com/) for the expert linguistic services provided.

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.