key: cord-0764246-51b7hss1 authors: Gong, Jiao; Ou, Jingyi; Qiu, Xueping; Jie, Yusheng; Chen, Yaqiong; Yuan, Lianxiong; Cao, Jing; Tan, Mingkai; Xu, Wenxiong; Zheng, Fang; Shi, Yaling; Hu, Bo title: A Tool to Early Predict Severe 2019-Novel Coronavirus Pneumonia (COVID-19) : A Multicenter Study using the Risk Nomogram in Wuhan and Guangdong, China date: 2020-03-20 journal: nan DOI: 10.1101/2020.03.17.20037515 sha: d5f13718a22acb1f72d7baad5d96db8c673f087c doc_id: 764246 cord_uid: 51b7hss1 Background Severe cases of coronavirus disease 2019 (COVID-19) rapidly develop acute respiratory distress leading to respiratory failure, with high short-term mortality rates. At present, there is no reliable risk stratification tool for non-severe COVID-19 patients at admission. We aimed to construct an effective model for early identifying cases at high risk of progression to severe COVID-19. Methods SARS-CoV-2 infected patients from one center in Wuhan city and two centers in Guangzhou city, China were included retrospectively. All patients with non-severe COVID-19 during hospitalization were followed for more than 15 days after admission. Patients who deteriorated to severe or critical COVID-19 and patients who kept non-severe state were assigned to the severe and non-severe group, respectively. We compared the demographic, clinical, and laboratory data between severe and non-severe group. Based on baseline data, least absolute shrinkage and selection operator (LASSO) algorithm and logistic regression model were used to construct a nomogram for risk prediction in the train cohort. The predictive accuracy and discriminative ability of nomogram were evaluated by area under the curve (AUC) and calibration curve. Decision curve analysis (DCA) and clinical impact curve analysis (CICA) were conducted to evaluate the clinical applicability of our nomogram. Findings The train cohort consisted of 189 patients, while the two independent validation cohorts consisted of 165 and 18 patients. Among all cases, 72 (19.35%) patients developed severe COVID-19 and 107 (28.76%) patients had one of the following basic disease, including hypertension, diabetes, coronary heart disease, chronic respiratory disease, tuberculosis disease. We found one demographic and six serological indicators (age, serum lactate dehydrogenase, C-reactive protein, the coefficient of variation of red blood cell distribution width (RDW), blood urea nitrogen, albumin, direct bilirubin) are associated with severe COVID-19. Based on these features, we generated the nomogram, which has remarkably high diagnostic accuracy in distinguishing individuals who exacerbated to severe COVID-19 from non-severe COVID-19 (AUC 0.912 [95% CI 0.846-0.978]) in the train cohort with a sensitivity of 85.71 % and specificity of 87.58% ; 0.853 [0.790-0.916] in validation cohort with a sensitivity of 77.5 % and specificity of 78.4%. The calibration curve for probability of severe COVID-19 showed optimal agreement between prediction by nomogram and actual observation. DCA and CICA further indicated that our nomogram conferred significantly high clinical net benefit. Interpretation Our nomogram could help clinicians to early identify patients who will exacerbate to severe COVID-19. And this risk stratification tool will enable better centralized management and early treatment of severe patients, and optimal use of medical resources via patient prioritization and thus significantly reduce mortality rates. The RDW plays an important role in predicting severe COVID-19, implying that the role of RBC in severe disease is underestimated. Severe cases of coronavirus disease 2019 (COVID-19) rapidly develop acute respiratory distress leading to respiratory failure, with high short-term mortality rates. At present, there is no reliable risk stratification tool for non-severe COVID-19 patients at admission. We aimed to construct an effective model for early identifying cases at high risk of progression to severe COVID-19. China were included retrospectively. All patients with non-severe COVID-19 during hospitalization were followed for more than 15 days after admission. Patients who deteriorated to severe or critical COVID-19 and patients who kept non-severe state were assigned to the severe and non-severe group, respectively. We compared the demographic, clinical, and laboratory data between severe and non-severe group. Based on baseline data, least absolute shrinkage and selection operator (LASSO) algorithm and logistic regression model were used to construct a nomogram for risk prediction in the train cohort. The predictive accuracy and discriminative ability of nomogram were evaluated by area under the curve (AUC) and calibration curve. Decision curve analysis (DCA) and clinical impact curve analysis (CICA) were conducted to evaluate the clinical applicability of our nomogram. diabetes, coronary heart disease, chronic respiratory disease, tuberculosis disease. We found one demographic and six serological indicators (age, serum lactate dehydrogenase, C-reactive protein, the coefficient of variation of red blood cell distribution width (RDW), blood urea nitrogen, albumin, direct bilirubin) are associated with severe COVID-19. Based on these features, we generated the nomogram, which has remarkably high diagnostic accuracy in distinguishing individuals who exacerbated to severe COVID-19 from non-severe COVID-19 ( Our nomogram could help clinicians to early identify patients who will exacerbate to severe COVID-19. And this risk stratification tool will enable better centralized management and early treatment of severe patients, and optimal use of medical resources via patient prioritization and thus significantly reduce mortality rates. The RDW plays an important role in predicting severe COVID-19, implying that the role of RBC in severe disease is underestimated. Since the outbreak of novel coronavirus pneumonia in December 2019, the number of reported cases has surpassed 120,000 with over 4600 deaths worldwide, as of March 12 2020. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a member of coronaviruses known to cause common colds and severe illnesses such as , is the cause of COVID-19 1 . Compared with much higher overall case-fatality rates (CFR) for the severe acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS), COVID-19 is being responsible for more total deaths because of the increased transmission speed and the growing numbers of cases 2 . Up to now, the World Health Organization (WHO) has raised global Coronavirus Disease (COVID-19) outbreak risk to "Very High", and SARS-CoV-2 infection has become a serious threat to public health. According to a report recently released by the Chinese Center for Disease Control and Prevention (CDC) that included approximately 44,500 confirmed cases of SARS-CoV-2 infections, up to 15.8% were severe or critical. Most COVID-19 patients have a mild disease course, while some patients experience rapid deterioration (particularly within 7-14 days) from onset of symptoms into severe COVID-19 with or without acute respiratory distress syndrome (ARDS) 3 . Current epidemiological data suggests that the mortality rate of severe COVID-19 patients is about 20 times higher than that of non-severe COVID-19 patients 4, 5 . This situation highlights the need to identify COVID-19 patients at risk of approaching to severe COVID-19. These severe illness patients often require utilization of intensive medical resources. Therefore, early identification of patients at high risk for progression to severe COVID-19 will facilitate All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . appropriate supportive care and reduce the mortality rate, unnecessary or inappropriate healthcare utilization via patient prioritization. At present, an early warning model for predicting COVID-19 patients at-risk of developing a costly condition is scarce 3, 6 . So far, prognosis factors of COVID-19 mainly focus on the immune cells. In our study, we found that older age, higher lactate dehydrogenase (LDH) and C-reactive protein (CRP), RDW (the coefficient of variation of red blood cell distribution width), DBIL (direct bilirubin), blood urea nitrogen (BUN), and lower albumin (ALB) on admission correlated with higher odds of severe COVID-19. Based on these indexes, we developed and validated an effective prognostic nomogram with high sensitivity and specificity for accurate individualized assessment of the incidence of severe COVID-19. Among these indexes, the prognostic role of RDW in COVID-19 is underestimated, which is associated with the increased turnover of erythrocytes. Our results hinted that the turnover of RBC might involve in severe illness. Data on COVID-19 inpatients between January 20 th 2020 and March 2 nd 2020 was retrospectively collected from three clincial centers: Guangzhou Eighth People's Hospital, Zhongnan Hospital of Wuhan University and the Third Affiliated Hospital of Sun Yat-sen University. A total of 372 patients with COVID-19 were enrolled, 9 patients younger than 15 years of age were excluded from the study. Clinical laboratory test results, including SARS-CoV-2 RNA detection results, biochemical indices, blood routine results, were collected from routine clinical practice. Written Clinical laboratory test results, including SARS-CoV-2 RNA detection results, biochemical indices, blood routine results, were collected from routine clinical practice. Clinical laboratory test All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. Categorical variables were expressed as frequency and percentages, and Fisher's exact test was performed to analyze the significance. Continuous variables were expressed as mean (standard deviation [SD]), or median (interquartile range [IQR]), as appropriate. Parametric test (T test) and non-parametric test (Mann-Whitney U) were used for continuous variables with or without normal distribution, respectively. A value of p < 0.05 was considered statistically significant. Except for filling missing values, all the statistical analyses were analyzed using R (version 3.6.2) with default parameters. Of all potential predictors in the dataset, 0.09 % of the fields had missing values. Predictor exclusion was limited to those with more than 7% missing rate to minimize the bias of the All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. To identify the relative importance of each feature, feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression method, and prediction models were built using logistic regression, decision tree, random forest (RF) and support vector machine (SVM) using R package Caret, using 300-time repeated random sub-sampling validation for diverse parameter conditions, respectively. As described previously, Nomograms were established with the rms package and the performance of nomogram was evaluated by discrimination (Harrell's concordance index) and calibration (calibration plots and Hosmer-Lemeshow calibration test) in R. During the external validation of the nomogram, the total points for each patient in the validation cohort were calculated based on the established nomogram. The selection of the study population is illustrated in Figure 1 . A total of 372 COVID-19 patients were enrolled after admission from three centers in Guangzhou and Wuhan (Figure 1 ). All patients with non-severe COVID-19 during hospitalization were followed for more than 15 days after admission. Patients who deteriorated to severe or critical COVID-19 and patients who kept non-severe state were assigned to the severe and non-severe group, respectively. There were no All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Table 2 . and support vector machine (SVM), and evaluated their performance by the receiver operating characteristic curve (ROC) and the precision-recall curve (appendix p1). There were no big difference in performance of these models except for decision tree. Therefore, logistic regression model was used for further analysis owing to its high predictive power and interpretability. The predictive nomogram that integrated 7 selected features for the incidence of severe COVID-19 in the train cohort is shown ( Figure 2C ). To evaluate clinical applicability of our risk prediction nomogram, decision curve analysis (DCA) and clinical impact curve analysis (CICA) were performed. The DCA and CICA visually showed that the nomogram had a superior overall All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037515 doi: medRxiv preprint net benefit within the wide and practical ranges of threshold probabilities and impacted patient outcomes ( Figure 2D and 2E ). In Figure 3A and 3B, the calibration plot for severe illness probability showed a good agreement between the prediction by nomogram and actual observation in the train cohort and validation cohort 1, respectively. In the train cohort, the nomogram had a significantly high AUC 0.912 (95% CI 0.846-0.978) to discriminate individuals with severe COVID-19 from non-severe COVID-19, with a sensitivity of 85.71 % and specificity of 87.58% ( Figure 3C , Table 2 ). Cutpoint R package was used to calculate optimal cutpoints by bootstraping the variability of the optimal cutpoints, which was 188.6358 for our nomogram (corresponding to a threshold probability of 0.190). Then patients in the validation cohorts were divided into the low group (score ≤ 188.6358) and the high group (score>188.6358) for further analysis. In consistent with the train cohort, in validation cohort 1, AUC was 0.853 for patients with severe COVID-19 versus non-severe COVID-19 with a sensitivity of 77.5 % and specificity of 78.4% ( Figure 3D , Table 3 ). In validation cohort 2, the sensitivity and the specificity of the nomogram were observed to be 75% and 100%, respectively. Early identification of patients approaching to severe COVID-19 patients will lead to better management and optimal use of medical resources. In this research, we identified older age, higher LDH and CRP, DBIL, RDW, BUN, and lower ALB on admission correlated with higher odds of severe COVID-19. Furthermore, we developed an effective prognostic nomogram composed of 7 features, had significantly high sensitivity and specificity to distinguish individuals with severe COVID-19 from non-severe COVID-19. DCA and CICA further indicated that our All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. Furthermore, only seven easy-access features were included in our nomogram, including older age, higher LDH and CRP, DBIL, RDW, BUN, and lower ALB. Among of them, age, NLR and LDH has been reported to be risk factors for severe patients with SARS-CoV-2 infection 3, 6, 7 . NLR, a widely used marker for the assessment of system inflammation, was not identified by LASSO as an important feature instead of LDH and CRP, which are associated with the systemic inflammatory response 8 . However, LDH could predict severity of tissue damage in early stage of diseases as an auxiliary marker 9 . These might be reasons why the lasso model did not identified NLR as a more important feature. Consistent with other reports, our results indicate that patients All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03. 17.20037515 doi: medRxiv preprint with higher levels of inflammation at admission might be at higher risk for severe COVID-19 as well. Interestingly, we found RDW was also an important prognostic predictor for severe COVID-19. RDW, one of the numbers or blood cell indices, reflects the variation in the size of RBC (red blood cells), which has been tightly correlated with critical disease 10-12 but negligent in COVID-19. It is a robust predictor of the risk of all cause patient mortality and bloodstream infection in the critically ill [11] [12] [13] [14] , including acute exacerbation of interstitial pneumonia, ARDS 10, 15 . RDW also can predict prognosis of sepsis, which was tied to poor COVID-19 outcomes-death 16 . The increased RDW in COVID-19 patients may be due to the increased turnover of erythrocytes: 1) Pro-inflammatory states may be responsible for insufficient erythropoiesis with structural and functional alteration of RBC, such as decreased deformability leading to more rapid clearing of RBC. 2) Plasma cytokines such as interleukin 1 (IL-1) and tumor necrosis factor-α (TNF-α) could not only attenuate the renal erythropoietin (EPO) production, but also blunt the erythroid progenitor response to EPO. In addition, INF-γ contributes to apoptosis of the erythroid progenitors and decrease the EPO receptor expression 17 . 3) RBC are dynamic reservoirs of cytokines 18 . Decreased deformability of RBC in severe illness leads to RBC lysis and release of intracellular contents into the circulation 19 , including some inflammatory cytokines. This positive feedback could greatly promote the apparent shortened RBC survival and ultimately more morphological variations in cell sizes (i.e., elevated RDW), increased inflammatory response, and lead to severe illness. RDW can be regarded as an index of enhanced patient fragility and higher vulnerability to adverse outcomes 20 . The elevated RDW may explain fatigue experienced by All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Our study has several strengths: first, we provide a practical quantitive prediction tool based on only 7 features which were relatively inexpensive and easy to be obtained directly from the routine blood tests. Second, to guarantee the robustness of the conclusion, we included the data from three centers with a large sample size and validation in independent cohorts. The performance of our nomogram was efficient for clinical practice. There were some limitations in the study. First, this is a retrospective study, including 372 patients with non-severe COVID-19 on admission. Second, some patients are still in hospital and their condition may change with follow-up. More comprehensive investigations need to be conducted to explain the characteric of the 7 features. In summary, our data suggest that our nomogram could early identify the severe COVID-19 patients, and RDW was vaulable for prediction of severe diseases. Our nomogram is especially valuable for risk stratification management, which will be helpful for alleviating insufficient medical resources and reducing mortality. This work is funded by the Science and Technology Program of Guangzhou, China (201804010474). Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare that there is no conflict of interest regarding the publication of this paper. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03. 17.20037515 doi: medRxiv preprint risk threshold. RDW_CV, red blood cell distribution width-coefficient variation; BUN, blood urea nitrogen; DBIL, direct bilirubin; CRP, C-reactive protein; LDH, lactate dehydrogenase; ALB, albumin. The calibration curve and ROC for performance to distinguish individuals with severe COVID-19 from non-severe COVID-19 in the train cohort (A, C) and validation cohort 1 (B, D), respectively. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03. 17.20037515 doi: medRxiv preprint Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention Early Prediction of Disease Progression Novel Coronavirus Pneumonia Patients Outside Wuhan with CT and Clinical Characteristics Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Neutrophil-to-Lymphocyte Ratio Predicts Severe Illness Patients with 2019 Novel Coronavirus in the Early Stage Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan The prognostic value of systemic inflammatory factors in patient with metastatic gastric cancer Prognostic biomarkers for immunotherapy with ipilimumab in metastatic melanoma Relation between Red Cell Distribution Width and Mortality in Critically Ill Patients with Acute Respiratory Distress Syndrome Red cell distribution width predicts out of hospital outcomes in critically ill emergency general surgery patients. Trauma surgery & acute care open Red cell distribution width (RDW) as a biomarker for respiratory failure in a pediatric ICU RBC Distribution Width: Biomarker for Red Cell Dysfunction and Critical Illness Outcome? Pediatr Crit Care Med RDW at Hospital Admission May Predict Prognosis of the Patient with Acute Exacerbation of Interstitial Pneumonia Broadening of the red blood cell distribution width is associated with increased severity of illness in patients with sepsis Anemia in Chronic obstructive pulmonary disease: Prevalence, pathogenesis, and potential impact Red blood cells are dynamic reservoirs of cytokines The role of red blood cells and cell-free hemoglobin in the pathogenesis of ARDS The role of red blood cell distribution width (RDW) in cardiovascular risk assessment: useful or hype? Annals of translational medicine Abbreviations: BMI, body mass index WBC, white blood cell red blood cell; NLR, neutrophil-to-lymphocyte ratio; PLR , patelet -to-lymphocyte ratio SII, systemic immune-inflammation index; RDW-SD, red blood cell distribution width-standard deviation RDW, red blood cell distribution width-coefficient variation ALB, albumin; BUN, blood urea nitrogen TBA, total bile acids C-reactive protein; INR, international normalized ratio All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity