key: cord-0696276-hu6qzibv authors: Zhang, H.; Shi, T.; Wu, X.; Zhang, X.; Wang, K.; Bean, D.; Dobson, R.; Teo, J. T.; Sun, J.; Zhao, P.; Li, C.; Dhaliwal, K.; Wu, H.; Li, Q.; Guthrie, B. title: Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London, UK date: 2020-05-03 journal: nan DOI: 10.1101/2020.04.28.20082222 sha: bb9376dc4caca4dbb56d0fb77512aa92366418b4 doc_id: 696276 cord_uid: hu6qzibv Background Accurate risk prediction of clinical outcome would usefully inform clinical decisions and intervention targeting in COVID-19. The aim of this study was to derive and validate risk prediction models for poor outcome and death in adult inpatients with COVID-19. Methods Model derivation using data from Wuhan, China used logistic regression with death and poor outcome (death or severe disease) as outcomes. Predictors were demographic, comorbidity, symptom and laboratory test variables. The best performing models were externally validated in data from London, UK. Findings 4.3% of the derivation cohort (n=775) died and 9.7% had a poor outcome, compared to 34.1% and 42.9% of the validation cohort (n=226). In derivation, prediction models based on age, sex, neutrophil count, lymphocyte count, platelet count, C-reactive protein and creatinine had excellent discrimination (death c-index=0.91, poor outcome c-index=0.88), with good-to-excellent calibration. Using two cut-offs to define low, high and very-high risk groups, derivation patients were stratified in groups with observed death rates of 0.34%, 15.0% and 28.3% and poor outcome rates 0.63%, 8.9% and 58.5%. External validation discrimination was good (c-index death=0.74, poor outcome=0.72) as was calibration. However, observed rates of death were 16.5%, 42.9% and 58.4% and poor outcome 26.3%, 28.4% and 64.8% in predicted low, high and very-high risk groups. Interpretation Our prediction model using demography and routinely-available laboratory tests performed very well in internal validation in the lower-risk derivation population, but less well in the much higher-risk external validation population. Further external validation is needed. Collaboration to create larger derivation datasets, and to rapidly externally validate all proposed prediction models in a range of populations is needed, before routine implementation of any risk prediction tool in clinical care. Evidence before this study Several prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay in COVID-19 have been published. 1 Commonly reported predictors of severe prognosis in patients with COVID-19 include age, sex, computed tomography scan features, C-reactive protein (CRP), lactic dehydrogenase, and lymphocyte count. Symptoms (notably dyspnoea) and comorbidities (e.g. chronic lung disease, cardiovascular disease and hypertension) are also reported to have associations with poor prognosis. 2 However, most studies have not described the study population or intended use of prediction models, and external validation is rare and to date done using datasets originating from different Wuhan hospitals. 3 Given different patterns of testing and organisation of healthcare pathways, external validation in datasets from other countries is required. This study used data from Wuhan, China to derive and internally validate multivariable models to predict poor outcome and death in COVID-19 patients after hospital admission, with external validation using data from King's College Hospital, London, UK. Mortality and poor outcome occurred in 4.3% and 9.7% of patients in Wuhan, compared to 34.1% and 42.9% of patients in London. Models based on age, sex and simple routinely available laboratory tests (lymphocyte count, neutrophil count, platelet count, CRP and creatinine) had good discrimination and calibration in internal validation, but performed only moderately well in external validation. Models based on age, sex, symptoms and comorbidity were adequate in internal validation for poor outcome (ICU admission or death) but had poor performance for death alone. This study and others find that relatively simple risk prediction models using demographic, clinical and laboratory data perform well in internal validation but at best moderately in external validation, either because derivation and external validation populations are small (Xie et al 3 ) and/or because they vary greatly in casemix and severity (our study). There are three decision points where risk prediction may be most useful: (1) deciding who to test; (2) deciding which patients in the community are at high-risk of poor outcomes; and (3) identifying patients at high-risk at the point of hospital admission. Larger studies focusing on particular decision points, with rapid external validation in multiple datasets are needed. A key gap is risk prediction tools for use in community triage (decisions to admit, or to keep at home with varying intensities of follow-up including telemonitoring) or in low income settings where laboratory tests may not be routinely available at the point of decision-making. This requires systematic data collection in community and low-income settings to derive and evaluate appropriate models. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint Abstract Background Accurate risk prediction of clinical outcome would usefully inform clinical decisions and intervention targeting in COVID-19. The aim of this study was to derive and validate risk prediction models for poor outcome and death in adult inpatients with COVID-19. Model derivation using data from Wuhan, China used logistic regression with death and poor outcome (death or severe disease) as outcomes. Predictors were demographic, comorbidity, symptom and laboratory test variables. The best performing models were externally validated in data from London, UK. Findings 4.3% of the derivation cohort (n=775) died and 9.7% had a poor outcome, compared to 34.1% and 42.9% of the validation cohort (n=226). In derivation, prediction models based on age, sex, neutrophil count, lymphocyte count, platelet count, C-reactive protein and creatinine had excellent discrimination (death c-index=0.91, poor outcome c-index=0.88), with good-to-excellent calibration. Using two cut-offs to define low, high and very-high risk groups, derivation patients were stratified in groups with observed death rates of 0.34%, 15.0% and 28.3% and poor outcome rates 0.63%, 8.9% and 58.5%. External validation discrimination was good (c-index death=0.74, poor outcome=0.72) as was calibration. However, observed rates of death were 16.5%, 42.9% and 58.4% and poor outcome 26.3%, 28.4% and 64.8% in predicted low, high and very-high risk groups. Our prediction model using demography and routinely-available laboratory tests performed very well in internal validation in the lower-risk derivation population, but less well in the much higher-risk external validation population. Further external validation is needed. Collaboration to create larger derivation datasets, and to rapidly externally validate all proposed prediction models in a range of populations is needed, before routine implementation of any risk prediction tool in clinical care. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) is the novel coronavirus that causes coronavirus disease 2019 (COVID-19). A local outbreak of COVID-19 was first detected in Wuhan, China and rapidly spread to cause a pandemic. 4 A key feature of the pandemic in all areas with large local epidemics is that healthcare systems are often rapidly overwhelmed. Effective and safe triage to ensure that those at highest risk of severe disease are appropriately escalated, and those at lower risk safely monitored, is critical to maintain healthcare system capacity for as long as possible. The same risk stratification is also needed to support targeted recruitment to both mechanistic and interventional research studies, since developing effective treatments of early disease is a key objective alongside vaccine development. Although risk stratification is important in all health systems, exactly how risk stratification is done will vary by country and by healthcare system. Across all systems though, important care pathway decision points include (1) who to test (particularly important when tests are in short supply); (2) who to hospitalize, and who to care for in the community (with or without telemonitoring or other proactive follow-up), and; (3) who to direct down different care pathways after admission. Since the severity of disease and risk of poor outcome varies by point on the pathway, there is a need for a range of risk stratification tools based on the data available at different stages of the care pathway (Figure 1 ), and/or in different healthcare systems. A number of studies have identified patient characteristics associated with poor outcomes in COVID-19 using univariate analysis. Older age, being male, dyspnea and comorbidities (e.g. chronic obstructive airways disease [COPD], cardiovascular disease, hypertension) are associated with intensive care unit (ICU) admission, 2,5-11 and comorbidities are also reported to be associated with an increased risk of death. 12, 13 However, many studies do not report adjusted associations and few examine laboratory findings, although there are reported univariate associations between the development of acute respiratory distress syndrome (ARDS) and a range of predictors including high fever (≥39°C), older age, and laboratory measures including neutrophilia, d-dimer and other evidence of coagulation and organ dysfunction. 14 However, univariate analysis is usually a poor guide to risk because of confounding, most obviously between age and physical morbidities. 15 Two studies have published multivariable/adjusted findings. Zhou et al examined risk factors for inhospital death in 191 patients, finding significant adjusted associations with older age, higher Sequential Organ Failure Assessment (SOFA) score, lower lymphocyte count and increased d-dimer. 16 Xie et al is the only published study explicitly aiming to create a risk prediction model for inpatient mortality for clinical use, with derivation in 299 patients admitted to one Wuhan hospital and external validation in 145 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint patients admitted to another Wuhan hospital. 3 Mortality was very high in both derivation (51.8%) and validation (47.6%) cohorts. Factors associated with inpatient mortality were age, lower lymphocyte count, higher lactate dehydrogenase and lower peripheral capillary oxygen saturation (SpO2), and external validation showed reasonable performance (calibration was excellent but only after model recalibration to the validation dataset which will rarely be possible in clinical use). 3 Neither study reported clinically relevant performance measures such as sensitivity, specificity and positive/negative predictive value. The aim of this study was to derive and internally validate risk prediction models for poor outcome and death in a cohort of 775 inpatients with COVID-19 in Wuhan, China, with external validation in a cohort of 226 inpatients with COVID-19 in London, UK. Methods are briefly described here and in detail in the supplementary file. The multivariable risk prediction model was derived in a cohort of 775 adults with COVID-19 confirmed Models are referred to as predictor-outcome (e.g. DCS-Death). For each model, predictors with coefficients significantly different to zero in cross-validation were selected. Optimal L1-regularization . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint strength for preferred models was chosen by best C-index in cross-validation. All models were internally validated with cross-validation to select variables based on model fit and not just individual predictor statistical significance, and model performance metrics in the held-out partition were reported. To explore prediction cut-offs to use to inform clinical decision-making, we grouped patients into low-, high-and very high-risk groups based on predicted probabilities obtained from the final models. Cut-offs to define the low and high-risk groups were arbitrarily assigned to be approximately 10-fold lower and 8-fold higher than the outcome percentage in the whole dataset. Models using DL predictors were externally validated in the KCH cohort, using two outcomes: death and poor outcome (defined as ICU admission or death). The derivation cohort consisted of 775 adult in-patients with COVID-19 in Wuhan (Table 1 ). The median age was 61 years (IQR 50-68, range 18-96), 48.9% were men, and 345 (44.5%) had comorbidity (most commonly hypertension (30.8%), diabetes mellitus (13.7%) and heart disease (11.0%)). The most common symptoms on admission were fever (68.6%) and cough (68.1%), followed by fatigue (54.3%) and dyspnoea (45.8%). Median lymphocyte count was 1.4 (IQR 1.1-1.9) 10^9/L, neutrophil count 3.5 Median lymphocyte count and platelet count was lower, and median age, proportion male, neutrophil count, CRP and creatinine higher in the KCH validation compared to the Wuhan derivation cohort. In the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint derivation cohort, 33 (4.3%) died and 75 (9.7%) had pre-defined poor outcome (ARDS, intubation, ECMO, ICU admission or death) (two patients had missing data for death and three patients had missing data for poor outcome). Both outcomes were much more common in the validation cohort where 77 (34.1%) died, and 97 (42.9%) of patients experienced a poor outcome (ICU admission or death). Univariate associations of patient characteristics and outcomes are summarised in Table S1 , with findings broadly consistent with previously reported univariate associations. 2, 12, 13 The strongest observed univariate associations were increased risk of each outcome with increasing age, the presence of prior comorbidity (particularly chronic lung disease, heart disease, immunocompromise, chronic renal disease and hypertension), the presence of dyspnoea, and laboratory values (increased risk with higher neutrophil count, CRP and creatinine, and with lower lymphocyte count and platelet count). Table 2 summarises multivariable odds ratios of models for both outcomes (Table S2 reports the predictor coefficients with more precision), and Table 3 details model performance in internal and external validation. In adjusted analysis, underlying conditions and symptoms were no longer associated with either outcome with the exception of dyspnoea which was associate with poor outcome, although confidence intervals are wide meaning that a type 2 error cannot be ruled out. (Table 3 and Figure S1A ). Calibration evaluated using calibration-in-the-large and calibration slope was poor for DCS-Death and DCSL-Death models, but good for DL-Death In the DL-Death model, increased blood neutrophil counts (OR 1.25 [all OR for continuous variables are per one unit increase in predictor values], p=0.002) and CRP levels (OR 1.01, p=0.008) were significantly associated with death. Other variables were not statistically significantly associated with death but included because they improved model fit. Using a probability of 0.5 as the prediction cut-off for the DL-Death model, sensitivity was 0.33, positive predictive value 0.52, specificity 0.99, and negative predictive value 0.98, which is not adequate for clinical use. Using the two alternative probability cut-offs to define low, high, and very high risk groups, the percentage who died in the low, high, and very high risk groups were 0.34% (n=2/580), 15.0% (n=3/20) and 28.3% (n=15/53), respectively (Figure 2A -B, Table S3 ). Notably, the model defined a lowrisk group comprising 88.8% of patients with very high negative predictive value (0.997) ( Table 3) as well as a very-high risk group comprising 8.1% of patients in whom 75% of observed deaths occurred. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. Figure S1B ). Calibration was better than for models predicting death, and was very good for the DL-Poor model (calibration-in-the-large 0.01, calibration slope 1.04). In the DL-Poor model, risk of poor outcome of COVID-19 was associated with increased blood neutrophil counts (OR 1.29 [all OR for continuous variables are per one unit increase in predictor value], p=<0.001), increased creatinine levels (OR 1.01, p=0.002) and decreased blood lymphocyte counts (OR 0.31, p=0.004). Age (OR 1.03, p=0.053) and CRP (OR 1.01, p=0.060) were strongly associated but marginally not statistically significant in adjusted models. Other variables were not statistically significantly associated with death but included because they improved model fit. Using a probability of 0.5 as the prediction cut-off for the DL-Poor model, sensitivity was 0.36, positive predictive value 0.72, specificity 0.99, and negative predictive value 0.94, which is not ideal for clinical use. Using the two alternative probability cut-offs, the percentage of patients with poor outcome in the low, high, and very high-risk groups were 0.63% (n=2/320), 8.9% (n=25/280) and 58.5% (n=31/53), respectively ( Figure 2E-F) . Notably, the model defined a low-risk group comprising 49.0% of patients with very high negative predictive value (0.99) ( Table 3 ) as well as a very-high risk group comprising 8.1% of patients with high positive predictive value (0.60) in which 53% of observed poor outcomes occurred. Table 3 shows the performance (discrimination, calibration and clinical usefulness) of the DL-Death and DL-Poor models without any re-calibration. Both models showed good discrimination measured by the C-index (death: 0.74; poor outcome: 0.72), but calibration was only fair with calibration-in-the-large 0.25 for death and 0.28 for poor outcome, and calibration slope 0.54 for death and 0.54 for poor outcome. Calibration plots are shown in Figure S2 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. We derived and internally validated models to predict the risk of death and poor outcome for COVID-19 patients using a cohort of inpatients in Wuhan, China with a relatively low risk of death (4.3%) and poor outcome (9.7%). Four groups of predictors were examined: demographic information, premorbid conditions, symptoms and laboratory results. Models including demographic and laboratory data (DL models) had the best performance, with older age, higher neutrophil count, lower lymphocyte count, lower platelet count, higher CRP and higher creatinine being predictive of death and poor outcome. In internal validation, DL model performance was good, and it was possible to define low, high and veryhigh risk groups for both outcomes. DL models were externally validated in a cohort of inpatients in London, UK with eight-fold higher risk of death (34.1%) and four-fold higher risk of poor outcome (42.9%) reflecting different care pathways in the two countries ( Figure 1 ) and therefore different casemix of those admitted (Table 1 ). In this very different cohort, DL model performance was only fair to moderate in the external validation dataset, and further external validation is required in datasets with different casemix and severity. Our study has several strengths. First, model derivation used a relatively large cohort of inpatients with COVID-19 from Wuhan, China with 99.6% ascertainment of endpoints, and predictors measured on admission. Derivation and internal validation of prediction models was robust. Second, we externally validated parsimonious risk prediction models using a dataset from a different country with a very different pattern of COVID-19 admission (London, UK). Third, the study is reported in accordance with the TRIPOD guidelines 17 including detailed methods reporting in the supplementary file to facilitate reproducibility and further external validation. 1 The study also has a number of limitations. First, the clinical datasets were collected when healthcare services were under severe strain. Data extraction sought to ensure consistency and accuracy, but was not blind to outcome, and there is missing data in both datasets. Second, the datasets used are smaller than ideal (although as large as or larger than previous studies), and there are relatively few deaths in particular. Our analytical approach aimed to minimise overfitting, but further research using larger, federated datasets is clearly required. Third, clinical assessments at admission such as SpO2 are likely to be important predictors of short-term outcome, 3 but were not available in either dataset. Fourth, our . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint external validation dataset has very different case-mix where spectrum effects are likely to contribute to lower prediction model performance, and only has follow-up to a fixed date (period range: days, although this is a reasonable time-horizon to inform clinical decision-making at hospital admission). Finally, all data available is for people with PCR-diagnosed COVID-19 who are admitted to hospital (decision 3 in Figure 1 ). Although the Wuhan cohort includes many people with less severe disease, in the validation cohort most admitted patients are likely to have severe disease. The findings therefore cannot be assumed to be applicable to decisions made earlier in the course of disease (decision 2 in Figure 1 ). Our univariate findings are similar to other studies reporting univariate associations. Older patients and those with dyspnoea or premorbid conditions (e.g. chronic lung disease, hypertension, cardiovascular disease) are consistently reported to be more susceptible to in-hospital death or poor outcome of COVID-19. 2, 6, 8, 9, 12, 13, 18 Lymphocytopenia, neutrophilia, and markers of inflammation (e.g. CRP), cell death (e.g. lactate dehydrogenase (LDH)) and abnormal coagulation (e.g. d-dimer) have all also been reported as associated with poor outcome. 14 However, the majority of these studies are small and only reported univariate associations. Our findings are similar to the few studies reporting multivariable associations, although the predictors included are not identical. 3, 16 Age is a very strong predictor of outcome, and accounting for age reduces associations with comorbidity and symptoms (although no study including our own is large enough to rule out significant type 2 error i.e. that these are truly not independently associated with outcomes). Lymphocyte count is included in all three models, but Zhou et al also included d-dimer and SOFA score, whereas Xie et al included LDH and SpO2, compared to our inclusion of neutrophil count and CRP. The three models are therefore based on demography, (different) laboratory measures, and (different) clinical measures. The choice of predictors in all three studies is driven by data availability, and by initial univariate analysis identifying different features to model (SOFA score for example is a good predictor in Zhou et al, 16 but not in Xie et al's study. 3 There is a need for larger studies to derive and validate more optimal models suitable for use at different points in the natural history of COVID-19, since for example, comorbidity may (or may not) be an important predictor of admission but less important than laboratory markers for predicting prognosis in those admitted. Xie et al's prediction model was externally validated in a similar dataset from Wuhan with very high mortality in both datasets (derivation cohort 51.8% died, validation cohort 47.6%). Discrimination in external validation was excellent, but calibration was poor until the model intercept and slope were recalibrated to the external validation dataset. We chose not to recalibrate our model to the external dataset, since this will hardly ever be feasible in a tool intended for clinical use for an epidemic disease . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. The findings of this and previous studies are consistent with demography and routinely available laboratory measurements being useful markers of prognosis at the time of hospital admission, indicating that simple risk prediction tools may usefully inform clinical decision-making at this point in the care pathway. However, COVID-19 risk prediction models need external validation before widespread clinical use, because severity and prognosis depends on the setting and system of care. Our Wuhan derivation cohort is closer to a general population cohort (mortality 4.3%) than either our validation cohort in London (mortality 34.1%) or the Wuhan cohorts in other studies (mortality 28.3%, 16 and 51.8% derivation and 47.6% validation 3 ). It may therefore be better suited to decisions about who to admit and who can safely be sent home to self-isolate (decision 2 in Figure 1 ) than to identifying those at highest risk of poor outcome and death in people with more severe disease, for example at the point of potential ICU admission. All proposed prediction models therefore require repeated external validation in multiple datasets to identify the contexts in which they can be most effectively deployed. The DCS and DL models are available as prediction tools in a web portal but we emphasise that they should be used with caution, particularly in settings with different rates of poor outcomes, and are not a replacement for clinical judgement (https://covid.datahelps.life/prediction/). Although very challenging under epidemic conditions, systematic data collection during clinical care and rapid data analysis is therefore essential to understand the natural history of COVID-19 to inform clinical decision-making along the entire care pathway (Figure 1 ). Of note is that our models based solely on demography, symptoms and premorbid conditions did not perform optimally in internal validation, which may indicate that community assessment would optimally include simple blood tests. Further research is needed to examine this, and the relative value of laboratory tests compared to clinical examination (e.g. pulse, respiratory rate, SpO2). Key gaps are in data collection and analysis in early disease and in the community (since minimising admission when it is safe to do so is critical for keeping hospital services functioning) and at the point of ICU admission (since there may or may not be populations where mechanical ventilation is predictably futile). The observed associations between raised neutrophil 19 and low lymphocyte counts 20 with outcomes highlights that understanding and delineating the innate and adaptive immune cascade may inform potential interventional strategies. These include targeting viral replication alongside directed antiinflammatory agents to mitigate the inflammatory cascade. 21 The timing of such interventions is likely to . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint be crucial, and risk prediction models therefore have the additional potential to support targeted and timely treatment initiation. Risk prediction tools have considerable potential to inform clinical decision-making at different points in the COVID-19 care pathway. Existing models, including our own, perform well in derivation datasets but robust and repeated external validation is required before widespread clinical use. Collaborative research to derive and validate tools using larger datasets from all points on the clinical pathway is urgently needed. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint Author contribution QL, HW, BG, HZ, TS, and XW conceived the study. All data from the Wuhan cohort were extracted by XZ, JS, PZ, and CL, double-checked by KW and XW, with disagreements resolved by involving a third independent reviewer (QL). Data from the KCH cohort were extracted by DB and reviewed by JTT. HZ, TS, BG, KD and HW analysed the data, with HZ, TS, BG, KD, HW, XW, RD and JTT leading interpretation of findings. All authors participated in the preparation and approval of the Article. We declare no competing interests. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 3, 2020. Chronic lung disease 3·00 (1·07-8·37) b. Calibration was examined using quintiles of predicted risk. Good calibration is demonstrated by calibration-in-the-large close to zero, and calibration slope close to one. c. Metrics reported on default probability cut-off of 0·5. d. Metrics reported using selected probability cut offs for low risk vs high or very high (0·039 for death, 0·030 for poor outcome), and for low or high risk vs very high (0·069 for death, 0·265 for poor outcome). See Table S3 for details. Figure 1 : Comparison of COVID-19 patients care pathway in China and UK . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint Figure 2 : Risk stratification of COVID-19 patients using predicted probability of DL-Death (top two panels) and DL-Poor models (bottom two panels) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 3, 2020. . https://doi.org/10.1101/2020.04.28.20082222 doi: medRxiv preprint Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal Systematic review and meta-analysis of predictive symptoms and comorbidities for severe COVID-19 infection Development and external validation of a prognostic multivariable model on admission for hospitalized patients with Johns Hopkins Coronavirus Resource Center The Clinical and Chest CT Features Associated with Severe and Critical COVID-19 Pneumonia Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Clinical Characteristics of Coronavirus Disease 2019 in China Characteristics of COVID-19 infection in Beijing Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China Clinical and computed tomographic imaging features of novel coronavirus pneumonia caused by SARS-CoV-2 Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan Clinical characteristics of 113 deceased patients with coronavirus disease 2019: retrospective study Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement