key: cord-0914984-7np944lw
authors: Knight, Stephen R; Ho, Antonia; Pius, Riinu; Buchan, Iain; Carson, Gail; Drake, Thomas M; Dunning, Jake; Fairfield, Cameron J; Gamble, Carrol; Green, Christopher A; Gupta, Rishi; Halpin, Sophie; Hardwick, Hayley E; Holden, Karl A; Horby, Peter W; Jackson, Clare; Mclean, Kenneth A; Merson, Laura; Nguyen-Van-Tam, Jonathan S; Norman, Lisa; Noursadeghi, Mahdad; Olliaro, Piero L; Pritchard, Mark G; Russell, Clark D; Shaw, Catherine A; Sheikh, Aziz; Solomon, Tom; Sudlow, Cathie; Swann, Olivia V; Turtle, Lance CW; Openshaw, Peter JM; Baillie, J Kenneth; Semple, Malcolm G; Docherty, Annemarie B; Harrison, Ewen M
title: Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score
date: 2020-09-09
journal: BMJ
DOI: 10.1136/bmj.m3339
sha: a4c0a1383a60ce98b8b75c5b1aeec68e1eb3c054
doc_id: 914984
cord_uid: 7np944lw

OBJECTIVE: To develop and validate a pragmatic risk score to predict mortality in patients admitted to hospital with coronavirus disease 2019 (covid-19). DESIGN: Prospective observational cohort study. SETTING: International Severe Acute Respiratory and emerging Infections Consortium (ISARIC) World Health Organization (WHO) Clinical Characterisation Protocol UK (CCP-UK) study (performed by the ISARIC Coronavirus Clinical Characterisation Consortium—ISARIC-4C) in 260 hospitals across England, Scotland, and Wales. Model training was performed on a cohort of patients recruited between 6 February and 20 May 2020, with validation conducted on a second cohort of patients recruited after model development between 21 May and 29 June 2020. PARTICIPANTS: Adults (age ≥18 years) admitted to hospital with covid-19 at least four weeks before final data extraction. MAIN OUTCOME MEASURE: In-hospital mortality. RESULTS: 35 463 patients were included in the derivation dataset (mortality rate 32.2%) and 22 361 in the validation dataset (mortality rate 30.1%). The final 4C Mortality Score included eight variables readily available at initial hospital assessment: age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation, level of consciousness, urea level, and C reactive protein (score range 0-21 points). The 4C Score showed high discrimination for mortality (derivation cohort: area under the receiver operating characteristic curve 0.79, 95% confidence interval 0.78 to 0.79; validation cohort: 0.77, 0.76 to 0.77) with excellent calibration (validation: calibration-in-the-large=0, slope=1.0). Patients with a score of at least 15 (n=4158, 19%) had a 62% mortality (positive predictive value 62%) compared with 1% mortality for those with a score of 3 or less (n=1650, 7%; negative predictive value 99%). Discriminatory performance was higher than 15 pre-existing risk stratification scores (area under the receiver operating characteristic curve range 0.61-0.76), with scores developed in other covid-19 cohorts often performing poorly (range 0.63-0.73). CONCLUSIONS: An easy-to-use risk stratification score has been developed and validated based on commonly available parameters at hospital presentation. The 4C Mortality Score outperformed existing scores, showed utility to directly inform clinical decision making, and can be used to stratify patients admitted to hospital with covid-19 into different management groups. The score should be further validated to determine its applicability in other populations. STUDY REGISTRATION: ISRCTN66726260

Disease resulting from infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a high mortality rate with deaths predominantly caused by respiratory failure. 1 As of 1 September 2020, doi: 10.1136/bmj.m3339 | BMJ 2020;370:m3339 | the bmj over 25 million people had confirmed coronavirus disease 2019 (covid-19) worldwide and at least 850 000 people had died from the disease. 2 3 As hospitals around the world are faced with an influx of patients with covid-19, there is an urgent need for a pragmatic risk stratification tool that will allow the early identification of patients infected with SARS-CoV-2 who are at the highest risk of death to guide management and optimise resource allocation.

Prognostic scores attempt to transform complex clinical pictures into tangible numerical values. Prognostication is more difficult when dealing with a severe pandemic illness such as covid-19 because strain on healthcare resources and rapidly evolving treatments alter the risk of death over time. Early information has suggested that the clinical course of a patient with covid-19 is different from that of pneumonia, seasonal influenza, or sepsis. 4 Most patients with severe covid-19 have developed a clinical picture characterised by pneumonitis, profound hypoxia, and systemic inflammation affecting multiple organs. 1 A recent review identified many prognostic scores used for covid -19, 5 which varied in their setting, predicted outcome measure, and the clinical parameters included. The large number of risk stratification tools reflects difficulties in their application, with most scores showing moderate performance at best and no benefit to clinical decision making. 6 7 Many novel covid- 19 prognostic scores have been found to have a high risk of bias, which could reflect development in small cohorts, and many have been published without clear details of model derivation and testing. 5 Therefore, a risk stratification tool within a large national cohort of patients admitted to hospital with covid-19 is needed with clear development and validation details.

Our aim was to develop and validate a pragmatic, clinically relevant risk stratification score that uses routinely available clinical information at hospital presentation to predict in-hospital mortality in patients admitted to hospital with covid-19. We then aimed to compare this score with existing prognostic models.

The International Severe Acute Respiratory and emerging Infections Consortium (ISARIC) World Health Organization (WHO) Clinical Characterisation Protocol UK (CCP-UK) study is an ongoing prospective cohort study. The study is being performed by the ISARIC Coronavirus Clinical Characterisation Consortium (ISARIC-4C) in 260 hospitals across England, Scotland, and Wales (National Institute for Health Research Clinical Research Network Central Portfolio Management System ID 14152). The protocol and further study details are available online. 8 Model development and reporting followed the TRIPOD (transparent reporting of a multivariable prediction model for individual prediction or diagnosis) guidelines. 9 The study is being conducted according to a predefined protocol (appendix 1).

The study recruited consecutive patients aged 18 years and older with a completed index admission to one of 260 hospitals in England, Scotland, or Wales. 8 Reverse transcriptase polymerase chain reaction was the only mode of testing available during the period of study. The decision to test was at the discretion of the clinician attending the patient, and not defined by protocol. The enrolment criterion "high likelihood of infection" reflected that a preparedness protocol cannot assume a diagnostic test will be available for an emergent pathogen. In this activation, site training emphasised the importance of only recruiting proven cases.

Demographic, clinical, and outcome data were collected by using a prespecified case report form. Comorbidities were defined according to a modified Charlson comorbidity index. 10 Comorbidities collected were chronic cardiac disease, chronic respiratory disease (excluding asthma), chronic renal disease (estimated glomerular filtration rate ≤30), mild to severe liver disease, dementia, chronic neurological conditions, connective tissue disease, diabetes mellitus (diet, tablet, or insulin controlled), HIV or AIDS, and malignancy. These conditions were selected a priori by a global consortium to provide rapid, coordinated clinical investigation of patients presenting with any severe or potentially severe acute infection of public interest and enabled standardisation.

Clinician defined obesity was also included as a comorbidity owing to its probable association with adverse outcomes in patients with covid-19. 11 12 The clinical information used to calculate prognostic scores was taken from the day of admission to hospital. 13 A practical approach was taken to sample size requirements. 14 We used all available data to maximise the power and generalisability of our results. Model reliability was assessed by using a temporally distinct validation cohort with geographical subsetting, together with sensitivity analyses.

The primary outcome was in-hospital mortality. This outcome was selected because of the importance of the early identification of patients likely to develop severe illness from SARS-CoV-2 infection (a rule in test). We chose to restrict analysis of outcomes to patients who were admitted more than four weeks before final data extraction (29 June 2020) to enable most patients to complete their hospital admission.

independent predictor variables A reduced set of potential predictor variables was selected a priori, including patient demographic information, common clinical investigations, and parameters consistently identified as clinically important in covid-19 cohorts following the methods described by Wynants and colleagues (appendix 2). 5 Candidate predictor variables were selected based on by copyright. on 19 November 2020 at Bodleian Libraries of the University of Oxford. Protected http://www.bmj.com/ BMJ: first published as 10.1136/bmj.m3339 on 9 September 2020. Downloaded from three common criteria 15 : patient and clinical variables  known to influence outcome in pneumonia and flulike illness; clinical biomarkers previously identified  within the literature as potential predictors in patients  with covid-19; values available for at least two thirds of  patients within the derivation cohort. Because our overall aim was to develop an easy-touse risk stratification score, we made the decision to include an overall comorbidity count for each patient within model development giving each comorbidity equal weight, rather than individual comorbidities. Recent evidence suggests an additive effect of comorbidity in patients with covid-19, with increasing number of comorbidities associated with poorer outcomes. 16 

Missing values for potential candidate variables were handled by using multiple imputation with chained equations, under the missing at random assumption (appendix 6). Ten sets, each with 10 iterations, were imputed using available explanatory variables for both cohorts (derivation and validation). The outcome variable was included as a predictor in the derivation dataset but not the validation dataset. All model derivation and validation was performed in imputed datasets, with Rubin's rules 17 used to combine results. Models were trained by using all available data up to 20 May 2020. The primary intention was to create a pragmatic model for bedside use not requiring complex equations, online calculators, or mobile applications. An a priori decision was therefore made to categorise continuous variables in the final prognostic score.

We used a three stage model building process ( fig  1) . Firstly, generalised additive models were built incorporating continuous smoothed predictors (penalised thin plate splines) in combination with categorical predictors as linear components. A criterion based approach to variable selection was taken based on the deviance explained, the unbiased risk estimator, and the area under the receiver operating characteristic curve. Secondly, we visually inspected plots of component smoothed continuous predictors for linearity, and selected optimal cut-off values by using the methods of Barrio and colleagues. 18 Lastly, final models using categorised variables were specified with least absolute shrinkage and selection operator logistic regression. L1 penalised coefficients were derived using 10-fold cross validation to select the value of lambda (minimised cross validated sum of squared residuals). We converted shrunk coefficients to a prognostic index with appropriate scaling to create the pragmatic 4C Mortality Score (where 4C stands for Coronavirus Clinical Characterisation Consortium).

We used machine learning approaches in parallel for comparison of predictive performance. Given issues with interpretability, this was intended to provide a best-in-class comparison of predictive performance when accounting for any complex underlying interactions. Gradient boosting decision trees were used (XGBoost). All candidate predictor variables identified were included within the model, except for those with high missing values (>33%). We retained individual major comorbidity variables within the model to determine whether inclusion improved predictive performance. An 80%/20% random split of the derivation dataset was used to define train and test sets. The validation datasets were held back and not used in the training process. We used a mortality label and design matrix of centred or standardised continuous and categorical variables including all candidate variables to train gradient boosted trees minimising the binary classification error rate (defined as number of wrong cases divided by number of all cases). Hyperparameters were tuned, including the learning rate and maximum tree depth, to maximise the area under the receiver operating characteristic curve in the test set. This approach affords flexibility in the handling of missing data; therefore, two models were trained and optimised, one using imputed data and the other modelling missingness in complete case data.

We assessed discrimination for all models by using the area under the receiver operating characteristic curve in the derivation cohort, with 95% confidence intervals calculated by bootstrapped resampling (2000 samples). A value of 0.5 indicates no predictive ability, 0.8 is considered good, and 1.0 is perfect. 19 We assessed overall goodness of fit with the Brier score, 20 a measure to quantify how close predictions are to the truth. The score ranges between 0 and 1, where smaller values indicate superior model performance. We plotted model calibration curves to examine agreement between predicted and observed risk across deciles of mortality risk to determine the presence of over or under prediction. Risk cut-off values were defined by the total point score for an individual, which represented low (<2% mortality rate), intermediate (2-14.9%), or high risk (≥15%) groups, similar to commonly used pneumonia risk stratification scores. 21 22 We performed sensitivity analyses by using complete case data. Model discrimination was also checked in ethnic groups and by sex using imputed datasets.

Patients entered into the ISARIC WHO CCP-UK study after 20 May 2020 were included in a separate validation cohort (fig 1) . We determined discrimination, calibration, and performance across a range of clinically relevant metrics. To avoid bias in the assessment of outcomes, patients who were admitted within four weeks of data extraction on 29 June 2020 were excluded. We included patients without an outcome after four weeks and considered to have had no event.

A sensitivity analysis was also performed, with stratification of the validation cohort by geographical location. We selected this geographical categorisation based on well described economic and health inequalities between the north and south of the United Kingdom. 23 24 Recent analysis has shown the impact of deprivation on risk of dying with covid-19. 25 As a result, by copyright. on 19 November 2020 at Bodleian Libraries of the University of Oxford. Protected http://www.bmj.com/ doi: 10.1136/bmj.m3339 | BMJ 2020;370:m3339 | the bmj population differences between regions could change the discriminatory performance of risk stratification scores. Two geographical cohorts were created, based on north-south geographical locations across the UK as defined by Hacking and colleagues. 23 We performed a further sensitivity analysis to determine model performance in ethnic minority groups given the reported differences in covid- 19 All tests were two tailed and P values less than 0.05 were considered statistically significant. We used R (version 3.6.3) with the finalfit, mice, glmnet, pROC, recipes, xgboost, rmda, and tidyverse packages for all statistical analysis.

comparison with existing risk stratification scores All derived models in the derivation dataset were compared within the validation cohort with existing scores. We assessed model performance by using the area under the receiver operating characteristic curve statistic, sensitivity, specificity, positive predictive value, and negative predictive value. Existing risk stratification scores were identified through a systematic literature search of Embase, WHO Medicus, and Google Scholar databases. We used the search terms "pneumonia," "sepsis," "influenza," "COVID-19," "SARS-CoV-2," "coronavirus" combined with "score" and "prognosis." We applied no language or date restrictions. The last search was performed on 1 July 2020. Risk stratification tools were included whose variables were available within the database and had accessible methods for calculation.

We calculated performance characteristics according to original publications, and selected score cutoff values for adverse outcomes based on the most commonly used criteria identified within the literature. Cut-off values were the score value for which the patient was considered at low or high risk of adverse outcome, as defined by the study authors. Patients with one or more missing input variables were omitted for that particular score.

We also performed a decision curve analysis. 27 Briefly, assessment of the adequacy of clinical prediction models can be extended by determining clinical utility. By using decision curve analysis, we can make a clinical judgment about the relative value of benefits (treating a true positive) and harms (treating a false positive) associated with a clinical prediction tool. The standardised net benefit was plotted against the threshold probability for considering a patient high risk for age alone and for the best discriminating models applicable to more than 50% of patients in the validation cohort.

This was an urgent public health research study in response to a Public Health Emergency of International Concern. Patients or the public were not involved in the design, conduct, or reporting of this rapid response research.

We collected data from 35 463 patients between 6 February 2020 and 20 May 2020 in the derivation cohort; 1275 (3.6%) patients had no outcome recorded and were considered alive. The overall mortality rate was 32.2% (11 426 patients). The median age of patients in the cohort was 73 years (interquartile range 59-83); 41.7% (14 741) were female and 76.0% (26 966) had at least one comorbidity. Table 1 shows demographic and clinical characteristics for the derivation and validation datasets.

We identified 41 candidate predictor variables measured at hospital admission for model creation (fig 1, appendix 2) . After the creation of a composite variable containing all seven individual comorbidities and the exclusion of 13 variables owing to high levels of missing values, 21 variables remained.

We identified eight important predictors of mortality by using generalised additive modelling with multiply imputed datasets: age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation, Glasgow coma scale, urea level, and C reactive protein (for variable selection process, see appendix 3). Given the need for a pragmatic score for use at the bedside, continuous variables were converted to factors with cut-off values chosen by using component smoothed functions (on linear predictor scale) from generalised additive modelling (appendix 4).

On entering variables into a penalised logistic regression model (least absolute shrinkage and selection operator), all variables were retained within the final model (appendix 5). We converted penalised regression coefficients into a prognostic index by using appropriate scaling (4C Mortality Score range 0-21 points; table 2).

The 4C Mortality Score showed good discrimination for death in hospital within the derivation cohort (table  3) , with performance approaching that of the XGBoost model. The 4C Mortality Score showed good calibration (calibration intercept=0, slope=1, Brier score 0.170) across the range of risk and no adjustment to the model was required (appendix 11).

The validation cohort included data from 22 361 patients collected between 21 May 2020 and 29 June 2020 who had at least four weeks of follow-up; 743 (3.3%) patients had no outcome recorded and were considered alive. The overall mortality rate was 30.1% (6729 patients). The median age of patients in the cohort was 76 (interquartile range 60-85) years; 10 178 (45.6%) were female and 17 263 (77%) had at least one comorbidity (table 1) .

Discrimination of the 4C Mortality Score in the validation cohort was similar to that of the XGBoost model (table 3) . Calibration was also found to be excellent in the validation cohort: overall observed (30.1%) versus predicted (30.1%) mortality was equal (calibration-in-the-large=0) and calibration was excellent over the range of risk (slope=1, Brier score 0.171; fig 2) . The 4C Mortality Score showed good performance in clinically relevant metrics across a range of cut-off values (table 4) .

Four risk groups were defined with corresponding mortality rates determined ( comparison with existing tools We performed a systematic literature search and identified 15 risk stratification scores that could be applied to these data. 6 22 28-40 The 4C Mortality Score compared well against these existing risk stratification scores in predicting in-hospital mortality (table 6, The number of patients in whom risk stratification scores could be applied differed owing to certain variables not being available, either because of missingness or because they were not tested for or recorded in clinical practice. Seven scores could be applied to fewer than 2000 patients (<10%) in the validation cohort owing to the requirement for biomarkers or physiological parameters that were not routinely captured (eg, lactate dehydrogenase). Decision curve analysis showed that the 4C Mortality Score had better clinical utility across a wide range of threshold risks compared with the best performing existing scores applicable to more than 50% of the validation cohort (A-DROP and CURB65; fig 3, sensitivity analysis Sensitivity analyses that used complete case data showed similar discrimination (appendix 7) and performance metrics (appendices 8 and 9) to analyses that used the imputed dataset. After stratification of the validation cohort into two geographical cohorts (validation north and south; appendix 14), discrimination remained similar for the 4C Mortality Score in the north subset (area under the receiver operating characteristic curve 0.77, 95% confidence interval 0.76 to 0.78) and south subset (0.76, 0.75 to 0.77; appendix 6).

Finally, we checked discrimination of the 4C Mortality Score by sex and ethnic group (appendix 10). Discrimination was the same in men (area under the receiver operating characteristic curve 0.77, 95% confidence interval 0.76 to 0. 

We have developed and validated the eight variable 4C Mortality Score in a UK prospective cohort study of 57 824 patients admitted to hospital with covid-19. The 4C Mortality Score uses patient demographics, clinical observations, and blood parameters that are commonly available at the time of hospital admission and can accurately characterise the population of patients at high risk of death in hospital. The score compared favourably with other models, including best-in-class machine learning techniques, and showed consistent performance across the validation cohorts, including good clinical utility in a decision curve analysis.

Model performance compared well against other generated models, with minimal loss in discrimination despite its pragmatic nature. A machine learning approach showed a marginal improvement in discrimination, but at the cost of interpretability, the requirement for many more input variables, and the need for an app or website calculator that might limit use at the bedside given personal protective equipment requirements. The 4C Mortality Score showed good applicability within the validation cohort and consistency across all performance measures.

comparison with other studies The 4C Mortality Score contains parameters reflecting patient demographics, comorbidity, physiology, and inflammation at hospital admission; it shares characteristics with existing prognostic scores for while raised urea is also a common component. 21 22 28 Increasing age is a strong predictor of in-hospital mortality in our cohort of patients admitted with covid-19 and is commonly included in other existing covid-19 scores, 37 41 42 together with comorbidity 37 41 42 and raised C reactive protein. 40 43 Discriminatory performance of existing covid-19 scores applied to our cohort was lower than reported in derivation cohorts (DL score 0.74, COVID-GRAM 0.88, Xie score 0.98). 37 38 40 The use of small inpatient cohorts from Wuhan, China for model development might have resulted in overfitting, limiting generalisability in other cohorts. 38 40 The Xie score demonstrated the highest discriminatory power (0.73), and included age, lymphocyte count, lactate dehydrogenase, and peripheral oxygen saturations. However, we were only able to apply this score for less than 10% of the validation cohort because lactate dehydrogenase is not routinely measured on hospital admission in the UK.

Owing to challenges of clinical data collection during an epidemic, missing data are common, with choice of predictors influenced by data availability. 40 Complete case analysis often leads to exclusion of a substantial proportion of the original sample, subsequently leading to a loss of precision and power. 44 However, the assessment of missing data on model performance in novel covid-19 risk stratification scores has been limited 37 or unexplored, 38 40 potentially introducing bias and further limiting generalisability to other cohorts. We found discriminatory performance in both derivation and validation cohorts remained similar after the imputation of a wide range of variables, 41 further supporting the validity of our findings.

The presence of comorbidities is handled differently in covid-19 prognostic scores; comorbidities might be included individually, 40 42 given equal weight, 37 or found to have no predictive effect. 38 16 In our cohort, the inclusion of individual comorbidities within the machine learning model conferred minimal additional discriminatory performance, supporting the inclusion of an overall comorbidity count.

The ISARIC WHO CCP-UK study represents a large prospectively collected cohort admitted to hospital with covid-19 and reflects the clinical data available in most economically developed healthcare settings. We derived a clinically applicable prediction score with clear methods and tested it against existing risk stratification scores in a large patient cohort admitted to hospital. The score compared favourably with other prognostic tools, with good to excellent discrimination, calibration, and performance characteristics. The 4C Mortality Score has several methodological advantages over current covid-19 prognostic scores. The use of penalised regression methods and an eventto-variable ratio greater than 100 reduce the risk of overfitting. 45 46 The use of parameters commonly available at first assessment increases its clinical applicability, avoiding the requirement for markers often only available after a patient has been admitted to a critical care facility. 4 47 Of course a model developed in a specific dataset should describe that dataset best.

However, by including comparisons with pre-existing models, reassurance is provided that equivalent performance cannot be delivered with a simple tool already in use.

Additionally, in a pandemic, baseline infection rates and patient characteristics might change by time and geography. This motivated the temporal and geographical validation, which is crucial to the reporting of this study. These sensitivity analyses showed that score performance continued to be robust over time and geography.

Our study has limitations. Firstly, we were unable to evaluate the predictive performance of several existing scores that require a large number of parameters (for example, APACHE II 48 ), as well as several other covid-19 prognostic scores that use computed tomography findings or uncommonly measured biomarkers. 5 Additionally, several potentially relevant comorbidities, such as hypertension, previous myocardial infarction, and stroke, 16 were not included in data collection. The inclusion of these comorbidities might have impacted upon or improved the performance and generalisability of the 4C Mortality Score.

Secondly, a proportion of recruited patients (3.3%) had incomplete episodes. Selection bias is possible if patients with incomplete episodes, such as those with prolonged hospital admission, had a differential mortality risk to those with completed episodes. Nevertheless, the size of our patient cohort compares favourably to other datasets for model creation. The patient cohort on which the 4C Mortality Score was derived comprised patients admitted to hospital who were seriously ill (mortality rate of 32.2%) and were of advanced age (median age 73 years). This model is not for use in the community and could perform differently in populations at lower risk of death. Further external validation is required to determine whether the 4C Mortality Score is generalisable among younger patients and in settings outside the UK.

We have derived and validated an easy-to-use eight variable score that enables accurate stratification of patients with covid-19 admitted to hospital by mortality risk at hospital presentation. Application within the validation cohorts showed this tool could guide clinician decisions, including treatment escalation.

A key aim of risk stratification is to support clinical management decisions. Four risk classes were identified and showed similar adverse outcome rates across the validation cohort. Patients with a 4C Mortality Score falling within the low risk groups (mortality rate 1%) might be suitable for management in the community, while those within the intermediate risk group were at lower risk of mortality (mortality rate 10%; 22% of the cohort) and might be suitable for ward level monitoring. Similar mortality rates have been identified as an appropriate cut-off value in pneumonia risk stratification scores (CURB-65 and The lead author (the manuscript's guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: ISARIC-4C has a public facing website isaric4c.net and twitter account (@CCPUKstudy). We are engaging with print and internet press, television, radio, news, and documentary programme makers. We will explore distribution of findings with The Asthma UK and British Lung Foundation Partnership and take advice from NIHR Involve and GenerationR Alliance Young People's Advisory Groups.

Provenance and peer review statement: Not commissioned; externally peer reviewed. 

Griffiths; data analysts: Lisa Norman

Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study

Johns Hopkins Coronavirus Resource Center. COVID-19 map

COVID-19 situation reports

Influenzaassociated pneumonia as reference to assess seriousness of coronavirus disease (COVID-19)

Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal

Comparison of clinical characteristics and performance of pneumonia severity score and CURB-65 among younger adults, elderly and very old subjects

Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study

Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement

A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

Obesity Is a Risk Factor for Severe COVID-19 Infection: Multiple Potential Mechanisms

High prevalence of obesity in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) requiring invasive mechanical ventilation

ISARIC Working Group 3, ISARIC Council. Open source clinical science for emerging infections

Calculating the sample size required for developing a clinical prediction model

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration

Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis

Multiple imputation for nonresponse in surveys

Use of generalised additive models to categorise continuous variables in clinical prediction

Receiver operating characteristic curve in diagnostic test assessment

Assessing the performance of prediction models: a framework for traditional and novel measures

Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study

A prediction rule to identify low-risk patients with community-acquired pneumonia

Trends in mortality from 1965 to 2008 across the English north-south divide: comparative observational study

North-South disparities in English mortality1965-2015: longitudinal population study

Deaths involving COVID-19 by local area and socioeconomic deprivation

Ethnicity and Outcomes from COVID-19. The ISARIC CCP-UK Prospective Observational Cohort Study of Hospitalised Patients

Decision curve analysis: a novel method for evaluating prediction models

Japanese Respiratory Society. The JRS guidelines for the management of community-acquired pneumonia in adults: an update and new recommendations

CRB-65 predicts death from community-acquired pneumonia

Improvement of CRB-65 as a prognostic tool in adult patients with communityacquired pneumonia

Expanded CURB-65: a new score system predicts severity of community-acquired pneumonia with superior efficiency

National Early Warning Score (NEWS) 2

The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)

SMART-COP: a tool for predicting the need for intensive respiratory or vasopressor support in community-acquired pneumonia

The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine

Prospective comparison of severity scores for predicting clinically relevant outcomes for patients hospitalized with communityacquired pneumonia

Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19

Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19

Surgisphere's COVID-19 machine learning platform receives international endorsement -now clinically in use at >1,000 hospitals worldwide

Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London

Estimation of risk factors for COVID-19 mortality -preliminary results

Seek COVER: Development and validation of a personalized risk calculator for COVID-19 outcomes in an international network

Prognostic value of C-reactive protein in patients with COVID-19

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

How to develop a more accurate risk prediction model when there are few events

Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates

Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images

APACHE II: a severity of disease classification system