key: cord-0448765-uzgp7u1n
authors: Dodd, Lori E; Follmann, Dean; Wang, Jing; Koenig, Franz; Korn, Lisa L; Schoergenhofer, Christian; Proschan, Michael; Hunsberger, Sally; Bonnett, Tyler; Makowski, Mat; Belhadi, Drifa; Wang, Yeming; Cao, Bin; Mentre, France; Jaki, Thomas
title: Endpoints for randomized controlled clinical trials for COVID-19 treatments
date: 2020-06-09
journal: nan
DOI: nan
sha: 73eeae7a6d4b86c87623113ccadf6c018a6231de
doc_id: 448765
cord_uid: uzgp7u1n

Introduction: Endpoint choice for randomized controlled trials of treatments for COVID-19 is complex. A new disease brings many uncertainties, but trials must start rapidly. COVID-19 is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks and can end in death. While improvement in mortality would provide unquestionable evidence about clinical significance of a treatment, sample sizes for a study evaluating mortality are large and may be impractical. Furthermore, patient states in between"cure"and"death"represent meaningful distinctions. Clinical severity scores have been proposed as an alternative. However, the appropriate summary measure for severity scores has been the subject of debate, particularly in relating to the uncertainty about the time-course of COVID-19. Outcomes measured at fixed time-points may risk missing the time of clinical benefit. An endpoint such as time-to-improvement (or recovery), avoids the timing problem. However, some have argued that power losses will result from reducing the ordinal scale to a binary state of"recovered"vs"not recovered."Methods: We evaluate statistical power for possible trial endpoints for COVID-19 treatment trials using simulation models and data from two recent COVID-19 treatment trials. Results: Power for fixed-time point methods depends heavily on the time selected for evaluation. Time-to-improvement (or recovery) analyses do not specify a time-point. Time-to-event approaches have reasonable statistical power, even when compared to a fixed time-point method evaluated at the optimal time. Discussion: Time-to-event analyses methods have advantages in the COVID-19 setting, unless the optimal time for evaluating treatment effect is known in advance. Even when the optimal time is known, a time-to-event approach may increase power for interim analyses.

Endpoint choice for randomized controlled trials of treatments for novel coronavirus-induced disease is complex. A new disease brings many uncertainties, but trials must start rapidly to identify treatments that can be used as part of the outbreak response. COVID-19 presentation is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks to over a month and can end in death. While improvement in mortality would provide unquestionable evidence about clinical significance of a treatment, sample sizes for a study evaluating mortality are large and may be impractical, particularly given a multitude of putative therapies to evaluate. Furthermore, patient states in between "cure" and "death" represent meaningful distinctions. Clinical severity scores have been proposed as an alternative. However, the appropriate summary measure for severity scores has been the subject of debate, particularly in the context of much uncertainty about the time-course of COVID-19. Outcomes measured at fixed time-points, such as a test comparing severity scores between treatment and control at day 14, may risk missing the time of clinical benefit. An endpoint such as time-to-improvement (or recovery), avoids the timing problem. However, some have argued that power losses will result from reducing the ordinal scale to a binary state of "recovered" vs "not recovered."

We evaluate statistical power for possible trial endpoints for COVID-19 treatment trials using simulation models and data from two recent COVID-19 treatment trials.

Power for fixed-time point methods depends heavily on the time selected for evaluation. Time-toimprovement (or recovery) analyses do not specify a time-point. Time-to-event approaches have reasonable statistical power, even when compared to a fixed time-point method evaluated at the optimal time.

Designing clinical trials for treatments for novel infectious disease brings many challenges, especially during a rapidly evolving pandemic. A new disease brings uncertainties arising from an imperfect understanding about the illness, little information about putative treatments, and complexities in measuring relevant patient outcomes. A pandemic adds an overloaded medical system with limited resources for research, heightened pressure to find effective treatments quickly, and unpredictability about potential case numbers. Studies need to start quickly for enrollments to track the epidemic curve.

However, early on, information about endpoints may be lacking. This means trial design should be appropriately flexible to respond to new information, but without compromising scientific rigor.

COVID-19 has a heterogeneous presentation and clinical course, ranging from asymptomatic to critical disease (Table 1 ). 1 While most infected patients present with asymptomatic or mild disease, some develop severe or critical illness that can result in acute respiratory distress syndrome and death.

The most common symptoms are fever, dry cough, dyspnea, chest pain, fatigue and myalgia, while less common symptoms are headache, dizziness, abdominal pain, diarrhea, nausea and vomiting. Most patients present with signs of bilateral pneumonia 2 . Neurologic symptoms including taste and smell disorders have been reported, with rare case reports of severe central nervous system affections. 3 Thrombotic complications in critically ill patients have also been observed. 4 Importantly, some COVID-19 patients recover quickly with limited (or no) complications, while patients suffering from severe disease may take 6-8 weeks or longer for full recovery. 5 This broad range of disease severity makes finding a common endpoint for all COVID-19 trials impractical. Endpoints for a study population representing a broad spectrum of disease may be different than those for a study with a narrow spectrum of disease.

We describe key considerations for selecting endpoints for COVID-19 treatment trials. We evaluate endpoints according to clinical relevance, ease and reliability of measurement, interpretability of its associated statistical analysis, and statistical efficiency. We discuss differences between fixed time-point endpoints and those that naturally incorporate changes over time. We evaluate statistical efficiency of multiple approaches with simulation models, as well as using data from two published COVID-19 randomized trials. 6, 7 

Treatments for COVID-19 are intended to be curative, with the goal that the patient will survive and ultimately return to normal function. This contrasts with a disease such as stroke in which the goal of a treatment may be to reduce stroke-induced impairments that occur across a spectrum. 8 Likewise, a benefit on mortality would be strong evidence of an effect, but deaths are relatively rare. A study powered for mortality benefit would require a large sample size. For example, a sample size of around 2,000 is needed (for a two-arm study) to detect a hazard ratio (for death) of 0.65 with 85% power and a type I error rate of 5% with a 10% mortality rate. Lower mortality rates require even larger studies. In a setting with multiple putative therapies, studies powered for mortality will restrict the number of therapies evaluated, which may slow provision of effective treatments to support the outbreak response.

Furthermore, multiple clinical states in between "death" and "cure" represent meaningful patient states. The World Health Organization (WHO) proposed an ordinal scale ranging from death to full health, with states in between corresponding to the need for hospitalization, oxygen support (including type of support needed), and need for additional medical support (Table 1) . 9 These states are important markers of how a patient feels and of disease progression (or improvement). Mechanical ventilation (intubation) marks a considerable worsening, as intubated patients often require treatment with sedatives and even paralytics to address patient discomfort and maximize therapy. Intubation is also associated with a host of complications leading to additional mortality and morbidity, such as ventilator-associated pneumonia 10 , GI bleeding 11 , and severe physical deconditioning. In a case series of 5,700 COVID-19 patients in New York, considerable numbers of patients remained intubated during the entire study 12 . Shortening the duration in a state like intubation or avoiding intubation altogether is of direct clinical benefit.

Timing of endpoint evaluation is another important consideration. A treatment effect that occurs early but dissipates over time may not be clinically meaningful. A treatment effect may be missed if evaluation is too early, before an intervention has had time for an effect. Timing of measurement is therefore crucial and can be particularly challenging in a novel disease with substantial heterogeneity.

Time-to-event endpoints do not require specifying a fixed time (just the observation interval) and are more robust in this regard. We note that longitudinal models of other endpoints are possible, such as a mixed-effects proportional odds model 13 but are not commonly used. Table 2 describes multiple endpoints considered for COVID-19, largely from the perspective of a definitive (Phase 3) trial. Endpoints for earlier phase studies may focus on evaluating mechanism (e.g., targeting a specific pathway) or evaluating activity so that "go/no-go" decisions for further evaluation in larger trials can be made. Endpoints are evaluated according to ease of measurement, reproducibility, whether they are clinically meaningful, and their ability to capture multiple clinical states and the timecourse of disease.

Meaningfulness and reproducibility can be distorted when states are influenced by external factors, as may happen when patient numbers exceed hospital capacity. For example, ordinal categories become less meaningful when mechanical ventilators are not available and patients who would normally be in this category are shifted to others (or when guidelines recommending early intubation are followed more rigorously in some centers than others). Further, non-invasive ventilators or highflow oxygen devices may not be utilized in settings where personal protection equipment is limited (or in the absence of negative pressure rooms) due to concerns about health-care worker infection from viral aerosolization. Similarly, hospitals exceeding capacity may discharge patients early due to demand for beds. Additional concerns have been raised that one-unit changes in the ordinal scale are not equally important. For example, extubation may represent a more meaningful improvement than being moved from high-flow oxygen to standard, low-flow oxygen. Both improvements have implications on health system resources; however, from the patient view, they may not be equal.

Endpoints used in other diseases have been considered. For example, the National Early Warning Score (NEWS2) 14 captures clinical deterioration in patients, but is not specific to COVID-19 and might not be sensitive enough for this disease. Other measures, such as SOFA 15 are well validated but are specific to ICU patients. Patients who require intensive care have a high mortality of approximately 30 to 60%. 16, 17, 18 Multiple laboratory parameters are associated with deterioration of clinical status, including surrogates for organ injury and markers of systemic inflammation, e.g., markers of cardiac injury (troponin T), elevated liver transaminases, creatinine levels, procalcitonin levels, D-Dimer concentrations, fibrinogen 19 , lactate dehydrogenase 20 , and lymphopenia. 21 Elevations in C-reactive protein (CRP) and ferritin, further reflective of high levels of systemic inflammation, are also associated with severe disease, consistent with the observed hyperinflammatory syndrome that appears to occur in a subset of patients. 21 While tracking these parameters is important to better understand COVID-19, they do not directly measure how a patient functions or feels and may not correlate with clinical outcome. In supplementary Table S1, we provide examples of endpoint choices for several COVID clinical trials.

To evaluate statistical considerations in more depth, we focus on four outcomes: time to death, time to recovery/improvement, ordinal scale at a fixed time point, and ordinal scale averaged across time points. We note that, with time-to-improvement/recovery models, the competing event of death requires special handling. Patients who die during follow-up should not be censored at time of death, as that assumes their recovery time would be like all who remain alive and unrecovered at that time. To state the obvious, once dead, a patient cannot recover. A death must be set to an infinite recovery time, so that at the end of follow-up, the patient is counted as "not recovered." We achieve the same objective by censoring deaths at the last observation day. Therefore, patients censored on the last observation day reflect two different states: death and failure to recover by day 28. Standard survival analysis methods can then be applied, but the "hazard" ratio refers to the instantaneous risk of a good outcome. Hence, we use the term "recovery rate ratio" (or "improvement rate ratio"). We note that, with administrative censoring from staggered entry before day 28, this approach corresponds to the Fine-Gray approach to competing risks. 22 With staggered entry, Fine-Gray censors deaths at the time they would have been censored had they not died (i.e., time of administrative censoring).

Discretizing a continuous variable is commonly thought to result in a loss of efficiency. 23, 24 Similarly, reductions in efficiency may occur when an ordinal scale is discretized into a binary endpoint and others have emphasized power advantages of a proportional odds model. 25, 26 Graubard and Korn note that rank-based methods (such as the proportional odds model) may have lower power when the marginal sums are not nearly uniform, compared to methods that use pre-assigned numeric values (scores) for categories of the ordinal scale. 27 Nonetheless, collapsing information can sometimes increase power. For example, if the distribution of a continuous endpoint is skewed or has wide tails, rank-based methods, or even dichotomizing and using a test of proportions, can be more powerful than a t-test. Relatedly if assignment to some ordinal categories is haphazard, methods that collapse categories can provide more power. Dichotomizing can also be useful when there is a clear cut-point beyond which negative sequelae of a disease manifest, such as with hemoglobin A1c or fasting glucose in diabetes. Table S2 provides a description of many statistical analysis options.

The endpoints considered are difficult to compare theoretically with respect to power. For example, time to recovery dichotomizes an ordinal scale into "recovered" and "not recovered", so one might assume there should be a loss in power associated with using this approach. However, time to recovery incorporates health states on multiple days instead of just one, which can increase power. For instance, if the proportional odds model is evaluated so early that no one has recovered (or so late that everyone has recovered), power for the proportional odds model on that day will be very low. Using an analysis that incorporates the average ordinal score over multiple days solves that problem, but its power gain is not as great as one might imagine because measurements on the same individual on different days are likely to be highly correlated. Furthermore, a between-arm difference in an average score may also be more difficult to interpret. For example, what does an average improvement of 0.4 units on an ordinal scale mean?

We also note that time-to-event analysis is advantageous from the perspective of interim analyses, as data from all patients with any amount of follow-up time are included. This contrasts with a fixed timepoint analysis, which only includes observations from patients who have made it to the prescribed follow-up milestone (e.g., all 14 days). In rapidly enrolling trials, time-to-event analysis may improve power to evaluate early efficacy (or harm) of treatments, and hence increase the speed at which treatment recommendations can be made.

Power is compared using two simulation methods and applications to two published studies of COVID treatments. For the simulation studies, ordinal trajectories were generated according to a random line, θ0i + θ1i log(d) for person i, where d is the day since randomization. For day d, the ordinal score for that day was given as floor[θ0i + θ1i log(d)], where the notation floor[x] indicates the integer part of x. Death (score=7) and recovery (score=1) were considered absorbing states (i.e., values above 7 or below 1 were set to 7 and 1, respectively). One can visualize the trajectory as a subject deterministically sliding up or down their own "line of destiny" over 28 days and reporting their integer value each day. Loosely, 10%

(5%) of placebo (active) patients were destined to die (having a large value of θ1i) within the 28-day observation period. The remaining subjects were destined to recover (with negative value of θ1i).

Multiple parameter values for generating θ0i and θ1i were considered until trajectories roughly reflected our understanding of COVID-19 disease progression. Figure 1 depicts results for the reference scenario.

Each setting was simulated 1,000 times, with 800 subjects total, equal randomization to the two arms, and 28 days of follow-up. We evaluated the proportional odds model at different days, a Wilcoxon ranksum test on the mean ordinal score (1-7) up to day 28, a test of proportions on day 28 mortality, and Cox models for time to (a) recovery, (b) a 2-point improvement, and (c) death. One possible criticism of the above simulations is that the proportional odds assumption may not hold. A second set of simulations compared methods under the proportional odds assumption. Technical details and results are given in the appendix.

Patient-level data from two published studies were obtained to compare methods. The

Adaptive COVID-19 Treatment Trial stage 1 (ACTT-1) randomized 1,062 patients to remdesivir or placebo and followed patients for 28 days. 6 The primary outcome was time-to-recovery, although ordinal scales were also assessed. Due to a surge in enrollments, the study exceeded its target sample size of 400 recoveries, reaching 482 by the time of the planned DSMB interim analysis. Data were taken from a preliminary report from an April 28, 2020 data freeze (before results were made public and before actively enrolled patients were offered cross-over treatment). Data cleaning for this data snapshot are ongoing, and the results presented here are intended to inform trial design. We compare empirical power for various methods with repeated random sampling of 50, 150 and 300 per arm. For each sample size, we replicated random sampling 100,000 times. Additionally, we present multiple analyses applied to the LOTUS study of lopinavir/ritonavir by Cao et al. 7 This study was stopped prior to reaching the pre-planned sample size. We present analyses with the original data (199 patients) as well as with hypothetical augmented data corresponding to 398 patients.

Power comparisons for simulations are shown in Table 3 . For the reference scenario, the proportional odds model has increasingly better power for later days, with highest power at day 28. Empirical power for both time-to-(2-point) improvement and time-to-recovery is somewhat lower than that for the proportional odds model at the optimal time. Empirical power for mortality is notably lower than for other methods, which is no surprise due to the low event rate and modest effect. We explored four perturbations from this reference scenario to more fully assess performance. The perturbations were 1) lagged treatment effect, 2) faster recovery, 3) faster mortality and 4) effect solely on mortality. (Table   S4 ). Under the lagged effect scenario, power for the proportional odds model decreases at days 7 and 14 but is similar on day 28 (compared to the reference scenario). This underscores the fragility in getting the day right with the proportional odds model. The faster recovery scenario has similar relative behavior to the reference scenario though power is uniformly increased. The faster mortality scenario has power like the reference scenario. These two perturbations show some robustness of the conclusions of the reference scenario. The last row of Table 3 provides scenarios with differences between arms from mortality only. Here, mortality has the highest power, as expected. More deaths on placebo necessarily implies more recoveries on treatment, which is why power for both time to improvement and time to recovery is around 30%.

Simulation studies under models that enforce the proportional odds assumption are provided in Table S3 and Figure S1 . Results from these simulations show are similar. Namely, when the fixed timepoint is chosen well, the proportional odds model performs well but suffers a loss of power if the time point is chosen poorly. Table 4 shows estimates and p-values from various models applied to the ACTT-1 study data. At the time of the data snapshot, the following proportion of subjects had ordinal score data available: 91% day 7; 89% day 14, 74% day 21 and 70% day 28. On the observed data, the proportional odds model Table 4 also shows empirical power (proportion of statistically significant p-values <0.05 out of the 100,000 simulations) for sample sizes of 50, 150, and 300 per group. Power is greatest at day 7 using the proportional odds model, with rejection rates of 24%, 62% and 97% for sample sizes of 50, 150 and 300 per group. Results for the t-test were similar, with rejection rates of 22%, 59% and 95% for the three samples sizes. Rejection rates for the proportional odds model and t-test evaluated at day 14

were lower for all sample sizes (day 14 proportional odds rejection rates: 16%, 41% and 79%; day 14 ttest rejection rates: 17%, 46% and 85%, respectively for sample sizes of 50, 150 and 300 per group). By day 28, empirical power was lower, although the t-test rejection rates were higher than for the proportional odds (proportional odds rejection rates: 7%, 13% and 19%; t-test rejection rates: 9%, 20% Rejection rates for the recovery rate ratio were 18%, 48% and 87%, respectively for sample sizes of 50, 150 and 300. Results for the time to improvements were 19%, 51% and 90% (one-point improvement) and were 17%, 44% and 84% (two-point improvement) for sample sizes of 50, 150 and 300 per group, respectively. Rejection rates for the hazard ratio for mortality were 7%, 12% and 18%, for the three sample sizes considered, consistent with the low power for mortality in this setting. Table S3 in the appendix show results from the LOTUS study of lopinavir/ritonavir. In the observed and augmented data analysis, none of the days the proportional odds was estimated were statistically significant, while with the augmented data, the time to a two-point improvement indicated a 31% faster rate of improvement with p<0.05.

One important challenge with COVID-19 is disease heterogeneity. An endpoint of cure or death would be the strongest clinical evidence of treatment effect. Trials using these endpoints may take an unfeasibly long time and preclude evaluation of other candidate treatments. The WHO ordinal scale reflects meaningful patient states. However, distinctions between categories may depend on limited resources (such ventilators or high-flow oxygen devices). Further, local differences in standard of care (including different guidelines recommending early intubation and/or limiting non-invasive oxygen treatments) may affect results in multicenter trials. Ideally, such guidelines would be unified within clinical trials, but dogmatic restrictions could limit enrollments. A placebo-controlled trial will reduce the potential for subjectivity to influence changes made to a patient's status.

Studies need to be launched quickly in order to inform the response, at a time when little information about the disease may be available. Planning for additional trial flexibility, without compromising scientific rigor, is important. 28 Changes made to endpoints based on results external to the trial (and prior to reviewing data) are acceptable. 29 In the ACTT-1 trial, the initial primary endpoint was the proportional odds model at day 14, based on early WHO guidance that recommended an analysis of ordinal scale at a fixed time point. At the time, many thought the clinical course was more like influenza illness, with recoveries occurring over two weeks. However, in late February, it became apparent, that the course of illness was more prolonged than previously thought. Consequently, followup was extended to 28-days and simulation results (presented in this manuscript), revealed the fragility of a fixed-time point analysis and highlighted the advantages of a time-to-recovery endpoint.

While both simulations and our examples show that power is comparable between a fixed-time point analysis and a time-to event analysis if the timing of the former is chosen well, marked power losses are apparent when this is not the case. Additionally, we believe that time-toimprovement/recovery analysis is easier to interpret. We also note that improvement in time-toimprovement/recovery is of relevance to the patient, as an indicator of faster improvement in clinical status, but also to a health system at maximum capacity. While a mortality improvement would have provided stronger evidence about treatment efficacy, initial estimates indicated a sample size of about 2,000 would be needed. This was deemed impractical given the goal to evaluate multiple therapeutic candidates.

The time-to event analysis offers other advantages such as that, for interim analyses, all data collected up until the data freeze were included, which can be important in an outbreak setting with rapid study enrollment. The PALM Ebola virus disease treatment trial provides one example. 30 In PALM, the primary endpoint was 28-day mortality. Due to rapid enrollment, there was a striking discrepancy between the number enrolled and the number with 28 days of follow-up. At the August 9, 2019 Data and Safety Monitoring Board meeting, 673 patients (of the 725 target) were enrolled but only 376 had 28-day follow-up; the study had enrolled 93% of its targeted sample size, but information (for the mortality proportion at day 28) was only 52%. A time-to-event analysis would have included data on all participants (for their observed time), and information would have been 65-70% at this analysis.

In our evaluations of the ACTT-1 data, day 7 had the highest power. However, evidence of an effect this early would likely not have been convincing for a definitive trial. A day 7 evaluation may be more appropriate for phase 2 trials. An alternative to the time-to-event approach would have been to specify multiple outcomes (e.g., ordinal scale at day 7, 14, 21 and 28), with multiplicity adjustments.

This was considered but concerns were raised about interpretation and the need to focus on an important measure of clinical benefit.

Regardless of the primary endpoint chosen, collection of core outcome measures will ensure comparability across studies and will be important for subsequent efforts to synthesize data from different trials. 31 disease-specific − Deaths need special consideration "+" indicates good performance, " -" indicates poor performance on this characteristic, neutral is denoted by "○". 20 

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: and foreach day of follow-up, the integer part of the line at that day was given as ordinal score. Except for the lagged effect scenario, the model is given by Y id = B0 + B1 log(d) + B2 Z log(d)+ b0i + b1i *log(d) + W e id (1) with b0i distributed N(0,1.5 2 ) and b1i distributed I x N(-4,.3 2 ) + (1-I) N(7,s 2 ) with I distributed Bernoulli(p=0.10) for placebo and Bernoulli(p=0.05)~Be(.05) for treatment, e id distributed N(0,.25 2 ), and Z the indicator of the treatment group. Note that there is a treatment effect both on the speed of recovery (as B2 <0) and mortality as I has a different Bernoulli probability for the two groups.

For the lagged effect scenario, the day 1 treatment effect begins at day 8:

Y id = B0 + B1 log(d)+ B2 Z I(d>7)*log(d-7) + b0i + b1i *Z*I (d>7) log(d-7) + W e id (2) With settings for the random variables as described for equation (1) . Table S4 provides the parameter values used for the different scenarios. Random multinomial data were generated corresponding to baseline ordinal scores. Then a trajectory of ordinal scores was applied as method 1 above, except that the trajectories were generated with the same distribution for treatment and control arms. Treatment-arm proportions at observation days were then re-scaled to satisfy a proportional odds assumption with according to a common odds ratio for specific treatment effects each day (as specified in table S5). Additional simulation studies (not shown) demonstrated that blinded (pooled) pilot studies are not very informative for guiding the determination of the optimal time. Blinded (pooled) data provide information about the overall proportions in each category, but simple rules such as selecting the time where there are a certain proportion of good outcomes or when the distribution is the most variable do not seem to improve identification of the optimal time for evaluation. Note the one peculiarity of how these models are set up. Figure S1 . Stacked bar plots for ordinal scores and Kaplan-Meier curves for time-to-recovery for three scenarios for simulation method 2.

COVID-19) Treatment Guidelines

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study

The need for neurologists in the care of COVID-19 patients

Incidence of thrombotic complications in critically ill ICU patients with COVID-19 Thromb Res

Remdesivir in Adults with Severe COVID-19: Results of a Randomized, Double-blind, Placebo-controlled, Multicenter Trial. The Lancet

Remdesivir for the treatment of Covid-19-a preliminary report

A trial of lopinavir-ritonavir in adults hospitalized with severe Covid-19

Novel end point analytic techniques and interpreting shifts across the entire range of outcome scales in acute stroke trials. Stroke

WHO R&D Blueprint Novel Coronavirus (COVID-19) Therapeutic Trial Synopsis

Multicenter prospective study of ventilator-associated pneumonia during acute respiratory distress syndrome

Risk factors for gastrointestinal bleeding in critically ill patients

Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area

A mixed-effects multinomial logistic regression model. Statistics in medicine

National Early Warning Score (NEWS) 2 standardising the assessment of acute-illness severity in the NHS

The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure

Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region

COVID-19 in critically ill patients in the Seattle region-case series

Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Resp Med

ISTH inerim guidance on recognition of management of coagulopathy in COVID-19. Jour of Thrombosis and Haemostasis

The laboratory tests and host immunity of COVID-19 patients with different severity of illness

COVID-19: consider cytokine storm syndromes and immunosuppression. The Lancet

A proportional hazards model for the subdistribution of a competing risk

Measurement in clinical trials: a neglected issue for statisticians? Stat inMed

The cost of dichotomising continuous variables

Comparison of an ordinal endpoint to time-to-event, longitudinal, and binary endpoints for use in evaluating treatments for severe influenza requiring hospitalization

An analysis of ordinal endpoint for use in evaluating treatments for severe influenza requiring hospitalization

Choice of column score for testing independence in ordered 2 X K contingency tables

Efficient adaptive designs for clinical trials of interventions for COVID-19

A randomized, controlled trial of Ebola virus disease therapeutics

Core outcome set developer's response to

The authors wish to acknowledge the ACTT-1 and LOTUS study teams for use of data from their study.