key: cord-0960883-x30ai0wu
authors: Granholm, Anders; Alhazzani, Waleed; Derde, Lennie P. G.; Angus, Derek C.; Zampieri, Fernando G.; Hammond, Naomi E.; Sweeney, Rob Mac; Myatra, Sheila N.; Azoulay, Elie; Rowan, Kathryn; Young, Paul J.; Perner, Anders; Møller, Morten Hylander
title: Randomised clinical trials in critical care: past, present and future
date: 2021-12-02
journal: Intensive Care Med
DOI: 10.1007/s00134-021-06587-9
sha: cfc764b786909b2780d7724ef2bf33afae52dcf3
doc_id: 960883
cord_uid: x30ai0wu

Randomised clinical trials (RCTs) are the gold standard for providing unbiased evidence of intervention effects. Here, we provide an overview of the history of RCTs and discuss the major challenges and limitations of current critical care RCTs, including overly optimistic effect sizes; unnuanced conclusions based on dichotomization of results; limited focus on patient-centred outcomes other than mortality; lack of flexibility and ability to adapt, increasing the risk of inconclusive results and limiting knowledge gains before trial completion; and inefficiency due to lack of re-use of trial infrastructure. We discuss recent developments in critical care RCTs and novel methods that may provide solutions to some of these challenges, including a research programme approach (consecutive, complementary studies of multiple types rather than individual, independent studies), and novel design and analysis methods. These include standardization of trial protocols; alternative outcome choices and use of core outcome sets; increased acceptance of uncertainty, probabilistic interpretations and use of Bayesian statistics; novel approaches to assessing heterogeneity of treatment effects; adaptation and platform trials; and increased integration between clinical trials and clinical practice. We outline the advantages and discuss the potential methodological and practical disadvantages with these approaches. With this review, we aim to inform clinicians and researchers about conventional and novel RCTs, including the rationale for choosing one or the other methodological approach based on a thorough discussion of pros and cons. Importantly, the most central feature remains the randomisation, which provides unparalleled restriction of confounding compared to non-randomised designs by reducing confounding to chance.

Randomised clinical trials (RCTs) fundamentally changed the practice of medicine, and randomisation is the gold standard for providing unbiased estimates of intervention effects [1] . Clinical trials have evolved, substantially, from the first described systematic comparison of dietary regimens 2500 years ago in Babylon, to the 1747 scurvy trial, the first double-blinded trial of patulin for the common cold conducted in the 1940's, and the establishment of modern ethical standards and regulatory frameworks following World War II (Fig. 1) [2, 3] .

While the fundamental concept of RCTs has remained relatively unchanged since then, the degree of collaboration has increased, and the largest trials have become larger [4] . Additionally, smaller RCTs assessing efficacy in narrow populations in highly controlled settings have been complemented with larger, more pragmatic RCTs in broader populations with less protocolisation of concomitant interventions, more closely resembling clinical practice [5] . Similarly, perprotocol-analyses assessing efficacy (i.e., effects of an intervention under ideal circumstances in patients with complete protocol adherence) have been complemented with intention-to-treat-analyses, assessing effectiveness under pragmatic circumstances in all randomised patients, regardless of protocol adherence (which may be affected by the intervention itself ). This provides a better estimate of the actual effects of choosing one intervention over another in clinical practice [5] . A discussion and historical timeline of key critical care studies and RCTs is available elsewhere [6] .

In this review, we outline the characteristics and common challenges of conventional RCTs in critical care, discuss potential improvements and novel design features followed by discussion of their potential limitations.

RCTs are not without limitations, some related to the conventional design (i.e., a parallel, two-group, fixedallocation-ratio RCT analysed with frequentist methods) and several to how many RCTs are designed and conducted. First, most critical care RCTs compare two interventions; while appropriate if only two interventions are truly of interest, oversimplifications may occur when two interventions, doses, or durations are chosen primarily to simplify trials. Second, sample size estimations for most RCTs enrolling critically ill patients in the intensive care unit (ICU) use overly optimistic effect sizes [7] [8] [9] [10] , leading to RCTs capable of providing firm evidence for very large effects, but unable to confirm or refute smaller, yet clinically relevant effects. Consequently, critical care RCTs are frequently inconclusive from a clinical perspective, and "absence of evidence interpreted as evidence of absence"-errors of interpretation [11] are common when RCTs are analysed using frequentist statistical methods and interpreted according to whether 'statistical significance' has been reached [8, 11] . Ultimately, this may lead to beneficial interventions being prematurely abandoned, and it has been argued that the conduct of clearly underpowered RCTs is unethical [12] . Third, critical care RCTs frequently focus on mortality [8] ; while patient-important [13] and capable of capturing both desirable and undesirable effects, it conveys limited information [14] , thus requiring large samples. Interventions may reduce morbidity or mortality due to one cause, but if assessed in patients at substantial risk of dying from other causes, differences may be difficult to detect [15] . Similarly, interventions may lead to negative intermediate outcomes and prolonged admission or increased treatment intensity, but not necessarily death [15] . Fourth, conventional RCT are inflexible. While one or few interim analyses may be conducted, they often rely on hard criteria for stopping [16] . Benefit or harm can, therefore, only be detected early if the effect is very large. Fifth, planning and initiating RCTs usually takes substantial time and funding, re-use of trial infrastructure is limited, data collection is mostly manual requiring substantial resources, and between-trial coordination is usually absent, increasing the risk of competing trials. Finally, even for conclusive RCTs, disseminating and implementing results into clinical practice requires substantial effort and time [17] .

The simple solution to inconclusive, underpowered RCTs is enrolling more patients, which requires more resources and international collaboration, while increasing external validity. Fewer, larger RCTs are more likely to produce conclusive evidence regarding important clinical questions than multiple, smaller trials, and are better to assess safety (including rare adverse events) if properly monitored. Thus, there may be a rationale for focussing on widely used interventions, such as the Mega-ROX RCT [18] that aims to compare oxygenation targets in 40,000 ICU patients to provide conclusive evidence for smaller effects than what previous RCTs have been able to confirm or reject [19] [20] [21] .

An alternative to very large RCTs which for logistic, economic and administrative reasons are challenging is standardisation or harmonisation of RCT protocols, followed by pre-planned, prospective meta-analyses [22] , which may also limit competition between trials. This was done for three large RCTs of early goal-directed therapy for septic shock, with results included in a conventional, trial-level meta-analysis with other RCTs and a prospectively planned individual patient-data metaanalysis [23, 24] . Other examples include prospective meta-analyses on systemic corticosteroids and interleukin-6-receptor antagonists for critically ill patients with

In this review, the primary challenges of conventional randomised clinical trials in critical care are discussed. This is followed by discussion of potential solutions and novel trial methods, including the challenges and potential disadvantages of using these methods.

coronavirus disease 2019 (COVID-19) [25, 26] , and a prospective meta-trial including six RCTs of awake prone positioning for patients with COVID-19 and hypoxia, with separate logistics and infrastructure, but harmonised protocols and prospective analysis of combined individual participant-data [27] .

Importantly, RCTs should ideally be conducted as part of complete research programmes (Fig. 2) , with pre-clinical studies (e.g., in-vitro and animal studies), systematic reviews, and non-randomised studies and pilot/feasibility RCTs informing RCT designs, including selection of appropriate research questions, populations, interventions and comparators, outcomes and realistic effect sizes. When RCTs are completed, results should be incorporated in updated systematic reviews and clinical practice guidelines to ease implementation [28] , all considering relevant patient differences and effects of concomitant interventions. For example, the SUP-ICU programme included topical and systematic reviews summarising existing evidence, a survey describing preferences and indications for stress ulcer prophylaxis, and a cohort assessing prevalence, risk factors and outcomes of patients with gastrointestinal bleeding before the RCT was designed [29] [30] [31] [32] [33] . Following the RCT, results were [2, 3] . A historical timeline of key critical care studies and RCTs is available elsewhere [6] incorporated in updated systematic reviews and clinical practice guidelines [33] [34] [35] .

Historically, most RCTs in critically ill patients have focussed on all-cause landmark mortality assessed at a single time-point [8] . As mortality in critically ill patients is high, it needs to be considered regardless of the outcome chosen. However, mortality conveys limited statistical information compared to more granular outcomes, as it only contains two possible values, i.e., death or alive regardless of health state [14, 36] and is thus insensitive to changes leading to other clinical improvements, e.g., quicker disease resolution or better functional outcomes in survivors. Thus, mortality requires large samples, and RCTs focussing on mortality are less frequently 'statistically significant' compared to RCTs focussing on other outcomes [8] . While mortality may be the most appropriate outcome in some trials, other outcomes should thus be considered [37] . During the COVID-19 pandemic, multiple RCTs focussed on more granular, higher-information outcomes such as days alive without life support or mechanical ventilation [38] [39] [40] , which includes both mortality, resource use and illness durations. However, these outcomes are challenging due to different definitions, different handling of death, potentially opposing effects on mortality and the duration of life support in survivors, possibly greater risk of bias in unblinded trials, and difficult statistical analysis [41] [42] [43] . Use of composite outcomes may increase power due to overall more events, but hamper interpretability as components of different importance to patients are weighted equally, and as interventions may affect individual components differently (e.g., increase intubation rates but decrease mortality) [44] . Finally, the development of core outcome sets may help in prioritisation and standardising outcome selection, allowing easier comparison and synthesis of RCT results [45] .

Most RCTs are planned and analysed using frequentist statistical methods, with results dichotomised as 'statistically significant' or not. Non-significant results are misinterpreted as evidence for no difference in approximately half of journal articles [46] and avoiding dichotomisations and abandoning the concept of statistical significance has been repeatedly discussed and recommended [46] [47] [48] [49] .

P values are calculated assuming that the null hypothesis is true (i.e., that there is exactly no difference, which is often implausible), and as they are indirect probabilities, they are hard to interpret (Fig. 3 ) [50] . As P values depend on both effect sizes and sample sizes, they will generally be small in large samples and large in small samples, regardless of the potential clinical importance of effects; thus, estimating effect sizes with uncertainty measures [i.e., confidence intervals (CIs)] may be preferable [51] , Fig. 2 Overview of different study types and their role in clinical research programmes. In general, pre-clinical studies can provide necessary background or laboratory knowledge that may be used to generate hypotheses later assessed in clinical trials. Summarising existing evidence prior to start of clinical studies is sensible, to identify knowledge gaps, avoid duplication of efforts, and inform further clinical studies. Surveys may identify existing beliefs, practices and attitudes towards further studies; cross-sectional studies and cohort studies can describe prevalence, outcomes, predictors/risk factors and current practice. Randomised clinical trials remain the gold standard for intervention comparisons but may also provide data for secondary studies not necessarily focussing on the randomised intervention comparison. Before randomised clinical trials aimed at assessing efficacy or effectiveness of an intervention are conducted, pilot/feasibility trials may be conducted to prepare larger trials and assess protocol delivery and feasibility. Following the conduct of a randomised clinical trial, relevant systematic reviews and clinical practice guidelines should be updated as necessary, to ease implementation of trial results into clinical practice. Of note, the process is not always linear and unidirectional, and different study types may be conducted at different temporal stages during a research programme. Translational research may incorporate preclinical and laboratory studies and clinical studies, including non-randomised cohort studies and randomised clinical trials. Similarly, clinical studies may be used to collect data or samples that are further analysed outside the clinical setting although CIs are frequently misinterpreted, too [50] . As misinterpretations are common [46, 50] , increased education of clinicians and researchers is likely needed [48] .

The issue with dichotomising results received attention following the publication of several important critical care RCTs with apparent discrepancies between statistical significance and clinical importance. The EOLIA RCT of extracorporeal membrane oxygenation (ECMO) in patients with severe acute respiratory distress syndrome (ARDS) concluded that "60-day mortality was not significantly lower with ECMO than with a strategy of conventional mechanical ventilation that included ECMO as rescue therapy. " [52] . While technically correct, it may be considered overly reductionistic, as the conclusion was based on 60-day mortality rates of 35% (ECMO) vs. 46% (control) and a P value of 0.09 following a sample size calculation based on an absolute risk reduction of 20 percentage points [52] . Similarly, the ANDROMEDA-SHOCK RCT conducted in septic shock patients concluded that "a resuscitation strategy targeting normalization of capillary refill time, compared with a strategy targeting serum lactate levels, did not reduce all-cause 28-day mortality. ", based on 28-day mortality rates of 34.9% vs. 43.4% and a P value of 0.06, following a sample size calculation based on a 15 percentage points absolute risk reduction [53] . Arguably, smaller effect sizes are clinically relevant in both cases.

There has been increased interest in supplementing or replacing conventional analyses with Bayesian statistical methods [37, 54, 55] , which start with probability distributions expressing prior beliefs. Once data have been collected, these are updated to posterior probability distributions [56, 57] . Different prior distributions can be used, including uninformative-, vaguely informative-, evidence-based-, sceptic-, positive-or negative priors [58] . The choice of prior may be difficult and may potentially be abused to get the 'desired' results; typically, however, weakly informative, neutral priors, with minimal influence on the results are used for the primary Bayesian analyses of critical care RCTs, with sensitivity analyses assessing the influence of other priors [59] [60] [61] [62] [63] . If priors are transparently reported (and ideally pre-specified), assessing whether they are reasonable is fairly easy. Posterior probability distributions can be summarised in multiple ways. Credible intervals (CrIs) directly represent the most probable values (which is how frequentist CIs are often erroneously interpreted) [50, 57] , and direct probabilities of any effect size can be calculated, i.e., the probability of any benefit (relative risk < 1.00), clinically important benefit (e.g., absolute risk difference > 2 percentage points) or practical equivalence (e.g., absolute risk difference between −2 and 2 percentage points) (Fig. 3) .

In Bayesian re-analyses of EOLIA and ANDROM-EDA-SHOCK, there were 96% and 98% probabilities of benefit with the interventions, respectively, using minimally informative or neutral priors [59, 60] ; while thresholds for adopting interventions may vary analyses. This figure illustrates the direction of probabilities in frequentist (conventional) and Bayesian statistical analyses. A Frequentist P values, Pr(data | H 0 ): probability of obtaining data (illustrated with a spreadsheet) at least as extreme as what was observed given the assumption that the null hypothesis (illustrated with a light bulb with 0 next to it) is correct. This mean that frequentist statistical tests assume that the null hypothesis (generally, that there is exactly no difference between interventions) is true. It then calculates the probability of obtaining a result at least as extreme (i.e., a difference that is at least as large as what was observed) under the assumption that there is no difference. Low P values thus provide direct evidence against the null hypothesis, but only indirect evidence related to the hypothesis of interest (i.e., that there is a difference), which makes them difficult to interpret. With more frequent analyses, there is an increased risk of obtaining results that would be surprising if the null hypothesis is true, and thus, with more tests or interim analyses, the risk of rejection the null hypothesis due to chance (a type I error) increases. B Bayesian probabilities, Pr(H | data): the probability of any hypothesis of interest (illustrated with a light bulb; e.g., that there is benefit with the intervention) given the data collected. Bayesian probabilities thus provide direct evidence for any hypothesis of interest, and the probabilities for multiple hypotheses, e.g. any benefit, clinically important benefit, or a difference smaller than what is considered clinically important, can be calculated from the same posterior distribution without any additional analyses or multiplicity issues. If further data are collected, the posterior probability distribution is updated and replaces the old posterior probability distribution. For both frequentist and Bayesian models, these probabilities are calculated according to a defined model and all its included assumptions-and for Bayesian analyses also a defined prior probability distribution-all of which are assumed to be correct or appropriate for the results to be trusted. Abbreviations and explanations: data: the results/difference observed; H: a hypothesis of interest; H 0 : a null hypothesis (i.e., that there is no difference). Pr: probability; |: should be read as "given" depending on resources/availability, preferences, and cost, these re-analyses led to more nuanced interpretations, with the use of multiple priors allowing readers to form their own context-dependent conclusions. Several Bayesian analyses have been conducted post hoc [59, 60, [64] [65] [66] , sometimes motivated by apparently clinically important effect sizes that did not reach statistical significance, while others have been pre-specified [61, 62, 67] , which is preferable as selection driven by trial results is thus avoided.

Nuanced interpretations avoiding dichotomisations are also possible using conventional, frequentist statistics [46] [47] [48] [49] ; however, assessments of statistical significance may be so ingrained in many clinicians, researchers and journal editors that more nuanced interpretation may be easier facilitated with alternative statistical approaches. Different evidence thresholds may be appropriate depending on the intervention, i.e., less certain evidence may be required when comparing commonly used and well-known interventions with similar costs and disadvantages, and more certain evidence may be required before implementing new, costly or burdensome interventions [68] . This is similar to how clinical practice guidelines consider the entire evidence base and the nature of the interventions being compared including costs, burden of implementation and patient preferences [1] . While claims of "no difference" based solely on lack of statistical significance should be avoided, clearly pre-defined thresholds may still be required for approving new interventions, for declaring trials "successful" and for limiting the risk of "spin" in conclusions. Thus, a nuanced set of standardised policy responses to more nuanced evidence summaries may be warranted to ensure some standardisation of interpretation and implementation, while still considering differences in patient characteristics and preferences.

The primary RCT results generally represent the average treatment effects across all included patients, however, heterogeneity of treatment effects (HTE) [69, 70] in subpopulations are likely, and, despite being difficult to prove, have been suggested in multiple previous critical care RCTs [33, 64, [71] [72] [73] [74] . A neutral average effect may represent benefit in some patients and harm in others (Fig. 4) , and a beneficial average effect may differ in magnitude across subgroups, which could influence decisions to use the intervention [1, 75, 76] . It is sometimes assumed that the risk of adverse events is similar for patients at different risk of the primary outcome [70] , which may affect the balance between benefits and harms of a treatment according to baseline risk, although this assumption may not always hold [77] .

While large, pragmatic RCTs may be preferred for detecting clinically relevant average treatment effects, guiding overall clinical practice recommendations and for public healthcare, they have been criticised for including too heterogenous populations, often due to inclusion of general acutely ill ICU patients or ICU patients with broad syndromic conditions, i.e., sepsis or ARDS [78] . Even if present, HTE may be of limited importance if some patients benefit while others are mostly unaffected, if cost or burden of implementation is limited, or if some patients are harmed while others are mostly unaffected. In this trial, the average treatment effect may be considered neutral with a relative risk (RR) of 0.96 and 95% confidence interval of 0.90-1.04 (or inconclusive, if this interval included clinically relevant effects). The trial population consists of three fictive subgroups with heterogeneity of treatment effects: A, with an intervention effect that is neutral (or inconclusive), similarly to the pooled result; B, with substantial benefit from the intervention; and C, with substantial harm from the intervention. If only the average intervention effect is assessed, it may be concluded -based on the apparent neutral overall result -that whether the intervention or control is used has little influence on patient outcomes, and it may be missed that the intervention provides substantial benefit in some patients and substantial harm in others. Similarly, an intervention with an overall beneficial effect may be more beneficial in some subgroups than others and may provide harm in some patients, and vice versa Most RCT assess potential HTE by conducting conventional subgroup analyses despite important limitations [79] . As substantially more patients are required to assess subgroup differences than for primary analyses, most subgroup analyses are substantially underpowered and may miss clinically relevant differences [79] . In addition, larger numbers of subgroup analyses increase the risk of chance findings [79] . Conventional subgroup analyses assess one characteristic at a time, which may not reflect biology or clinical practice where multiple risk factors are often synergistic or additive [79] , or where effect modifiers may be dynamic and change during illness course. Finally, conventional subgroup analyses frequently dichotomise continuous variables, which limits power [80] and makes assessment of gradual changes in responses difficult.

Alternative and better solutions for assessing HTE include predictive HTE analysis, where a prediction model incorporating multiple relevant clinical variables predictive of either the outcome or the change in outcomes with the intervention is used [77] ; use of clustering algorithms and clinical knowledge to identify subgroups and distinct clinical pheno-/endotypes for syndromic conditions [64, 81, 82] ; assessments of interactions with continuous variables without categorisation [63] [64] [65] ; use of Bayesian hierarchical models, where subgroups effect estimates are partially pooled, limiting the risk of chance findings in smaller subgroups [63] [64] [65] ; and adaptive enrichment [83, 84] , discussed below. Improved and more granular analyses seem the most realistic way towards "personalised" medicine [77] , but requires more data and thus overall larger RCTs. Regardless of the approach, appropriate caution should always be employed when interpreting subgroup and HTE analyses.

Adaptive trials are more flexible and can be more efficient than conventional RCTs [85] , while being designed to have similar error rates. Adaptive trials often, but not always, use Bayesian statistical methods, which are well suited for continuous assessment of accumulating evidence [83, 86] . Adaptive trials can be adaptive in multiple ways [87] . First, pre-specified decision rules (for stopping for inferiority/superiority/equivalence/futility) allow trials to run without pre-specified sample sizes or to revise target sample sizes, thus allowing trials to run until just enough data have been accumulated. Expected sample sizes are estimated using simulation; if the expected baseline risks and effect sizes are incorrect, the final sample sizes will differ from expectations, but adaptive trials are still able to continue until sufficient evidence is obtained. Further, adaptive sample sizes are better suited for new diseases, where no or limited existing knowledge complicates sample size calculations. For example, conducting the OSCAR RCT assessing high-frequency oscillation in ARDS using a Bayesian adaptive design could have reduced the number of patients and total deaths by > 15% [88] . Second, trials may be adaptive regarding the interventions assessed; multiple interventions or doses may be studied simultaneously or in succession, and the least promising may be dropped while assessment of better performing interventions continues until conclusive evidence has been obtained [83, 86] . This has been used for dose-finding trials, e.g., the SEPSIS-ACT RCT initially compared three selepressin doses to placebo, followed by selection of the best dose for further comparison [67] , and the ongoing adaptive phase II/III Revolution trial [89] , comparing antiviral drugs and placebo focussing on reducing viral loads in its first phases and increasing the number of days without respiratory support in the third phase. Similarly, interventions may be added during the trial, as in platform trials discussed below. Third, trials may use response-adaptive randomisation to update allocation ratios based on accumulating evidence, thereby increasing the chance that patients will be allocated to more promising interventions, despite not having reached conclusiveness yet. This can increase efficiency in some situations, but also decrease it, as in two-armed RCT and some multi-armed RCTs [90, 91] . Thus, it has been argued that while response-adaptive randomisation may benefit internal patients, it may not always be preferable, as it can lead to slower discovery of interventions that can benefit patients external to the trial in some cases [91, 92] . Finally, trials may use adaptive enrichment to adapt/restrict inclusion criteria to focus on patients more likely to benefit, or use different allocation ratios for different subpopulations [84, 87] .

Platform trials are RCTs that instead of focussing on single intervention comparisons focus on a disease or condition and assess multiple interventions according to a master protocol [83, 93] . Platform trials may run perpetually, with interventions added or dropped continuously [83, 94] and often employ multiple adaptive features and probabilistic decision rules [83, 93] . Interventions assessed can be nested in multiple domains, e.g., REMAP-CAP assesses interventions in patients with severe community-acquired pneumonia in several domains including antibiotics, corticosteroids, and immune-modulating therapies. By assessing multiple interventions simultaneously and by re-using controls for comparisons with multiple interventions, platform trials can be more efficient than sequential two-armed comparisons and can be more efficient than simpler adaptive trials [94, 95] .

Adaptive platform trials are capable of "learning while doing", and potentially allow tighter integration of clinical research and clinical practice, i.e., a better exploration-exploitation trade-off (learning versus doing based on existing knowledge) [83, 96] . If response-adaptive randomisation is used, probabilities of allocation to potentially superior interventions increases as evidence is accumulated, and interventions that are deemed superior may immediately become implemented as standard of care by becoming the new control group [83] . Thus, implementation of results into practice -at least in participating centres-may become substantially faster. While platform trials have only recently been used in critically ill patients, the RECOVERY and REMAP-CAP trials have led to substantial improvements in the treatment of patients with COVID-19 within a short time-frame [38, [97] [98] [99] , although this may not only be explained by the platform design, but also the case load and urgency of the situation.

Comparable to how data from multiple conventional RCTs may be prospectively planned to be analysed together, data from multiple platform trials may be combined in multiplatform trials with similar benefits and challenges as individual platform trials and standardisation across individual, conventional RCTs [100] .

In addition to the possible tighter integration between research and clinical practice that may come with adaptive platform trials and ultimately may lead to learning healthcare systems [83] , integration may be increased in other ways. Trials may be embedded in electronic health records, where automatic integration may lead to substantial logistic improvements regarding data collection, integration of randomisation modules, and alerts about potentially eligible patients. This may improve logistics and data collection and facilitate closer integration between research and practice [61, 83] . Similarly, RCTs may use data already collected in registers or clinical databases, substantially decreasing the data-collection burden, as has been done in, e.g., the PEPTIC clusterrandomised register-embedded trial [74] . Finally, fostering an environment where clinical practice and clinical research are tightly integrated and where enrolment in clinical trials is considered an integral part of clinical practice in individual centres by clinicians, patients and relatives may lead to faster improvements of care for all patients.

While the methods discussed may mitigate some challenges of conventional RCTs, they are not without limitations (Table 1) . First, larger trials come with challenges regarding logistics, regulatory requirements (including approvals, consent procedures, and requirements for reporting adverse events), economy, collaboration, between-centre heterogeneity in other interventions administered, and potential challenges related to academic merits. Second, standardisation and meta-analyses may require compromises or increased data-collection burden in some centres or may not be possible due to between-trial differences. Third, while complete research programmes may lead to better RCTs, they may not be possible in, e.g., emergency situations such as pandemics caused by new diseases. Fourth, using outcomes other than mortality comes with difficulties relating to statistical analysis, how death is handled and possibly interpretation, and mortality should not be abandoned for outcomes that are not important to patients. Fifth, while avoiding dichotomisation of results and using Bayesian methods has some advantages, it may lead to larger differences in how evidence is interpreted and possibly lower thresholds for accepting new evidence if adequate caution is not employed. In addition, switching to Bayesian methods requires additional education of clinicians, researchers and statisticians, and specification of priors and estimating required sample sizes adds complexity. Sixth, while improved analyses of HTE have benefits compared to conventional subgroup analyses, the risk of chance findings and lack of power remains. Finally, adaptive and platform trials come with logistic and practical challenges as listed in Table 1 and discussed below.

As adaptive and platform trials are substantially less common than more conventional RCTs, there is less methodological guidance and interpretation may be more difficult for readers. Fortunately, several successful platform trials have received substantial coverage in the critical care community [64, 99] , and an extension for the Consolidated Standards of Reporting Trials (CON-SORT) statement for adaptive trials was recently published [101] . Planning adaptive and platform trials comes with additional logistic and financial challenges related to the current project-based funding model, which is better suited for fixed-size RCTs [83, 85, 93] . While adaptive trials are more flexible, large samples may still be required to firmly assess all clinically relevant effect sizes, which may not always be feasible. In addition, statistical simulation is required instead of simple sample size estimations [83, 94] . Further, the regulatory framework for adaptive and platform trials is less well-developed than for conventional RCTs, and regulatory approvals may thus be more complex and time-consuming [83] .

There are also challenges with the adaptive features, and careful planning is necessary to avoid aggressive adaptations to random, early fluctuations. Initial "burnin" phases where interventions are not compared until a

Larger trials (increased sample sizes) Decreased uncertainty, increased precision Easier to detect potential subgroup differences Less chance of inconclusive results (i.e., greater precision and less uncertainty); results from fewer large RCTs are easier to compare than results from many smaller RCTs Easier to address safety concerns if properly monitored, as larger trials have higher chances of detecting rare adverse events Increased generalisability/external validity in multicentre/international RCTs Economic: trial cost and optimal use of overall research resources Collaboration: increased workload coordinating, different regulatory requirements including different handling of consent procedures and reporting of adverse events, challenges with coordination due to language and time zone differences Comparability: potential differences in standard of care/available resources in international trials Academic challenges: less individually led projects due to increased collaboration -group authorships may be less attractive in settings where individual author positions are valued (e.g., grant applications)

Increased comparability/less heterogeneity Less competition between trials Meta-analysis may be more sensible -less statistical inconsistency may lead to more precise results Prospective meta-analyses or meta-trials may provide quicker answers than individual conventional RCTs and meta-analyses, especially if trialists share data earlier, and adequate certainty is obtained before individual trials finish Agreement between investigators on the design and variables could be challenging and time-consuming; compromises may be necessary for standardisations; core outcome sets may improve this Data not routinely collected in one setting may be required due to standardisation, potentially increasing workload in some centres Subgroup or HTE analyses, regardless of approach, generally requires more patients-trials may still be underpowered to detect differences The more analyses conducted, the greater risk of chance findingsthis may be mitigated, but not completely solved, by the discussed approaches

Requires careful consideration of whether HTE analyses should be conducted on the absolute or relative scales; when the baseline risk differs between groups, there will always be HTE on either the relative or the absolute scale (often, intervention effects are most consistent on the relative scale)

Adaptation Adaptive sample sizes/stopping rules may lead to optimally sized trials, more likely to reach conclusive evidence Adaptive arm adding/dropping may increase overall trial efficiency Adaptive randomisation may increase chance of getting better interventions in some situations, which may make trial participation more attractive to patients Adaptive enrichment may enable trials to better detect differences in responses and tailor interventions to different subpopulations or phenotypes Logistic and economic challenges in planning and funding trials without fixed sample sizes; alternative financing models may be necessary Planning may be more difficult; instead of simple sample size calculations, advanced statistical simulation may be necessary to estimate required sizes and risk of random errors, requiring increased collaboration with statisticians and increased training of clinician-researchers Pre-specified criteria for stopping/adaptation necessary; may be difficult to define Adaptation requires more real-time data collection and verification, increasing data registration burden on individual sites Adaptations may be complex to implement and communicate Outcomes with longer follow-up lead to slower adaptation compared to shorter-term outcomes, which may add additional complexity. Consequently, the use of shorter-term outcomes to guide adaptive trials instead of the outcome of primary interest may be considered in some situations

Risk of adaptations based on chance findings/fluctuations may require restraints of adaptation to avoid random errors, which is difficult to plan and handle While adaptive trials may be more likely to reach conclusive evidence if continued until a stopping rule is reached, they may need to be substantially larger to confirm or refute all clinically relevant effects (as is the case for conventional trials, too)

Adaptive platform trials Increased efficiency, and potentially similar advantages as for adaptive conventional trials

May decrease time to clinical adaptation and enable "learning while doing" Reuse of trial infrastructure and embedding in electronic health records and clinical practice may increase efficiency and decrease cost Potential improvement of informed consent procedures compared to consent when co-enrolment in multiple trials occurs Familiarity and consistency with a common platform design may be easier in practice than repeated conduction of independent RCTs Same challenges as adaptive trials in general Potential regulatory issues; less well-known design may complicate approvals May take longer time to setup and implement than regular trials More complex -may be more difficult to implement and train staff, more difficult to explain to patients/potential complication of consent procedures, relatives and other stakeholders, may be more difficult to work with for non-researcher clinicians Standards for conducting and reporting less developed; may be more difficult to report and explain results Additional complexity with time drift/temporal variation and responseadaptive randomisation and potential re-use of non-concurrent controls requires adequate statistical handling to avoid bias Potential challenges with workload/stress of perpetual trials sufficient number of enrolled patients can be used, as can more restrictive rules for response-adaptive randomisation and arm dropping early in the trial [94] . Simulation may be required to ensure that the risk of stopping due to chance is kept at an acceptable level, analogous to alpha-spending functions in conventional, frequentist trials [102] . Temporal changes in case-mix or concomitant interventions used may influence results in all RCTs, but is complicated further if adaptive randomisation or arm dropping/adding is used, thus requiring additional consideration, especially if patients randomised at earlier stages are re-used for comparisons with more recently introduced interventions [83] . Finally, comparisons with non-concurrent controls may affect interpretation and introduce bias if inappropriately handled [103] . Adaptations require continues protocol amendments and additional resources to implement and communicate, and may require additional training when new interventions are added [104] . Finally, while platform trials come with potential logistic and efficiency benefits, they may be more time-consuming initially and lack of a clear-cut "finish-line" may stress involved personnel [105] , although familiarity and consistency may also have the opposite effect once implemented compared with repeated initiation, running and closure of consecutive, independent RCTs.

We expect that the discussed methodological features will become more common in future critical care RCTs, and that this will improve efficiency and flexibility, and may help answer more complex questions. These methods come with challenges, though, and conventional RCTs may be preferred for simple, straightforward comparisons. Some challenges may be mitigated as these designs become more familiar to clinicians and researchers, and as additional methodological guidance is developed. We expect the future critical care RCT landscape to be a mix of relatively conventional RCTs and more advanced, adaptive trials. We propose that researchers consider the optimal methodological approach carefully when planning new RCTs. While different designs may be preferable in different situations, the choice should be based on careful thought instead of convenience or tradition, and more advanced approaches may be necessary in some situations to move critical care RCTs and practice forward.

In this review, we have discussed challenges and limitations of conventional RCTs, along with recent developments, novel methodological approaches and their advantages and potential disadvantages. We expect

Embedding trials in clinical practice and registers Tighter integration of clinical practice and clinical trials may lead to faster improvements in patient care

Embedding clinical trials in electronic health records may reduce datacollection burden and cost and alert clinicians and researchers of eligible patients and clinical events Register-based trials (including register-based cluster-randomised trials) may reduce data-collection burden and trial cost by using clinical registers already in place

Register-based data-collection may not be as easily standardised without changing individual registers; compromises based on availability in registers may be necessary Embedding trials in registers or electronic health records poses additional challenges with different electronic health record software and across borders Data quality and completeness in registers may not be as good as when data are prospectively collected for all variables Limited long-term outcome data generally available in registers due to additional complexity of data collection HTE heterogeneity of treatment effects; RCT randomised clinical trial

Use of the GRADE approach in systematic reviews and guidelines

Evolution of clinical trials throughout history

Evolution of clinical research: a history before and beyond James Lind

Overall bias and sample sizes were unchanged in ICU trials over time: a metaepidemiological study

Pragmatic trials

Clinical research: from case reports to international multicenter clinical trials

Effect sizes in ongoing randomized controlled critical care trials

Outcomes and statistical power in adult critical care randomized trials

Paying the Piper": the downstream implications of manipulating sample size assumptions for critical care randomized control trials

Powering bias and clinically important treatment effects in randomized trials of critical illness

Statistics notes: absence of evidence is not evidence of absence

Statistics and ethics in medical research III: how large a sample

Patient-important outcomes in randomized controlled trials in critically ill patients: a systematic review

The added value of ordinal analysis in clinical trials: An example in traumatic brain injury

Is Mortality a useful primary end point for critical care trials

Comparison of Bayesian and frequentist group-sequential clinical trial designs

From best evidence to best practice: effective implementation of change in patients' care

MEGA-ROX (ANZICS CTG endorsed study

Conservative oxygen therapy during mechanical ventilation in the ICU

Higher versus lower fraction of inspired oxygen or targets of arterial oxygenation for adults admitted to the intensive care unit

Lower or higher oxygenation targets for acute hypoxemic respiratory failure

Prospective meta-analysis using individual patient data in intensive care medicine

A systematic review and meta-analysis of early goal-directed therapy for septic shock: the ARISE, process and promise investigators

Early, goal-directed therapy for septic shock: a patient-level meta-analysis

The WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group (2020) Association between administration of systemic corticosteroids and mortality among critically ill patients with COVID-19: a meta-analysis

The WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group (2021) Association between administration of IL-6 antagonists and mortality among patients hospitalized for COVID-19: a meta-analysis

Awake prone positioning for COVID-19 acute hypoxaemic respiratory failure: a randomised, controlled, multinational, open-label meta-trial

Intensive care medicine rapid practice guidelines (ICM-RPG): paving the road of the future

Stress ulcer prophylaxis in the intensive care unit: Is it indicated? A topical systematic review

Stress ulcer prophylaxis versus placebo or no prophylaxis in critically ill patients: a systematic review of randomised clinical trials with meta-analysis and trial sequential analysis

Stress ulcer prophylaxis in the intensive care unit: an international survey of 97 units in 11 countries

Prevalence and outcome of gastrointestinal bleeding and use of acid suppressants in acutely ill adult intensive care patients

Pantoprazole in patients at risk for gastrointestinal bleeding in the ICU

Stress ulcer prophylaxis with proton pump inhibitors or histamin-2 receptor antagonists in adult intensive care patients: a systematic review with meta-analysis and trial sequential analysis

Gastrointestinal bleeding prophylaxis for critically ill patients: a clinical practice guideline

Statistical thinking. information gain from using ordinal instead of binary outcomes

Contemporary strategies to improve clinical trial design for critical care research: insights from the First Critical Care Clinical Trialists Workshop

Effect of hydrocortisone on mortality and organ support in patients with severe COVID-19: the REMAP-CAP COVID-19 corticosteroid domain randomized clinical trial

Low-dose hydrocortisone in patients with COVID-19 and severe hypoxia: the COVID STER-OID randomised, placebo-controlled trial

Effect of 12 mg vs 6 mg of dexamethasone on the number of days alive without life support in adults with COVID-19 and Severe Hypoxemia: The COVID STEROID 2 randomized trial

Patient-important outcomes other than mortality in recent ICU trials: protocol for a scoping review

Ventilator-free day outcomes can be misleading

Reappraisal of ventilator-free days in critical care research

The "Utility" in composite outcome measures: measuring what is important to patients

Progress on core outcome sets for critical care research

Scientists rise up against statistical significance

Sifting the evidence: what's wrong with significance tests?

Shifting the focus away from binary thinking of statistical significance and towards education for key stakeholders: revisiting the debate on whether it's time to de-emphasize or get rid of statistical significance

The ASA's statement on p-values: context, process, and purpose

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

To test or to estimate? P-values versus effect sizes

Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome

Effect of a resuscitation strategy targeting peripheral perfusion status vs serum lactate levels on 28-day mortality among patients with septic shock

Using Bayesian methods to augment the interpretation of critical care trials. An overview of theory and example reanalysis of the alveolar recruitment for acute respiratory distress syndrome trial

A manifesto for the future of ICU trials

A gentle introduction to the comparison between null hypothesis testing and Bayesian analysis: Reanalysis of two randomized controlled trials

Doing bayesian data analysis

Seven items were identified for inclusion when reporting a Bayesian analysis of a clinical study

Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome and posterior probability of mortality benefit in a Post Hoc Bayesian analysis of a randomized clinical trial

Effect of a resuscitation strategy targeting peripheral perfusion status vs serum lactate levels on 28-day mortality among patients with septic shock: a Bayesian reanalysis of the ANDROMEDA-SHOCK trial

The REMAP-CAP (randomized embedded multifactorial adaptive platform for community-acquired pneumonia) study. Rationale and design

Dexamethasone 12 mg versus 6 mg for patients with COVID-19 and severe hypoxaemia: a pre-planned, secondary Bayesian analysis of the COVID STEROID 2 trial

Lower versus higher oxygenation targets in ICU patients with severe hypoxaemia: secondary Bayesian analyses of mortality and heterogeneous treatment effects in the HOT-ICU trial

Heterogeneous effects of alveolar recruitment in acute respiratory distress syndrome: a machine learning reanalysis of the Alveolar Recruitment for Acute Respiratory Distress Syndrome Trial

Heterogeneity of treatment effect of prophylactic pantoprazole in adult ICU patients: a post hoc analysis of the SUP-ICU trial

Perioperative haemodynamic therapy for major gastrointestinal surgery: the effect of a Bayesian approach to interpreting the findings of a randomised controlled trial

Effect of selepressin vs placebo on ventilator-and vasopressor-free days in patients with septic shock: the SEPSIS-ACT randomized clinical trial

When should clinicians act on non-statistically significant results from clinical trials

Using group data to treat individuals: Understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence

Implications of heterogeneity of treatment effect for reporting and analysis of randomized trials in critical care

Albumin replacement in patients with severe sepsis or septic shock

Restrictive or liberal red-cell transfusion for cardiac surgery

A comparison of albumin and saline for fluid resuscitation in the intensive care unit

Alberta Health Services Critical Care Strategic Clinical Network, and the Irish Critical Care Trials Group (2020) Effect of stress ulcer prophylaxis with proton pump inhibitors vs histamine-2 receptor blockers on in-hospital mortality among ICU patients receiving invasive mechanical ventilation: the PEPTIC randomized clinical trial

GRADE guidelines: 14. Going from evidence to recommendations: the significance and presentation of recommendations

GRADE guidelines: 15. Going from evidence to recommendation -determinants of a recommendation's direction and strength

The predictive approaches to treatment effect heterogeneity (PATH) statement: explanation and elaboration

Time to stop randomized and large pragmatic trials for intensive care medicine syndromes: the case of sepsis and acute respiratory distress syndrome

Three simple rules to ensure reasonably credible subgroup analyses

The cost of dichotomising continuous variables

Early sedation with dexmedetomidine in ventilated critically ill patients and heterogeneity of treatment effect in the SPICE III randomised controlled trial

Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis

Adaptive platform trials: definition, design, conduct and reporting considerations

Adaptive enrichment designs for clinical trials

Adaptive designs in clinical trials: Why use them, and how to run and report them

Arguing for Adaptive Clinical Trials in Sepsis

Adaptive designs in clinical trials in critically ill patients: principles, advantages and pitfalls

Using Bayesian adaptive designs to improve phase III trials: a respiratory care example

Antiviral agents against COVID-19 infection (REVO-LUTION)

Comparison of methods for control allocation in multiple arm studies using response adaptive randomization

Statistical controversies in clinical research: Scientific and ethical problems with adaptive randomization in comparative clinical trials

A simulation study of outcome adaptive randomization in multi-arm clinical trials

The platform trial: an efficient strategy for evaluating multiple treatments

An overview of platform trials with a checklist for clinical readers

Efficiencies of platform clinical trials: a vision of the future

Optimizing the trade-off between learning and doing in a pandemic

Interleukin-6 receptor antagonists in critically ill patients with Covid-19

Tocilizumab in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, openlabel, platform trial

Dexamethasone in Hospitalized Patients with Covid-19

Therapeutic anticoagulation with heparin in critically ill patients with covid-19

The adaptive designs CONSORT extension (ACE) statement: a checklist with explanation and elaboration guideline for reporting randomised trials that use an adaptive design

Do we need to adjust for interim analyses in a Bayesian adaptive trial design

Platform trials: beware the noncomparable control group

This is a platform alteration: a trial management perspective on the operational aspects of adaptive platform trials

Mind the gap? The platform trial as a working environment