key: cord-0276657-zsp24d9l authors: Hoffman, K. L.; Schenck, E. J.; Satlin, M. J.; Whalen, W.; Pan, D.; Williams, N.; Diaz, I. title: Corticosteroids in COVID-19: Optimizing Observational Research through Target Trial Emulations date: 2022-05-29 journal: nan DOI: 10.1101/2022.05.27.22275037 sha: f1a2579c9a09d49015873c250730df1a5ad39908 doc_id: 276657 cord_uid: zsp24d9l Background: Observational research provides a unique opportunity to learn causal effects when randomized trials are not available, but obtaining the correct estimates hinges on a multitude of design and analysis choices. We illustrate the advantages of modern causal inference methods and compare to standard research practice to estimate the effect of corticosteroids on mortality in hospitalized COVID-19 patients in an observational dataset. We use several large RCTs to benchmark our results. Methods: Our retrospective data source consists of 3,293 COVID-19 patients hospitalized at New York Presbyterian March 1-May 15, 2020. We design our study using the Target Trial Emulation framework. We estimate the effect of an intervention consisting of 6 days of corticosteroids administered at the time of severe hypoxia and contrast with an intervention consisting of no corticosteroids administration. The dataset includes dozens of time-varying confounders. We estimate the causal effects using a doubly robust estimator where the probabilities of treatment, outcome, and censoring are estimated using flexible regressions via super learning. We compare these analyses to standard practice in clinical research, consisting of two main methods: (i) Cox models for an exposure of corticosteroids receipt within various time windows of hypoxia, and (ii) a Cox time-varying model where the exposure is daily administration of corticosteroids starting at the time of hospitalization. Results: The effect in our target trial emulation is qualitatively identical to an RCT benchmark, estimated to reduce 28-day mortality from 32% (95% confidence interval: 31-34) to 23% (21-24). The estimated effect from meta-analyses of RCTs for corticosteroids is an odds ratio of 0.66 (0.53-0.82)(1). Hazard ratios from the Cox models range in size and direction from 0.50 (0.41-0.62) to 1.08 (0.80-1.47) and all study designs suffer from various forms of bias. Conclusion: We demonstrate in a case study that clinical research based on observational data can unveil true causal relations. However, the correctness of these effect estimates requires designing and analyzing the data based on principles which are different from the current standard in clinical research. The widespread communication and adoption of these design and analytical techniques is of high importance for the improvement of clinical research based on observational data. Observational databases are an invaluable resource for studying causality when randomized controlled trials (RCTs) are infeasible or unavailable. Landmark examples of studies using observational data to conclusively establish causal effects include the effect of smoking on lung cancer (1) and the efficacy of early (versus delayed) antiretroviral therapy in patients with human immunodeficiency virus (HIV) (2) . While observational data provides a unique resource to answer critical questions, the correctness of the conclusions gleaned from analyses of observational data hinges on the careful consideration of study design principles and choice of estimation methodology. Threats to the validity of causality include unmeasured confounding, incorrectly modeling measured confounding, time-alignment biases such as immortal time and selection bias, and incorrectly accounting for time-dependent confounders, among others (3) . Examples of studies that suffer from selection and immortal time bias, model misspecification bias, and that do not correctly handle time-varying confounding are pervasive in the clinical literature (4) (5) (6) . A major reason for the failure to address these biases is the widespread adoption of a so-called model-first approach to observational research. In a model-first approach, a model is first chosen according to the data type and outcome of interest, and the quantity used to answer the research question is automatically determined by the model choice. For example, when faced with a time-to-event outcome, clinical researchers and analysts automatically employ a Cox regression model. It is common practice to then use the coefficients of the model or transformations thereof (e.g., hazard ratios) as the answer to the clinical question of interest. A model-first approach induces multiple problems for the estimation of causal effects (7) . First, model parameters often do not represent quantities of scientific interest (e.g, it has been recently established that hazard ratios often do not represent well-defined causal effects (8) ). Second, assumptions such as the proportional hazards assumption used in Cox models are rarely correct in medical research since hazards cannot be proportional when a treatment effect changes over time (9) . Third, regression models cannot correctly handle time-varying confounders, such as that arising from the time-dependent feedback between CD4 counts, antiretroviral therapy, and mortality in HIV (3) . Fourth, the model-first approach results in a tendency to report and interpret all coefficients in the model as causal effects, which is a mistake known as the Table 2 fallacy (10) . Lastly, model-first analyses often employ less-than-optimal model selection techniques such as stepwise regression or adding only significant confounders from univariate analyses, which leads to improper variance estimates and may result in model misspecification bias (11) . Recent developments in the causal inference literature provide researchers with a number of tools that can be used to alleviate some of the above biases. We hypothesize that using these new tools will have improved success in recovering causal effects as opposed to a model-first approach. To test this hypothesis, we use a retrospective cohort of 3,293 COVID-19 patients hospitalized at NewYork-Presbyterian March 1-May 15, 2020. Lack of guidance for clinical practice at the beginning of the pandemic meant that high variability existed in the administration and timing of corticosteroids ( Figure 1 ). While high variability in provider practice aids in the estimation of causal effects due to higher "experimentation", the resulting complex longitudinal treatment patterns can complicate study design and analytical methods. Correctly handling these complex treatment-outcome patterns requires the use of design and analysis methods from the causal inference literature to avoid time-alignment biases. Our dataset together with results from numerous randomized clinical trials (RCTs) on corticosteroids provide a unique opportunity to assess various design and analysis methods. We benchmark the results of our analyses against the effect measures obtained in The WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group's prospective meta-analysis of RCTs for corticosteroids (12) . Our proposed approach using modern causal inference methodology is a question-first approach. It proceeds by (i) using the target trial framework of HernĂ¡n and Robins (13) and the roadmap for causal inference outlined by Petersen and van der Laan (14) to design the study in a manner that avoids time-alignment and other design biases, and (ii) using optimal machine learning estimators that aid to mitigate model misspecification bias (15, 16) . The target trial approach and the roadmap for causal inference seek to apply design principles from RCTs to those of observational data analysis. This framework can help clarify the research question, define an estimand of interest, determine study eligibility requirements, and identify enrollment and follow-up times. Furthermore, this framework can aid in deciding whether enough confounder data are available to sufficiently account for confounding by indication, selection biases, etc. (17, 18) . The target trial framework has been recently used in numerous studies whose results have been corroborated by RCTs, including studies aiming to assess the effects of platelet inhibition on myocardial infarction (19) , statins on mortality for individuals (20) , as well as early convalescent plasma therapy (21) and use of tocilizumab for COVID-19 (22) . Instead of defaulting to effect measures provided by regression models (e.g., hazard ratios), a question-first approach begins by defining a target of inference that answers the scientific question of interest. This is the so-called estimand, i.e., the quantity to be estimated. The estimand principle is advocated by the Food and Drug Administration for the analysis of randomized trials(23), and we argue that it should also be followed by studies that seek to establish causal relations by analyzing observational data. Examples of interesting estimands include the odds ratio, risk ratio, average treatment effect, etc. After the estimand of interest is chosen, the most appropriate statistical technique for said estimand may be chosen. Here we advocate for the use of a doubly robust estimation technique from the statistics literature that can help researchers mitigate model misspecification biases that arise when adjusting for a potentially large set of confounding variables. In accordance with the target trial framework, we now describe one randomized trial we could hypothetically run to analyze the effect of corticosteroids for severely ill COVID-19 patients. Inclusion criteria is all adult patients with COVID-19 who were admitted to New-York-Presbyterian Hospital (NYPH)/Weill Cornell, Lower Manhattan Hospital, or NYPH Queens. Cases are confirmed through reverse-transcriptase-polymerase chain-reaction assays performed on nasopharyngeal swab specimens. Patients who have chronic use of corticosteroids prior to hospitalization or who are transferred into the hospital from an outside hospital were also excluded. In our target trial, patients are randomized on their first day of hospitalization to receive either (1) standard of care therapy (without corticosteroids) or (2) standard of care plus a corticosteroid regimen to be administered if and when criteria for severe hypoxia are met. The corticosteroid dosage is 0.5 mg/kg body weight of methylprednisolone corticosteroid equivalent per 24-hour period and the duration of therapy is six days(24). Corticosteroids include prednisone, prednisolone, methylprednisolone, hydrocortisone, and dexamethasone and choice of which drug is at the attending physician's discretion. Severe hypoxia criteria is defined as the initiation of high flow nasal cannula, venti-mask, non-invasive or invasive mechanical ventilation, or as an oxygen saturation of <93% after the patient is on 6 Liters of supplemental oxygen via nasal cannula. Of note, this analysis plan was developed with clinical input in the April 2020, prior to established RCT knowledge. The primary outcome is 28-day mortality from time of randomization. The estimand of interest is the difference in 28-day mortality rates between the two treatment strategies. In this hypothetical target trial with no loss-to-follow-up, we would analyze the difference in the proportion of patients who experienced the outcome between those who were randomized to the "standard of care" treatment regime and those who were randomized to the "standard of care plus corticosteroids at time of hypoxia" treatment regime. This is known as the intention-to-treat effect. Since we do not have randomized data, we will use the closest possible analog of an intentionto-treat analysis, which is a comparison of hypothetical treatment strategies with appropriate adjustment for confounders. We will now outline this emulation of a target trial using our observational data. Our target trial emulation uses a retrospective, longitudinal dataset from patients who meet our target trial's inclusion and exclusion criteria between March 3 and May 15, 2020. Our data source includes two distinct observational databases. Demographics, comorbidity, intubation, death, and 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. ; discharge data were manually abstracted from electronic health records by trained medical professionals into a secure RedCAP database (25) . These data were supplemented with the Weill Cornell Critical carE Database for Advanced Research (WC-CEDAR), a comprehensive data repository housing laboratory, procedure, diagnosis, medication, microbiology, and flowsheet data documented as part of standard care (26) . Subjects are followed for 28 days from the time of hospitalization and are lost to follow-up by discharge or transfer to an external hospital system. To emulate the target trial corticosteroid treatment strategy, we estimate the effect of a hypothetical dynamic treatment regime (27) , whereby each patient is administered six days of corticosteroids if and when they meet the criteria of severe hypoxia. This dynamic corticosteroid treatment regime is contrasted with a static treatment regime where patients never received corticosteroids. We measure severe hypoxia using vital signs (for oxygen saturation) and flow sheet data (for supplemental oxygen documentation) and define it in the same way as in our target trial. We measure corticosteroid exposure by extracting corticosteroids from the Medical Administration Record and computing cumulative mg/kg dosing over rolling 24-hour windows from the time of randomization. If a patient received a >0.5 mg/kg methylprednisolone equivalent of corticosteroids within a 24-hour window, they are denoted as having corticosteroids treatment that day. Unlike our target trial, patients in the observational study are subject to loss-to-follow-up. Thus, we also require conceptualizing a hypothetical world whereby patients are not loss to follow-up, so that we can observe their 28-day mortality status. An illustration of the treatment regimens as they relate to the naturally observed data are shown in Figure 2 . In contrast to the target trial, treatment assignment in the observational study is not randomized and depends on physiologic and biological characteristics of the patients. Correct emulation of the target trial requires: (i) careful consideration of all possible confounders, and (ii) careful adjustment for these confounders in data analysis. Baseline confounders included age, sex, race, ethnicity, Body Mass Index (BMI), comorbidities (coronary artery disease, cerebral vascular event, hypertension, diabetes mellitus, cirrhosis, chronic obstructive pulmonary disease, active cancer, asthma, interstitial lung disease, chronic kidney disease, immunosuppression, HIV-infection, home oxygen use), mode of respiratory support upon arrival to the ED, and hospital admission location. Figure 3 summarizes the relationship between confounders, treatment, and outcomes in the form of a Directed Acyclic Graph (DAG). Time-dependent confounders included heart rate, pulse oximetry percentage, respiratory rate, temperature, blood pressure (systolic and diastolic), BUN-creatinine ratio, creatinine, neutrophils, lymphocytes, platelets, bilirubin, blood glucose, D-dimers, C-reactive protein, activated partial thromboplastin time, prothrombin time, arterial partial pressure of oxygen, and arterial partial pressure of carbon dioxide, and level of supplemental oxygen support. In cases of multiple laboratory results or vital signs in a 24-hour period, the clinically worst value was used. Missing categorical baseline variables were handled using an indicator for missingness. Missing continuous baseline variables were handled using mean imputation with an indicator for miss-5 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. ingness added as a covariate. Missing laboratory results and vital signs were all continuous and handled using a combination of last observation carried forward or mean imputation when no previous measures were recorded. An indicator for missing data was used to account for potentially informative time-dependent missingness mechanisms. All potential confounders were determined through input from clinicians. Our estimand of interest is the difference in 28-day mortality rates in a hypothetical world where we had implemented two different corticosteroid treatment strategies, as well as an intervention to prevent loss-to-follow-up. Under the assumption that treatment and loss-to-follow-up each day are randomized conditional on baseline and time-dependent confounders, this estimand is identified by the longitudinal g-computation formula (28) . This longitudinal g-computation formula for our two corticosteroids treatment regimens with a censoring intervention will be our estimand of interest. The g-computation formula can be conceptualized as a sequential regression of the outcome as follows (29) . First, a model for the probability of mortality a day 28 is estimated amongst patients who have not been lost to follow-up, with predictors given by all the data prior to day 28. This model fit is then used to produce predictions of the probability of mortality for all patients had the hypothetical intervention of interest been implemented at day 28. This probability is then used as a pseudo-outcome in a regression model where the predictors are all variables measured up to day 27. This model is then used to obtain new predictions for all patients had the hypothetical intervention of interest been implemented at day 27. This process is iterated until day 1, at which point the predictions are averaged. This average is the estimated mortality rate under the hypothetical treatment rule. Correct emulation of a target trial requires proper adjustment for measured confounding through correct estimation of the g-computation formula. It is important to use estimation methods capable of fitting the data using flexible mathematical relations so that confounding is appropriately removed, especially when the number of baseline and time-dependent confounders is large. Several methods can be used to estimate the g-computation formula (e.g., inverse probability weighting (IPW), parametric g-formula, targeted minimum loss based estimators (TMLE), sequentially doubly robust estimators (SDR), etc.) (15, 16) . These estimation methods rely on two kinds of mathematical models: (i) models of the outcome as a function of the time-dependent confounders, and (ii) models of treatment as a function of time-dependent confounders. Methods that use only one of these models are often called singly robust, because their correctness relies on the ability of correctly specifying one the models (e.g., IPW relies on estimating treatment models correctly). Estimation methods that use both of these models are often called doubly robust, because they remain correct under misspecification of at most one of the above two models. Furthermore, doubly robust estimators such as TMLE and SDR allow the use of machine learning to flexibly fit relevant treatment and outcome regressions (30, 31) . This is desirable because these regression functions might include complex relationships between exposures and treatments, and capturing those relationships is not possible using simpler models such as the Cox proportional hazards(32). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. The primary analysis is conducted using SDR estimation with a dynamic intervention, timevarying confounders, and a time-to-event outcome. The SDR estimator uses the probability of receiving the intervention, the probability of experiencing the outcome, and the probability of censoring at each time point to estimate the final survival probability under a given intervention. For longitudinal studies, the SDR estimator estimates these probabilities iteratively at discrete time points. At each time point, only the intervention or outcome model need be correctly specified for the final estimate to be consistent (16) . We fit each model using an ensemble machine learning via the super learner algorithm (33, 34) . The super learner algorithm creates a weighted combination of algorithms in a user-supplied library to obtain a final prediction that is guaranteed to outperform each algorithm in the library as sample size grows (34) . We use a super learner library containing candidate learners of multivariate adaptive regression splines (MARS) with varying parameters, Bayesian Additive Regression Trees (BART), penalized linear regression with penalties of 0, 0.5, and 1 (Ridge, Elastic Net, and LASSO), and an interceptonly (mean) model. To fit these regressions, we assume that outcomes at day t only depend on baseline confounders and time-varying covariates measured in the previous two days. Supplemental Figure 1 shows an illustrated example of the analytical file. Time-dependent confounders, censoring indicators, and outcome indicators are binned in 24-hour windows from time of randomization. The final analytical file contains one row per patient with one column for each baseline confounder and one column for each time-dependent confounder, outcome, and censoring indicator at all 28 time points. For contrast with the target trial emulation strategy described above, we review methodology of papers cited in Chaharom et. al's meta-analysis (35) , and then analyze the data using model-first strategies common in observational corticosteroid COVID-19 research. The data source, outcome, and confounders are the same as the above target trial. Modifications to the cohort and treatment definitions to accommodate the model-first approaches are outlined below. The first model-first approach we explore is a regression for mortality with a point (as opposed to time-varying) treatment variable. The inclusion criteria and time zero are defined as the time of meeting hypoxia criteria, which is the intended indication for corticosteroids. A study design using this analytical approach entails a number of choices. These choices include but are not limited to: These choices are crucial to the correctness of the estimates from observational research, and, even the best choices do not guarantee that a point exposure type analysis will succeed in emulating the target trial. We fit Cox proportional hazards model to data sets obtained from the following design choices, which are some of the most common choices in the clinical literature: A Corticosteroid exposure defined as anytime during the course of hospitalization. All patients satisfying inclusion criteria are included in the analysis and time to event is defined as time from hypoxia to death. B Corticosteroid exposure defined as any administration up to one day after meeting hypoxia criteria. All patients satisfying inclusion criteria are included in the analysis and time to event is defined as time from hypoxia to death. E Corticosteroid exposure defined as any administration up to one day after meeting hypoxia criteria. Patients who receive corticosteroids before hypoxia are excluded. Patients who receive corticosteroids after the one-day time window passes are censored at the time of corticosteroids receipt. Models B-E are repeated using a treatment window of five days, and denoted as F-I, respectively. All model specifications are summarized in Supplemental Table 1 . Results are obtained using the above study designs with all baseline confounders plus the predefined time-dependent confounders from day zero (hypoxia) and the corticosteroid exposure as variables in a Cox regression model. The exponentiated coefficient associated to corticosteroids is interpreted as the hazard ratio for corticosteroid exposure within the defined treatment window for severely ill COVID-19 patients. In our second model-first approach, we fit a time-varying Cox model for time to mortality up to 28 days from the day of hospitalization. The analytical file contains our entire cohort, is in a long format (by day), and includes all baseline, time-dependent confounders, and daily corticosteroid administration. The coefficient for corticosteroids in the model is exponentiated and used as an estimate of the hazard ratio for corticosteroids. All data were analyzed in R version 4.0.3 in combination with the open-source packages ggplot2, gtsummary, survival, survminer, lmtp, and sl3 (36) (37) (38) (39) (40) (41) (42) . A simulated data set, relevant code, and 8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022 Several randomized controlled trials have established the effectiveness of corticosteroids in the treatment of severe COVID-19 hospitalized patients (43) (44) (45) . The largest and most influential trial was run by the RECOVERY network, which enrolled more than 6,000 patients assigning 2,104 to receive the corticosteroid Dexamethasone (46) . Dexamethasone significantly reduced 28-day mortality in ventilated patients (29.3% compared to 41.3% in the usual care group) and in patients receiving oxygen support that was not mechanical ventilation (23.3% vs. 26.2%) (46) . Months later, the WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group published a meta-analysis of seven RCTs and estimated the odds ratio of mortality to be 0.66 (95% Confidence interval (0.53-0.82) (12) . We use this estimate, as well as supporting evidence from other RCT meta-analyses (35, 47) to benchmark our results. The final cohort includes 3,298 patients of a median age 65 (IQR 53, 77) and 60% males. The median BMI of these patients was 27 (interquartile range [23] [24] [25] [26] [27] [28] [29] [30] [31] . There were 1033 (31%) patients with diabetes mellitus, 460 (14%) with coronary artery disease, 1780 (54%) with hypertension, and 159 (4.8%) with kidney disease. Table 1 shows other key baseline demographics of the cohort, overall and stratified by any corticosteroid exposure meeting our target trial definition. Baseline characteristics of those who did and did not receive corticosteroids at any point were similar. There were 1,690 patients who reached the randomization criteria of severe hypoxia, and 423 patients received corticosteroids at any point during follow-up. There were 699 patients who died before 28 days. A total of 574 (20%) of patients who reached the outcome never received corticosteroids, while 125 (30%) of patients died after receiving corticosteroids. Supplemental Figure 2 shows the movement of patients across exposures and outcomes throughout the study's follow up. In the target trial emulation analysis, all 3,298 patients who were admitted to the hospital are analyzed. The estimated mortality rate under a hypothetical intervention of no corticosteroids to any patients is 32% (95% CI 31-34). The estimated mortality rate under a hypothetical intervention in which corticosteroids are administered for 6 days upon patients becoming severely hypoxic is 23% (21) (22) (23) (24) . This yields an estimated reduction in mortality of 9.6% (8.8-10.4) if this policy had been implemented. In a subset of 1,690 patients who met the randomization criteria of hypoxia, 72 patients received corticosteroids within one day of hypoxia and 191 patients received corticosteroids within 5 days 9 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. ; https://doi.org/10.1101/2022.05.27.22275037 doi: medRxiv preprint of hypoxia. There were 18 and 451 patients who died within one and five days of hypoxia without receiving corticosteroids, respectively. Model A, which defined corticosteroid exposure as anytime during hospitalization, yielded an HR of 0.50 (0.41-0.62). Models B-I, which placed either a one-or five-day limit on the time-frame for corticosteroids treatment from the time of hypoxia, yielded mostly non- 04 (0.75-1.45) ). The exception to this was Model I, which excluded patients who died before five days and estimated the HR of corticosteroids to be 0.63 (0.48-0.83). Model F also reached the edge of statistical significance, 0.77 (0.60-0.99), and was the result of a 5-day treatment window with no exclusion or censoring variations. Model J, the time-varying Cox model, yielded an HR of 1.08 (0.80-1.47). All hazard ratios for the model-first approaches are summarized in Figure 4 . This study shows that observational data can be used to obtain correct estimates of causal effects, but highlights the importance of using a rigorous causal inference framework and the perils of less rigorous approaches to the design and analysis of the study. We illustrate how the incorporation of the principles of the target trial framework and the roadmap for causal inference can aid in devising an optimal study design and choice of estimation procedure. Specifically, we show that using the above principles is helpful in achieving results that recover the benchmark causal effect obtained in an RCT. Our final estimate that corticosteroids would reduce overall 28-day mortality in a hospitalized cohort from approximately one-third to one-fourth matches multiple meta-analyses, most notably The WHO REACT Working Group's estimate, which guides the current clinical practice recommendation to administer corticosteroids to moderate-to-severely ill COVID-19 patients (12, 48) . Our study design allowed us to create a realistic trial with a meaningful intervention (i.e. randomize patients at hospitalization but do not give corticosteroids unless the patient becomes severely hypoxic) that further aligned with randomized trial conclusions that corticosteroids benefit those who are severely ill. Our analysis plan allowed us to adjust for multiple potential confounders and employ model selection in a principled way. Not only does the protective effect of corticosteroids through target trial emulation match the cumulative RCT evidence, but the failure to identify a treatment effect using a model-first approach also aligns with the current observational literature on corticosteroids. In a recent metaanalysis including observational analyses on corticosteroid treatment for over 18,000 patients, there was no impact found on mortality (OR 1.12, (0.83-1.50)) (35) . Dozens of other studies have attempted to reproduce the trial results with observational data with flawed approaches, e.g. selection bias, poorly defined time zero or interventions, inappropriate adjustment for time-dependent confounders, model misspecification, etc. The task of creating reliable evidence from complex longitudinal data is not an easy one, as evidenced by issues which arise in the model-first approaches. In a review of the current observational study literature on corticosteroids for COVID-19, most studies used a design approach similar to Model A, where no treatment window is defined (49) (50) (51) . This is problematic because it leads to immortal time bias. Patients who end up in the treated group must be, by definition, alive until they are given corticosteroids. This biases the results towards 10 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. ; a protective effect of corticosteroids because there is a spurious correlation between survival time and treatment (52) . This is reflected in our results; Model A is the approach most susceptible to immortal time bias and shows the strongest protective estimate of corticosteroids. A few studies do limit the time frame for corticosteroids treatment (Models B-I), but the "grace period" for assigning treatment still results in some degree of immortal time bias (3) . To alleviate the bias in patients dying before they are able to receive corticosteroids, studies tend to handle this in an arbitrary way, i.e. excluding patients who die prior to a time window after inclusion criteria (53) . According to the target trial emulation literature, this is an incorrect way to handle the grace period; Hernan and Robins discuss alternatives to alleviate bias (3) . A related issue is the handling of patients who received corticosteroids after the treatment window ended. In the corticosteroids literature, these patients who received treatment after the treatment window ended are sometimes excluded (54, 55) . This can again lead to bias and spurious associations(3). In Model E and I, we explored censoring these patients at their time of receiving treatment if it is after the treatment window has passed. However, since Cox regression cannot handle time-dependent censoring, this is also not optimal and will again bias results in favor of the corticosteroids group (3) . In addition to these issues, it is often unclear in the current literature how patients who receive corticosteroids prior to meeting inclusion criteria (e.g. severe hypoxia or pneumonia definitions) are handled in the analysis (49) (50) (51) 56) . We excluded them in Models D, E, H, and I. A subtler, related issue is that corticosteroids can, according to RCTs, affect severity of illness, which these studies' inclusion criteria relied on; this is a form of bias called collider bias(57). All of our pointtreatment designs were subject to this source of bias. Although the second model-first approach with a time-varying Cox regression does not suffer from all of the biases discussed in the point-exposure design, the time-varying Cox model cannot properly account for time-dependent confounders(3), such as the relationship between intubation, corticosteroids administration, and mortality. However, this modeling strategy is also commonly used in clinical research from longitudinal observational data, and has been used in corticosteroids research (51, 54) . In our study, the time-varying Cox model estimated the effect of corticosteroids to be non-significantly harmful. Notably, much of the observational research on corticosteroids additionally includes some form of propensity score matching or reweighting, but no estimation methodology can solve the above study design issues(3). In addition, many of the studies employed model selection which relies on either statistical significance of univariate analyses or multivariable regression coefficients (i.e. stepwise regression). This is problematic since (i) true confounders may happen to not be statistically significant while other variables are coincidentally statistically significant and (ii) the final regression standard errors are incorrect because they ignore the model-selection process (11) . There are limitations of our study. First, while our study time frame of Spring 2020 is ideal in terms of corticosteroid experimentation, it includes New York City's initial pandemic surge conditions and rapidly changing clinical practice. We cannot rule out the presence of unmeasured confounding. Second, we did not have the data to look at individual corticosteroid types, making an exact comparison to a specific randomized trial impossible. Despite these limitations, our study has numerous strengths and serves as an example in which the current standard for clinical research methods fail to recover the correct treatment effect where a modern causal inference method succeeds. Using observational data to guide clinical practice is possible, but relies on the incorporation of advanced epidemiological and statistical methodology principles. We hope this study emphasizes the importance of incorporating these innovative 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 29, 2022. ; https://doi.org/10.1101/2022.05.27.22275037 doi: medRxiv preprint techniques into study designs and statistical analyses of observational data. Figure 1 : A random sample of 50 patients' observed hospital courses. Each line represents one patient, and the color of the line represents that patient's intubation status (a time-dependent confounder to corticosteroids administration and death). A red circle indicates the time of severe hypoxia. Yellow squares indicate corticosteroids administration. We see that patients received corticosteroids at various times and durations throughout their hospitalization course and in relation to meeting the criteria of severe hypoxia. This is ideal for experimentation, but makes designing a point-treatment study extremely difficult. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 29, 2022. ; Figure 2 : Illustrated example of two patients under the two hypothetical interventions of our target trial emulation. Patient A reaches severe hypoxia criteria at study day 2 and is followed the entire study duration. Patient B never reaches severe hypoxia criteria and is lost to follow up after five study days. Under the dynamic corticosteroids intervention (Intervention #1), Patient A receives 6 days of corticosteroids, and under Intervention #2 they receive no corticosteroids. Patient B does not receive corticosteroids under either intervention strategy, however, in both hypothetical worlds they are observed for the entire study duration. 13 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 29, 2022. ; https://doi.org/10.1101/2022.05.27.22275037 doi: medRxiv preprint Table 1 : Demographics and outcome for study cohort, overall and stratified by any corticosteroid exposure. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 29, 2022. ; https://doi.org/10.1101/2022.05.27.22275037 doi: medRxiv preprint A Study of Six Hundred and Eighty-Four Proved Cases HIV-CAUSAL Collaboration, et al. Comparative effectiveness of strategies for antiretroviral treatment initiation in hiv-positive individuals in high-income countries: an observational cohort study of immediate universal treatment versus cd4-based initiation Causal Inference: What If Methods of public health research -strengthening causal inference from observational data Statistical modeling methods: challenges and strategies Handling time varying confounding in observational research Statistical modeling: The two cultures (with comments and a rejoinder by the author) The Hazards of Hazard Ratios Why Test for Proportional Hazards? The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients Step away from stepwise Association between administration of systemic corticosteroids and mortality among critically ill patients with covid-19: a meta-analysis Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available Causal models and learning from data: Integrating causal modeling and statistical estimation Sequential double robustness in right-censored longitudinal models Nonparametric causal effects based on longitudinal modified treatment policies Target trial emulation: teaching epidemiology and beyond Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses Emulating a randomised controlled trial with observational data: An introduction to the target trial framework Electronic medical records can be used to emulate target trials of sustained treatment strategies Early Convalescent Plasma Therapy and Mortality Among US Veterans Hospitalized With Nonsevere COVID-19: An Observational Analysis Emulating a Target Trial Association Between Early Treatment With Tocilizumab and Mortality Among Critically Ill Patients With COVID-19 E9(r1) statistical principles for clinical trials: Addendum: Estimands and sensitivity analysis in clinical trials gov/regulatory-information/search-fda-guidance-documents/ e9r1-statistical-principles-clinical-trials-addendum-estimands-and-sensitivity-anal [24] Steroid conversion calculator Clinical characteristics of covid-19 in new york city Critical care database for advanced research (cedar): An automated method to support intensive care units with electronic health record data Statistical methods for dynamic treatment regimes A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect Doubly robust estimation in missing data and causal inference models Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies Double/debiased machine learning for treatment and structural parameters On functional misspecification of covariates in the Cox regression model Stacked regressions. Machine learning Super learner Effects of corticosteroids on covid-19 patients: A systematic review and meta-analysis on clinical outcomes ggplot2: Elegant Graphics for Data Analysis Reproducible summary tables with the gtsummary package A Package for Survival Analysis in R Modeling Survival Data: Extending the Cox Model survminer: Drawing Survival Curves using 'ggplot2 lmtp: Non-parametric causal effects of feasible interventions based on modified treatment policies Pipelines for Machine Learning and Super Learning Covid-19-associated ards treated with dexamethasone (codex): study design and rationale for a randomized trial Intravenous methylprednisolone pulse as a treatment for hospitalised severe covid-19 patients: results from a randomised controlled clinical trial Methylprednisolone in adults hospitalized with covid-19 pneumonia Dexamethasone in hospitalized patients with covid-19 Systemic corticosteroids for the treatment of covid-19 Corticosteroids for covid-19: Living guidance A retrospective controlled cohort study of the impact of glucocorticoid treatment in sars-cov-2 infection mortality Clinical outcomes associated with methylprednisolone in mechanically ventilated patients with covid-19 Corticosteroid treatment in severe covid-19 patients with acute respiratory distress syndrome Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes Efficacy of corticosteroid treatment for hospitalized patients with severe covid-19: a multicentre study Treatment with tocilizumab or corticosteroids for covid-19 patients with hyperinflammatory state: a multicentre cohort study (sam-covid-19) Corticosteroid pulses for hospitalized patients with covid-19: effects on mortality Corticosteroids for covid-19 patients requiring oxygen support? yes, but not for everyone: effect of corticosteroids on mortality and intensive care unit admission