key: cord-0750030-q3h9apss authors: Kim, Jae Hyun; Ta, Casey N; Liu, Cong; Sung, Cynthia; Butler, Alex M; Stewart, Latoya A; Ena, Lyudmila; Rogers, James R; Lee, Junghwan; Ostropolets, Anna; Ryan, Patrick B; Liu, Hao; Lee, Shing M; Elkind, Mitchell S V; Weng, Chunhua title: Towards clinical data-driven eligibility criteria optimization for interventional COVID-19 clinical trials date: 2020-12-01 journal: J Am Med Inform Assoc DOI: 10.1093/jamia/ocaa276 sha: c7e0fdfcab5470dd5c49e00e35a1cf72315f1922 doc_id: 750030 cord_uid: q3h9apss OBJECTIVE: This research aims to evaluate the impact of eligibility criteria on recruitment and observable clinical outcomes of COVID-19 clinical trials using electronic health record (EHR) data. MATERIALS AND METHODS: On June 18, 2020, we identified frequently used eligibility criteria from all the interventional COVID-19 trials in ClinicalTrials.gov (n = 288), including age, pregnancy, oxygen saturation, alanine/aspartate aminotransferase, platelets, and estimated glomerular filtration rate. We applied the frequently used criteria to the EHR data of COVID-19 patients in Columbia University Irving Medical Center (CUIMC) (March 2020–June 2020) and evaluated their impact on patient accrual and the occurrence of a composite endpoint of mechanical ventilation, tracheostomy, and in-hospital death. RESULTS: There were 3251 patients diagnosed with COVID-19 from the CUIMC EHR included in the analysis. The median follow-up period was 10 days (interquartile range 4–28 days). The composite events occurred in 18.1% (n = 587) of the COVID-19 cohort during the follow-up. In a hypothetical trial with common eligibility criteria, 33.6% (690/2051) were eligible among patients with evaluable data and 22.2% (153/690) had the composite event. DISCUSSION: By adjusting the thresholds of common eligibility criteria based on the characteristics of COVID-19 patients, we could observe more composite events from fewer patients. CONCLUSIONS: This research demonstrated the potential of using the EHR data of COVID-19 patients to inform the selection of eligibility criteria and their thresholds, supporting data-driven optimization of participant selection towards improved statistical power of COVID-19 trials. The disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been spreading worldwide rapidly since late 2019. The number of confirmed COVID-19 cases and deaths globally exceeded 15 million and 600 thousand, respectively, as of July 22, 2020. 1 Despite the rocketing number of confirmed cases and deaths, there is no consensus on a definitive therapy for COVID-19 as of late July 2020. Meanwhile, many clinical trials have been launched to evaluate the efficacy and safety of experimental agents. 2 As COVID-19 clinical trials are being created rapidly, concerns regarding the robustness of trial designs have been raised, 3 particularly related to insufficient power or sample sizes, inadequate statistical adjustments, and inconsistent definitions for endpoints. [4] [5] [6] However, no one yet has evaluated the influence of the eligibility criteria of the COVID-19 studies on these issues. Clinical trials on investigational agents test efficacy upon a sample of patients who meet the eligibility criteria. Results from trials with restrictive eligibility criteria do not provide evidence regarding the risks and benefits of tested drugs for patients who were excluded from the trials and may not generalize to real-world patients with the target condition. 7 On the other hand, applying overly permissive eligibility criteria may include heterogeneous patients and reduce the probability of detecting the drug's true effect. 7 Thus, it is vital to balance the tradeoff between internal validity and external validity during eligibility criteria definition. Better understanding of the baseline characteristics and the events of interest among potentially eligible patients could inform optimal definition of eligibility criteria for COVID-19 interventional trials. Towards this understanding, we aimed to measure the influence of eligibility criteria on patient recruitment and outcome event observation in this retrospective cohort study. We first identified the frequently used eligibility criteria in COVID-19 trials. Then, using electronic health record (EHR) data, we assessed the influence of individual eligibility criteria based on the number of patients that could be included or excluded by each criterion. The outcome events were compared between the included and excluded groups to estimate how eligibility criteria could be modified to optimize the balance between internal and external validity. Eligibility criteria and descriptive trial information were obtained from the Aggregative Analysis of ClinicalTrials.gov (AACT) database. 8 AACT provides a copy of ClinicalTrials.gov, updated daily, with some postprocessing and data formatting, enabling users to easily analyze multiple aspects of clinical trials in a structured format. A total of 288 COVID-19 interventional trials with at least 1 recruiting site in the US were registered in ClinicalTrials.gov as of June 18, 2020. Two researchers (AMB, LAS) shared the workload for the annotation of these trials and independently annotated 457 eligibility criteria from a common set of 32 trials manually and mapped them to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). 9 For example, in the exclusion criterion "Exclude patients with aspartate aminotransferase (AST) or alanine aminotransferase (ALT) > 3 times the upper limit of normal," aspartate aminotransferase (AST) and alanine aminotransferase (ALT) were annotated as measurement entities and > 3 times upper limit of normal was annotated as a value entity. Corresponding values were retrieved from the Columbia University Irving Medical Center (CUIMC) OMOP database using the OMOP concept identifier for AST, 3013721. The COVID-19 cohort in Columbia University Irving Medical Center (CUIMC) The EHR of CUIMC was transformed to the OMOP CDM v5.3 in May 2020, henceforth referred to as the CDM. 10 The operational definition of the COVID-19 cohort was based on the cohort definition developed during the Observational Health Data Sciences and Informatics (OHDSI) study-a-thon, namely "a persons hospitalized with COVID-19 narrow, with no prior observation required." 11 Cohort enrollment date was defined as the admission date of a person hospitalized with COVID-19 after March 1, 2020. We included each patient with a confirmed diagnosis of COVID-19 within 3 weeks prior to or during hospitalization. Using the OMOP CDM, the occurrence of outcome events was identified in patients with COVID-19. The event of interest was the composite of mechanical ventilation, tracheostomy, and in-hospital death, corresponding to the severity score of ! 6 of the World Health Organization ordinal scale for clinical improvement. 12 Measurements, including glomerular filtration rate, liver function tests, blood oxygen saturation, and platelet counts, were extracted from the OMOP CDM. The glomerular filtration rate was estimated using the Modification of Diet in Renal Disease equation for patients with age ! 18 and Schwartz equation for patients with age < 18, respectively. 13 Measurements of oxygen saturation and supplemental oxygen requirements were also extracted from the EHR narratives along with their values using natural language processing. Primary terminology related to oxygen saturation or supplemental oxygen delivery was recognized in the Emergency Department provider notes and then data were manually reviewed to identify oxygen saturation and the requirement of supplemental oxygen. Measured values at the time of cohort enrollment date 6 1 day were used as the initial presenting characteristics to evaluate the simulated eligibility of patients to COVID-19 interventional clinical trials, selecting the earliest measure in the case of multiple measurements. A validated algorithm was used to identify pregnancy episodes in the CUIMC OMOP database. 14 The OMOP concepts and corresponding concept codes that were used to identify the occurrence of events are presented in Supplementary Table 1 . This study was approved by the CUIMC institutional review board. The impact of the most frequently used eligibility criteria from COVID-19 clinical trials was evaluated using EHR data from COVID-19 patients hospitalized at CUIMC. The joint influence of multiple criteria was estimated using synthetic eligibility criteria, designed using the most commonly used cutoff values. The number of potentially eligible patients, the proportion of patients with events during the follow-up period, and the required sample size to detect 20% reduction of composite events with 80% power and a significance level alpha of 0.05 were compared among synthetic trials with different thresholds for the same set of criteria. In the case of a censored event, the observation is censored on the last date for which EHR data are available from the CUIMC OMOP database. For time-to-event analysis, the cumulative incidence was compared among different subgroups stratified by criteria. Log-rank test with post-hoc adjustment with false discovery rate was used to test the significance of difference in survival analysis. Cohort generation and descriptive analysis were done using Python v3.5. Plotting figures and sample size calculation were done using R v4.0. There were 288 interventional trials (301 cohorts; analysis was done at the cohort level hereafter) targeting COVID-19 in the United States as of June 18, 2020. Frequently used exclusion criteria of COVID-19 trials are presented in Supplementary Table 2 . Medical concepts that belong to the condition, drug, and measurement domains of the OMOP CDM were frequently used in the eligibility criteria of COVID-19 trials. We evaluated the inter-annotator agreement as an F1 score on instance-level in a subset (32/288) of annotated trials. 15 The instance-level F1 scores for condition, drug, and measurement concepts were 0.789, 0.845, and 0.884, respectively. We evaluated the following frequently used criteria: 1) age, 2) pregnancy, 3) oxygen saturation, 4) liver function (AST, ALT), 5) platelet counts, and 6) estimated glomerular filtration rate (eGFR). The most frequent minimum age was 18 years, with 261 trials (86.7%) using this criterion. Only 24 trials (8.0%) included pediatric patients. Six trials excluded patients over 65. The next most frequent exclusion criteria were pregnancy, oxygen saturation, hepatic impairment, requiring mechanical ventilation, ratio of arterial oxygen tension to inspired oxygen fraction, platelet counts, and renal impairment. Criteria concerning mechanical ventilation and ratio of arterial oxygen tension to inspired oxygen fraction were not evaluated because those are related to composite events in the study cohort. A total of 3251 COVID-19 patients were identified from the CUIMC EHR. The baseline characteristics of patients are presented in Table 1 . Median age was 65 years (IQR, 50-77 years) with 4.1% (n ¼ 132) pediatric patients and 49.5% (n ¼ 1609) of patients above 65 years old. The percentage of male patients was 54.4% (n ¼ 1767). About 44.2% of patients (n ¼ 1360) had an eGFR less than 60 mL/min/1.73m 2 . Median values for liver enzymes were 29 U/L (IQR, 18-51 U/L) for ALT and 42 U/L (IQR, 28-70 U/L) for AST. The level of ALT and AST was within normal range in 75.0% (n ¼ 2169) and 43.8% (n ¼ 1276) of patients, respectively. The values of ALT and AST were greater than 5 times upper limit of normal (ULN) in 1.6% (n ¼ 47) and 4.1% (n ¼ 120) of patients. About 42.9% of patients (n ¼ 969) required supplemental oxygen or showed 93% of oxygen saturation at room air. Platelets were less than 50 x10 3 /lL in 19 (0.6%) patients. At the time of hospital admission, 49 (1.5%) patients were pregnant. Supplementary Figure 1 shows the cumulative proportion of patients meeting the eligibility criteria concerning eGFR, AST/ALT, oxygen saturation, and platelets for trials that used 1 of those criteria. If a clinical trial includes patients with eGFR above 60 mL/min/ 1.73m 2 , only 56% of the COVID-19 patients would qualify ( Figure S1A ). Relaxing the threshold to 30 mL/min/1.73m 2 would make 81% of the patients eligible. Using liver enzyme criteria with cutoff values of 3 times ULN, 97% of the patients would qualify ( Figure S1B ). Figure 1 shows the proportion of trials applicable to patients with specific measurement values. Patients with eGFR ! 30 mL/ min/1.73m 2 were eligible for 87% of the trials ( Figure 1A ). Patients with AST or ALT level equal to 3 times ULN were eligible for 89% of the trials ( Figure 1B) . Patients with oxygen saturation at room air of 93% were eligible for 69% of the trials ( Figure 1C ). The most common cutoff for platelet counts was 50 x10 3 /lL. Patients with platelet counts of <50 x10 3 /lL were eligible for only 12% of the trials ( Figure 1D ). The median follow-up period of patients in the cohort was 10 days (IQR 4-28 days). The composite event of mechanical ventilation, tracheostomy, or in-hospital death occurred in 18.1% (n ¼ 587) of the total cohort. The number of composite events stratified by baseline characteristics are presented in Table 1 . More events were observed in males (20.1%, 355/1767) than females (15.6%, 232/1483) and in patients with age over 65 (28.8%, 464/1609) than in adults 65 (7.9%, 120/1510) or pediatric patients (2.3%, 3/ 132). While the composite events occurred in 10.8% of patients with eGFR ! 60 mL/min/1.73m 2 , more than 30% of patients with eGFR < 30 mL/min/1.73m 2 had events. Although the number of patients with AST or ALT > 5 times ULN was small, the proportions of patients with composite events were higher in these groups (17/ Figure 2 shows the cumulative incidence of composite events over the duration of follow-up period stratified by age groups, eGFR levels, and oxygen saturation. Overall, the difference in the occurrence of composite events was statistically significant for different age groups, renal functions, and oxygen saturation levels (Figure 2 Figure 2A ). Cumulative incidences of composite events over the duration of follow-up period according to a different age with more detailed groupings are presented in Figure 2B . Patients with eGFR ! 60 ml/min/1.73m 2 had significantly lower risk for the occurrence of composite events in comparison to eGFR < 60 ml/min/1.73m 2 ( Figure 2C ). Patients with oxygen saturation at room-air > 93% had significantly lower incidence of composite events than those with supplemental oxygen requirement or oxygen saturation 93% ( Figure 2D ). The number of eligible patients and corresponding incidence of composite events were modeled using a synthetic clinical trial with the following inclusion criteria: 1) Age ! 18; 2) eGFR ! 30 mL/min/ 1.73m 2 ; 3) AST/ALT 5 times the ULN; 4) not pregnant; 5) Oxygen saturation at room air 93% or requirement of supplemental oxygen; and 6) platelets ! 50 x10 3 /lL, and 2051 COVID-19 patients with available eGFR, AST/ALT, platelets, and oxygen saturation levels was used in the estimation. In the synthetic trial, 33.6% (690/2051) of patients were eligible and 22.2% (153/690) of eligible patients had composite events (Table 2) . Patients were mostly ineligible because of the oxygen saturation (n ¼ 1153) followed by the renal function (n ¼ 405). Ineligible patients showed comparable risks regarding the incidence of composite events when compared to potentially eligible patients (Figure 3 ; HR 0.92 (95% CI 0.75-1.12), P ¼ .4). The number of eligible patients, composite events, and sample size required to detect 20% reduction in event rates according to different set of eligibility criteria are presented in Table 2 . With modifying the thresholds for age, renal, and liver functions, more patients were potentially eligible for the trials and more events could be observed during the follow-up. Selected alternative sets of eligibility criteria that enable study sponsors to decrease the required sample size are presented in Table 2 . We recommend sponsors use "Age ! 50" instead to enrich the event rates. Sponsors could adopt 15 mL/min/1.73m 2 and 10x ULN as thresholds for renal and liver functions to enrich the event rates, respectively. Alternatives that have shown more accrual and more event rates are presented in Supplementary Table 3 . The number of interventional COVID-19 clinical trials is growing rapidly. Without knowing the baseline characteristics and incidence The ULN of AST was 37 U/L. of events of interest of COVID-19 patients, the protocols of previous clinical trials and drug monographs were the only knowledge sources for clinical trial designers when they specified eligibility criteria. With the availability of real-world data on COVID-19 patients, it is possible to better estimate the influence of eligibility criteria on accrual and outcome rates and to develop data-driven criteria design. The influence of eligibility criteria for age, renal function, liver function, oxygen saturation, platelets, and pregnancy on patient selection and event observation for COVID-19 trials was evaluated. For the criteria concerning age, renal function, and oxygen saturation, the adoption of certain cutoff values could lead to the exclusion of a significant number of patients and observable events as well as to the inclusion of patients who do not contribute much information. Regarding the age range, the most common age range criterion was "18 years or older." While patients most vulnerable to COVID-19 (ie, those with age over 65), qualified for most of the trials, pedi-atric patients were largely excluded from most ongoing trials. In our cohort, 2.3% (3/132) of pediatric patients experienced the composite events, implying pediatric patients were less likely to have severe pulmonary events when compared to adults but are not completely free of risks. 16, 17 Recently, a multisystem inflammatory syndrome in children has been reported and the association with COVID-19 is being investigated. 18, 19 Clinical presentation and severe outcomes commonly involved the intestines and heart, with very distinctive features from acute COVID-19 progression in adults. 20 Therefore, pediatric patients should be studied in trials tailored to the unique progression of COVID-19 in children rather than being excluded or being included in trials recruiting all age ranges. Among the measurements, oxygen saturation was the most commonly used eligibility criterion of the COVID-19 trials. However, the way in which each clinical trial formulated criteria related to oxygen saturation varied. Some trials simply mentioned the numerical cutoff for oxygen saturation, while others provided more information about the delivery methods or flow rates. In some cases, other respiratory indexes including respiratory rates and ratio of arterial oxygen tension to inspired oxygen fraction ratio were also added to form a combined criterion in order to enroll patients meeting at least 1 of the criteria. Regardless of the complexity of eligibility criteria, patients with low oxygenation status had more opportunities to participate in clinical trials. Patients with supplemental oxygen requirement or oxygen saturation 93% had significantly higher incidence of severe events in our evaluation. Observable events could be enriched by excluding patients with room-air oxygen saturation above 93%. Only 1.5% (49/3251) of the cohort was pregnant at enrollment; composite events were not observed among them. When compared to nonpregnant patients, pregnancy does not seem to confer additional risk regarding the clinical outcome of the COVID-19. [21] [22] [23] Given the relatively small number of pregnant patients hospitalized in each institution, a collaborative approach with a multisite clinical trial or registry is more feasible to generate COVID-19-related clinical trial evidence in pregnant patients. 24 When the occurrence of events was evaluated stratified by eGFR, the rate of the composite event drops for patients with eGFR ! 60 mL/min/1.73m 2 . To enrich the study population, sponsors should consider enrolling patients with moderate renal impairment, although it will depend upon the tolerability and pharmacokinetic properties of the interventional agent. For example, remdesivir, baricitinib, and tocilizumab are currently being tested for treatment of COVID-19 and can be administered to patients with eGFR ! 30 mL/min/1.73m 2 . Using the eGFR criteria of ! 30 or 45 mL/min/ 1.73m 2 would have the dual benefit of expanding the pool of eligible patients and enriching the trial with a group of patients who have a greater need for a potentially effective drug. For patients with eGFR < 30 mL/min/1.73m 2 , retrospective evaluation could be an alternative if it is not feasible to include these patients as trial participants considering the fragile clinical state in this subgroup. 25 By assessing commonly used eligibility with real-world data, we identified the alternate thresholds for eligibility criteria in COVID-19 trials. Ongoing and planned clinical trials could consider revisiting their eligibility criteria, including the modification of certain criteria to help with accrual and to enrich for patient subgroups at risk. According to ClinicalTrials.gov, the median number of target enrollments in the Phase 3 trials targeting COVID-19 was 325. Given the required number of patients to detect the efficacy of drug in our analysis, those trials might be underpowered. Baseline distribution of characteristics and observed incidences of composite events of our study could serve as a reasonable rationale for trialists to revisit their eligibility criteria for efficient accrual and appropriate statistical power. This study has a few limitations. First, study results were generated from a single-center cohort, which, although sizeable, may not be representative of populations at other locations and nonhospitalized . Cumulative incidence of mechanical ventilation, tracheostomy, or in-hospital death by eligibility to a hypothetical clinical trial using common eligibility criteria. Inclusion criteria for this hypothetical trial are as follows: 1) Age ! 18; 2) eGFR ! 30 mL/min/1.73m 2 ; 3) AST/ALT 5 times the ULN; 4) not pregnant; 5) Oxygen saturation at room air 93% or requirement of supplemental oxygen; and 6) platelets ! 50 x10 3 /lL. Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; eGFR, estimated glomerular filtration rate; ULN, upper limit of normal. patients. Previous studies indicated that COVID-19 treatment outcome could be influenced by the availability of medical resources or ethnicity. 26, 27 The baseline characteristics of our study population were comparable to the patient characteristics in other studies (Supplementary Table 4 ). 28, 29 In an international study that included 95 hospitals across 5 countries, the laboratory trajectories including hepatic and renal function were consistent on the day of COVID-19 diagnosis. 30 However, the characteristics of the COVID-19 patient population may be evolving as the pandemic progresses and its epicenters move geographically. Further studies are warranted to test the generalizability of the results to cohorts with different demographics or comorbidities. Our analytical software was based on the OMOP CDM and can be easily run on other OMOP CDMcompliant clinical databases, enabling future studies like this in different institutions. Secondly, we focused our evaluation of the influence of eligibility criteria against severe events. A better understanding of patient cases who progress from mild to moderate is as important as that of severe cases. Future research is needed to evaluate the impact of eligibility criteria on the occurrences of moderate-level events. Thirdly, this retrospective cohort study used the CUIMC OMOP database as a primary data source for the analysis. Event of deaths that might have occurred outside the hospital were not able to be included in the analysis. During the early period of the pandemic, the location of death was unknown or not specified in 26.5% of the decedents. 31 Linking to death registry or claims data would lead to a more precise estimate of the death rate. In addition, the stored EHR data might be incomplete and/or inconsistent, and the data collection process could be biased. 32 The conversion of EHR data to OMOP can introduce additional data quality issues; however, the conversion process involves several data quality checks, and other multicenter observational studies using this database have shown consistency between CUIMC and other healthcare databases. 11, 33 Fourth, the values of measurements are dynamic in nature. The eligibility status of patients could be temporal and hence change as the patients are being followed-up. Future research is needed to address the impact of dynamic change of clinical characteristics on patients' eligibility. We present a novel of use electronic health records data for estimating the influence of eligibility criteria of COVID-19 trials on patient accrual and the incidence of severe events using EHR and provide data-driven recommendations for the thresholds for age, oxygen saturation, kidney and liver function, achieving potential COVID-19 trials with greater power without increasing sample sizes. This method promises to improve feasibility and efficiency for COVID-19 clinical trial recruitment. This study was sponsored by National Library of Medicine grant 5R01LM009886-11 and National Center for Advancing Clinical and Translational Science grants UL1TR001873 and 1OT2TR003434-01. JHK wrote the manuscript; JHK, CNT, CS, CL, AMB, JRR, HL, SML, MSVE, and CW edited the manuscript; JHK, CS, and CW designed the research; JHK analyzed the data; JHK, AO, CNT, CL, JL, and HL acquired and interpreted the data; PBR defined the phenotype; AMB and LAS manually reviewed the eligibility criteria; LE contributed to the natural language processing of notes in the electronic health records. Supplementary material is available at Journal of the American Medical Informatics Association online. An interactive web-based dashboard to track COVID-19 in real time Ongoing clinical trials for the management of the COVID-19 pandemic Against pandemic research exceptionalism COVID-19 coronavirus research has overall low methodological quality thus far: case in point for chloroquine/hydroxychloroquine The quality of the reported sample size calculation in clinical trials on COVID-19 patients indexed in PubMed Harmonizing heterogeneous endpoints in COVID-19 trials without loss of information: an essential step to facilitate decision making Evaluating Inclusion and Exclusion Criteria in Clinical Trials The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty Feasibility and utility of applications of the common data model to multiple, disparate observational health databases Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study World Health Organization. COVID-19 Therapeutic Trial Synopsis Assessment of glomerular filtration rate in acute and chronic settings Inferring pregnancy episodes and outcomes within a network of observational databases Annotating German clinical documents for deidentification Epidemiology of COVID-19 among children in China Characteristics and outcomes of children with coronavirus disease 2019 (COVID-19) infection admitted to US and Canadian pediatric intensive care units Multisystem inflammatory syndrome related to COVID-19 in previously healthy children and adolescents in New York City Clinical characteristics of 58 children with a pediatric inflammatory multisystem syndrome temporally associated with SARS-CoV-2 Multisystem inflammatory syndrome in US children and adolescents Clinical characteristics of pregnant women with Covid-19 in Wuhan, China Characteristics and outcomes of pregnant women admitted to hospital with confirmed SARS-CoV-2 infection in UK: national population based cohort study Intensive care unit admissions for pregnant and nonpregnant women with coronavirus disease 2019 A call for action for COVID-19 surveillance and research during pregnancy Remdesivir in Patients with Acute or Chronic Kidney Disease and COVID-19 Potential association between COVID-19 mortality and health-care resource availability The impact of ethnicity on clinical outcomes in COVID-19: a systematic review Characteristics and predictors of death among 4035 consecutively hospitalized patients with COVID-19 in Spain Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium Characteristics of persons who died with COVID-19-United States Biases in electronic health record data due to processes within the healthcare system: retrospective observational study Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis The authors would like to thank Carol Friedman for the task related to natural language processing of the EHR notes; Karthik Natarajan for the provision and the maintenance of the OMOP CDM; and Ning Shang for offering advice for EHR data interpretation. None declared.