key: cord-1010568-g0q8dq3g
authors: Dash, Dev; Gokhale, Arjun; Patel, Birju S.; Callahan, Alison; Posada, Jose; Krishnan, Gomathi; Collins, William; Li, Ron; Schulman, Kevin; Ren, Lily; Shah, Nigam H.
title: Building a Learning Health System: Creating an Analytical Workflow for Evidence Generation to Inform Institutional Clinical Care Guidelines
date: 2022-03-02
journal: Appl Clin Inform
DOI: 10.1055/s-0042-1743241
sha: a98617f38a22c523fcb715e09afb057a4c92312a
doc_id: 1010568
cord_uid: g0q8dq3g

Background One key aspect of a learning health system (LHS) is utilizing data generated during care delivery to inform clinical care. However, institutional guidelines that utilize observational data are rare and require months to create, making current processes impractical for more urgent scenarios such as those posed by the COVID-19 pandemic. There exists a need to rapidly analyze institutional data to drive guideline creation where evidence from randomized control trials are unavailable. Objectives This article provides a background on the current state of observational data generation in institutional guideline creation and details our institution's experience in creating a novel workflow to (1) demonstrate the value of such a workflow, (2) demonstrate a real-world example, and (3) discuss difficulties encountered and future directions. Methods Utilizing a multidisciplinary team of database specialists, clinicians, and informaticists, we created a workflow for identifying and translating a clinical need into a queryable format in our clinical data warehouse, creating data summaries and feeding this information back into clinical guideline creation. Results Clinical questions posed by the hospital medicine division were answered in a rapid time frame and informed creation of institutional guidelines for the care of patients with COVID-19. The cost of setting up a workflow, answering the questions, and producing data summaries required around 300 hours of effort and $300,000 USD. Conclusion A key component of an LHS is the ability to learn from data generated during care delivery. There are rare examples in the literature and we demonstrate one such example along with proposed thoughts of ideal multidisciplinary team formation and deployment.

In 2015, the National Academy of Medicine set a goal for 90% of clinical decisions by 2020 to be supported by accurate, timely, and up-to-date clinical information reflecting the best available evidence. 1 While randomized control trials (RCTs) remain the gold standard for producing clinical evidence, only 65%, and as little as 14% for some medical disciplines, of clinical decisions are supported by RCTs, and the scope of these decisions is limited by cost and external validity. [2] [3] [4] In fact, the percentage of recommendations supported by RCTs has decreased in fields like cardiology over the prior decade and many of these guidelines are derived from expert opinion. 5, 6 There exists a clear need for other evidence sources to support guideline generation.

Learning health systems (LHSs) may bridge this evidence gap by creating the infrastructure and processes to leverage data created through care delivery. 7 The National Academy of Medicine defined an LHS as a system where "science, informatics, incentives, and culture are aligned for continuous improvement and innovation, with best practices seamlessly embedded in the delivery process and new knowledge captured as an integral byproduct of the delivery experience." 8, 9 Though definitions of an LHS vary, a core tenet is the ability to analyze care delivery and learn from institutional data. Given the increase in the number of hospitals with a certified electronic health record (EHR), learning from a health system's own care delivery and visualizing findings in a meaningful manner should be possible. 10, 11 With the assistance of a librarian, a literature search was conducted in the PubMed to locate relevant English language literature published between January 2010 to November 2021 and used relevant keywords related to "hospital guideline" and "development." Of the 89 articles returned regarding how institutional guidelines are created, we found that most relied on expert opinion and review of the literature. Only two articles discussed the use of observational data to inform creation of institutional guidelines. Both related to antibiotic stewardship and had limited duration with time frames that would have limited utility in more urgent situations. 12, 13 Therefore, while the concept of an LHS has generated significant interest, there are no examples of wide-scale use of observational data that address a range of disciplines and are intended to persist to continuously inform institutional care guidelines. The ability to learn from observational health data remains challenging across health systems and analyzing these data to generate usable evidence requires significant effort. 14, 15 We have previously demonstrated the ability to answer questions prompted during individual patient care. 16 In 2019, our institution led the world's first pilot of an informatics consultation service using routinely collected data on millions of individuals to provide on-demand evidence not addressed by primary literature or other data sources. [17] [18] [19] Although this need was first identified over a decade ago, 20 we are aware of few health systems and organizations that have been able to translate foreground questions to generate insights such as Kaiser Permanente, Geisinger, the Food and Drug Administration Sentinel Initiative, Mental Health Research Network, and Vaccine Safety Datalink. 21 While these efforts were remarkable in their translation of observational data into clinical insights, a novel contribution of our work is the short timeframe from need identification to data to guide the design of practice guidelines. 16, 19 The COVID-19 pandemic presented unprecedented challenges, and while some systems were able to rapidly implement surveillance mechanisms, 22 the pandemic exposed equipment and personnel capacity limitations as well as a shortfall of systems to perform rapid analyses of observational data. 15 On a local level, in early January 2021 at Stanford Hospital, a 605 bed quaternary care hospital located in Stanford, California, the general medicine service appeared at risk of being overwhelmed by new cases of COVID-19. The hospital medicine division sought to standardize care and rapidly create and disseminate guidelines for the inpatient management of these patients leveraging existing data that had been generated during the care of COVID-19 patients.

In the current work, we describe the creation of an interdisciplinary team anchored by clinical informatics fellows that used existing institutional infrastructure 18, 19, 23 to generate timely and actionable insights from existing data to support the design of institutional management guidelines for COVID-19 patients.

We describe both the process and infrastructure, as well as the evidence generated to inform hospital medicine division guidelines for the care of COVID-19 patients as an example of what an LHS can accomplish. We conclude with a discussion of future steps on how institutions can leverage their existing fellowship programs, informatics teams, and data science expertise to jump-start LHS efforts.

The workflow and methodology of our institution's previous informatics consultation service was repurposed to answer questions for near real-time clinical decision support at a population level for continuous guideline evolution, as shown in ►Fig. 1. 19 This was accomplished by a multidisciplinary team consisting of practicing clinicians, electronic medical record reporting specialists, data scientists, and clinical informaticians (CIs). Briefly, the workflow consisted of specifying clinical questions, translating the details of those questions into patient cohort definitions which are then executed as queries to our clinical data warehouse (CDW), and conducting statistical analyses appropriate to the question(s) over the retrieved patient data. The question, cohort definitions, analyses, and their results were then summarized as written reports and discussed with the requesting team in an in-person debrief. Details of the workflow, data retrieval, and analysis tools developed for this service are further described in our previous work. 19 

Practicing clinicians from the hospital medicine division (end-users) identified clinical questions with practice changing implications to inform the creation of guidelines for the management of patients with COVID-19. This operational need arose from a need to mitigate a predicted upcoming inpatient surge of COVID-19 patients, with a turnaround request of less than 2 weeks for the results. To do so, an analytical process was created to streamline the process.

The aforementioned team was assembled by an associate chief information officer (CIO; N.H.S.) and the databases containing sufficient information to run the study were identified. The Epic Clarity and STARR-OMOP (Stanford Research Repository-Observational Medical Outcomes Partnership), a separate CDW that has converted the Epic Clarity schema to the OMOP common data model (CDM), were deemed to contain the necessary clinical information and this informed the staffing of EHR database specialists with expertise in those databases. Data scientists with an understanding of data integrity issues and experience with clinical research and communication were also required to summarize findings in an actionable manner. CIs with understanding of clinical workflows and the clinical data model were needed to identify where key data elements were generated and stored as well as perform data validation.

Practicing clinicians posed questions which were then formalized into Population/Intervention/Comparator/Outcome (PICO) questions by a data science and CI team. EHR-based phenotyping was performed by this team to define the cohorts of interest. Patient cohorts were created by the data scientists using the Observational Health Data Science and Informatics (OHDSI) ATLAS Cohort Creation Tool, 24 an open-source, web-based clinical data search tool designed to be used on the OMOP CDM and operating over Stanford Medicine's CDW, STARR. 25 With the cohorts defined, the two databases were accessed separately and an iterative process of data validation was performed by the clinical informaticists in conjunction with data scientists and database specialists. During this process, EHR database specialists, data scientists, and CIs focused on creating cohort definitions given knowledge of the clinical workflow and where data were stored in the Clarity database. This was followed by sanity checks on cohort generation and troubleshooting issues that affected sensitivity and specificity of the defined phenotypes. As an example, there were multiple ways to define ventilated patients such as O 2 delivery method, documentation on ventilator flow sheet rows, new addition of an airway record, and signing of a new ventilator setting order. Troubleshooting via database queries and verification of retrieved data revealed that initiation of a new ventilator setting order was the most accurate of the above definitions. Other examples of findings from this process included the retirement of an old C-reactive protein (CRP) laboratory record in use prior to COVID, redefining the "COVID positive" phenotype to use COVID nasal swab test positivity rather than International Classification of Diseases (ICD) codes, excluding patients who had "do not intubate" orders from the denominator of potential intubation events, and expanding criteria to find adult patients who had been accidentally registered under pediatric testing departments. These final phenotypes are included in ►Supplementary Appendix A (available in the online version) for reproducibility. Once the cohorts were properly defined, we ascertained the location of salient pieces of information within the EHR, through synchronous meetings between CIs and database specialists. In our cohort, a COVID-19 test positivity as defined SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) test result "detected" was more specific than using the application of a COVID-19 ICD code. Other features within the EHR such as D-dimer, CRP, supplemental oxygen information, and ventilator usage required iterative meetings with the database specialists to ensure accuracy of cohort generation. Data scientists then conducted analyses to answer the questions received and summarize the results in conjunction with CIs through written and oral reports.

Finally, debrief sessions were scheduled to perform a warm handoff of the reports to end-user clinicians who then utilized them in creating and refining clinical care guidelines. 1 This figure illustrates the iterative cycles that can be performed to create and continuously update institutional guidelines. The process starts with a clinical need that arises from an institutional concern (e.g., resource constraints at our institution during COVID-19 pandemic). This concern is then translated into a PICO-T question as described in the Methods section. Next, report writers query the clinical data warehouses and this information is validated by clinical informaticians. Once the data are validated, data scientists create report summaries and the evidence is presented to the initiators of the request to inform clinical guideline creation. As a clinical need evolves over time, this process can undergo further iterative cycles to refine guidelines to adapt to a changing environment. We used this approach to answer four clinical questions posed by the hospital medicine division pertinent to the management of COVID-19 patients. These results were then disseminated via a presentation at hospital grand rounds and electronically via email.

We were able to successfully answer all four questions, with answers summarized here for brevity and detailed in a full PICO format in ►Supplementary Appendix A (available in the online version The answers to these questions informed institutional guideline creation, helped alleviate concerns about resource constraints, and informed possible measures in the setting of high levels of ICU utilization. In particular, data demonstrating that patients discharged with home oxygen compared with those discharged on room air had similar 30 day readmission rates to the ED-alleviated concerns about needing to increase patient length of stay until the resolution of an oxygen requirement. Observing that 14.3% of patients with significant oxygen requirements progressed to intubation helped address concerns about being able to predict ICU demand and reassure clinicians that a significant proportion of these patients could be monitored on the general medical service if needed. Finally, seeing the potential prognostic value of an admission D-dimer for ICU transfer and VTE helped inform the development of screening guidelines for VTE on the COVID-19 service. The cohorts were not of sufficient size to enable subgroup analyses by age, sex, race, or ethnicity.

Answering the questions using our re-purposed informatics consult service required over 300 hours of effort from a multidisciplinary team. These hours included efforts in performing an intake of the clinical questions, refining the questions into a PICO format, performing data integrity checks, and creating data summaries. The hours were then converted into a dollar amount using standard salary rates. After accounting for the fixed costs in creating the information technology, informatics, and data science infrastructure, the total approximate cost amounted to $300,000 USD for the four reports.

Currently, institutional clinical management guidelines rely on a combination of literature review and expert opinion. Health care systems have the requisite data, expertise, and staffing to rapidly generate evidence from local institutional data for clinical questions that are unaddressed in traditional literature sources. The urgent nature of the above questions, posed at a time when it appeared COVID-19 cases may overwhelm our hospital, highlights the importance of creating institutional infrastructure to rapidly generate insights from observational data. Such infrastructure must include data warehouses, a multidisciplinary team, and an analytical process. We utilized such a process to answer questions about oxygen therapy, VTE risks, and ICU transfer risks as shown, and demonstrate the feasibility of rapidly generating data summaries using observational data to support institutional guidelines.

The four key questions were identified and answered within a 2-week time frame to help inform guidelines for the care of patients with COVID-19 on the hospital medicine service which would have been helpful in supporting triage decisions had the COVID-19 surge happened. Our efforts to define and allocate resources needed to operationalize this analytic process help inform future planning. Although we presented one disease process and four associated questions, similar inquiries can be generalized to clinical questions regarding management across all medical specialties. With proper resources and an analytic process in place, given sufficient patient level data, it should be possible to individualize recommendations as more homogenous cohorts can be created through subsetting. 26 This effort also demonstrated the value of an interdisciplinary team in answering clinically oriented questions using EHR data and the role that board-certified CIs can play on such teams. While each team member had a specific role at different points in the process, continuous collaboration between all was critical in refining questions and adjusting to issues presented by the data. Informaticians played a particularly important role in ensuring that the populations and interventions of interest were appropriately defined in order for the EHR specialists and data scientists to accurately build patient cohorts. Broad questions of interest to clinicians such as "Can patients be discharged with home oxygen safely?" needed to be translated into detailed population, intervention, and outcome definitions by CIs and accordingly, collaboration was critical in building patient cohorts. For example, consensus and compromise between the data science team and clinicians were needed to define an effective but not overly complicated phenotype for an intubation event. Collaboration between the groups also led to the discovery of key data integrity issues, such as intermittent registration of adult patients that showed up initially under pediatrics emergency medicine due to legacy construction hold overs, inclusion of surgical patients without COVID-19 who had been given COVID-19 encounter diagnoses as part of surgical prescreening, and the exclusion of a D-dimer test that had been retired during the middle of the study period as it was not part of the currently available laboratory tests. Rectifying these issues led to significant changes in cohort sizes. Additionally, initial reports of the intubation rate for patients with a high oxygen requirement were lower than expected by clinical suspicion, leading the clinical team to review patient charts and find that patients with do not intubate orders had not been excluded from the initial cohort. These data validation steps would not have been possible without knowledge about inner workings of our hospital's clinical workflows, data scientists with a high degree of familiarity with the EHR, and involving clinicians who were involved in care of the patient cohort of interest. Although the particular application was a singular use case, it expands upon our previously described informatics consult service from generating personalized real-world evidence to the use of these data to support clinical care guidelines. 18 While this effort did not focus on iterating within a specific disease, it serves as a proof of concept, and the underlying strategies are fundamental to iterative cycles that continuously provide evidence within our institution and will inform future clinical care guideline creation across diseases. Additionally, in the event of a changing institutional concern in relation to COVID-19, the created infrastructure can be easily adapted to support evolving clinical care guidelines.

The process, as detailed in the Methods section, relied on resources such as the ATLAS cohort analysis tool and the STARR-OMOP database as well as human resources. Indeed, database management and assessment has been highlighted as critical in the development of an LHS. 15 To create and deliver these four reports in less than 2 weeks, over 300 hours of effort from a multidisciplinary team comprising 11 members including clinical end-users, clinical informaticists, data scientists, and EHR data specialists were required at a total estimated cost of $300,000 USD. While there were large initial costs in team formation and defining an effective workflow, the marginal cost of producing additional reports is likely close to $20,000 USD. This cost also does not take into account the research and development necessary to create and maintain the tools and databases, which are supported via the office of the senior associate dean of research. The majority of the effort was spent in performing steps upstream from data analysis such as population definition, data extraction, and data validation, suggesting that to improve the cost effectiveness of future projects, efforts should be spent on supporting diverse and complex data extraction processes and ensuring data integrity of the data sources. In comparison, estimates of running an RCT from phases 1 to 3 come to $285 million USD in 2014 and additionally required 6.4 years. 27 There were multiple ways in which our process could have been streamlined to help reduce costs and turnaround time. An ideal LHS analytic process starts with high-level institutional support, especially to facilitate protecting time for personnel such as specific EHR/database professionals and clinicians and financially supporting the creation and maintenance of data warehouses. Clinical questions asked by endusers should be presented in a PICO format to provide clarity regarding the specific ask. A clear point of contact, ideally CIs, is helpful to refine these clinical questions. Additionally, a clear turnaround time and rationale of the PICO question from the clinical team will help minimize scope creep. After this initial intake, the PICO question is to be further refined by CIs along with data scientists who will then interface with database experts in an iterative manner for optimal phenotyping and real-time data validation. An understanding of which data items are present in which databases is key to knowing what questions can be answered. Having multiple teams query different databases can help with verifying data fidelity as well as containing costs. 28 The data validation steps are laborious and not easy to streamline, but critical for uptake of the final results, and best done by CIs who have a clear understanding of the workflow and data model. Random sampling of patient charts and comparing intermediate results pulled from relational databases to operational facing databases (e.g., ones used by business intelligence) is helpful at this phase. Once data validation steps are complete, the data science team can generate statistical reports and visualizations for the final report. Deadlines should be set ahead of time for the aforementioned steps and dissemination of this final report should be tailored to institutional culture and take advantage of existing communication workflow and consistent with each report. This step is as important as the others and necessary to operationalize to oppose the inertia of deploying well-validated clinical knowledge into clinical practice, historically thought of years-long timeframes. 29 This should be done by local leaders and in particular, CIs. Having been involved throughout the entire workflow, CIs are well suited to this final task and accordingly, this role of practicing physicians has been highlighted at other systems. 30 Data generated during care delivery can be used in a variety of ways including in quality metric reporting, clinical guideline creation, and answering questions about specific patients. For health systems interested in leveraging their existing data to rapidly generate evidence, we are freely sharing our tools and lessons learned. 19, 23 In our case, we are fortunate that a surge of COVID-19 patients did not occur, but we were well equipped in the event that it did, in part because of the insights derived by our COVID-19 data science task force.

While the merits of using observational data for evidence generation are often debated, there is a clear argument for utilizing available data to inform questions that are unaddressed in the literature and need to be answered urgently. 15 While efforts exist to answer patient-level questions, 18, 21 an LHS also needs to be able to rapidly answer operational questions. The management questions posed by the COVID-19 pandemic is one such example. Our experience shows that by utilizing interdisciplinary teams, judicious use of observational data can support the rapid creation of evidencebased guidelines. By employing a high degree of collaboration, data scientists and clinicians can validate the utility of a clinical question, define the populations and interventions of interest, validate the integrity of the data, and define analytic methods to protect against misleading results. Given the interdisciplinary nature of this analytic process, CIs are well suited to lead endeavors such as these.

A learning health system should be able to provide actionable insights through analysis of available observational data. An institution-supported multidisciplinary team can execute the stages of our proposed analytic process to produce evidence that can be incorporated into guidelines. The creation of such a process will likely require significant upfront investment but can provide value to the health care system through insight generation and improved patient outcomes. Correct Answer: The correct answer is option c. The results of a 2-day workshop in 2006 were published as a summary in 2007 by the Institute of Medicine, now known as the National Academy of Medicine. 1 Real-time digital access to health data, empowered patients, aligned incentives, and proper organizational culture are all necessary components to a functioning LHS. Further reports from IOM/NAM delineate rising costs, complexities, economic, and quality barriers to widespread development of LHSs.

This manuscript discusses development of an analytic process and reports QA results which do not constitute human subjects research. 

Integrating Research and Practice: Health System Leaders Working Toward High-Value Care: Workshop Summary

Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review

A literature review on the representativeness of randomized controlled trial samples and implications for the external validity of trial results

How good is the evidence to support primary care practice?

Scientific evidence underlying the ACC/AHA clinical practice guidelines

Levels of evidence supporting American College of Cardiology/American Heart Association and European Society of Cardiology Guidelines

Bridging the inferential gap: the electronic health record and clinical evidence

The Author(s)

Toward a science of learning systems: a research agenda for the high-functioning learning health system

Best Care at Lower Cost: The Path to Continuously Learning Health Care in America

Non-federal acute care hospital electronic health record adoption

Graphical presentations of clinical data in a learning electronic medical record

Development of institutional guidelines for management of gram-negative bloodstream infections: incorporating local evidence

An antimicrobial stewardship program with a focus on reducing fluoroquinolone overuse

How learning health systems learn: lessons from the field

Advancing the learning health system

Evidence-based medicine in the EMR era

A 'green button' for using aggregate patient data at the point of care

It is time to learn from patients like mine

Using aggregate patient data at the bedside via an on-demand consultation service

Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care: Workshop Series Summary

Data consult service: can we use observational data to address immediate clinical needs?

Rapid implementation of a complex, multimodal technology response to COVID-19 at an integrated community-based health care system

A new paradigm for accelerating clinical data science at Stanford Medicine

Observational Health Data Sciences and Informatics GitHub Library

Enabling individualised health in learning healthcare systems

Examination of Clinical Trial Costs and Barriers for Drug Development: Report to the Assistant Secretary of PLanning and Evaluation (ASPE). Washington, DC: Department of Health and Human Services

Barriers to achieving economies of scale in analysis of EHR data. A cautionary tale

Achieving a nationwide learning health system

Embedded research in the learning healthcare system: ongoing challenges and recommendations for researchers, clinicians, and health system leaders

The Author(s)

We would like to thank Stanford Research IT for providing material support as well as the Office of the CIO for institutional support of this initiative.