key: cord-0788646-1105ccy6
authors: Shen, Lin; Levie, Alexandra; Singh, Hardeep; Murray, Kristen; Desai, Sonali
title: Harnessing Event Report Data to Identify Diagnostic Error During the COVID-19 Pandemic
date: 2021-10-29
journal: Jt Comm J Qual Patient Saf
DOI: 10.1016/j.jcjq.2021.10.002
sha: a1adccbf929fa833bcec8afd63542f7b37eb9d0f
doc_id: 788646
cord_uid: 1105ccy6

Introduction COVID-19 exposed systemic gaps with increased potential for diagnostic error. We implemented a new approach leveraging electronic safety reporting to identify and categorize diagnostic errors during the pandemic. Methods We evaluated all safety event reports from March 1, 2020 to February 28, 2021 at our academic medical center using two complementary pathways. Pathway 1: all reports with explicit mention of COVID-19. Pathway 2: all reports without explicit mention of COVID-19 where natural language processing (NLP) plus logic-based stratification was applied to identify potential cases. Cases were evaluated by manual review to identify diagnostic error/delay and categorize error type using a recently proposed classification framework of 8 categories of pandemic-related diagnostic errors. Results We included 14,230 reports and identified 95 (0.67%) cases of diagnostic error/delay. Pathway 1 (n = 1780 eligible reports) yielded 45 reports with diagnostic error/delay (positive predictive value [PPV] = 2.5%), of which 38.1% (16/45) were attributed to pandemic-related strain. In Pathway 2, the NLP-based algorithm flagged 110 safety reports for manual review from 12,450 eligible reports. Of these, 50 reports had diagnostic error/delay (PPV = 45.5%); 94.0% (47/50) were related to strain. Errors from all 8 categories of the taxonomy were found on analysis. Conclusion An event reporting-based strategy including use of simple-NLP identified COVID-related diagnostic errors/delays uncovered several safety concerns related to COVID-19. An NLP based approach can complement traditional reporting and be used as a just-in-time monitoring system to enable early detection of emerging risks from large volumes of safety reports.

monitoring system to enable early detection of emerging risks from large volumes of safety reports.

Keywords: Safety reporting; Diagnostic Errors; COVID-19; informatics; natural language processing Diagnostic errors are receiving intense investigation in the safety community due to their high prevalence and harmful impact on patients. [1] [2] [3] Diagnostic errors affect 12 million US adult patients per year in the outpatient setting 3 and at least 0.7% of adult admissions involve a harmful diagnostic error. 4 The COVID-19 pandemic has further strained the health care system, resulting in cognitive errors, burnout, challenges with hospital resources, and a rapid shift in operational workflows that may contribute to missed and delayed diagnoses. [5] [6] [7] [8] Due to the novel characteristics of COVID-19 related disease as well as its impact on hospital capacity, staffing shortages, and burnout, Gandhi and Singh proposed that the COVID-19 pandemic would exacerbate diagnostic errors. 7 They developed a taxonomy to define 8 types of diagnostic errors that could be expected in the pandemic: Classic, Anomalous, Anchor, Secondary, Acute Collateral, Chronic Collateral, Strain, and Unintended. Classic and Anomalous refer to missed or delayed COVID-19 diagnoses, whereas the other six categories pertain to missed or delayed non-COVID-19 diagnoses that may result from factors related to the COVID-19 pandemic. The classification definitions as well as examples are described in Table 1 . This approach accounted for diagnostic errors or delays based on possible disruptions COVID-19 may have on health care providers and the health care system, and identification of these can be used in specific mitigation strategies discussed described in the original paper. 7 Examples include cognitive errors, such as various forms of availability and anchoring bias; care deferment; and effects of rapidly expanding care delivery changes, such as use of telemedicine and personal protective equipment.

At our institution, the ability to recognize COVID-19-related diagnostic errors was an important part of the COVID-19 response. This included an increased emphasis on the use of data and transparency for more rapid strategic response. 9 We thus embarked on a project to characterize diagnostic errors at our institution using the Gandhi and Singh taxonomy by leveraging our incident reporting systems. While simplified clinician reporting mechanisms have been recently developed to improve reporting of diagnostic error, incident reporting has not yet been widely used to study diagnostic error. 10, 11 There are a number of criticisms related to incident reporting systems, particularly the voluntary nature of reporting, which may lead to reporting bias and hindsight bias. [12] [13] [14] However, we felt that the data would be a readily accessible and valuable source of information about events pertaining to diagnostic errors during the pandemic.

As the project evolved, it became clear that the task of identifying COVID-19-related diagnostic errors included the need to analyze a high volume of safety event reports to discern whether a diagnostic error or delay occurred. Manual chart review for the volume of reports was not feasible given the multiple demands on our patient safety and risk management team during the pandemic; thus, we developed an informatics-based approach with the capability to preprocess large numbers of safety reports. To our knowledge, such a natural language processing (NLP) and logic-based cohort enrichment have not been applied to safety event reports to facilitate identification of COVID-related diagnostic errors.

In this paper, we describe results of our study that aimed to identify and analyze diagnostic errors at a large US health care system to identify patient safety risks during the COVID-19 pandemic. To achieve the study aims, we sought to: (1) rapidly develop a safety reporting-based workflow to identify sources of diagnostic error in our institution, especially in the context of a novel pandemic, and (2) examine application of Gandhi and Singh's classification framework in the real world and describe categories of potential diagnostic errors that were found.

We conducted the study at an academic tertiary care referral center in Northeastern United States with 753 inpatient beds and more than 135 ambulatory practices. The institution uses an electronic vendor-based safety reporting system (RL Solutions, RLDatix, London, United Kingdom) capturing both inpatient and ambulatory safety events. Approximately 10% of our safety reports are from the ambulatory setting, with the remainder from the inpatient setting.

Project managers within the Department of Quality and Safety reviewed all safety events related to COVID-19. We created customized fields within RL Solutions corresponding to the 8 classes of COVID-related diagnostic errors and delays as proposed by Gandhi and Singh. These custom fields were available to patient safety and risk management specialists who routinely review safety reports, but not to frontline staff filing the initial safety report.

Early in the pandemic, we developed a workflow using safety reports which were either manually flagged as COVID-related or explicitly mentioned COVID-19 or coronavirus. 15 All safety reports containing the keywords "COVID" or "coronavirus" were extracted from the safety report database. These reports were then manually reviewed for potential diagnostic error using the classification for COVID-related diagnostic error developed by Gandhi and Singh. Chart reviews were performed in instances where the safety reports had insufficient detail ( Figure 1 ). Because the resources needed were substantial, it was not feasible to review all safety reports to look for diagnosis-related signals.

We developed a second pathway later in the pandemic response to complement Pathway 1, which looked specifically at safety reports excluded in Pathway 1 and reduced the number of manual chart reviews through the application of an NLP approach. This second pathway included cases that may not have been explicitly linked to COVID-19 by the staff member filing the safety report. We developed the software algorithm iteratively, following methods used successfully to develop other health informatics innovations. 16 This included forming a working group with both clinical and informatics expertise, establishing design requirements, and using an agile approach for iterative development to develop a rapidly deployable tool that could be integrated into an existing workflow.

We considered two main requirements for the NLP algorithm in the design phase. First, the algorithm must be able to process a high volume of safety reports. The high volume precluded manual review of each report to assess for COVID-related diagnostic errors. Second, the algorithm must be able to rank cases to optimize the efficiency of human review. In other words, safety reports should be rank-ordered such that study personnel can review a high-yield enriched cohort of cases to discover COVID-19 related diagnostic errors. Pathway 2 included the following steps: (1) extraction of case-related details of safety reports from RL Solutions, (2) automated processing of safety report free text to categorize concepts it contained, (3) creation of a ranked list of safety reports based on number of concept categories flagged, and (4) manual case review of the enriched cohort. We wrote the algorithm in R programming language and developed a custom NLP approach using heuristic keyword checking against a custom lexicon. In essence, we parsed the free-text report narrative for the presence of specific keywords in specific concept categories. The list of keywords was derived from working group consensus. Case-insensitive string matching for keywords, partial word fragments, common abbreviations, and misspellings were used. We used 11 concept categories based on an association with the diagnostic process-"COVID", "Communication", "Testing", "Orders", "Precautions", "Workflow gap", "Patient condition", "Symptoms", "PPE", "Diagnostic", and "Care plan"-and multiple keywords for each. For example, the "Communication" concept category included the following keywords: call, video, virtual, VV, misunderst, telemedicine, hear, ipad, communic, phone. A full list of terms used for heuristic matching as well as the R code for our algorithm can be found in the GitHub repository (distributed under GPL v3 license). 17 The COVID-19 category was used specifically to exclude reports processed through Pathway 1 to avoid redundancy.

A safety report was deemed to have a match for a particular concept category if any matching keywords were found in the safety report narrative. A single safety report could contain multiple concept categories. For each safety report, categories found were tallied; reports were then sorted in descending order of number of concept categories. We hypothesized that cases involving more concept categories were more likely to be high yield for human review.

Safety reports meeting the threshold of 6 or more flagged categories were manually reviewed by project managers to assess for presence of diagnostic error or delay and subcategorization using concept analysis. A threshold of 6 was chosen from a practical perspective, as it generated a cohort that was of reasonable size for our team to manually review.

Our working group, which consisted of two project managers, a clinician quality and safety expert, a clinician-informatician, as well as an external clinician expert leader on quality and safety, was tasked with design of the pathway and rigorous review of potential diagnostic error/delay cases using the Gandhi and Singh framework. Team members included both clinical (physicians, nurses) and non-clinical staff. Primary reviewers of cases were non-clinical project managers (MPHs) who were extensively trained in the Gandhi and Singh taxonomy at the start of the project. Additionally, reviewers were provided with an infographic with definitions and examples of each category. A report was classified as having a diagnostic error or delay when it met the definition of one of the 8 categories in the Gandhi and Singh taxonomy. When the safety report itself did not contain sufficient information for categorization, additional chart review in the electronic health record (EHR) was performed to find contextual details. Any reports where either the presence of diagnostic error/delay or the most appropriate classification was unclear during the project manager review was subsequently reviewed by a physician team member of the working group for a clinical assessment. Unclear cases were subsequently presented in a group setting by the physician team member who performed the secondary review for discussion until consensus was reached.

A total of 14,230 safety reports were included in the study period of March 1, 2020 to February 28, 2021. COVID-related diagnostic error or delay was identified in 95 (0.8%) reports, and 1,780 reports were tagged as COVID-related (Pathway 1). The remaining 12,450 were not explicitly COVID-related (Pathway 2). Manual report review was conducted on 1,890 safety reports (Pathway 1 = 1,780, Pathway 2 = 110).

Manual review of 1,780 COVID-explicit reports in Pathway 1 revealed 45 reports with diagnostic error/delay, a PPV of 2.5% (45/1524). Ten safety reports in Pathway 1 required additional chart review by a physician for classification. Safety reports explicitly mentioning COVID-19 peaked in April with 260 reports, before gradually declining to a steady average of around 100 reports per month ( Figure 2 ). Pathway 1 had highest yield in April and May 2020 with 10 diagnostic error/delay related safety reports, with subsequently months having fewer cases. Of the cases identified in Pathway 1, the most common error type was "strain" (n=16, 35 .6%), followed by "unintended" (n=9, 20.0%) (Figure 3 ).

Of the 12450 reports in Pathway 2, 110 safety reports were highlighted for manual review by the NLP-based tool using the threshold of 6 or more category flags. 15 of the 110 reports required additional content review including accessing the EHR for clinical context. 50 of the 110 reports were found to have a diagnostic error/delay with a PPV of 45.5 % (50/110). Pathway 2 yielded approximately an average of 4.2 cases per month (Figure 2 ). Of the 50 diagnostic errors/delays found in Pathway 2, the predominant type was "strain" (n = 47, 94.0%), with "unintended" and "delayed presentation" making up the remainder (Figure 3 ).

Due to the disproportionate representation of "strain" categorization in Pathway 2, our team opted to conduct additional qualitative content review of these safety reports. We identified reports had other miscellaneous causes unrelated to the above 4 categories ( Table 2) .

We found COVID-19 diagnostic errors accounted for 0.67% of all safety reporting volume during a one-year period during the pandemic (95 out of 14,230 safety reports). We developed two complementary pathways to enrich the cohort of reports for manual review: a manual review pathway for COVID-explicit reports (PPV 2.5%) and an NLP pre-screened manual review pathway for the remainder of reports (PPV 45.5%). Additionally, qualitative review of Pathway 2 safety reports revealed 3 major drivers and 1 minor COVID-specific driver that contributed to safety events: (major: supply versus demand imbalance, patient handoff, and care provider fatigue and burden; minor: COVID-19 status uncertainty). Safety reporting during the COVID-19 pandemic can serve as an important tool to recognize potential gaps that lead to diagnostic errors or delays in care, with the potential to serve as an early monitoring system.

As our project evolved, we quickly realized that the volume of safety reports made it impossible to manually review them all for COVID-related diagnostic errors. As such, we needed a way to extract potential reports of interest from the larger pool. We first developed a workflow reviewing only COVID-explicit reports. For the remainder, which constituted the majority of the reports, we developed a rapidly deployable method of processing large volumes of safety reports to identify potential diagnostic errors. By focusing on a simple, yet practical NLP approach that drove a logic-based ranking algorithm, we were able to better focus human resources to find COVID diagnostic errors (Pathway 2 PPV 45.5%). While the intent was to find diagnostic errors, we found that our algorithm logic was sensitive to strain-related safety events.

While a wide range of machine-learning-based NLP with various degrees of complexity have been used for safety reporting in the past, 18-21 we opted to use a simple heuristic keyword approach for the practicality of rapid deployment without the lengthy machine learning-specific validation needed in NLP approaches. We have successfully used the keyword heuristic approach in other informatics innovations at our institution. 16 Methods we developed may serve as potential resources that can be adapted and implemented in other health systems. We chose to use R programming language as it is freely available, has easily interpretable (non-compiled) code, and is widely used in the biostatistics and informatics community, lowering the barrier to entry. Additionally, by avoiding a machine learning NLP approach, retraining of an ML model for localization would not be necessary to deploy to other sites, a task that would likely require expertise and resources more commonly found only in large academic medical centers.

Detection of early signals of trends or systemic patterns is increasingly important in the age of large volumes of data. 22 Our approach is generalizable outside of the COVID-19 pandemic because the keywords used are COVID-agnostic. For the specific tool developed in Pathway 2, the sensitivity and specificity can be easily adjusted by changing the threshold of categories flagged. An advantage of using a logical framework as the underpinning of the tool is that it can easily be adapted to search for other types of safety reports. For example, if an institution was interested to have monitoring for signals related to COVID testing in safety reports, one could create a query to identify safety reports containing COVID-19 and testing concept categories. Additionally, new features could be built on top of this foundation, including more complex tasks such as aggregating data for summary; searching for more specific concepts for more in-depth secondary analysis for quality improvement, such as whether handoff errors were more common in specific locations; or feeding traditional and nontraditional dashboards, such as word clouds.

One limitation of our tool is that since manual review for diagnostic error was not done for all 14,230 safety reports, we are unable to estimate the sensitivity for all diagnostic errors using our approach. This is mainly because the tool was developed out of necessity as the project evolved and not planned a priori. As such, our project was not scoped to have the resources to conduct a rigorous evaluation of the tool. However, given that the tool was designed to generate a cohort that was practical for our study staff, we think that this approach is still useful in a real-world setting. Second, the lexicon we developed for keyword heuristic checking may not be allinclusive. Although the number of diagnostic errors found was low, the figure in our study was quite similar to the 0.7% pooled rate of errors for hospitalized patients in a recent meta-analysis. 4 Future work could include further refinement and expansion of the heuristic categories used in this project.

Overall safety report volume was similar in 2020 as compared to 2019, except for a notable decline from March to May. We saw a shift in the type of COVID-19 diagnostic errors as the pandemic went on. Initially, diagnostic error types varied, with strain, unintended, and anomalous, and chronic collateral categories being the most frequent (Figure 3 ). Over the first 4 months, we saw both a decline in diagnostic errors found in COVID-19 labeled reports and a decrease in non-strain-related diagnostic errors. By June 2020, most diagnostic errors were predominantly strain-related and were found using Pathway 2 ( Figure 2 ). Our observations reflect that certain types of diagnostic errors may have been more frequent early in the pandemic, when knowledge of the disease and its management was still evolving. Specifically, certain factors were more prominent at the start of the pandemic, including: (1) development, implementation, and repeated revisions of regulations, policies that impact care delivery [23] [24] [25] ; (2) scientific and epidemiologic knowledge gaps on novel infectious diseases 26 ; (3) need for education for both health care workers and patients 27 ; and (4) development of individual attitudes and emotional response toward the pandemic. 28, 29 Recognizing that different types of errors may occur in different stages of a pandemic or with a novel disease may be useful to assess future safety risks.

COVID-19 emphasized the need for real-time data mining for detection of early risk signals to target. 22, 30 By coupling safety reports with an NLP-based approach, human resources can be directed to safety events that may be indicative of larger systemic problems while the number of cases is still small. At our institution, safety reports drew attention to various drivers of safety events. Some of these drivers, such as communication challenges related to patient hand-off, have long been recognized as an important source of medical error. 31, 32 Others, such as supply versus demand imbalance, took on new meaning during periods of extreme clinical demands on the hospital system. From a capacity and surge response perspective, significant attention has been paid to considerations such as number of beds and durable equipment such as ventilators and PPE. [33] [34] [35] However, Pathway 2 flagged safety reports repeatedly highlighted cases of supply versus demand imbalance in less visible areas, such as patient transport and phlebotomy services.

Early recognition of these signals can help inform a more robust response to ensure additional resources are not overlooked in key low-visibility areas.

Clinician fatigue and burden is often a difficult phenomenon to identify. Diagnostic errors, especially certain categories such as strain, atypical, or anchor errors, 7 may suggest increasing clinician exhaustion. Finding a general signal of strain may be useful as an institutional barometer as well as for identifying systemic areas to reinforce, as an overburdened clinician is an ineffective safety net and a potential source of error. The use of an NLP-based tool to assess for diagnostic error and strain may provide critical information for health system leaders. Within our institution, we saw an increase in strain-related reporting several months after the start of the pandemic that persisted for the duration of the study period, regardless of the number of active COVID patients. The NLP-based tool can be applied to identify diagnostic errors using safety reporting data. Although our organization is not currently leveraging this approach due to limited bandwidth on our team, it is our goal to employ this approach in the near future.

During a one-year period in the COVID-19 pandemic, our organization developed a new safety report-based workflow to identify diagnostic errors and delays related to COVID. A strategy using two complementary approaches, including traditional reporting and a simple NLP and logic-based algorithm, was effective in discovering diagnostic errors and could be a useful early signal detector for trends in safety reports. This strategy significantly reduced the number of manual reviews needed to find a true diagnostic error and can be readily adapted and applied to other settings and situations. All 8 categories of diagnostic errors previously described by Gandhi and Singh were found at our institution, highlighting the need to address each of them through multifaceted interventions. Adapted with permission from Gandhi and Singh, 2020. 7 Examples are taken directly from real cases identified from our institution as part of this project. 

A total of 14,230 safety reports were filed between March 1, 2020 and February 28, 2021. These were processed through two pathways. Pathway 1 (1,780 reports) contained all reports with explicit mention of COVID-19, whereas Pathway 2 (12,450 reports) utilized automated natural language processing to highlight specific cases for manual review. Manual review was performed for 1,780 reports in Pathway 1 and 110 safety reports in Pathway 2. A total of 95 cases of diagnostic error or delay were identified.

Panel A: total hospital census (ambulatory and inpatient). Panel B: safety report volume. Panel C: Volume for Pathway 1 (COVID-tagged safety reports), Pathway 2 (natural language processing-based), and diagnostic errors or delays found using each pathway. Safety reporting volume roughly mirrored total hospital volume. Pathway 1 volume peaked in April 2020 and declined by mid-summer. Pathway 2 identified cases increased in the summer and remained steady thereafter. Important hospital policy events during this period include deferment of non-urgent care (3/13/2021), resumption of non-urgent care (7/20/2021), and second deferment of non-urgent care (12/26/2020).

COVID-tagged safety reports (Pathway 1) had all types of errors represented as compared to natural language processingbased reports (Pathway 2). Pathway 2 was most sensitive for detecting strain-type diagnostic errors.

Learning from patients' experiences related to diagnostic errors is essential for progress in patient safety

Measures to Improve Diagnostic Safety in Clinical Practice

The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving US adult populations

Prevalence of harmful diagnostic errors in hospitalised adults: a systematic review and meta-analysis

The Effect of COVID-19 on Interventional Pain Management Practices: A Physician Burnout Survey

Contributing factors to personal protective equipment shortages during the COVID-19 pandemic

Reducing the risk of diagnostic error in the COVID-19 era

Lessons from Operations Management to Combat the COVID-19 Pandemic

Fighting a common enemy: a catalyst to close intractable safety gaps

Increasing physician reporting of diagnostic learning opportunities

Using voluntary reports from physicians to learn from diagnostic errors in emergency medicine

Measuring errors and adverse events in health care

Error reporting and disclosure systems: Views from hospital leaders

The incident reporting system does not detect adverse drug events: a problem for quality improvement

Rapid-Cycle Improvement During the COVID-19 Pandemic: Using Safety Reports to Inform Incident Command

Clinical decision support system, using expert consensus-derived logic and natural language processing, decreased sedation-type order errors for patients undergoing endoscopy

Applications of network analysis to routinely collected health care data: A systematic review

Status of text-mining techniques applied to biomedical text

Automatic analysis of critical incident reports: Requirements and use cases

A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis

COVID-19 and Patient Safety: Time to Tap Into Our Investment in High Reliability

Regional Strategies for Academic Health Centers to Support Primary Care During the COVID-19 Pandemic: A Plea From the Front Lines

Staying connected in the COVID-19 pandemic: Telehealth at the largest safety-net system in the United States

Labor and Delivery Visitor Policies during the COVID-19 Pandemic: Balancing Risks and Benefits

An Imperative Need for Research on the Role of Environmental Factors in Transmission of Novel Coronavirus (COVID-19)

Departmental Experience and Lessons Learned With Accelerated Introduction of Telemedicine During the COVID-19 Crisis

suffering among COVID-19 patients isolated in the ICU and their families

Psychological effects caused by the COVID-19 pandemic in health professionals: A systematic review with meta-analysis

Fighting a common enemy: a catalyst to close intractable safety gaps

Impact of the communication and patient hand-off tool SBAR on patient safety: A systematic review

Shift-to-Shift Handoff Effects on Patient Safety and Outcomes: A Systematic Review

Hospital preparedness for COVID-19: A practical guide from a critical care perspective

Hospital surge capacity in a tertiary emergency referral centre during the COVID-19 outbreak in Italy

Locally Informed Simulation to Predict Hospital Capacity Needs During the COVID-19 Pandemic