The Elusive and Illusive Quest for Diagnostic Safety Metrics The Elusive and Illusive Quest for Diagnostic Safety Metrics Gordon D. Schiff, MD1,2 and Elise L. Ruan, MD, MPH3 1Harvard Medical School Center for Primary Care, Boston, MA, USA; 2Brigham and Womens Hospital Center for Patient Safety Research and Practice, Boston, MA, USA; 3Tufts University School of Medicine, Boston, MA, USA. J Gen Intern Med 33(7):983–5 DOI: 10.1007/s11606-018-4454-2 © Society of General Internal Medicine 2018 Not everything that counts can be counted, and not everything that can be counted counts. Variously attributed to Albert Einstein, William Bruce Cam- eron, Lord Platt, and others1 Can’t improve what you can’t measure? Nonsense. Over the decades my relationship with my wife has continuously improved. But I’ve never administered a survey to her, nor tracked metrics of our relationship. Not only was this not needed for improvement, but likely would have been detrimental and disrespectful. Don Berwick speaking at Institute for Healthcare Improve- ment Forum2 Diagnosis errors have come out of the periphery of the patient safety movement. With the publication of 2015 Na- tional Academy of Medicine Report Improving Diagnosis in Health Care and recent reports suggesting diagnostic errors are the leading types of errors reported by patients and a top reason they file malpractice suits, diagnostic errors are finally gaining the Brespect^ they warrant.3,4,5 In the current issue of JGIM, three leading voices in the movement to improve diagnosis propose a framework that they argue will help advance metrics of diagnostic performance within and across health care systems as well as enable researchers and systems to determine the impact of improvement interventions.6 The goal of developing and reporting standardized mea- sures related to diagnostic safety has been an elusive one. Despite the urging of multiple organizations and advocates, crafting measures for diagnosis quality has not proven to be simple.7 But how will we know if we are making progress and how can we hold organizations and clinicians accountable without some objective measures? To overcome a host of past difficulties in creating such metrics, the authors propose a framework with seven criteria for designing measures of BUndesirable Diagnostic Events^ (UDE’s). They suggest six diagnoses as logical candidates for places to start. (Olson, Table 1). One clue that this may not be so simple is the fact that in their article, Olsen et al. mention twice that number of diagnoses as examples that would not lend themselves to the UDE measurement framework, includ- ing herpes zoster, pneumothorax, adult onset Stills, amyloid, Alzheimer’s, depression, spinal metastasis, mitochondrial dis- orders, bacterial overgrowth, adrenal insufficiency, and certain psychiatric conditions.6 Perhaps just by sheer coincidence, one of us (GS) has personally had two of these (zoster, pneumo- thorax) misdiagnosed by skilled physicians (in addition to initially self-misdiagnosing). Thus this list is revealing not only because it suggests several personally experienced diag- nostic failures would be outside the purview of the UDE framework, but we suspect that applying their criteria strictly for the type Bnever-event^ UDE’s they advocate would ex- clude most of the diagnostic errors and problems in the diag- nostic process that are occurring in healthcare today. Let us examine just one of the diagnoses they suggest would be a good candidate, tuberculosis. TB is indeed important, being highly prevalent worldwide, as well as an important diagnosis not to miss or delay. Consider the consequences of overlooking a hospitalized patient with active pulmonary TB, both in terms of exposure of other patients and health workers, as well as failure or delay in treating a seriously ill patient with lifesaving medications. So how should we go about designing the proposed UDE performance metric? Should we only mea- sure pulmonary TB? If we did, we would be excluding TB meningitis, miliary TB, and renal and spinal TB—all serious diagnoses not to miss. What about active vs. latent TB which is often difficult to differentiate. What about the false negative and false positive rates of various TB sputum, skin, and blood tests—how should we factor this in when evaluating care quality? The authors mention the finding of TB on autopsy would be the gold standard basis for this metric. However, autopsies are rarely performed in the USA, and subject to serious selection bias that would markedly limit the utility of finding missed TB as an accurate and fair diagnostic perfor- mance measure. Finally, one would hope we do not have to wait for a patient to die to uncover diagnostic improvement opportunities. The purpose of raising these questions is not to nitpick or deny that smarter minds can overcome some of these technical challenges in crafting a TB, or other diagnostic performance measures. Rather it is to raise more fundamental questions that those of us in the diagnostic safety movement as well asPublished online April 30, 2018 983 http://crossmark.crossref.org/dialog/?doi=10.1007/s11606-018-4454-2&domain=pdf clinicians and patients need consider. How will these metrics help us move forward, and importantly how will they posi- tively engage clinicians to achieve the goal of more reliable and timely diagnosis? OUTSIDE IN VS. INSIDE OUT MEASUREMENT Quality guru W. Edwards Deming is reported to have said Bwhen I see workers measuring themselves, I see quality.^ In this profound statement, he was both downplaying the value of external measurement as well as extolling the importance of motivated, empowered workers trained in self-measurement skills (e.g., using statistical process control (SPC) charts to differentiate Bspecial cause^ (special circumstances, unexpect- ed outlier defects) from Bcommon cause^ variation (random variation that is part of the system) taking the initiative to examine and improve their own quality. There is an even deeper significance to this concept of ensuring quality by creating a culture where front line staff, rather than external inspection or metrics, are the key to safe diagnosis. To explain, consider an analogy to the modern day approaches to ensuring medication quality. For decades, US Pharmacopeia (USP), the official certi- fier of chemical quality for drug products marketed in the USA, relied primarily on sophisticated laboratory methods for inspecting drug product samples produced and submit- ted by each manufacturer. Using established laboratory techniques such as chromatography, USP scientists would check the purity and strength of the ingredients of these samples against reference standards to ensure they conformed to the strict standards that had been established for that drug entity. However, increasingly this inspection approach to medication quality has been displaced by a very different approach—continuous process verification, whereby continuous assurance is available to detect any unplanned departures and allow manufacturers to identify and adjust for them, thus helping prevent product failures. USP standards now provide precise formulas and prepara- tion guidelines, along with pure reference samples for test- ing, so that drugs can be made consistently, every time. A similar approach is the basis for ISO (International Organiza- tion for Standardization) good manufacturing standards which Bprovide requirements, specifications, guidelines or character- istics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose.^ In short, quality, safety, and efficacy are designed/built into the product, not Binspected-in^ after the fact. Perhaps we need to apply these state-of-the-art manufactur- ing constructs to improving diagnosis. What would be the steps to specify Bwell-made^ diagnoses? For example, we should certainly want to ensure every test result was returned, acknowledged, and acted on as well as communicated to the patient by the ordering clinician. Although this processes standard would appear to be a relatively straightforward one (at least compared to other more complex aspects of diagno- sis), we now know that both specifying and ensuring such a reliable closed loop system is not a trivial feat. Under what conditions should diagnoses be produced (re- ally co-produced) with patients? Given what we know about the importance of safety culture in general, could we specify (and even measure) organizational expectations, processes, and conditions for assuring quality diagnoses and learning from errors? (Text Box 1) Compare an organization that relied on publicly reporting of its Bnever event^ numbers for the six proposed UDE diagnoses vs. one that was hard-wired with meaningful process and learning culture and safeguards em- bodied in our diagnosis culture framework? In which organization/system would you feel safer and more confident that the safeguards were in place for reliable timely diagnosis? In which system would you rather be a patient? And in which would you rather work?8 Text Box 1. Culture of diagnostic safety and improvement • Replacing blame and fear with learning and improvement, so no one is afraid to ask questions, question a diagnosis, or transparently share when things go wrong • Commitment to improving diagnosis, learning from delays and diagnostic process errors - Organizational recognition that misdiagnosis is #1 cause of patient-reported errors - Comprehensive reporting, appreciative investigation of adverse events - Relentless curiosity/worry/conferencing: what we might be missing, what can go wrong in system - Attention to details of diagnostic process and what can go wrong, awareness of limitations of tests • Recognition that uncertainty is inherent in diagnoses, tests, illness presentation, and evolution; anticipation of common pitfalls - Situational awareness of local, disease-specific, literature-reported vulnerabilities and pitfalls - Hard-wired, proactive, reliable follow-up safety nets, and feedback systems to detect and protect - Conservative approaches to testing and imaging, enabled by shared decision-making and reliable follow-up • Respect for human limitations and need for cognitive, process support - Decreased reliance on human memory, minimizing negative effects of stress, fatigue, fear, appreciating risks of multitasking - Redesign of EMR and communication systems to support cognition, collaborative diagnosis, follow-up • Enhanced role for patients in co-producing diagnosis - Working collaboratively to formulate history, diagnosis, monitor course, and raise and research questions We now know that so-called quality reporting (particularly those more market-oriented approaches to encourage patients to shop around for quality or financially incentivize (or pun- ish) institutions based on their performance) is vulnerable to a myriad of issues including problems with measurement, case mix adjustment, incentives to game measures to make perfor- mance look better than it is, neglect of areas (in this case different diagnoses) not covered, clinician cynicism and skep- ticism with Bbox-ticking^ and ill-informed second guessing, and time and resources required to collect (often manually) data for pubic reporting of dubious proven value that present an incomplete picture of clinicians’ diagnostic work.9,10,11 These two approaches-metrics vs. culture– of course are not mutually exclusive requiring us to make an either/or choice. 984 Schiff and Ruan: The Elusive and Illusive Quest for Diagnostic Safety Metrics JGIM And in many ways, narrower outcome metrics and culture could work together in a complementary fashion. However, before we go down this Bmetrics^ road, we need to critically weigh what such measurements will and will not bring to improving diagnosis. Perhaps focusing more closely on collaborative learning from the stories and details of actual cases of diagnostic error can be a more powerful lever for accountably and improvement than bar graphs or pie charts.12, 13 The success of the #MeToo movement in exposing and limit- ing sexual misconduct demonstrates the power of impact over metric. Qualitatively understanding the plethora of diagnostic errors locally and across institutions can help us build the situational awareness and safety nets we need for better diagnostic conduct. Acknowledgements: The authors acknowledges support for re- search in diagnostic error and improvement from CRICO (Harvard affiliated organizations malpractice insurer) and the Gordon and Betty Moore Foundation. Corresponding Author: Gordon D. Schiff, MD; Harvard Medical School Center for Primary Care, Boston, MA, USA (e-mail: gschiff@bwh.harvard.edu). Compliance with Ethical Standards: Conflict of Interest: The authors have no financial conflicts. REFERENCES 1. Investigator Q. Not Everything That Counts Can Be Counted. 2010; https://quoteinvestigator.com/2010/05/26/everything-counts-ein- stein/#note-455-9. Accessed 1/15/20118, 2018. 2. Berwick DM. Escape fire: designs for the future of health care. John Wiley & Sons; 2010. 3. Balogh E, Miller BT, Ball J. Improving diagnosis in health care. National Academies Press; 2015. 4. Schiff GD, Puopolo AL, Huben-Kearney A, et al. Primary care closed claims experience of Massachusetts malpractice insurers. JAMA internal medicine. 2013;173(22):2063–2068. 5. The Public's Views on Medical Error in Massachusetts. https://cdn1. sph.harvard.edu/wp-content/uploads/sites/94/2014/12/MA-Patient- Safety-Report-HORP.pdf. 6. Olson A, Graber, Mark, Singh, H. Tracking Progress in Improving Diagnosis: A Framework for Defining Undesirable Diagnostic Events JGIM 2018. SPI 4340. 7. National Quality Forum. Improving Diagnostic Quality and Safety Draft Measurement Framework. Washington DC: National Quality Forum 2017 8. Berwick DM. Continuous improvement as an ideal in health care. In: Mass Medical Soc; 1989. 9. Greener I, Harrington B, Hunter D, Mannion R, Powell M. A realistic review of clinico-managerial relationships in the NHS: 1991-2010. National Institute for Health Research, Service Delivery & Organisation programme. 2011. 10. Himmelstein D, Woolhandler S. Quality improvement:BBecome good at cheating and you never need to become good at anything else.^. Health Affairs Blog. 2015. 11. Casalino LP, Nicholson S, Gans DN, et al. What does it cost physician practices to interact with health insurance plans? Health Affairs. 2009;28(4):w533-w543. 12. Hoff, Timothy J. Next in Line: Lowered Care Expectations in the Age of Retail-and Value-based Health. Oxford University Press, 2017. 13. Berwick DM. The stories beneath. Medical care. 2007;45(12):1123–1125. 985Schiff and Ruan: The Elusive and Illusive Quest for Diagnostic Safety MetricsJGIM http://dx.doi.org/https://quoteinvestigator.com/2010/05/26/everything-counts-einstein/#note-455-9 http://dx.doi.org/https://quoteinvestigator.com/2010/05/26/everything-counts-einstein/#note-455-9 http://dx.doi.org/https://cdn1.sph.harvard.edu/wp-content/uploads/sites/94/2014/12/MA-Patient-Safety-Report-HORP.pdf http://dx.doi.org/https://cdn1.sph.harvard.edu/wp-content/uploads/sites/94/2014/12/MA-Patient-Safety-Report-HORP.pdf http://dx.doi.org/https://cdn1.sph.harvard.edu/wp-content/uploads/sites/94/2014/12/MA-Patient-Safety-Report-HORP.pdf The Elusive and Illusive Quest for Diagnostic Safety Metrics OUTSIDE IN VS. INSIDE OUT MEASUREMENT References