key: cord-0751561-jv91qwjg authors: Borthakur, Arijitt; Barbosa, Eduardo M.; Katz, Sharyn; Knollmann, Friedrich D.; Kahn, Charles E.; Schnall, Mitchell D.; Litt, Harold title: Radiology Extenders: Impact on Throughput and Accuracy for Routine Chest Radiographs date: 2020-10-13 journal: J Am Coll Radiol DOI: 10.1016/j.jacr.2020.09.044 sha: 83c0956f5410544f3dfbf817dd4b0539ed341613 doc_id: 751561 cord_uid: jv91qwjg Interpretation of chest radiographs, although clinically highly impactful, entails a disproportionate amount of work relative to low reimbursement for these studies. Increasing volumes of chest radiographs in the thoracic imaging division of an academic radiology practice led to incorporation of two radiology extenders (REs) to draft reports of chest radiographs. We evaluated the difference in productivity and accuracy of the reports drafted by REs versus radiology trainees. Impact on throughput was analyzed by measuring flow rates (number of radiograph interpretations finalized per hour) for four subspecialist attending thoracic radiologists under three conditions: independent interpretation, reviewing RE-drafted cases, and reviewing resident-drafted cases. Improvements were calculated as change in flow rates for the latter two conditions compared with an independent interpretation. Accuracy of RE-drafted reports was compared with that of junior residents by evaluating their draft reports for the same chest radiographs (n = 49). A blinded judging panel of three attending radiologists scored these reports using the 3-point RADPEER scoring system; the original report dictated by an attending radiologist served as the reference standard. RADPEER scores were compared between the REs and residents. Flow rates improved significantly (P < .05) by 52% for RE-drafted cases versus a small improvement (17%) for resident-drafted cases compared with independently completed cases. RE-drafted cases generated 36% greater efficiency for attending radiologists compared with resident-drafted cases (P < .05). There was no significant difference in accuracy scores between RE-drafted and resident-drafted reports. RE-drafted reports were finalized more rapidly by attending radiologists with insignificant differences in discrepancy rates compared with resident-drafted reports. Thoracic radiographs are performed in patients in the intensive care unit primarily to identify placement of support tubes, catheters, and other devices. Although highly impactful toward patient care, interpreting these radiographs entails a disproportionate amount of work (eg, retrieving patient history, completing standard dictation templates, and ensuring proper communication of important findings before finalization of reports). Given low reimbursement rates for these studies, economic necessities push radiologists to provide faster interpretations, contributing to burnout. Secondly, tasks that are characterized by low complexity and variability result in mundane work and a low sense of personal accomplishment, thus reducing job satisfaction for attending radiologists. Although both high throughput and repetitive tasks lead to underperformance in any occupation [1] , including medicine [2, 3] , it has been particularly of concern in the field of radiology [4] [5] [6] . We previously described method to improve radiologists' productivity by using a radiology extender (RE) in a musculoskeletal radiology section [7] . That experience motivated a trial of this concept in the department's cardiothoracic imaging division at our academic medical institute, which reports over 2,500 radiographs per week. The first experiment was to determine the impact of RE on throughput by measuring the time taken by radiology attending radiologists to finalize examinations. The second experiment compared the accuracy of RE-drafted reports to those from residents by comparing their dictation texts for the same chest radiographs. Double reading of imaging examinations is an accepted method of assessing interpretation quality [8, 9] . In this approach, examinations are submitted to different reader(s) after the initial reader and are reinterpreted with the same clinical information and prior examinations that were available at the time of the initial interpretation by an attending radiologist. A third set of readers then compared both RE and resident-drafted reports and scored any discrepancies using the RADPEER system [10, 11] . Major discrepancy rates were obtained by quantifying the highest score (3b) across REs and residents. Both REs were recruited from an experienced pool of radiologic technologists through a formal interview process. Interview candidates responded to an advertisement circulated internally in our hospital system. Interviews were conducted over a period of a month by two radiology faculty, a clinical administration leader, and an operational business advisor. Nearly all candidates exhibited desirable characteristics such as radiographic technical knowledge, intellectual curiosity, and ability to decide under ambiguity. However, the two best candidates additionally had concrete evidence of performing beyond their current job requirements in completing tasks. Both REs were trained over a period of 2 months to interpret one-or two-view chest radiographs by a senior thoracic imaging radiologist with over 30 years of experience as a faculty member. Commensurate with previous RE training efforts [7] , these included one-on-one review of cases predrafted by the trainee and directed reading in both an online course designed to train radiographers in the UK to interpret images (http://eintegrity.org, Ware, United Kingdom), as well as standard radiologic texts. Finally, the REs were trained to dictate using standardized templates in our hospital's clinical reading rooms using standard reporting technology (IDS7 PACS, Sectra Inc, Shelton, Connecticut; PowerScribe 360 reporting system, Nuance, Burlington, Massachusetts). The REs were deployed to the chest radiology section and drafted on average 130 to 160 cases each per day for a 6month period. This project was undertaken as a quality improvement initiative and thus was exempt from institutional review board review. Four board-certified subspecialty-trained thoracic radiologists with at least 10 years of experience as faculty volunteered for this study. All radiographs were read in one of our hospital's standard clinical reading rooms using the same reporting system utilized for training the REs. Examinations types were restricted to those routinely processed by the REs as part of their normal workflow. These consisted of chest one-view anterior-posterior or two-view lateral and posterior-anterior radiographs corresponding to Current Procedural Terminology codes 71010 or 71020, respectively. Examinations were not controlled for case variety or complexity, to replicate the realistic clinical scenario of the daily workflow. A single observer (A.B.) manually measured the time from examination opening in PACS to final signing in three categories: independently interpreted by the attending radiologist, REdrafted reports, or resident-drafted reports. To reduce variability due to repeated stop-start events, the attending radiologists read out blocks of 5 to 30 or more cases under each category during each timing episode without interruption. These experiments were performed over several days to accumulate at least 50 cases for each category for each attending radiologist. Flow rate was defined as the number of radiographic examinations finalized per hour. A sample of 49 attending radiologist-only finalized examination reports with Current Procedural Terminology codes 71010 and 71020 were extracted from the radiology database mPower (Nuance Communications) to serve as the reference standard. Only accession numbers, the unique record number in the local Radiological Information System for each diagnostic imaging examination, were supplied to the reporters who proceeded to dictate new reports using standardized templates without viewing the original attending radiologist's report. These reports were copied into a Word (Microsoft Corp, Redmond, Washington) document and securely e-mailed to a single researcher (A.B.) for preprocessing, which involved automatically extracting report elements using custom-built code in Python 3.6 programming language written in a Jupyter notebook (jupyter.org), an open-source web application interface. All 49 extracted examination reports were stored on a scoring template created using Excel (Microsoft Corp) and saved using a coded file name to blind the reporter's identity. Reporter's scoring template files were sent to a judging panel consisting of three attending radiologists with over 10 years of specialist thoracic imaging experience each. Judges were also provided the original attending radiologist reports to use as a gold standard. Comparisons were restricted to text of the reports and no images were supplied. Diagnostic discrepancies were classified according to ACR's 2016 RADPEER scoring system [11] . This scoring system determines the significance of a discrepancy based on a 3point scale: 1 ¼ concur with interpretation, 2 ¼ discrepancy in interpretation or not ordinarily expected to be made (understandable miss), and 3 ¼ discrepancy in interpretation or should be made most of the time. Modifiers for scores 2 and 3 include a ¼ unlikely to be clinically significant and b ¼ likely to be clinically significant, resulting in five possible scores. RADPEER scores of 3b were deemed a major discrepancy and their count was recorded in the Excel spreadsheet for both REs and residents. Statistical analyses were performed in JMP Pro 15 (SAS, Cary, North Carolina) and results graphed in Excel. A P value of <.05 was assumed to be statistically significant for all hypothesis testing. For the flow rate measurements, average read time, defined as the time taken between opening the first case in a queue until finalization of the last case, was recorded within a fraction of a minute. Analysis of variance was performed on flow rate measurements with three-level categorical dependent variables (ie, independently interpreted, resident, and RE), resulting in a t test with pooled standard error. An ordered logit model [12] was used to regress ordinal RADPEER scores against the reporter category (ie, RE or resident) and accession number of the 49 cases to account for case mix. The model fits the cumulative response probabilities to the logistic function of a linear model using maximum likelihood. The value of the likelihood ratio c 2 statistic and associated P value were recorded as the figure of merit for the corresponding effect. A low value of the statistic means that the null hypothesis should not be rejected. The REs at our practice were formally recruited from a pool of x-ray radiologic technologists (rad techs or RTs), whose qualifications vary by state but must have at least an associate's degree or a technical diploma. Both of our REs have bachelor's degrees in addition to being licensed radiologic technologists for at least 6 years. One studied art history in college and served as a licensed medic, and another has an undergraduate degree in medical imaging, completed her RT training at our institute, and served as the lead inpatient tech for the preceding 2 years. Flow rate for the three categories of examinations finalized by the four attending radiologists that participated in the timing study are shown in Figure 1 . RE-drafted cases were finalized more rapidly compared with resident-drafted cases as well as cases completed by the attending radiologists themselves. On average, the four attending radiologists finalized 93.6 cases per hour when reviewing RE-drafted cases, 72.6 cases per hour when reviewing resident drafted cases, and 62.4 cases per hour performing independently, with a pooled standard error of AE 10.2 cases per hour. Consequently, flow rates improved significantly (P < .05) by 52% for RE-drafted cases compared with a smaller improvement of 17% for resident-drafted cases, which was within the error variance of flow rate of independently completed cases. RE-drafted cases were also finalized 36% (P < .05) faster by the attending radiologists compared with resident-drafted cases, suggesting perhaps that more edits were necessary on resident compared with RE reports. The distribution of RADPEER scores across the reporters (Fig. 2 ) revealed no significant difference in scores between RE-drafted and resident-drafted reports. The major discrepancy rate amounted to less than 2% of all cases and generated only by the REs. Modeling scores with reporter type and examination case mix (ie, accession number) showed a 100-fold greater contribution by the latter variable to the likelihood-ratio c 2 ( Table 1) , regardless of whether the reporter was a resident or an RE. Combining these results suggested that the variability in scores was significantly (P < .05) related to case mix and indifferent to reporter type. The results of this project revealed baseline values for flow rate improvements, a key performance indicator for gauging Flow rate, defined as the number of radiographic examinations finalized per hour, for the three categories of examinations finalized by the four attending radiologists that participated in the timing study. Radiology extenderdrafted cases were finalized more rapidly compared with resident-drafted cases or cases completed by the attending radiologists independently. Note: Error bars indicate AE95% confidence intervals using a pooled estimate of error variance. efficiency gains from incorporating an RE in practice. Previous work at our institute that graded trainee reports demonstrated that the rate of discrepancies as measured by major changes during attending radiologist review varied from 1.5% to 4% among individual residents, modalities, and level of resident training [13, 14] . Hence the discrepancy rate of the REs is within that expected of trainees. Major discrepancy rates, although low and consistent with values measured previously at our institution, still merit a deeper evaluation to detect patterns of mischaracterizations or bias by the reporters. Measured values will be leveraged to optimize the deployment of REs into the reading rooms to synchronize their work schedules with fluctuating radiography volumes during the course of a day or a week. Furthermore, the variability in the efficiency gains across radiologists could be utilized to redesign the processes of allocating cases into each of their queues. The RE job role is controversial among some who consider REs as a cost-effective replacement for radiology trainees. We respectfully disagree with their presumption. The incorporation of REs had been a strategic decision to address recent mergers and acquisitions that have resulted in our radiology service providing care in multiple affiliated hospitals and diagnostic imaging centers located across our region and growth within those facilities, performing over a million cases annually. As with other academic hospitals, we have relied on trainees to share the burden of interpreting studies at lower cost; however, their supply has not kept pace with the increasing volume of work. Regardless, as an academic medical center we hold training the next generation of radiology leaders at a higher priority than any revenue improving or cost control initiative. We are confident that the RE job function makes more business sense if REs are deployed in a strategic and meaningful manner. Physician wellness is acknowledged as an essential goal in enhancing the safety and quality of health care [15] . Q 5 Although we are in the process of surveying radiologists' satisfaction with the new workflow, concerns in response to our earlier work on the role of REs included the suggestion that reallocation of radiologists' tasks would devalue their work. We contend that the diagnostic responsibility remains with the radiologist in the REenhanced workflow while some of the technical tasks such as handling the computer software for speech recognition and text editing are taken over by support staff. Our model does not reassign the intellectual contribution of the radiologist, but rather improves their professional stance by relieving them of some of the information technology tasks in a manner that fits the modern, fast-paced health care arena better than other methods such as tape-recording dictation models. We continue to manage the process improvement change proactively by tracking success with performance indicators such as average daily volumes by each RE. Over a typical month, each RE predrafts upwards of 160 cases per day, which at the activity time difference measured would free up 51 min of a radiologist's time per day for higher value work. Our experience in transforming our radiology practice required effective change management approaches to clearly articulate the business need and projected impact clearly. We relied on data from a variety of internal systems such as portable x-ray, radiologist, and resident schedules to optimize the work hours for the REs and on external sources such as online courses for training. Although the current REs at our practice are licensed x-ray technicians with bachelor's degrees, as the role evolves and is adopted more Distribution of RADPEER scores from radiology extenders (gray bar) and residents (white bar). Score of 3b is considered a major discrepancy in this study. These results suggested that the variability in RADPEER scores was significantly (P < .05) related to case mix (ie, examination accession number) and indifferent to reporter type. DF ¼ degrees of freedom, L-R c 2 ¼ likelihood ratio c 2 statistic. widely, we foresee additional licensing and certification requirements, most likely on a state-by-state basis. A limitation of our approach was that an ideal quality evaluation should also assess the accuracy of the initial report, independently verifying those findings with operative findings, pathology reports, or a consensus interpretation by experts. This would lead to a better understanding of the discrepancies observed in the current study and will be pursued in a subsequent study. Another limitation for the timing experiments was that the differences in types of cases was not controlled as attending radiologists were timed finalizing cases each morning as part of their routine clinical workflow. There may also be a "Hawthorne effect," sometimes described as the increase in human productivity when people are observed and measured in real time [16] , and may be different for each radiologist. Additionally, although these interpretations were performed as part of the radiologists' clinical work, they were completed and timed without interruption, which may not reflect a typical routine. As their medical colleagues have become more comfortable with their services, the REs have taken on additional responsibilities in the reading room. A frequent complaint of radiologists is the many daily interruptions that impede one's ability to sustain a high rate of examination interpretations. Fishman et al [5] describe these interruptions to include telephone calls with referring physicians to communicate important results and working with technologists regarding protocoling or image acquisition. A recent publication from our institution on deploying a "call triage assistant" demonstrated improved workflow efficiency and reduced resident stress on call [17] . At our practice, the REs are routinely becoming the first to answer telephone calls and communicate previously reported findings to referring physicians, routing more difficult questions to senior house staff or attending radiologists as needed. Finally, advances in artificial intelligence and machine learning techniques, particularly deep learning algorithms, have demonstrated significant success in image recognition [18] . In thoracic imaging, these algorithms have been able to perform a variety of tasks such as automatically identifying and categorizing benign or malignant lung nodules in CT images, predicting long-term mortality from chest radiographs [19] , and distinguishing COVID-19 from other types of pneumonia from CT images [20] . As artificial intelligence becomes more integral to patient care in radiology, radiologists who have leveraged its potential will likely become indispensable to the field [21] . We foresee the RE role evolving in conjunction to serve as a data curator, leveraging their technical expertise to assess image quality and guide reacquisition and performing hyperparameter tuning during model training to accelerate the adoption of machine learning algorithms in radiology workflow. -Deployment of REs in the thoracic imaging section of our academic radiology practice improved efficiency of chest radiography. -RE-drafted reports were finalized more rapidly than resident-drafted reports by attending radiologists with insignificant differences in interpretation discrepancy rates. Borthakur et al n Radiology Extenders 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 A new strategy for job enrichment Estimating the attributable cost of physician burnout in the United States Changes in burnout and satisfaction with work-life integration in physicians and the general US working population between A call to action-our radiology chairs are burning out The road to wellness: engagement strategies to help radiologists achieve joy at work Addressing burnout in radiologists Improving performance by using a radiology extender Radiology double reads Added value of double reading in diagnostic radiology, a systematic review Getting the most out of RADPEER ACR RADPEER Committee white paper with 2016 updates: revised scoring system, new classifications, self-review, and subspecialized reports Regression models for ordinal data Identifying benchmarks for discrepancy rates in preliminary interpretations provided by radiology trainees at an academic institution Orion: a web-based application designed to monitor resident and fellow performance on-call The relationship between physician burnout and quality of healthcare in terms of safety and acceptability: a systematic review Radiologist productivity increases with real-time monitoring: the Hawthorne Effect Evaluating the impact of a call triage assistant on resident efficiency, errors, and stress Current applications and future impact of machine learning in radiology Deep learning to assess long-term mortality from chest radiographs Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy Will machine learning end the viability of radiology as a thriving medical specialty?