key: cord-0734636-dw7c23du
authors: Foster, Madison; Presseau, Justin; McCleary, Nicola; Carroll, Kelly; McIntyre, Lauralyn; Hutton, Brian; Brehaut, Jamie
title: Audit and feedback to improve laboratory test and transfusion ordering in critical care: a systematic review
date: 2020-06-19
journal: Implement Sci
DOI: 10.1186/s13012-020-00981-5
sha: 5e75896574b66c4a0eb3e3cf61a1260457631343
doc_id: 734636
cord_uid: dw7c23du

BACKGROUND: Laboratory tests and transfusions are sometimes ordered inappropriately, particularly in the critical care setting, which sees frequent use of both. Audit and Feedback (A&F) is a potentially useful intervention for modifying healthcare provider behaviors, but its application to the complex, team-based environment of critical care is not well understood. We conducted a systematic review of the literature on A&F interventions for improving test or transfusion ordering in the critical care setting. METHODS: Five databases, two registries, and the bibliographies of relevant articles were searched. We included critical care studies that assessed the use of A&F targeting healthcare provider behaviors, alone or in combination with other interventions to improve test and transfusion ordering, as compared to historical practice, no intervention, or another healthcare behaviour change intervention. Studies were included only if they reported laboratory test or transfusion orders, or the appropriateness of orders, as outcomes. There were no restrictions based on study design, date of publication, or follow-up time. Intervention characteristics and absolute differences in outcomes were summarized. The quality of individual studies was assessed using a modified version of the Effective Practice and Organisation of Care Cochrane Review Group’s criteria. RESULTS: We identified 16 studies, including 13 uncontrolled before-after studies, one randomized controlled trial, one controlled before-after study, and one controlled clinical trial (quasi-experimental). These studies described 17 interventions, mostly (88%) multifaceted interventions with an A&F component. Feedback was most often provided in a written format only (41%), more than once (53%), and most often only provided data aggregated to the group-level (41%). Most studies saw a change in the hypothesized direction, but not all studies provided statistical analyses to formally test improvement. Overall study quality was low, with studies often lacking a concurrent control group. CONCLUSIONS: Our review summarizes characteristics of A&F interventions implemented in the critical care context, points to some mechanisms by which A&F might be made more effective in this setting, and provides an overview of how the appropriateness of orders was reported. Our findings suggest that A&F can be effective in the context of critical care; however, further research is required to characterize approaches that optimize the effectiveness in this setting alongside more rigorous evaluation methods. TRIAL REGISTRATION: PROSPERO CRD42016051941.

Conclusions: Our review summarizes characteristics of A&F interventions implemented in the critical care context, points to some mechanisms by which A&F might be made more effective in this setting, and provides an overview of how the appropriateness of orders was reported. Our findings suggest that A&F can be effective in the context of critical care; however, further research is required to characterize approaches that optimize the effectiveness in this setting alongside more rigorous evaluation methods. Trial registration: PROSPERO CRD42016051941.

Keywords: Audit, Feedback, Intensive Care, Critical Care, Laboratory Utilization, Test Use, Transfusion Background Laboratory testing is an important and high volume medical resource that facilitates disease detection and monitoring of patient status [1] . However, lab testing is prone to inappropriate use [1] , with estimates suggesting that 20-30% of tests ordered are low-value, i.e., unnecessary, not indicated, or potentially harmful [1, 2] . While the tests themselves directly comprise only 4% of overall hospital expenditure, they are thought to be important in up to 70% of subsequent healthcare decisions and their related expenditures, and thus represent an important area for quality improvement [3, 4] .

Critical care is one setting where tests are ordered often [5] and where there is concern of overuse contributing to clinically important poor outcomes in vulnerable patients [6] [7] [8] [9] [10] [11] [12] [13] [14] . Blood loss can contribute to iatrogenic anemia [5, 15] . Subsequent red blood cell (RBC) transfusions [5, 15] can be associated with nontrivial risks such as transfusion-associated circulatory overload (TACO), transfusion-related acute lung injury (TRALI), and transfusion-related immunomodulation (TRIM) [15, 16] . Similar to laboratory testing, transfusion ordering has been flagged as an important area for quality improvement due to inappropriate use [17] [18] [19] [20] [21] [22] . A call to improve both practices was made by the Critical Care Societies Collaborative within their "Five Things Physicians and Patients Should Question" list, as part of the Choosing Wisely initiative [23] . The potential risks and downstream consequences associated with laboratory testing and transfusion ordering, in addition to increased expenditure and limited blood resources, all provide motivation to reduce inappropriate use [5, 15, 24, 25] .

Audit and Feedback (A&F), the collection and provision of clinical performance data to healthcare providers, represents a potentially low cost and sustainable class of intervention [26, 27] for improvement of test and transfusion ordering in the critical care setting. A Cochrane review has demonstrated that A&F shows widespread effectiveness across a range of clinical behaviors [28] . It is a broadly used intervention, familiar to most healthcare providers. We hypothesize that this class of intervention may be particularly well suited to the critical care setting, as A&F can be provided at the individual or group level through a variety of different modalities. Furthermore, test and transfusion ordering is increasingly documented electronically, providing accessible data to produce feedback reports at a reasonable cost [27, [29] [30] [31] [32] . A&F interventions in the context of test ordering in various clinical settings show a 22% relative risk reduction in test volume [33] . To date however, no review has examined the effectiveness of A&F interventions to modify these behaviors in the complex, team-based critical care setting.

To review how A&F interventions targeting healthcare professionals have been implemented in the critical care setting to improve the appropriateness of laboratory test and transfusion ordering.

To summarize the effectiveness of these interventions as compared to usual care or other interventions in modifying laboratory test and transfusion ordering. 

We used the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P checklist) [34] to draft our protocol, which was registered with the International Prospective Register of Systematic Reviews (PROSPERO: CRD42016051941) [35, 36] . All deviations to the protocol were minor and were implemented prior to the start of data extraction.

Studies with the following PICOS characteristics were included in the review.

Population Studies that targeted healthcare professionals (physicians, nurses, phlebotomists, or respiratory therapists) ordering laboratory tests or blood transfusion components (red blood cells (RBCs), platelets, plasma, or cryoprecipitate) for patients in an intensive care unit (ICU). Articles targeting healthcare professionals ordering laboratory tests or blood transfusion components for patients in a non-ICU setting were excluded.

Intervention Studies assessing Audit and Feedback (A&F) interventions, defined as "Any summary of clinical performance of health care over a specified period of time. The summary may also have included recommendations for clinical action. The information may have been obtained from medical records, computerized databases, or observations from patients" [37] . We also included multifaceted interventions that included an A&F component (e.g., A&F paired with educational sessions).

Comparator Studies that compared A&F interventions to usual care (no intervention; historical or concurrent), or any other single or multifaceted behavioral intervention that did not involve A&F (e.g., education, incentives, reminders, or systems-based changes).

Outcomes Primary outcomes included the number of laboratory tests or transfusions ordered. Secondary outcomes included the appropriateness of ordered laboratory tests or transfusions (for example as judged by the clinical context, or as compared to specified guidelines), length of stay (LOS), mortality, infection, and laboratory test or blood product expenditure.

Study design We included randomized controlled trials (RCTs), controlled clinical trials (CCTs), and observational studies (controlled before-after studies (CBAs), interrupted time series studies (ITSs), and uncontrolled before-after studies (UBAs)).

Setting We assessed studies that implemented interventions in an intensive care setting. All types of hospitals (i.e., academic, community) and ICUs (i.e., surgical, medical, pediatric, neonatal, etc.) were included. Studies implementing interventions across multiple settings (i.e., hospital-wide) were only included if ICU-specific data was reported for the primary outcome.

No time restrictions or year or language filters were used. We excluded conference abstracts, commentaries and letters to the editor, as well as studies not published in English to maintain feasibility. Previous literature suggests such language restrictions do not greatly affect review conclusions [38] . Studies implementing interventions across multiple settings, but not reporting ICU-specific data for the primary outcome, were excluded.

Our Medline (database conception: 1946) search strategy (Additional File 1) was developed with help from an information specialist. The strategy was then peer reviewed by a second, independent information specialist, as recommended by the Centre for Reviews and Dissemination [39] [40] [41] . Medical Subject Headings (MeSH terms) and title and abstract terms (".tw") were chosen for the general categories "Laboratory Tests," "Transfusions," "Intensive Care," and "Audit and Feedback." This template strategy was translated for use in the remaining databases, Embase (1947), EBM Reviews-Cochrane Central Register of Controlled Trials, CINAHL (1981), and PsycINFO (1806). These searches were run on October 28th, 2016, starting from database conception. The trial registries "ClincalTrials.gov" and International Standard Registered Clinical/soCial sTudy Number (ISRCTN) were additionally searched on December 23rd, 2016 to identify any relevant ongoing trials, using the search terms "intensive care" and "feedback." The bibliographies of included articles and relevant systematic reviews [28, 33, [42] [43] [44] were also hand searched to identify any further articles meeting the inclusion criteria.

Citations retrieved from the search were imported into the reference manager software program Mendeley Desktop 1.17.12 (Mendeley Ltd., London, UK) for de-duplication, then imported into Covidence [45] for screening.

The titles and abstracts of unique citations identified from electronic database searches were screened by two independent reviewers (MF and KC), and registry citations were screened by one reviewer (MF). Conflicts were resolved through discussion or reference to a third independent reviewer (JCB, JP). Full text articles were screened by one reviewer (MF), and justifications for inclusion or exclusion were confirmed by a second member of the research team (KC).

Data was extracted by two independent reviewers (MF and NM) using a standardized data extraction form implemented in Microsoft Excel 2011. One reviewer piloted the form on the first five articles and only minor refinements were required. Conflicts between data extraction forms were identified by one reviewer (MF), and consensus was reached between reviewers through discussion. If reviewers were not able to come to an agreement, a third reviewer (JCB, JP) was consulted to reach consensus.

We extracted several A&F intervention details based on characteristics described in the most recent Cochrane review [28] (format type, interval between reports (frequency)) and recently published guidance for the optimization of A&F [27] (type of data, specificity of data, number of reports, mode of delivery). We also extracted details about study design, type of control (e.g., historical, concurrent), type of ICU, type of patient (if applicable), type of laboratory test or blood component targeted, study participants (e.g., healthcare provider type), number of participants, follow-up time points, study country, funding, year of publication, and each study's definition for an appropriate test or transfusion (if applicable). We also extracted other intervention components (for multifaceted interventions) according to the following categories adapted from the Effective Practice and Organisation of Care (EPOC) Taxonomy [46] and a review by Kobewka et al. [33] : Education, Guidelines, Opinion Leader, Administrative Intervention, Financial Incentive, or "Other."

Two independent reviewers (NM and MF) assessed the methodological quality of studies using a modified version of the EPOC Review Group's quality criteria [37] used by Kobewka et al. [33] (Additional File 2). At the present time, there is not enough evidence to pick an appropriate cut-off to differentiate between high and low-quality studies. Furthermore, Cochrane recommends researchers avoid a scaled approach, and instead advocates for complete reporting of quality criteria [47] . We have thus presented results for each criteria item, and have not excluded any studies from our qualitative review. Reviewers were not blinded during data extraction or quality assessment. Cohen's Kappa [48] was calculated manually to evaluate inter-rater reliability for extraction of the quality assessment criteria.

Because of high heterogeneity in study designs, methods, outcomes, and variable reporting formats, we deemed meta-analysis to be inappropriate. Tables of study characteristics, intervention characteristics, and intervention effects were prepared to describe the set of included studies; absolute differences have been calculated for study outcomes. Our results have otherwise been reported as per the PRISMA guidelines, and a PRISMA checklist has been completed to document the inclusion of all critical elements of this review (Additional File 3) [49] . [16, [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] were identified for inclusion (Note: Merlani et al. [60] and Diby et al. [61] are publications assessing different aspects of the same study). A list of the excluded full text articles, sorted by reason for exclusion, can be found in Additional File 4. Table 1 describes characteristics of the included studies (n = 16). Ten of the 16 studies (63%) included transfusion outcomes [16, [50] [51] [52] [53] [54] [55] [56] [57] 64] , eight studies (50%) included test ordering outcomes [51, [57] [58] [59] [60] [61] [62] [63] 65] , while two studies included both [51, 57] . Of the studies including test ordering outcomes, six aimed to reduce overall test ordering [57, 58, [60] [61] [62] [63] 65] , and four aimed to improve the appropriateness of tests; one aimed to increase compliance with a sepsis bundle [51] , one aimed to improve compliance with arterial blood gas guidelines (an algorithm) [60, 61] , one aimed to improve compliance with standards for practice in the ICU [59] , and one aimed to reduce "unordered" tests (tests with no written order) [62] . Of the studies including transfusion ordering outcomes, three aimed to reduce the overall number of transfusions [50, 52, 57] , while seven aimed to improve the appropriateness of transfusions [16, 51, [53] [54] [55] [56] 64] . Of those assessing appropriateness, two aimed to improve compliance with a bundle [16, 51] , three assessed appropriateness as per guidelines or a protocol involving a transfusion "trigger" (defined level(s) at which to transfuse) and sometimes other patient factors [54] [55] [56] , and one study assessed appropriateness as per guidelines but included an additional category based on clinical context, "inconsistent with guidelines yet appropriate for ICU" [53] . The remaining study used a combination of transfusion "triggers" and an audit of clinical factors; however, several transfusion triggers were noted in the publication and it was not entirely clear which were used to specify appropriateness [64] . Further details on the criteria used to assess appropriateness can be found in Additional File 5.

Most studies (81%) used an uncontrolled before-after design [50, 51, [53] [54] [55] [56] [57] [58] [60] [61] [62] [63] [64] [65] . Only one RCT [59] , one controlled clinical trial (CCT) with a quasiexperimental comparative design [16] , and one controlled before-after design were identified [52] . Most (56%) were conducted in America [50-52, 54-57, 59, 62] and two (13%) were conducted in Canada [53, 58] . Half (50%) of the included studies did not report their source of funding [16, 50, 55, 56, 58, 62, 63, 65] ; four studies (25%) reported government grant funding [51, 54, 57, 59] . Most studies (56%) were conducted in a single ICU [51-55, 58, 60-62, 64] , while four studies (25%) were conducted in multiple ICUs at a single centre [16, 50, 56, 65] . Most (69%) took place at academic hospitals [16, 51-53, 55, 57, 60-65] . The year of publication ranged from 1988 to 2016, and study duration ranged from 25 weeks to 4 years.

A Cohen's Kappa of 0.67 was computed for interrater reliability (Additional File 6), representing "substantial agreement" as per Landis and Koch, but just meeting the cut-off for "suggesting that … conclusions tentatively be made" as per Krippendorff [48] . As such, reviewers discussed all disagreements to reach a consensus. Additional File 2 describes the quality of included studies (n = 16). Overall quality of the studies was judged to be poor; 94% of studies [16, [50] [51] [52] [53] [54] [55] [56] [57] [58] [60] [61] [62] [63] [64] [65] scored 4 or lower on the 8-9 criteria (risk of contamination was often not applicable). Most studies reported similar providers between groups (94%) [16, [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] 65] , and used an objective primary outcome measure or blinded for the primary outcome assessment (88%; 13 studies [16, 50-52, 54, 56-59, 62-65] and one study [53] respectively). However, most studies lacked a concurrent control group (88%) [50] [51] [52] [53] [54] [55] [56] [57] [58] [60] [61] [62] [63] [64] [65] , did not use time series analysis (100%), provided an insufficient amount of detail to allow for replication (100%), and did not report the number of tests per patient (56%) [16, 50, 53, 54, 56, 59, 62, 64, 65] .

There was a range of A&F interventions (n = 17) used in the 16 included studies. a As shown in Table 2 , most interventions were multifaceted (88%) [16, 50-53, 55-62, 64, 65] , including A&F and one or more additional components (i.e., education, guidelines, opinion leaders, financial incentives, checklists, or administrative interventions). Seven interventions (41%) reported providing feedback in a written format only [16, 52, 56, 57, 60, 61, 63, 65] , four (24%) provided at least verbal feedback [54, 55, 58, 64] , and three (18%) reported providing both written and verbal feedback [16, 59, 62] . Four interventions (24%) provided feedback only once [50, 59, 64, 65] , nine (53%) provided feedback more than once [16, 51, 52, 54, 56, 57, 60, 61, 63] , and in four cases it was unclear or the feedback was provided variably (24%) [53, 55, 58, 62] . Where reported, feedback was provided daily in one study [63] , weekly in two (12%) [51, 54] , monthly in three (18%) [16, 57, 60, 61] , and at various instances in four (24%) [16, 52, 55, 56] . Feedback most often provided data on group performance only, in seven of the interventions (41%) [16, 50, 53, 57, [59] [60] [61] 65] , three interventions provided both group and individual feedback (18%) [16, 54, 56] , and one intervention only clearly reported providing individual feedback (unclear if group data was provided) [58] . Feedback recipients were most commonly multiple groups of healthcare providers All three of the studies that tested significance found these reductions to be statistically significant [57, 60, 61, 65] .

Four studies aimed to improve the appropriateness of test orders (as per compliance with a bundle [51] , guidelines (an algorithm) [60, 61] , standards for practice [59] , or whether the test had a written order [62] ). Three of these four studies reported statistically significant increases in compliance (range + 5.3 to + 27%) [51, [59] [60] [61] . The remaining study reported a decrease in the proportion of inappropriate tests; however, upon assessing the number of overall tests per patient and inappropriate tests per patient, we noted undesired increases in both outcomes (range + 144 to + 214 total tests/patient; + 6 to + 15 unordered tests per patient). No statistical test was reported [62] .

Three studies sought to reduce transfusion orders. All three reported decreases (range − 0. [57] , one reported a statistically significant decrease for a subset of patients (overall significance not reported) [52] , and one did not report a statistical test [50] .

Seven studies [16, 51, [53] [54] [55] [56] 64] aimed to improve the appropriateness of transfusion orders (as per compliance with a bundle [16, 51] , a protocol/guideline [54] [55] [56] , guidelines plus clinical context [53] , and a combination of transfusion triggers and audit of patient factors [specifics unclear] [64] ). Outcomes included the over-transfusion rate, the odds of an inappropriate transfusion, the proportion of patients receiving inappropriate orders, the threshold at which a transfusion was given, the proportion of transfusions with an inappropriate threshold, or compliance with a bundle. Two studies saw significant decreases (range: OR of inappropriate transfusion 0.37-0.52; proportion of patients receiving unnecessary transfusion − 6.6%) [54, 55] ; one saw significant reductions during the intervention period and non-significant reductions at follow-up (range − 8 to − 23% inappropriate transfusions; − 5 to − 8% over-transfusion rate; − 0.3 to − 0.5 g/dL mean pre-transfusion trigger) [56] ; one saw a significant reduction for one transfusion outcome, but no significant difference for another (− 6.9% to − 17% in proportion of transfusions over specific triggers; distribution of pre-transfusion platelet counts: p = 0.452) [64] ; and one saw a non-significant increase in compliance (range + 3.1 to + 3.8% compliant episodes of transfusion) [51] . Another study saw non-significant decreases for both inappropriate transfusions and transfusions consistent with guidelines (− 14% and − 1% respectively) [53] . As described in Table 4 , the final included study was a head-to-head comparison of different types of A&F and found the enhanced intervention (timely individual + monthly group feedback) to significantly improve compliance of transfusions as compared to the monthly, group A&F (range + 31 to + 36% bundle compliance) [16] . Table 3 also describes A&F in light of different comparators. Fourteen studies (88%) compared multifaceted interventions to usual care [16, 50-53, 55-62, 64-66] . In most cases, data were only reported for the baseline and postintervention periods, thus not enabling direct assessment of A&F components only. Nine of these studies [51, 52, 55-57, 59-61, 64, 65] saw a statistically significant change in the hypothesized direction for at least one of the outcomes (range + 15 to + 27% in compliance, + 5.3 to + 21.6% compliant episodes, − 0.1 to − 1.6 orders/encounter, − 1.7 to − 3.4 median tests per patient day, − 613.1 tests/ 100 hospital days, − 6.9 to − 17% in proportion of transfusions over specific triggers, − 23% in inappropriate transfusions, − 8% in over-transfusion rate, − 0.5 g/dL mean pre-transfusion trigger, − 6.6% in patients receiving unnecessary transfusion, − 15.9% of patients receiving transfusion); three [50, 58, 63] reported changes in the hypothesized direction but did not report the significance (range − 1.72 to − 8 tests per patient; − 79 FFP use/month [units not reported]), and one [53] saw a statistically significant increase in transfusions "inconsistent with guidelines yet appropriate for the ICU" (+ 15% in requests), but nonsignificant decreases in both inappropriate (− 14% in requests) and "consistent with guidelines" transfusions (− 1% in requests). One study [62] did however provide a comparison of A&F alone versus usual care prior to implementing additional intervention components; undesired increases were seen for both overall [54] to implement a sole A&F intervention saw a significant decrease in the odds and proportion of inappropriate transfusion (OR 0.37-0.52).

Additional outcomes of interest, including length of stay, mortality, infection, and expenditure, are summarized in Tables 5 and 6 . Length of stay (ICU or hospital) and mortality (ICU or hospital) outcomes were reported in totals of 11 studies [16, 51, 52, 54-57, 59-62, 64 ] and ten studies [16, [51] [52] [53] [54] [55] [56] [57] [59] [60] [61] , respectively. A statistically significant reduction in LOS measure was reported in only one of the seven studies where it was tested [51] . Statistically significant decreases in mortality were found in three of the eight studies in which it was tested [51, 54, 57] . In the two studies that reported infection rates, one saw no statistical difference [59] , and the other did not report statistical tests [52] . Savings or expenditure was reported in five studies [52, 57, 58, [60] [61] [62] ; however, no statistical tests were reported.

A&F is known to be an effective component of interventions to improve practice [28] , and it is suggested to be a feasible intervention due to the availability of electronic health data [27, 29, 30, 32] . However, relatively little work has explored how this behaviour change intervention can be effectively implemented in the complex, team-based critical care setting. Our systematic review yielded 16 studies, the majority of which showed positive effects, though their overall quality and rigour of design were assessed to be relatively weak.

Of the 16 included studies, only one [54] assessed A&F alone as the sole intervention; the remaining studies assessed the effects of A&F alongside a range of intervention components (and in one case it was unclear if there were additional components). That most studies used a multifaceted intervention was reasonable, as previous literature has suggested that these interventions are more effective than single component interventions [33, [66] [67] [68] . While the lack of simple comparison studies would seem to prevent us from directly assessing the effectiveness of A&F, some investigators have argued that the substantial literature (the latest Cochrane review included 140 trials [28] ) demonstrates A&F's effectiveness, and negates the need for further testing of this intervention on its own [69] . Instead, the assessment of the conditions and mechanisms under which A&F is most effective is argued to be more likely to improve effectiveness of interventions [28, 69, 70] . Future primary studies may therefore consider the application of theory, process evaluations, and methods to compare different intervention component combinations to facilitate identification of those that are most effective and to better understand the potential mechanisms [71, 72] . Syntheses of the literature of the sort we report here are another way to advance work in this field.

Our review points to some mechanisms by which A&F might be made more effective in the critical care context. Two studies in our review [16, 54] suggest enhancing group feedback with individual feedback may improve intervention effectiveness. This is in line with a previous meta-analysis which found that combined group and individual feedback yielded a larger effect size than either type of feedback alone [32] . Recent guidance around A&F [27] also suggests that provision of individualized feedback whenever possible is more likely to be effective, as group-level feedback is easier for an individual to discount. In the critical care context, both levels of feedback may be preferable, in that it addresses the team-based nature of critical care [73, 74] , but still provides specific data for individual practitioners.

In eight of the 17 interventions, feedback was either presented only once, it was not clearly specified how often feedback was provided, or the feedback was provided variably (only when an inappropriate order was placed) [50, 53, 55, 58, 59, 62, 64, 65] . The finding that not all A&F interventions provide iterative feedback suggests that the important notion of the feedback loop [27] is overlooked in some cases. Recent guidance [27] recommends that feedback be provided multiple times, in order to close the feedback loop (i.e., a provider identifies a practice gap(s) based on the first instance of feedback, makes a change, and then needs subsequent instances of feedback to understand whether the practice change has resulted in improved outcomes).

While we were primarily interested in studies that aimed to reduce inappropriate tests and transfusions, it can be difficult to both define and adjudicate whether these resources are used appropriately [4, 44] . Thus, some studies aim to reduce inappropriate orders, but simply measure the overall reduction in tests or blood components. For instance, in our small sample, six studies (37.5%) did not assess appropriateness. Clear definitions of appropriate use are needed to ensure that the tests and transfusions reduced are in fact unnecessary, and that underuse and patient harm does not occur, especially in the context of the ICU. The remaining ten studies (62.5%) assessed appropriateness, with the majority identifying "appropriateness" as compliance with guidelines or protocols. Across studies, there was great variation in definitions of appropriateness, study aim, and outcomes measured. While it is plausible that varying definitions of appropriateness may have impacted the effectiveness of A&F, the small number of studies identified limited our ability to derive any differences and precluded statistical analysis.

The limited evidence we could find pertaining to patient length of stay (LOS) and mortality showed few significant differences. In part, this may be due to a lack of reporting on patient outcomes, an issue that has also been identified in other reviews [33] .

We found studies in this area lacking on important quality indicators. Many studies lacked a concurrent control group, and only one study used randomization. No time-series analyses were identified. Interventions were rarely described adequately to allow for replication. Lack of an appropriate control group and time-series analysis makes interpretation of study results difficult, as any effect seen may simply be due to coincidence, Hawthorne effects, seasonal differences, or another undocumented change [75] [76] [77] . Non-randomized studies are at risk of introducing selection bias [47] . Furthermore, poor reporting of intervention details makes synthesis and replication more difficult.

A&F interventions for laboratory test and transfusion ordering exhibited differences that may be important but that we were unable to test statistically due to the low number of studies available to us. They differed substantially in terms of the outcomes reported for the two types of studies (e.g., number of tests ordered per 100 hospital days versus number of blood component units ordered per year; unordered "blood work" tests per patient versus proportion of patients receiving an unnecessary transfusion). We noted that a greater proportion of studies assessing transfusion practices (7/10) reported measures of appropriateness as compared to studies assessing laboratory test ordering (4/8), which more often focused on reduction alone. These findings may warrant further investigation when more studies are available.

We conducted the first comprehensive review of A&F interventions for improvement of test and transfusion ordering in critical care. Our search strategy was developed and peer reviewed with guidance from library information specialists, and screening, data extraction, and the risk of bias assessment were completed by two independent reviewers. Furthermore, in addition to summarizing the effectiveness of these interventions, our review is the first to assess characteristics of the A&F interventions in light of recent best practice guidance [27] .

Our study has limitations that warrant consideration. Inconsistency in reporting and differences in intervention component nomenclature complicated our categorization of intervention types. Using standard intervention categories and terms (such as those outlined by the EPOC taxonomy [46] or the Expert Recommendations for Implementing Change (ERIC) project [78] ), reporting guidelines (such as the Template for Intervention Description and Replication (TIDieR) checklist [79] ), and online access to more detailed descriptions of the interventions, may facilitate comparisons between studies in future reviews. Our use of an unvalidated subset of quality items also precluded us from computing an overall quality score for each study. While we worked hard to be comprehensive, some relevant studies may not have been included in our review as not all publications provide the relevant information in the abstract. Considerable work aiming to improve test and transfusion ordering may be conducted as quality improvement initiatives, and thus be less frequently published or more difficult to identify in electronic searches [80, 81] . Finally, there is the potential for publication bias; we note that many of the included studies showed desired, albeit weak effects, which may suggest that studies that have positive and/or significant findings may be more likely to be submitted and published. Due to the heterogeneity in outcomes, we were not able to assess the potential for publication bias by funnel plot, as Cochrane suggests asymmetry statistical tests be conducted with no less than ten studies [82] . Future updates to this review, however, may be able to address this issue.

Our research identifies several ways to advance this literature. Use of more rigorous study designs, such as randomized controlled trials or cluster randomized controlled trials, would help to produce a higher quality evidence base around A&F interventions in the critical care setting. Greater focus on head-to-head trials of different types of A&F to study potential mechanisms of action and whether theory-informed suggestions for best practice help to optimize this intervention would advance this literature [27, 28, 69] . To allow for more robust and conclusive synthesis techniques such as meta-analysis and network metaanalysis, primary studies should employ comparative designs measuring and reporting on common outcomes (e.g., the number of laboratory tests per patient). Furthermore, adoption of consistent [46, 78] and thorough reporting practices [79] , improved access to feedback templates, and development of core outcome sets would enable research teams to produce cumulative knowledge. Measurement and reporting of core patient outcomes and cost data will also help to assess whether these interventions are safe and sustainable. In future updates of this review, it may be of interest to describe intervention components in light of established frameworks (e.g., Consolidated Framework for Implementation Research [83] , TIDieR [79] ), and to describe intervention implementation outcomes (e.g., acceptability, adoption, feasibility) [84] .

This study showed that A&F is potentially effective in the critical care setting, but interventions are typically inconsistent with best practice recommendations for A&F interventions, and lack important indicators of study quality. In the majority of cases, A&F was implemented as one part of a multi-component intervention, limiting our ability to determine which components were contributing to the overall success. Additionally, the majority of studies in our sample were uncontrolled, leaving the results prone to bias [76] .

More research focussed on the optimization of A&F in critical care is warranted; initial signals of efficacy, and the lack of consistency with best practices, suggest that these types of intervention can be improved. Future work should focus on understanding the mechanisms by which this intervention works [27, 85] , particularly in this team-based environment. Assessment of whether interventions designed with more best practice recommendations [27] in place are more effective, would help to advance this literature. Further work to develop a tool enabling assessment of A&F interventions in terms of these best practice recommendations would be valuable. Such work will help us determine how A&F interventions may optimally improve test and transfusion ordering in the critical care setting.

The landscape of inappropriate laboratory testing: a 15-year meta-analysis

Australian Association of Pathology Practices Inc. An analysis of pathology test use in australia

The effectiveness of interventions to improve laboratory requesting patterns among primary care physicians: a systematic review

Utilization management in the clinical laboratory: an introduction and overview

Laboratory Testing in the intensive care unit

Reducing unnecessary lab testing in the ICU with artificial intelligence

Multipronged strategy to reduce routine-priority blood testing in intensive care unit patients

Reduction of laboratory utilization in the intensive care unit

Reducing unnecessary testing in the intensive care unit by choosing wisely

Phlebotomy in the Intensive Care Unit: Strategies for blood conservation

Reducing unnecessary laboratory testing in the medical ICU

Hospital-acquired anemia: prevalence, outcomes, and healthcare Implications

Hospital-acquired anemia and in-hospital mortality in patients with acute myocardial infarction

Diagnostic blood loss from phlebotomy and hospital-acquired anemia during acute myocardial infarction

Anemia, bleeding, and blood transfusion in the intensive care unit: causes, risks, costs, and new strategies

Timely individual audit and feedback significantly improves transfusion bundle compliance-a comparative study

Cryoprecipitate use in 25 Canadian hospitals: commonly used outside of the published guidelines

Audit of appropriate use of platelet transfusions: validation of adjudication criteria

Utilization of frozen plasma in Ontario: a provincewide audit reveals a high rate of inappropriate transfusions

Evaluation of RBC transfusion practice in adult icus and the effect of restrictive transfusion protocols on routine care

A systematic review and metaanalysis of the clinical appropriateness of blood transfusion in China

The scientific basis for patient blood management

Critical Care Societies Collaborative. Five things physicians and patients should question

Non-essential blood tests in the intensive care unit: a prospective observational study

Transfusion thresholds and other strategies for guiding allogeneic red blood cell transfusion. Cochrane Database Syst Rev

Testing feedback message framing and comparators to address prescribing of high-risk medications in nursing homes: protocol for a pragmatic, factorial, clusterrandomized trial

Practice feedback interventions: 15 Suggestions for Optimizing Effectiveness

Audit and feedback : effects on professional practice and healthcare outcomes ( Review )

Survey of information technology in intensive care units in Ontario. Canada

Nextgeneration audit and feedback for inpatient quality improvement using electronic health record data: a cluster randomised controlled trial

The use of big data in transfusion medicine

Meta-analysis: audit and feedback features impact effectiveness on care quality

Influence of educational, audit and feedback, system based, and incentive and penalty interventions to reduce laboratory test utilization: A systematic review

Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation

PROSPERO International Prospoective Register for Systematic Reviews-Guidance notes for registering a systematic review protocol with PROSPERO

Evaluation of feedback interventions for the reduction of inappropriate laboratory tests and transfusions in intensive care units: a systematic review protocol

Cochrane Effective Practice and Organisation of Care Review Group (EPOC). Data collection checklist. Ottawa

The effect of English-language restriction on systematic review-based metaanalyses: a systematic review of empirical studies

Roles for librarians in systematic reviews: a scoping review

Committee on standards for systematic reviews of comparative effectiveness research. Standards for finding and assessing individual studies

Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews/Standards for Systematic Review

Systematic reviews: CRD's guidance for undertaking reviews in health care

Reducing the amount of blood transfused: a systematic review of behavioral interventions to change physicians' transfusion practices

Appropriateness of fresh-frozen plasma usage in hospital settings: a meta-analysis of the impact of organizational interventions

The effectiveness of interventions to reduce physician's levels of inappropriate transfusion: what can be learned from a systematic review of the literature

Vertitas Health Innovation Ltd. Covidence

Effective Practice and Organisation of Care (EPOC)

Cochrane Handbook for Systematic Reviews of Interventions

Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol

Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement

The use of laboratory intervention to stem the flow of fresh-frozen plasma

Septic shock: a multidisciplinary response team and weekly feedback to clinicians improve the process of care and mortality

Establishing a culture of blood management through education: a quality initiative study of postoperative blood use in CABG patients at methodist DeBakey Heart & Vascular Center

A multifaceted strategy to reduce inappropriate use of frozen plasma transfusions in the intensive care unit

Disclosure of physician-specific behavior improves blood utilization protocol adherence in cardiac surgery

Impact of guideline implementation on transfusion practices in a surgical intensive care unit

Peerto-peer physician feedback improves adherence to blood transfusion guidelines in the surgical intensive care unit

Using incentives to improve resource utilization: a quazi-experimental evaluation of an ICU quality improvement program

Changing physicians' behavior using combined strategies and an evidence-based protocol

Outreach education to improve quality of rural ICU care: results of a randomized trial

Quality improvement report: linking guideline to regular feedback to increase appropriate requests for clinical tests: blood gas analysis in intensive care

Harmonization of practice among different groups of caregivers: a guideline on arterial blood gas utilization

Reducing unnecessary blood work in the neurosurgical ICU

Blood loss from laboratory tests

A simple automatized audit system for following and managing practices of platelet and plasma transfusions in a neonatal intensive care unit

An administrative intervention to improve the utilization of laboratory tests within a university hospital

From best evidence to best practice: Effective implementation of change in patients' care

Changing physicians' practices

Implementing guidelines in general practice care. Qual Heal Care

Reducing research waste with implementation laboratories

Growing literature, stagnant science? Systematic review, metaregression and cumulative analysis of audit and feedback interventions in health care

Deconstructing interventions: approaches to studying behavior change techniques across obesity interventions

Developing and evaluating complex interventions: The new Medical Research Council guidance

Balancing intertwined responsibilities: a grounded theory study of teamwork in everyday intensive care unit practice

The intensive care unit work environment: current challenges and recommendations for the future

Uncontrolled before-after studies: Discouraged by cochrane and the EMJ

Research designs for studies evaluating the effectiveness of change and improvement strategies

Block design allowed for control of the Hawthorne effect in a randomized controlled trial of test ordering

A refined compilation of implementation strategies: Results from the Expert Recommendations for Implementing Change (ERIC) project

Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide

Perspective on publishing quality improvement efforts

Identifying quality improvement intervention publications-a comparison of electronic search strategies

Cochrane Handbook for Systematic Reviews of Intervention Version 510

Fostering implementation of health services research findings into practice: A consolidated framework for advancing implementation science

Outcomes for implementation research: Conceptual distinctions, measurement challenges, and research agenda

Health professionals' perceptions about their clinical performance and the influence of audit and feedback on their intentions to improve practice: a theory-based study in Dutch intensive care units

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Supplementary information accompanies this paper at https://doi.org/10. 1186/s13012-020-00981-5. Endnotes a) One study compared two different types of feedback; proportions for the audit and feedback intervention characteristics have therefore been calculated using a denominator of 17.Authors' contributions JCB and JP were responsible for the conception of this project and provided guidance and expertise throughout the entire project. MF and KC completed title and abstract screening and KC provided confirmation for inclusion and exclusion of full text articles screened by MF. Data extraction and quality assessment were completed by MF and NM. MF drafted the manuscript, and JCB, JP, NM, LM, and BH provided critical input and aided in the revision of the manuscript. All authors have read and approved the final manuscript. The guarantor of this review is MF.Funding MF received a Queen Elizabeth II scholarship for her Master's thesis. MF also received a University of Ottawa Graduate Studies Scholarship and held a graduate studentship with the Ottawa Hospital Research Institute. Funding bodies had no role in the design of the study, collection, analysis, interpretation of data, or in the writing of the manuscript.

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.Ethics approval and consent to participate Not applicable.

Not applicable.

The authors declare that they have no competing interests.