key: cord-0785773-8scforde
authors: Ben Charif, Ali; Zomahoun, Hervé Tchala Vignon; Gogovor, Amédé; Abdoulaye Samri, Mamane; Massougbodji, José; Wolfenden, Luke; Ploeg, Jenny; Zwarenstein, Merrick; Milat, Andrew J.; Rheault, Nathalie; Ousseine, Youssoufa M.; Salerno, Jennifer; Markle-Reid, Maureen; Légaré, France
title: Tools for assessing the scalability of innovations in health: a systematic review
date: 2022-03-24
journal: Health Res Policy Syst
DOI: 10.1186/s12961-022-00830-5
sha: b0176d0727dfbfdda93366d2337fd1f25dff09a2
doc_id: 785773
cord_uid: 8scforde

BACKGROUND: The last decade has seen growing interest in scaling up of innovations to strengthen healthcare systems. However, the lack of appropriate methods for determining their potential for scale-up is an unfortunate global handicap. Thus, we aimed to review tools proposed for assessing the scalability of innovations in health. METHODS: We conducted a systematic review following the COSMIN methodology. We included any empirical research which aimed to investigate the creation, validation or interpretability of a scalability assessment tool in health. We searched Embase, MEDLINE, CINAHL, Web of Science, PsycINFO, Cochrane Library and ERIC from their inception to 20 March 2019. We also searched relevant websites, screened the reference lists of relevant reports and consulted experts in the field. Two reviewers independently selected and extracted eligible reports and assessed the methodological quality of tools. We summarized data using a narrative approach involving thematic syntheses and descriptive statistics. RESULTS: We identified 31 reports describing 21 tools. Types of tools included criteria (47.6%), scales (33.3%) and checklists (19.0%). Most tools were published from 2010 onwards (90.5%), in open-access sources (85.7%) and funded by governmental or nongovernmental organizations (76.2%). All tools were in English; four were translated into French or Spanish (19.0%). Tool creation involved single (23.8%) or multiple (19.0%) types of stakeholders, or stakeholder involvement was not reported (57.1%). No studies reported involving patients or the public, or reported the sex of tool creators. Tools were created for use in high-income countries (28.6%), low- or middle-income countries (19.0%), or both (9.5%), or for transferring innovations from low- or middle-income countries to high-income countries (4.8%). Healthcare levels included public or population health (47.6%), primary healthcare (33.3%) and home care (4.8%). Most tools provided limited information on content validity (85.7%), and none reported on other measurement properties. The methodological quality of tools was deemed inadequate (61.9%) or doubtful (38.1%). CONCLUSIONS: We inventoried tools for assessing the scalability of innovations in health. Existing tools are as yet of limited utility for assessing scalability in health. More work needs to be done to establish key psychometric properties of these tools. Trial registration We registered this review with PROSPERO (identifier: CRD42019107095) SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12961-022-00830-5.

Background Various innovations have been developed and successfully piloted to strengthen healthcare systems in low-, middle-or high-income countries [1] [2] [3] . A health innovation refers to a set of behaviours, routines and ways of working that are perceived as new; that aim to improve health outcomes, administrative efficiency, cost-effectiveness or user experience; and that are implemented through planned action [4] [5] [6] . But there is a global delivery gap between innovations for which evidence of effectiveness has been established and those that actually reach the people who could benefit [7, 8] . Thus, the last decade has seen growing interest in the scaling up of health innovations. Scaling up, or expanding the impact and reach of effective innovations, could reduce waste and inequalities in health settings and improve outcomes [7] [8] [9] . For example, up to 85% of all maternal, neonatal and child deaths in low-or middle-income countries could potentially be averted through scaling up of successfully piloted innovations [10] . The science of knowledge mobilization, or moving knowledge into action (also known variously as knowledge translation and implementation science), can be a key instrument for closing this gap by taking evidence-based innovations and testing strategies to move them into wider practice [11] [12] [13] . Thus, there is a need for tools to help identify evidencebased innovations that could be successfully expanded or scaled up to reach more patients in healthcare systems. There are various definitions of scaling up [14] , ranging from an increase in the number of beneficiaries, organizations or geographic sites, to more complex definitions in which expanding the variety, equity and sustainability of an innovation is also considered [1, 6, 15] . Some innovations are implemented at scale before ever going through a pilot trial or small-scale introduction [16] . This was the case with the coronavirus disease 2019 (COVID- 19) vaccines in Canada, for example, which were developed elsewhere through clinical research and then introduced simultaneously nationwide at the local level. In some situations, scale-up is transnational; for example, innovations adopted first in a low-or middle-income country are then transferred or scaled up to a highincome country [17, 18] . Scale-up can be nonlinear, and is inherently complex and often political [19] . Scalability is defined as the "ability of a health innovation shown to be efficacious on a small scale and/or under controlled conditions to be expanded under real-world conditions to reach a greater proportion of the eligible population, while retaining effectiveness" [20] . Here, we consider scalability broadly as also including assessing whether the innovation can be replicated, transferred or sustained [6, 21] .

Among other considerations in preparing for scaleup, decision-makers need to assess the more technical scalability components of an innovation [2, 3] . In 2003, Everett Rogers identified key innovation characteristics relevant for assessing scalability: relative advantage (which includes effectiveness), compatibility, complexity, comprehensibility (to the user), trialability, observability and potential re-invention (i.e. adaptation) [5] . Since then, others have adapted and added to these characteristics [6] . Milat's scalability assessment tool [22] , for example, based on existing frameworks, guides and checklists, is a recent and comprehensive effort to select and summarize essential components of a scale-up preparedness plan [1, 15, 21, 23] . In spite of these advances, however, scalability assessments are still often overlooked by those responsible for developing and delivering innovations in health [1, 16] .

Thus, scalability assessments target certain key components or properties that are critical for scale-up. For example, many health innovations are scaled up in the absence of evidence of beneficial impact [16] , a scalability component that is an essential predictor of successful scale-up [3, 6, 21] . Scalability assessments should also anticipate known pitfalls of scale-up, that is, elements that have compromised the success of scaling up, such as the replicating of harms at scale [24] . While few studies focus on scale-up failures, studies that do so can throw into relief gaps that otherwise might be overlooked [25] . Failing to involve patients and the public, especially those who may be socially excluded owing to age, ethnicity, or sex and gender, may also result in poor programmatic outcomes, as scale-up could overlook the concerns of its intended beneficiaries [14, 15, 26] .

In addition to the complex strategic, political and environmental considerations surrounding scale-up, end-users (e.g. policy-makers, implementers) lack theoretical, conceptual and practical tools for guiding scalability assessments in health settings [27] . In Canada, many innovation teams have expressed the need for a validated tool for scalability assessment in primary healthcare [2, 3] . No previous knowledge synthesis has been conducted on the measurement properties (i.e. quality aspects such as reliability, validity and responsiveness) of scalability assessment tools. Thus, we aimed Keywords: Scalability, Scaling up, Scaling, Spread, Health innovations, Patient and public involvement, Scalability assessment tool, Systematic review Page 3 of 20 Ben Charif et al. Health Research Policy and Systems (2022) 20:34 to review existing tools for assessing scalability of health innovations, describing how the tools were created and validated, and describing the scalability components they target. Our research question was as follows: "What tools are available for assessing the scalability of innovations in health, how were they created, what are their measurement properties, and what components do they target?"

We performed a systematic review with a comprehensive overview of the components targeted by scalability assessment tools and their measurement properties. We adapted and followed the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology for systematic reviews [28] . We reported the review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [29] and the COSMIN reporting recommendations [28] . In this manuscript, the noun "report" refers to a document (paper or electronic) supplying information about a study, and the noun "record" refers to the title or abstract of a report indexed in a database or website [29] . We registered this review in the International Prospective Register of Systematic Reviews (PROSPERO) on 2 May 2019 (registration identifier: CRD42019107095) [30] .

Following the COSMIN approach, we used the following eligibility criteria.

• Construct: We included any tool aiming to assess or measure scalability of innovations in health. WHO defines health as "a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity". According to the International Classification of Health Interventions, types of health innovations could include management, prevention, therapeutic, diagnostic, other (i.e. not classified elsewhere) or unspecified [3, 31] . • Population: We included any type of stakeholder or end-user. Stakeholders refer to persons who were involved in the conception, creation or validation of the tool [32] . End-users refer to individuals such as policy-makers who are likely to use the tool to make decisions about scaling up an innovation [33] . Stakeholders can also be end-users, and both can include patients and the public, healthcare providers, policymakers, investigators, trainees and funders [14] . Endusers can be involved in the creation or validation process of the tool, and the level of their involvement may vary from minimal (i.e. receiving information about it, but with no contributing role) to coproducing the tool (i.e. participating as an equal member of the research team) [14, 34, 35] . • Instrument: We included any tool containing items proposed for assessing the scalability of an innovation in health. A tool refers to a structured instrument such as a guide, framework, questionnaire, factors, facilitators or barriers. Items refer to individual elements of the tool such as questions or statements that were mapped to targeted components. • Measurement properties: We included any reports presenting (1) creation of a scalability assessment tool, (2) assessment of one or more measurement properties of the tool or (3) assessment of the interpretability of the tool. A measurement property is defined as a quality aspect of a tool, i.e. reliability, validity and responsiveness [28] . We included any of the following nine measurement properties: content validity, structural validity, internal consistency, cross-cultural validity or measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. We excluded any study protocol and any editorial material, defined as an article that gives the opinions of a person, group or organization (e.g. editorials, commentaries and letters).

In other words, we included any empirical research which aimed to investigate the creation, validation or interpretability of a scalability assessment tool in health settings (Table 1) .

Overall, we performed a comprehensive search to identify records through both electronic databases of peerreviewed literature and secondary searches, including hand searching relevant websites, screening reference lists of included or relevant reports, and consulting experts in the field of scale-up. There was no restriction regarding language, date or country of publication, or type of reports.

First, we searched Embase via embase.com, MED-LINE via Ovid, CINAHL via EBSCO, Web of Science, PsycINFO via Ovid, the Cochrane Library, and ERIC via EBSCO from their inception to 20 March 2019. An information specialist with the Unité de soutien SSA Québec [36] (NR) drafted the preliminary version of the search strategy for Ovid MEDLINE. The search terms were based on previous works to reflect three concepts: scalability [1] , tool [37] and health [38] . The preliminary search strategy was reviewed by eight international experts (ABC, HTVZ, LW, JP, MZ, AJM, JS and MMR), and then by a second information specialist in the Faculty guideline [39] . The experts were university-based investigators (from Benin, Togo, Comoros, Australia, and Canada) and experts in knowledge mobilization, health services research, health research methodology and scaling up. We resolved any disagreements through a consensus meeting between the two information specialists and a third party (ABC and HTVZ). The search terms were adapted to the above-mentioned databases by removing search terms related to the concept of health in all biomedical databases-the difference in the number of records found in MEDLINE when removing healthrelated terms was minimal (104 records out of a total of 2528). Details of the search strategy in each electronic database can be found in the appendix (Additional file 1). Second, we identified other records by searching relevant websites, screening reference lists of included or relevant reports, and consulting experts in the field of scale-up. This approach is promoted as a way of reducing publication bias [40] . We consulted Google Scholar, Google web search, and the websites of a list of 24 Canadian and international organizations in both English and French from 10 October to 20 December 2019 (Additional file 2). In French, we used the following keywords: "potentiel de mise à l'échelle", "potentiel de passage à grande échelle", "transférabilité", "mise à l'échelle", "passage à grande échelle", "accroissement d'échelle", "passage à l'échelle" and "diffusion". In English, we used terms related to the concept of scalability including scalability, transferability, readiness, scale, scaling, upscaling, upscaling, and spread (Additional file 1). We also established a list of experts in the field of scale-up and asked them via email about documentation of tools they had created or knew about, from 5 to 29 May 2020 (Additional file 3). The list of experts included authors of reports included in this systematic review, authors of reports included in our previous systematic review [1] , members of the 12 Canadian Institutes of Health Research (CIHR)-funded Community-Based Primary Health Care (CBPHC) teams [2, 41] , and members of the Research on Patient-Oriented Scaling-up (RePOS) network [14] .

First, we operationalized eligibility criteria using questions with the following responses: "met", "not met" and "unclear". Five author reviewers (ABC, AG, MAS, JM and YMO) independently screened a random sample of 5% of records identified with our literature search. We discussed the results of this pilot and reviewed the eligibility criteria. Second, two senior end-users and experts in scaling up (JP and MZ) independently screened five records and suggested a minor change in wording to clarify eligibility criteria. Third, the same five reviewers independently piloted the selection of another random sample of 5% of the remaining records. We calculated inter-reviewer agreement between these five reviewers using the weighted Cohen's kappa [42] and considered it substantial when we reached a value of at least 0.60 [43] . Fourth, the five reviewers (ABC, AG, MAS, JM and YMO) independently screened all remaining records. We detail the records assignment and kappa calculation in the appendix (Additional file 4). Fifth, two reviewers (ABC and MAS) assessed all potentially relevant reports Aim of study Development of a tool (e.g. a guide, framework, questionnaire, factors, facilitators or barriers) Assessment of one or more measurement (or psychometric) properties of a tool Assessment of the interpretability of a tool

The study did not present or describe a guide, framework, questionnaire, factors, facilitators or barriers (hereafter referred to as "tool")

Aim of the tool Tool aiming to assess the scalability of an innovation (i.e. potential or readiness for scale-up, for spread, for transfer, for diffusion or for system wide implementation)

The tool is not intended to be used for assessing the scalability of an innovation Does the tool aim to evaluate the scalability of an innovation?

Any health context The tool is not intended to be used for an innovation in the field of health Is the tool intended to be used for innovation in the field of health? to identify reports meeting the eligibility criteria. For all ineligible reports, we documented the main reason for exclusion. Finally, in all steps, we resolved all disagreements through consensus among reviewers in faceto-face meetings and, when required, with the project leader (ABC). Records that referred to the same report were considered duplicates, but records that referred to reports that were merely similar were considered unique [29] . We used EndNote X9 software to identify duplicates and an Excel form for the selection process.

We developed an Excel form to guide extraction of variables based on the COSMIN manual [28] . Six reviewers (ABC, HTVZ, AG, MAS, JM and YMO) performed a calibration exercise to ensure that the form captured all relevant data. Then two reviewers (ABC and MAS) independently extracted data using the Excel form. The following information was extracted from each included unique report:

• characteristics of included tools (e.g., type, date of issue or publication, funding support, language, stakeholder, open-access source, name, scalability components targeted, content and pitfall predictions); • intended context of use (e.g. income level of country, healthcare level, focus area, end-user and aim); and • data that could be considered sources of validity for measurement properties. For example, data regarding the tool's content validity could include test blueprint, representativeness of items in relation to the scalability component, logical or empirical relationship of content tested to scalability component, strategies to ensure appropriate content representation, item writer qualifications, and analyses by experts regarding how adequately items represent the content of the scalability component [44] .

All disagreements were resolved through consensus between ABC and MAS in face-to-face and virtual meetings. We used Microsoft Teams for the virtual meetings.

We used the COSMIN Risk of Bias checklist to assess the methodological quality of included tools [28] . This checklist contains one box with standards for assessing the tool's methodological quality and nine boxes for assessing the methodological quality of studies that reported measurement properties for tools. In this review, because there were very limited data on content validity and no data on other measurement properties, we assessed the methodological quality of tool creation only, which is also part of the content validity. Two reviewers (ABC and MAS) independently assessed the quality of all included tools after a pilot using a sample of two tools. We resolved all disagreements through consensus between ABC and MAS in virtual meetings using Microsoft Teams.

The COSMIN standards for tool creation consist of 35 items divided into two parts [45] : Part A addresses the quality of the design and Part B the quality of the pilot study. Part A includes a concept elicitation study performed with end-users to identify relevant items for a new tool, and a clear description of the construct and how it relates to the theory or conceptual framework from which it originates. Part B includes a pilot study performed with end-users to evaluate comprehensiveness and comprehensibility. Each standard is scored on a four-point rating scale: "very good", "adequate", "doubtful" or "inadequate". A standard is rated as "doubtful" if it is doubtful whether the quality aspect is adequate (i.e. minor methodological flaws), and "inadequate" when evidence is provided that the quality aspect is not adequate (i.e. important methodological flaws) [28] . Where a score for a standard was not requested, the option "not applicable" was available. Total scores are determined separately for concept elicitation and pilot test. A total score per tool is obtained by taking the lowest rating of any item (i.e. worst score counts).

We analysed and summarized extracted data using a narrative approach involving framework and content analysis [46] . We created an integrated framework of categories for the purpose of this study based on recent work on scaling up. All classification was carried out independently by two reviewers (ABC and MAS) and all disagreements were resolved through consensus in virtual meetings using Microsoft Teams. We used the PRISMA 2020 flowchart to describe the process of tool selection [29] . We summarized the main characteristics of tools, including components targeted by the tools and their methodological quality, in a tabular display using SAS 9.4 software.

First, we classified each tool using the three types: (1) scale, (2) checklist or (3) set of criteria. To be considered a scale, each item within the tool had to have a numeric score attached to it so that an overall summary score could be calculated. To be considered a checklist, the tool had to include multiple items to observe for scalability criteria to be met. To be considered "criteria", the tool had to include a list of items (questions or statements) with no proposed responses. Second, we mapped each item of each tool to the following 12 possible components targeted by the tool: ( C 1 ) health problem addressed by the innovation; ( C 2 ) development process of the innovation;

( C 3 ) innovation characteristics; ( C 4 ) strategic, political or environmental context of the innovation; ( C 5 ) evidence available for effectiveness of the innovation; ( C 6 ) innovation costs and quantifiable benefits; ( C 7 ) potential for implementation fidelity and adaptation of the innovation; ( C 8 ) potential reach and acceptability to the target population; ( C 9 ) delivery setting and workforce; ( C 10 ) implementation infrastructure required for scaleup; ( C 11 ) sustainability (i.e. longer-term outcomes of the scale-up); and ( C Other ) other components. This classification was based on Milat's 10-component framework [22] , to which we added items related to the development process of the innovation such as the use of a theoretical, conceptual or practical framework ( C 2 ) [2, 3] , which is the primary stage of scale-up [16] . Third, we determined whether each tool included items related to eight potential pitfalls to be anticipated when planning scale-up of the innovation. Six of those pitfalls were based on a rapid review of points of concern regarding the success or failure of scale-up efforts [24] . To these six pitfalls we added patient and public involvement and sex and gender. These were demonstrations that development or piloting of the innovation had not excluded its targeted beneficiaries (e.g. excluding women in a programme about women's health) [1, 14, 15, 26] . The expanded pitfalls thus consisted of the following: ( P 1 ) sex and gender considerations; ( P 2 ) patient and public involvement; ( P 3 ) the difficulty of cost-effectiveness estimates; ( P 4 ) the production of health inequities; ( P 5 ) scaled-up harm; ( P 6 ) ethics (e.g. informed consent at scale); ( P 7 ) top-down approaches (i.e. the needs, preferences and culture of beneficiaries of the innovation may be forgotten when scale-up is directed from above); and ( P 8 ) context (e.g. difficulty in adapting the innovation to certain contexts). Finally, we adopted a previous rating system to quantify the extent to which sources of validity evidence for measurement properties of the tools were reported: 0 = "no discussion or data presented as a source of validity evidence"; 1 = "data that weakly support the validity evidence"; 2 = "some data (intermediate level) that support the validity evidence, but with gaps"; and 3 = "multiple sets of data that strongly support the validity evidence" [44] .

Our electronic search identified 11 299 potentially relevant records. Of these, 2805 were duplicates, leaving 8494 records. Of these, 8422 did not meet the review criteria. With the second random sample of 5% of the 8494 records, we found substantial pair inter-reviewer agreements for decisions regarding inclusion, with kappa values ranging from 0.66 to 0.89 across all reviewers (Additional file 4). Finally, we reviewed a total of 72 reports, retained 13 [2, [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] and excluded 59 (Additional file 5). In addition, our secondary searches led to the inclusion of 18 additional reports [3, 6, [20] [21] [22] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] . Overall, we included a total of 31 reports from all sources [2, 3, 6, 20-22, 47-58, 118-130] , which described a total of 21 unique tools (Fig. 1) . We included the following tools: the Innovation Scalability Selfadministered Questionnaire (ISSaQ) [2, 3] , the AnalySe de la Transférabilité et accompagnement à l' Adaptation des Interventions en pRomotion de la santE (ASTAIRE) [53, 54] , the Process model for the assessment of transferability (PIET-T) [55] , the CORRECT attributes [6, 121, 122] , the scalability assessment framework [57] , the Intervention Scalability Assessment Tool (ISAT) [22] , the Readiness to Spread Assessment Scoring Sheet [125] , the Readiness to Receive Assessment Scoring Sheet [126] , the Applicability and Transferability of Evidence Tool (A&T Tool) [118, 119] , the Scalability Assessment and Planning (SAP) Toolkit [130] and the Scalability Checklist [127] [128] [129] . We did not find names for 10 of the tools [20, 21, 47-52, 56, 58, 120, 123, 124] .

Characteristics of included tools are outlined in Table 2 .

Type and source of tools: most tools were criteria (n = 10, 47.6%), followed by scales (n = 7, 33.3%) and checklists (n = 4, 19.0%). Included tools were created or published from 2005 onwards and the majority since 2010 (n = 19, 90.5%). Their creation was funded by governmental or nongovernmental organizations (n = 16, 76.2%). All tools were in English; three were translated into French only (14.3%) and one into French and Spanish (4.8%). Most tools were available through open-access peer-reviewed journals, ResearchGate or organizational websites (n = 18, 85.7%).

Scalability components: all tools targeted multiple components. The most frequently targeted components were potential implementation fidelity and adaptation (81.0%), delivery setting and workforce (81.0%), and implementation infrastructure (81.0%). The three least frequently targeted were health problems addressed by the innovation (57.1%), sustainability (47.6%), and development process of innovations (28.6%) ( Table 2) .

Content of tools: tools contained a total of 320 items (e.g. questions, statements) mapping to targeted components (Additional file 6). There was a median of 16 items per tool (interquartile range: 13 items). In 286 items, just one scalability component was targeted; in 27 items, two scalability components were targeted; in five items, three scalability components were targeted; and in two items, four scalability components were targeted. Most items covered delivery setting and workforce (68 items), reach and acceptability for the target population (62 items), and evidence available for effectiveness of the innovation (42 items). Components least covered by items were problem addressed by the innovation (19 items), development process of the innovation (16 items), and sustainability (12 items). Pitfall predictions: most tools included items that considered contextual pitfalls (90.5%) and cost-effectiveness estimation pitfalls (71.4%). Pitfalls least considered were scaled-up harms (14.3%) and health inequities (4.8%) ( Table 2) .

Stakeholder involvement: no information on stakeholder involvement in tool creation or validation was found for 12 out of the 21 tools (57.1%) ( Table 2) . No studies reported involving patients or the public, for example, or reported on the sex of tool creators. Tool creation involved single (n = 5, 23.8%) or multiple (n = 4, 19.0%) types of stakeholders, including clinicians, policy-makers, researchers and civil society organizations (Table 2 ).

Eight tools did not report the income levels of countries for which they were created (38.1%) ( Table 3) . Six tools were reported as created for use in high-income countries (28.6%), four in low-or middle-income countries (19.0%), two in both (9.5%), and one for transnational transfers from low-or middle-income to high-income contexts (4.8%).

Seven tools did not report which healthcare levels they were created for (33.3%) ( Table 3 ). The largest proportion of tools for which this information was reported were created for public or population health (47.6%), primary healthcare (33.3%) or home care (4.8%) initiatives. We did not find names for 10 of the tools, in which case we indicate names of authors or organizations b We found no discussion or data presented as a source of validity evidence for the eight other measurement properties. 1

= Only a limited amount of data (e.g. simply listing items without justification); 2 = listing items with some references and justifications, limited description of the process for creating the tool; 3 = well-defined process for developing tool content, including both an explicit theoretical, conceptual or practical basis for the tool items and systematic item review by experts c According to COSMIN definitions, a standard is rated as "doubtful" if it is doubtful whether the quality aspect is adequate (i.e. minor methodological flaws), and "inadequate" when evidence is provided that the quality aspect is not adequate (i.e. important methodological flaws) d CORRECT attributes: C-credible in that they are based on sound evidence or advocated by respected persons or institutions; O-observable to ensure that potential users can see the results in practice; R-relevant for addressing persistent or sharply felt problems; R-relative advantage over existing practices so that potential users are convinced the costs of implementation are warranted by the benefits; E-easy to install and understand rather than complex and complicated; C-compatible with the potential users' established values, norms and facilities; fit well into the practices of the national programme; and T-testable so that potential users can see the innovation on a small scale prior to large-scale adoption Page 15 of 20 Ben Charif et al. Health Research Policy and Systems (2022) 20:34 Nine tools did not report on the focus area (42.9%) ( Table 3 ). The largest proportion of tools for which this information was reported were created for innovations related to reproductive, maternal, newborn, child or adolescent health (n = 7, 33.3%).

We found no information about intended end-users for 11 tools (52.4%) ( Table 3) . Tools for which this information was reported were intended for researchers, policymakers, programme managers, healthcare providers or funders (n = 10, 47.6%). No tool was created for lay endusers including patients or the public.

All tools presented information for content validity, but most tools (n = 18, 85.7%) provided limited information (e.g. simply listing items without justification, limited description of the process for creating the tool). Only three tools (14.3%) provided multiple sets of information that strongly supported content validity, such as descriptions and origins of constructs, or comprehensibility and comprehensiveness of items. No tool reported on the other measurement properties.

According to COSMIN standards, the methodological quality of tools was deemed inadequate in 61.9% of cases (n = 13) and doubtful in 38.1% of cases (n = 8) ( Table 3) . The main reason was that design requirements were not met: for example, there was no clear description of the target population, context of use, or the tool's evaluative or predictive purpose.

We reviewed tools proposed for assessing the scalability of innovations in health. Altogether, identified tools targeted 11 scalability components and predicted eight pitfalls of scale-up. All included tools were created or published since 2005, but their methodological quality was inadequate or doubtful. No studies reported that patients or the public were involved in the creation or validation process of tools, and there was limited information on how the tools were intended to be used or on their intended end-users. These findings lead us to make the following observations.

First, all items found in the included tools were covered by our 11 defined scalability components, confirming that these classifications come close to reflecting the full range identified by others [22] , and were enriched by items contributing to avoiding identified pitfalls such as replication of harms. Scalability assessment should ensure that innovations do not replicate social inequities when implemented at scale [15, 24, 131, 132] . For example, if the design of an innovation to be scaled up was based on the male body as the norm [131] , its scaleup could reproduce harmful outcomes at scale. This is the case with the conventional seat belt: Seat belts are not tested with pregnant women, and their design has undergone almost no changes since they were first patented in 1958 [133] . Yet car crashes are the main cause of foetal deaths related to maternal trauma. The forces of the seat belt against a pregnant woman's abdomen leads to placental abruption, causing foetal death [133] . More scalability assessments should also involve patients and the public [1, 14] . For example, members of the advisory committee, together with patient representatives and other stakeholders, could visit actual or potential sites to review arrangements for the project and to assess the potential for scale-up if the innovation proves successful. Discussion with providers, programme managers and community members could provide insights into how the project will be implemented on the ground and possible challenges and opportunities for scaling up, and could inspire reflection on possible adjustments to enhance its scalability [15, 124] . Certain scalability components could be less relevant for some innovations depending on the political circumstances, or on whether they are outcome evaluations under ideal circumstances (efficacy) or realworld circumstances (effectiveness) [2, 3, 15] . In addition, epidemics (e.g. COVID-19) have highlighted how dramatically scalability considerations can change when the world changes [1, 15] .

Second, included tools were created or published since 2005, had inadequate or doubtful methodological quality, and most were of the "criteria" type. As key psychometric properties of these tools are yet to be established, for many of the tools there is still insufficient evidence to justify their claims. Future reviews involving the use of included tools should begin at the year 2005. Our results suggest that scalability assessment tools for health are still in their infancy. Previous studies confirm this, particularly in high-income countries [1, 22, 27, 134] . Indeed, the sophistication of our included tools varied from a simple list of items (i.e. criteria) to elaborate scales [135] , although none of these had been validated [22, 27] . There were also important limitations in terms of sample representativity in the creation or validation of tool content. Intended context of use, for example, and content validity, the primary measurement property, were not fully addressed in most of the included tools [135] . However, we believe that content validation may increase over time as we learn more about the notion of scalability [136] . Nevertheless, for end-users wanting to adopt an existing tool or create a new one, we propose a useful inventory of items (Additional file 6). We also hope to create a repertory of existing items whose language is accessible to Page 16 of 20 Ben Charif et al. Health Research Policy and Systems (2022) 20:34 lay end-users, including patients and the public. This will contribute to increasing patient and public involvement in the science and practice of scale-up in health and social services [14] . Third, we noticed an absence of patient and public involvement in the creation of the scalability assessment tools. Patient perspectives are not only essential in innovation development; they are also important in the creation of scalability assessment tools [14, 15] , asking the right questions and providing suggestions regarding items to include [135] . Although researchers, clinicians and policy-makers may be well positioned to describe the nature, scope and impact of a health problem that is being addressed, only those who experience the issues can report on the more subjective elements [135] . When appropriate, innovation teams have a responsibility to work with target patients to anticipate potential benefits and risks associated with scaling up, and to learn what risks they are willing to accept at each step of scale-up [15] . In practice, however, involving multiple stakeholders including patients and the public in the scalability assessments is a highly complex process [14, 15] . We have established the RePOS network to build patient-oriented research capacity in the science and practice of scaling up and ensure that patients, the public and other stakeholders are meaningfully and equitably engaged [14] . This international network will undertake the next phase of this review, conducting a multi-stakeholder consensus exercise to propose patient-oriented scalability assessment tools.

Finally, we acknowledge that our findings should be interpreted with caution. First, the interpretability criteria for what constitutes a useful item are not met by all items listed in our inventory (e.g. reading level, lack of ambiguity, asking only a single question) [135, 137] . However, at this early stage in the creation of scalability assessment tools, our interest is in creating an item pool. We aimed to be as inclusive as possible, even to the point of being overinclusive, as nothing can be done after the fact to compensate for items we neglected to include. Indeed, our research findings can be used to detect and weed out poor items using interpretability criteria proposed in the literature for item selection (Additional file 7) [135, 137, 138] . Second, characteristics of the innovation are important in scalability assessments, but there are other important, equally relevant assessments. Examples include comparing effects over time, namely at different stages of scale-up, so that innovations can be refined as coverage expands [27] , and taking into account ongoing interactions between the innovation and its potential contexts [21, 23] .

We reviewed and inventoried tools proposed for assessing the scalability of innovations in health and described the scalability components they targeted. Overall, the included tools covered many components of scalability and helped predict the pitfalls of scale-up in health such as the replication of harms at scale. However, our findings show that these tools are still at an early stage of creation and their key psychometric properties are yet to be established. Scalability is a new concept, and as our understanding of this construct evolves, we will often need to revise tools accordingly. Our review may aid future investigators in weighting or prioritizing where planning and actions for scale-up should focus. Future studies could further compare and contrast the identified tools to illuminate the many perspectives on scale-up and the diverse approaches needed. Further analyses of our identified tools could also deepen understanding of how implementers, including patient partners, evaluate scalability components and how tools differ in their incorporation of evidence about acceptability. We also need to identify further scalability components, nuances of components already identified, and precisely how each scalability component contributes to the scale-up process.

Effective strategies for scaling up evidence-based practices in primary care: a systematic review

Assessment of scalability of evidence-based innovations in community-based primary health care: a cross-sectional study

Assessing the scalability of innovations in primary care: a cross-sectional study

Diffusion of innovations in health service organisations: a systematic literature review

Diffusion of innovations

Nine steps for developing a scalingup strategy

A country of perpetual pilot projects

Statewide implementation of evidence-based programs

Research: increasing value, reducing waste

Improving implementation: building research capacity in maternal, neonatal, and child health in Africa

Implementation science: what is it and why should I care?

Knowledge translation in health care: moving from evidence to practice

Applying systems thinking to knowledge mobilisation in public health

Strategies for involving patients and the public in scalingup initiatives in health and social services: protocol for a scoping review and Delphi survey

Scaling impact: innovation for the public good

Pathways for scaling up public health interventions

A model for 'reverse innovation' in health care

Reverse innovation: a systematic literature review

From pilot project to system solution: innovation, spread and scale for health system leaders

The concept of scalability: increasing the scale and potential adoption of health promotion interventions into policy and practice

A guide to scaling up population health interventions

Intervention scalability assessment tool: a decision support tool for health policy makers and implementers

Spreading and scaling up innovation and improvement

The pitfalls of scaling up evidence-based interventions in health

Millions saved: new cases of proven success in global health

Integrating gender into the scale-up of family planning and maternal neonatal and child health programs

Assessing scalability of an intervention: why, how and who? Health Policy Plan

COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs)

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

Tools for assessing the scalability of health innovations: a systematic review

International classification of health interventions (ICHI)

Successful communication: a toolkit for researchers and civil society organisations

Canadian Institutes of Health Research (CIHR)

The Montreal model: the challenges of a partnership relationship between patients and healthcare professionals

Patient and family engagement: a framework for understanding the elements and developing interventions and policies

Unité soutien SRAP

Risk assessment tools for the prevention of pressure ulcers. Cochrane Database Syst Rev

Biomedical information of the Dutch Library Association (KNVI)

PRESS peer review of electronic search strategies: 2015 guideline statement

Methods guide for effectiveness and comparative effectiveness reviews

Promoting cross-jurisdictional primary health care research: developing a set of common indicators across 12 community-based primary health care teams in Canada. Prim Health Care Res Dev

Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit

The measurement of observer agreement for categorical data

Technical skills assessment toolbox: a review using the unitary framework of validity

COSMIN methodology for assessing the content validity of PROMs

Content validity-establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 1-eliciting concepts for a new PRO instrument

Assessment of applicability and transferability of evidence-based antenatal interventions to the Australian indigenous setting

Strengthening scaling up through learning from implementation: comparing experiences from Afghanistan, Bangladesh and Uganda

Criteria to assess potential reverse innovations: opportunities for shared learning between high-and low-income countries

How do we know when research from one setting can be useful in another? A review of external validity, applicability and transferability frameworks

When can research from one setting be useful in another? Understanding perceptions of the applicability and transferability of research

Transferability of interventions in health education: a review

A tool to analyze the transferability of health promotion interventions

Un outil pour accompagner la transférabilité des interventions en promotion de la santé: ASTAIRE

Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis

Scaling-up is a craft not a science": catalysing scale-up of health innovations in Ethiopia, India and Nigeria

Understanding and measuring scalability in disaster risk reduction

Applicability and transferability of interventions in evidence-based public health

The development sector is a graveyard of pilot projects!' Six critical actions for externally funded implementers to foster scale-up of maternal and newborn health innovations in low and middle-income countries

The stars seem aligned': a qualitative study to understand the effects of context on scale-up of maternal and newborn health innovations in Ethiopia

A conceptual framework for transferring research to practice

A contextual factors framework to inform implementation and evaluation of public health initiatives

A new scale to assess barriers to adopting motivational interviewing

A process evaluation of the scale up of a youth-friendly health services initiative in northern Tanzania

A quantitative, multi-national and multi-stakeholder assessment of barriers to the adoption of cell therapies

A systems-based typological framework for understanding the sustainability, scalability, and reach of childhood obesity interventions. Children's Health Care

Adapting and RE-AIMing a heart disease prevention program for older women with diabetes

An efficient and inexpensive system for the distribution and tracking of investigational medicinal products in streamlined safety trials

An instrument to measure maturity of integrated care: a first validation study

Applying a framework for assessing the health system challenges to scaling up mHealth in South Africa

Assessing the maturity of the healthcare system for integrated care: testing measurement properties of the SCIROCCO tool

Barriers and facilitators of transferring research to practice: an exploratory case study of motivational interviewing

Becoming baby friendly: a complex adaptive systems toolbox for scaling up breastfeeding programs globally

Becoming breastfeeding friendly index: development and application for scaling-up breastfeeding programmes globally

Community readiness within systems of care: the validity and reliability of the system of care readiness and implementation measurement scale (SOC-RIMS)

Criteria for evaluating transferability of child health interventions: a systematic review

Defining dimensions of research readiness: a conceptual model for primary care research networks

Development of a serious illness care model implementation framework

Differences in the scalability of formal and informal in-home care of urban elderly

Diffusion of an effective tobacco prevention program. Part I: evaluation of the dissemination phase

Enablers and inhibitors: a review of the situation regarding mHealth adoption in lowand middle-income countries

Exploring barriers and enablers for scaling up a community-based grain bank intervention for improved infant and young child feeding in Ethiopia: a qualitative process evaluation

Exploring the adoption and scale of mobile health solutions: antenatal mobile application data exchange in Nigeria

I-RREACH: an engagement and assessment tool for improving implementation readiness of researchers, organizations and communities in complex interventions

Implementation factors affecting the large-scale deployment of digital health and well-being technologies: a qualitative study of the initial phases of the "Living-It-Up" programme

Information technology in complex health services: organizational impediments to successful technology transfer and diffusion

Is the EUnetHTA HTA Core Model ® fit for purpose? Evaluation from an industry perspective

Key factors influencing adoption of an innovation in primary health care: a qualitative study based on implementation theory

Knowledge transfer methods in German disease prevention and health promotion. A survey of experts in the federal prevention research program

Knowledge transfer: theoretical framework to systematically spread best practice

Large-scale improvement initiatives in healthcare: a scan of the literature

Measuring maturity of integrated care: a first validation study

Measuring quality improvement in public health: the development and psychometric testing of a QI maturity tool

Pilot of a low-resource, EHR-based protocol for sepsis monitoring, alert, and intervention

Prior schemata transfer as an account for assessing the intuitive use of new technology

Prioritizing health system and disease burden factors: an evaluation of the net benefit of transferring health technology interventions to different districts in Zimbabwe

Psychometric properties of the Norwegian version of the evidence-based practice attitude scale (EBPAS): to measure implementation readiness

Qualitative comparative analysis: a mixedmethod tool for complex implementation questions

Scaling up [to] a population health intervention: a readiness assessment framework

Scaling up a community-based grainbank intervention for improved infant and young child feeding (IYCF) in Ethiopia

Scaling up a school-based sexual and reproductive health intervention in rural Tanzania: a process evaluation describing the implementation realities for the teachers

Scaling up a tobacco control intervention in low resource settings: a case example for school teachers in India

Scaling up strategies of the chronic respiratory disease programme of the European innovation partnership on active and healthy ageing-executive summary

Strategies to facilitate the implementation of collaborative practice agreements in chain community pharmacies

Sustainability and scalability of a volunteer-based primary care intervention (Health TAP-ESTRY): a mixed-methods analysis

The development and validation of a meta-tool for quality appraisal of public health evidence: meta quality appraisal tool (MetaQAT)

The reach, transferability, and impact of the evidence-based practice questionnaire: a methodological and narrative literature review

Towards a framework to measure knowledge transfer in organizations

Transferability of health technology assessments and economic evaluations: a systematic review of approaches for assessment and application

Transferability of model-based economic evaluations: the case of trastuzumab for the adjuvant treatment of HER2-positive early breast cancer in the Netherlands

Transferability of national institute for health and clinical excellence recommendations for pharmaceutical therapies in oncology to Central-Eastern European countries

Translational formative evaluation": critical in up-scaling public health programmes

Type II translation: transporting prevention interventions from research to real-world settings

Understanding quality improvement at scale in general practice: a qualitative evaluation of a COPD improvement programme

Using a framework for spread of best practices to implement successful venous thromboembolism prophylaxis throughout a large hospital system

Using nominal group technique among clinical providers to identify barriers and prioritize solutions to scaling up opioid agonist therapies in Ukraine

Winning big

Can I use this evidence in my program decision? Assessing applicability and transferability of evidence. Hamilton

Évaluation de l'applicabilité et de la transférabilité des données probantes: Puis-je utiliser ces données probantes dans mes décisions de programmes?

Increasing the scale of population health interventions: a guide. Evidence and Evaluation guidance series, population and public health division. Sydney: NSW Ministry of Health, Centre for Epidemiology and Evidence

Nueve pasos para formular una estrategia de ampliación a escala. Geneva: World Health Organization

Avoir le but à l'esprit dès le début : la planification des projets pilotes et d'autres recherches programmatiques pour un passage à grande échelle réussi. World Health Organization, Department of Reproductive Health and Research-ExpandNet

Beginning with the end in mind: planning pilot projects and other programmatic research for successful scaling up. Geneva: World Health Organization

Canadian Foundation for Healthcare Improvement (CFHI)

Canadian Foundation for Healthcare Improvement (CFHI)

Scaling up-from vision to large-scale change: a management framework for practitioners

Scaling up-from vision to large-scale change: a management framework for practitioners

Scaling up-from vision to large-scale change: a management framework for practitioners

Toolkit: scalability assessment and planning (SAP) (including workshop guidance)

Exploring equity in primary-carebased physical activity interventions using PROGRESS-Plus: a systematic review and evidence synthesis

Applying an equity lens to interventions: using PROGRESS ensures consideration of socially stratifying factors to illuminate inequities in health

Analysis of the design of a car seatbelt : a study of the invention and a proposal to minimize the risk of injuries during pregnancy

Narrative review of models and success factors for scaling up public health interventions

Health measurement scales: a practical guide to their development and use

Content validity in psychological assessment: a functional approach to concepts and methods

Content validity-establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 2-assessing respondent understanding

Criteria for item selection for a preference-based measure for use in economic evaluation

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

We wish to acknowledge the following persons for their dedicated assistance with various aspects of this systematic review: Dr Arlene Bierman, M. Frédéric Bergeron, Dr Louisa Blair and Dr Annie LeBlanc. Also, we thank Dr Louisa Blair, English-language editor, for her kind help with the manuscript.

The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12961-022-00830-5.

Additional file 2. List of relevant websites used to identify potential eligible records.Additional file 3. Email sent to experts to identify potential eligible records. 

Please send all requests for study data or materials to Dr Ali Ben Charif (ali. bencharif@gmail.com) or Dr France Légaré (france.legare@mfa.ulaval.ca).

Ethics approval and consent to participate Not applicable.

Not applicable.

The authors declare that they have no competing interests.