key: cord-0817567-j7to9aug
authors: Page, Matthew J; Moher, David; Bossuyt, Patrick M; Boutron, Isabelle; Hoffmann, Tammy C; Mulrow, Cynthia D; Shamseer, Larissa; Tetzlaff, Jennifer M; Akl, Elie A; Brennan, Sue E; Chou, Roger; Glanville, Julie; Grimshaw, Jeremy M; Hróbjartsson, Asbjørn; Lalu, Manoj M; Li, Tianjing; Loder, Elizabeth W; Mayo-Wilson, Evan; McDonald, Steve; McGuinness, Luke A; Stewart, Lesley A; Thomas, James; Tricco, Andrea C; Welch, Vivian A; Whiting, Penny; McKenzie, Joanne E
title: PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews
date: 2021-03-29
journal: BMJ
DOI: 10.1136/bmj.n160
sha: f17e8e4be67ddd09453a6fa8d85dc7b6a9bd206f
doc_id: 817567
cord_uid: j7to9aug

The methods and results of systematic reviews should be reported in sufficient detail to allow users to assess the trustworthiness and applicability of the review findings. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement was developed to facilitate transparent and complete reporting of systematic reviews and has been updated (to PRISMA 2020) to reflect recent advances in systematic review methodology and terminology. Here, we present the explanation and elaboration paper for PRISMA 2020, where we explain why reporting of each item is recommended, present bullet points that detail the reporting recommendations, and present examples from published reviews. We hope that changes to the content and structure of PRISMA 2020 will facilitate uptake of the guideline and lead to more transparent, complete, and accurate reporting of systematic reviews.

"Currently there is no clear evidence to indicate which surgery is the best choice. It is unclear if the older operations that were previously available (such as anterior repair and colposuspension) really result in equivalent or better outcomes than the polypropylene mid-urethral sling. However, the feeling of our clinical experts who used to offer colposuspension and traditional slings is that these techniques had more frequent and severe associated complications and returning to them may be detrimental to women. To enable women to make an evidence-based choice and inform practice guidelines, it is essential to collect reliable evidence in a transparent, concise manner to allow impartial counselling of women regarding the benefits and risks of the alternative surgical operations for the management of stress urinary incontinence. The wide range of surgical operations available, the different techniques used to perform these operations and the lack of a consensus among surgeons make it challenging to establish which procedure is the most effective. The existing evidence base, including the Cochrane systematic reviews, has focused on discrete two-way comparisons, with no attempt being made to collate all of the evidence on the surgical options available and rank them in terms of clinical effectiveness, safety and cost-effectiveness. This has resulted in a piecemeal evidence base that is difficult for women and clinicians to interpret. This assessment includes an evidence synthesis of all available randomized controlled trials to determine the relative clinical effectiveness and safety of interventions, a discrete choice experiment (DCE) to explore women's preferences, an economic decision model to determine the most cost-effective treatment and a value-of-information (VOI) analysis to help inform the focus of further research." (11) Example 4: In a review examining the effects of dietary inorganic nitrate for lowering blood pressure in hypertensive adults, the authors report what information the review seeks to add to current knowledge, and indicate that no systematic review addressing the same question exists:

"…it is well known that the organic nitrates lower blood pressure in hypertensive individuals, which brings about the question of whether inorganic nitrates have the same ability. This review focuses on the dietary alteration component of lifestyle modifications by the use of inorganic nitrate in the treatment of hypertension. The appraisal of the evidence was completed to ultimately help providers make informed decisions regarding interventions to address one of the nation's biggest killers. There was a systematic review published in 2013 that addressed the effects of dietary inorganic nitrate on blood pressure with an overrepresentation of healthy, normotensive participants. That review found that inorganic nitrates decrease blood pressure. For this reason, this review examines studies published from 2013 through 2018 with blood pressure greater than 120/80 mmHg in participants, which would be considered elevated according to the guidelines published by the American College of Cardiology Examples (ACC) and American Heart Association (AHA). The results of this review will contribute towards a greater understanding of possible treatments for hypertension, sequentially resulting in less morbidity and mortality from cardiovascular diseases. At the time of this systematic review, there was no systematic review that evaluated the effects of inorganic nitrate specifically on adults with blood pressure greater than 120/80 mmHg." (12) Item 4. OBJECTIVES: Provide an explicit statement of the objective(s) or question(s) the review addresses.

Example 1: In a review examining the effects of anti-tumour necrosis factor-blocking agents for rheumatoid arthritis, the authors report a single objective of the review:

"Objectives: To evaluate the benefits and harms of down-titration (dose reduction, discontinuation, or disease activity-guided dose tapering) of anti-tumour necrosis factor-blocking agents (adalimumab, certolizumab pegol, etanercept, golimumab, infliximab) on disease activity, functioning, costs, safety, and radiographic damage compared with usual care in people with rheumatoid arthritis and low disease activity." (13) Example 2: In a review examining the effects of pre-exposure prophylaxis for the prevention of HIV infection, the authors report five key questions the review addresses:

"Key Questions:

1. What are the benefits of PrEP in persons without pre-existing HIV infection versus placebo or no PrEP (including deferred PrEP) on the prevention of HIV infection and quality of life? a. How do the benefits of PrEP differ by population subgroups? b. How do the benefits of PrEP differ by dosing strategy or regimen? 2. What is the diagnostic accuracy of provider or patient risk assessment tools in identifying persons at increased risk of HIV acquisition who are candidates for PrEP? 3. What are rates of adherence to PrEP in U.S. primary care-applicable settings? 4. What is the association between adherence to PrEP and effectiveness for preventing HIV acquisition? 5. What are the harms of PrEP versus placebo or no PrEP when used for the prevention of HIV infection?" (14) Example 3: In a review examining the effects of mobile health interventions during the perinatal period for mothers in lowand middle-income countries, the authors report the primary objective of the review and specify two questions the review addresses:

"The primary objective of this review was to determine the impact of mother-targeted mHealth educational interventions during the perinatal period in low-and middle-income countries on maternal and neonatal outcomes. Thus, this quantitative review aimed to answer the following questions:

i.

What is the impact of mother-targeted mHealth educational interventions on maternal knowledge, self-efficacy and antenatal/postnatal care clinic attendance in low-and middle-income countries? ii.

What is the impact of mother-targeted mHealth educational interventions on neonatal mortality and morbidity in lowand middle-income countries?" (15) Example 4: In a review examining the effects of screening for esophageal adenocarcinoma in patients with chronic gastroesophageal reflux disease, the authors report two key questions the review addresses:

"In order to determine the effectiveness of screening for esophageal adenocarcinoma among gastroesophageal reflux disease patients, the following key questions were addressed:

1a. In adults (≥ 18 years) with chronic gastroesophageal reflux disease with or without other risk factors, what is the effectiveness (benefits and harms) of screening for esophageal adenocarcinoma and precancerous conditions (Barrett's Esophagus and lowand high-grade dysplasia)? What are the effects in relevant subgroup populations?"

1b. If there is evidence of effectiveness, what is the optimal time to initiate and to end screening, and what is the optimal screening interval (includes single and multiple tests and ongoing 'surveillance')?" (16) Item 5. ELIGIBILITY CRITERIA: Specify the inclusion and exclusion criteria for the review and how studies were grouped for the syntheses.

Example 1: In a review examining the effects of family therapy for people with anorexia nervosa, the authors report the types of studies, participants, interventions, comparators, and outcomes that were eligible for inclusion in the review, and state that there were no restrictions on the type of reports that were eligible (i.e. published or unpublished, any language, any date of publication):

"Types of studies: We include all published or unpublished randomised controlled trials (RCTs). We would also have included cluster-randomised controlled trials and cross-over trials, but we found none. There were no language restrictions, nor did we exclude studies on the basis of the date of publication.

Types of participants: We included people of any age or gender with a primary clinical diagnosis of anorexia nervosa (AN), either or both purging or restricting subtypes, based on DSM (APA 2013) or ICD criteria (WHO 1992) or clinicians' judgement, and of any severity. We included those with chronic AN. We included those with psychiatric comorbidity, with the details of comorbidity documented. Participants may have received the intervention in any setting (including in-, day-or outpatient) and may have started in the trial at the beginning of treatment or part-way through (e.g. after discharge from hospital or some other indication/definition of stabilisation). We included those living in a family unit (of any nature, as described/defined by study authors), and those living outside of a family unit.

Types of interventions: Trials where the intervention describes inclusion of the family in some way and is labelled 'family therapy'. These interventions may have been delivered as a monotherapy or in conjunction with other interventions (including standard care, which may or may not be in the context of an inpatient admission). The main categories of family therapy approaches considered were:

• Structural family therapy • Systems (systemic) family therapy • Strategic family therapy • Family-based therapy and its variants (including short-term, long-term, and separated) and behavioural family systems therapy (these two therapies were grouped together, given the similarity of approach) • Other (including other approaches that use family involvement in therapy but are less specific about the theoretical underpinning of the therapy and its procedures).

Family therapy approaches were compared with:

• Standard care or treatment as usual • Biological interventions (for example, antidepressants, antipsychotics, mood stabilisers, anxiolytics, neutraceuticals, and other agents such as anti-glucocorticoids) • Educational interventions (for example, nutritional interventions and dietetics) • Psychological interventions (for example, cognitive behavioural therapy (CBT) and its derivatives, cognitive analytical therapy, interpersonal therapy, supportive therapy, psychodynamic therapy, play therapy, other) • Alternative or complementary interventions (for example, massage, exercise, light therapies).

Additionally, different types of family therapy approaches were compared to each other. The addition of a family therapy approach to other interventions (including standard care) was also compared to other interventions alone. We would also have included the following comparisons: Family therapy approaches versus biological interventions; and Family therapy approaches versus alternative/complementary interventions; however, we had neither the relevant trials nor useable data from these.

Types of outcome measures: Primary outcomes included:

• Remission (by DSM or ICD or trialist-defined cut-off on standardised scale measure for remission versus no remission) • All-cause mortality Secondary outcomes included:

• Family functioning as measured on standardised, validated and reliable measures, e.g. Family Environment Scale (Moos 1994) , Expressed Emotions (Vaughn 1976), FACES III (Olson 1985) • General functioning, measured by return to school or work, or by general mental health functioning measures, e.g. Global Assessment of Functioning (GAF) (APA 1994) • Dropout (by rates per group during treatment) • Eating disorder psychopathology (evidence of ongoing preoccupation with weight/shape/food/eating by eating-disorder symptom measures using any recognised validated eating disorders questionnaire or interview schedule, e.g. the Morgan-Russell Assessment Schedule (Morgan 1988 ), Eating Attitudes Test (EAT, Garner 1979), Eating Disorders Inventory (Garner 1983; Garner 1991). • Weight, including all representations of this measure such as kilograms, body mass index (BMI, kg/m2) and average body weight (ABW) calculations. We included this measure after the finalisation of our protocol, due to the lack of universal reporting on remission, and the differing definitions used for remission • Relapse (by DSM or ICD or trialist-defined criteria for relapse or hospitalisation)" (17) Example 2: In a review examining the effects of perioperative interventions for prevention of postoperative pulmonary complications, the authors report the types of studies, participants, interventions and outcomes that were eligible for inclusion in the review, indicating that studies were excluded if they measured outcomes that were neither patient centric nor clinically relevant.

"Population: We included RCTs of adult (age ≥18 years) patients undergoing non-cardiac surgery, excluding organ transplantation surgery (as findings in patients who need immunosuppression may not be generalisable to others).

Intervention: We considered all perioperative care interventions identified by the search if they were protocolised (therapies were systematically provided to patients according to pre-defined algorithm or plan) and were started and completed during the perioperative pathway (that is, during preoperative preparation for surgery, intraoperative care, or inpatient postoperative recovery). Examples of interventions that we did or did not deem perioperative in nature included long term preoperative drug treatment (not included, as not started and completed during the perioperative pathway) and perioperative physiotherapy interventions (included, as both started and completed during the perioperative pathway). We excluded studies in which the intervention was directly related to surgical technique.

Outcomes: To be included, a trial had to use a defined clinical outcome relating to postoperative pulmonary complications, such as "pneumonia" diagnosed according to the Centers for Disease Control and Prevention's definition. RCTs reporting solely physiological (for example, lung volumes and flow measurements) or biochemical (for example, lung inflammatory markers) outcomes are valuable but neither patient centric nor necessarily clinically relevant, and we therefore excluded them. We applied Example 3: In a review examining the effects of pharmacological or non-pharmacological interventions for adults with exacerbation of chronic obstructive pulmonary disease, the authors report inclusion and exclusion criteria for participants, interventions, comparators, outcomes, settings, study designs, and language of publication. More detail is provided in a table.

"The eligible studies had to meet all of the following criteria: 1) adult 18 years and older with exacerbations of chronic obstructive pulmonary disease (ECOPD); 2) received pharmacologic intervention or nonpharmacologic interventions; 3) compared with placebo, standard care, for antibiotics and systemic corticosteroids: different types of agents, different delivery modes, and different durations of treatments; 4) reported outcomes of interest; 5) conducted in outpatient, inpatients, and emergency department; 6) randomized controlled trials (RCTs); and 7) published in English. We excluded studies conducted in the intensive care unit, or chronic ventilator unit or respiratory care unit; studies of patients with exacerbation of chronic bronchitis if they did not have any evidence of airflow limitation on spirometry (at any time, including during a stable state); and studies of health service interventions (e.g. hospital in the home as alternative to hospitalization). We focused only on interventions during the initial acute phase of an exacerbation of COPD and not during the convalescence period. We did not restrict study location or sample size. The detailed inclusion and exclusion criteria are listed in Table 1 . All outcomes were final health outcomes except for the intermediate outcome, "forced expiratory volume in one second" (FEV1). FEV1 was included because it is a commonly used outcome in COPD studies and has been shown to be highly predictive of final health outcomes during ECOPD (including mortality, need for intubation, or hospital admission for COPD)." (19) Item 6. INFORMATION SOURCES: Specify all databases, registers, websites, organisations, reference lists and other sources searched or consulted to identify studies. Specify the date Example 1: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors list the electronic bibliographic databases (with dates of coverage for each), trials registers and websites searched. They also indicate that reference lists of all eligible study reports were reviewed and forward citation tracking of all eligible study reports was conducted:

"We conducted electronic searches for eligible studies within each of the following databases:

• Cochrane Central Register of Controlled Trials (CENTRAL) (1992 to 23rd July 2018);

• MEDLINE (including MEDLINE In-Process) (OvidSP) (1946 to 23rd July 2018);

• Embase (OvidSP) (1980 to 23rd July 2018); Examples when each source was last searched or consulted.

• PsycINFO (OvidSP) (1806 to 23rd July 2018);

• Applied Social Sciences Index and Abstracts (ASSIA) (ProQuest) (1987 to 24th July 2018);

• Science Citation Index Expanded (Web of Science) (1900 to 24th July 2018);

• Social Sciences Citation Index (Web of Science) (1956 to 24th July 2018); and

• Trials Register of Promoting Health Interventions (EPPI Centre) (2004 to 27th July 2018).

We conducted electronic searches of the following grey literature databases using search strategies adapted from the final MEDLINE search strategy, as described above:

• Conference Proceedings Citation Index -Science (Web of Science) (1990 to 24th July 2018);

• Conference Proceedings Citation Index -Social Science & Humanities (Web of Science) (1990 to 24th July 2018); and

• OpenGrey (1997 to 24th July 2018).

We searched trial registers (US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov/), the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch/), and the EU Clinical Trials Register (www.clinicaltrialsregister.eu/) to identify registered trials (up to 25th July 2018), and the websites of key organisations in the area of health and nutrition, including the following:

• UK Department of Health;

• Centers for Disease Control and Prevention (CDC), USA;

• World Health Organization (WHO);

• International Obesity Task Force; and

• EU Platform for Action on Diet, Physical Activity and Health.

In addition, we searched the reference lists of all eligible study reports and undertook forward citation tracking (using Google Scholar) to identify further eligible studies or study reports (up to 25th July 2018)" (20) Example 2: In a review examining the educational outcomes of children in contact with social care in England, the authors report the databases and other sources consulted, along with the date each source was searched:

"On 21 December 2017, MAJ searched 16 health, social care, education and legal databases, the names and date coverage of which are given in Table 2. […] We also carried out a 'snowball' search to identify additional studies by searching the reference lists of publications eligible for full-text review and using Google Scholar to identify and screen studies citing them.

[…] On 26 April 2018, we conducted a search of Google Scholar and additional supplementary searches for publications on websites of 10 relevant organisations (including government departments, charities, think-tanks and research institutes). Full details of these supplementary searches can be found in the Additional file 2. Finally, we updated the database search on 7 May 2019, and the snowball and additional searches on 10 May 2019 as detailed in Additional file 3. We used the same search method, except that we narrowed the searches to 2017 onwards." (21) Example 3: In a review examining the effects of environmental interventions to reduce the consumption of sugar-sweetened beverages, the authors report the databases, trials registers and websites searched (noting the date when each was searched) and indicate in an Appendix the reports for which forward and backward citation searching occurred:

"We performed searches in the following databases: In addition, we searched the websites of key organisations in the area of health, health promotion and nutrition, including the following:

• EU platform for action on diet, physical activity and health (ec.europa.eu/health/ph_determinants/life_style/nutrition/platform/database/dsp_search.cfm).

• U.S. Centers for Disease Control and Prevention (www.cdc.gov/nutrition/data-statistics/sugar-sweetened-beveragesintake.html).

• Rudd Center for Food Policy and Obesity (www.uconnruddcenter.org/publications).

• Harvard TH Chan School of Public Health Obesity Prevention Source (www.hsph.harvard.edu/obesity-prevention-source).

• World Obesity (www.worldobesity.org/what-we-do/policy-prevention).

We handsearched reference lists of included studies and previously published reviews, and contacted the corresponding author of included studies and previously published reviews as well as the members of the Review Advisory Group to identify additional studies. We also conducted a citing studies search with Scopus, i.e. we searched for studies that have cited included studies and previously published reviews. The studies used for these forward and backward citation searches are provided in Appendix 6… The following terms were searched individually using the CADTH site search engine.

Five known relevant studies were used to identify records within databases. Candidate search terms were identified by looking at words in the titles, abstracts and subject indexing of those records. A draft search strategy was developed using those terms and additional search terms were identified from the results of that strategy. Search terms were also identified and checked using the PubMed PubReMiner word frequency analysis tool. The MEDLINE strategy makes use of the Cochrane RCT filter reported in the Cochrane Handbook v5.2. The RCT filter used in the Embase search was developed by the authors. Animal studies are removed from MEDLINE by using a standard algorithm and from Embase using an approach that uses animal-related subject headings but excluding records that are also indexed with the Emtree heading 'Human'. As per the eligibility criteria the strategy was limited to English language studies.

The search strategy was validated by testing whether it could identify the five known relevant studies and also three further studies included in two systematic reviews identified as part of the strategy development process. All eight studies were identified by the search strategies in MEDLINE and Embase.

The strategy was developed by an information specialist and the final strategies were peer reviewed by an experienced information specialist within our team. Peer review involved proofreading the syntax and spelling and overall structure, but did not make use of the PRESS checklist.

Three additional approaches were used to identify further studies. The reference lists of the eligible trials were screened, the included studies of two recent systematic reviews were screened and a forward citation search of the eligible trials to identify publications that had cited them was conducted using Web of Science on 3 Dec 2013. (24) Example 2: In a review examining the effects of environmental interventions to reduce the consumption of sugar-sweetened beverages, the authors report the full search strategy for all databases searched. The following is an excerpt of how they reported the search strategy for trials registers:

"For clinicaltrials.gov we used the advanced search interface, and used the search syntax "(sugar-sweetened beverage) OR SSB OR soda" to run searches in the following fields:

The search yielded 646 records, which we collated and de-duplicated in MS Excel. After de-duplication, 282 unique records remained.

For the International Clinical Trials Registry Platform (ICTRP) we used the advanced search interface, and used the search syntax "sugar-sweetened beverage OR SSB OR soda" to run searches in the following fields (with synonyms, all recruitment status):

The search resulted in 171 hits.

Based on the search, we identified two completed studies eligible for inclusion in our review (Collins 2016 SNAP; Collins 2016 WIC), which we found through clinicaltrials.gov. Moreover, we identified 10 ongoing studies which we judged likely to meet our eligibility criteria upon completion. We present details of these in Characteristics of ongoing studies. We found eight of these through our search in clincialtrials.gov, and two through our search in the ICTRP. We ran trial register searches on 21 June 2018." (22) Item 8. SELECTION PROCESS: Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process.

Example 1: In a review examining the key components of shared decision-making models, the authors report piloting, double screening, and consensus methods for study selection:

"Three researchers (AP, HB-R, FG) independently reviewed titles and abstracts of the first 100 records and discussed inconsistencies until consensus was obtained. Then, in pairs, the researchers independently screened titles and abstracts of all articles retrieved. In case of disagreement, consensus on which articles to screen full-text was reached by discussion. If necessary, the third researcher was consulted to make the final decision. Next, two researchers (AP, HB-R) independently screened full-text articles for inclusion. Again, in case of disagreement, consensus was reached on inclusion or exclusion by discussion and if necessary, the third researcher (FG) was consulted." (25) Example 2: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors report piloting, single screening titles/abstracts, partial single screening of full-text, and linking reports to studies: "Citations identified from the literature searches and reference list checking were imported to EndNote and duplicates were removed. Three reviewers independently screened a sample of 109 citations to pre-test and refine coding guidance based on the inclusion criteria. Disagreements about eligibility were resolved through discussion. One reviewer (SB, JR, or SM) then each screened about a third of the remaining citations (grouped by year of publication) for inclusion in the review using the pre-tested coding guidance.

Full-text of all potentially eligible studies were retrieved. A sample of full-text studies was independently screened by two reviewers (SB and JR) until concordance was achieved (~15%; 37/228 of full-text studies screened). The remaining full-text studies were screened by one reviewer (SB or JR). All included studies, and those for which eligibility was uncertain, were screened by a second reviewer (JR or SB). Disagreements or uncertainty about eligibility were resolved through discussion, with advice from the review biostatisticians (JM, AF, or both) to confirm eligibility based on study design and analysis methods. Further information was sought from the authors of two studies (Piumatti 2018, Wardzala 2018) to clarify methods and interpretation of the analysis.

Citations that did not meet the inclusion criteria were excluded and the reason for exclusion was recorded at the full-text screening. Cohort names, author names, and study locations, dates and sample characteristics were used to identify multiple reports arising from the same study (deemed to be a 'cohort'). These reports were matched and data extracted only from the report that provided the most relevant analysis and complete information for the review. In most cases, the decision was based on the outcome reported (global function was prioritised)." (26) Example 3: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors report priority screening methods and how non-English language articles were handled:

"We imported titles and abstracts retrieved by the searches into EPPI Reviewer v.4.10.2 (ER4) systematic review software. Duplicate records were identified, manually reviewed and then removed using ER4's automatic de-duplication feature, with the similarity threshold initially set to 0.85 and then to 0.80. Due to the large number of records retrieved, we developed a semiautomated screening workflow in ER4 that used machine learning to assign title-abstract records for duplicate manual screening. This workflow was designed to maximise the recall of eligible studies while reducing the overall screening workload to match the resources available. We planned for duplicate manual screening to apply to up to a third of records retrieved.

In developing the workflow, we first screened a random sample of 500 title-abstract records to calculate inter-rater reliability and establish an initial estimate of the baseline inclusion rate (sample sized determined as per Shemilt 2014). Secondly, title-abstract records were prioritised for manual screening using active learning to distinguish between relevant and irrelevant records in conjunction with manual user input. This phase of the workflow stopped when each review author had completed 15 hours of duplicate screening without identifying any further potentially eligible studies. In practice, this equated to 1700 title-abstract records.

When we found non-English language articles, we used Google Translate in the first instance to determine potential eligibility. We intended that if an article appeared to be eligible, we would have the article translated by a native language speaker or professional translation service, however no articles needed translating." (20) Example 4: The following is a made-up example showing how to report use of machine learning and crowdsourcing in the study selection process:

Study selection followed a three-stage process that involved machine learning classifiers, crowdsourcing and manual screening. After removing duplicates, we applied Cochrane's RCT machine learning classifier (Thomas 2020) and removed from further consideration any record classified as highly unlikely to report a randomized trial (i.e. below the externally calibrated recall threshold of 99%). Records that remained were then screened by Cochrane Crowd (Noel-Storr 2020), a crowdsourcing platform that has consistently shown to be over 99% accurate. In Cochrane Crowd, every record is screened by at least two crowd members, with all disagreements resolved by two expert screeners. Records rejected by the crowd were removed from further consideration. Finally, records the crowd deemed likely to be reports of randomized trials were screened independently by two members of the review team in Covidence. [Example drafted by Steve McDonald and James Thomas, March 2020] Item 9. DATA COLLECTION PROCESS: Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and if applicable, details of automation tools used in the process.

Example 1: In a review examining the effects of pharmacological interventions for promoting smoking cessation during pregnancy, the authors report using a data collection form, the number of authors collecting data from studies and the process for resolving disagreements, and indicate that study authors were contacted if any data were unclear:

"We designed a data extraction form based on that used by Lumley 2009, which two review authors (RC and TC) used to extract data from eligible studies. Extracted data were compared, with any discrepancies being resolved through discussion. RC entered data into Review Manager 5 software (Review Manager 2014), double checking this for accuracy. When information regarding any of the above was unclear, we contacted authors of the reports to provide further details." (27) Example 2: In a review examining the effects of pharmacological or non-pharmacological interventions for adults with exacerbation of chronic obstructive pulmonary disease, the authors report using a standardized form that was pilot tested, and indicate that independent reviewers extracted data, which was checked by another reviewer:

"We developed a standardized data extraction form to extract study characteristics...The standardized form was pilot-tested by all study team members using five randomly selected studies. Reviewers worked independently to extract study details. A third reviewer reviewed data extraction, and resolve conflicts." (19) Example 3: In a review examining the association between smoking and sickness absence, the authors report using a standardized form that was pilot tested, that one reviewer extracted data which was checked by another reviewer, and that study authors were contacted:

"A data extraction sheet was developed, pilot tested on ten randomly selected included articles and then refined. After finalizing the data extraction sheet, one reviewer performed the initial data extraction for all included articles and a second reviewer checked all proceedings…. Corresponding authors were asked for additional information in cases where data provided in the published articles were insufficient." (28) Item 10a. DATA ITEMS: List and define all outcomes for which data were Example 1: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors list and define the outcomes for which data were sought (e.g. cognitive function), and specify the decision rules used to decide which results to collect when multiple were available in studies (e.g. when multiple measures, time points and unadjusted and adjusted analyses were available): Examples sought. Specify whether all results that were compatible with each outcome domain in each study were sought (e.g. for all measures, time points, analyses), and if not, the methods used to decide which results to collect.

"Eligible outcomes were broadly categorised as follows:

Cognitive function

• Global cognitive function • Domain-specific cognitive function (especially domains that reflect specific alcohol-related neuropathologies, such as psychomotor speed and working memory)

Clinical diagnoses of cognitive impairment

• Mild cognitive impairment (also referred to as mild neurocognitive disorders)

These conditions were 'characterised by a decline from a previously attained cognitive level'.

Major cognitive impairment (also referred to as major neurocognitive disorders; including dementia) was excluded.

We expected that definitions and diagnostic criteria would vary across studies, so we accepted a range of definitions as noted under 'Methods of outcome assessment' section. Table 1 provides an example of specific domains of cognitive function used in the diagnosis of mild and major cognitive impairment in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)).

Any measure of cognitive function was eligible for inclusion. The tests or diagnostic criteria used in each study should have had evidence of validity and reliability for the assessment of mild cognitive impairment, but studies were not excluded on this basis.

We anticipated that many different methods would be used to assess cognitive functioning across studies. These include the following.

Clinical diagnoses of Neuropsychological tests used to assess global cognitive function, for example the:

• Addenbrooke's Cognitive Examination-Revised (ACE-R) which "incorporates the MMSE and assesses attention, orientation, fluency, language, visuospatial function, and memory, yielding subscale scores for each domain"

• Montreal Cognitive Assessment (MOCA), which provides measures for specific cognitive abilities and may be more suitable for assessing mild cognitive impairment than the MMSE Neuropsychological tests for assessing domain-specific cognitive function, for example, tests of:

• Attention and processing speed, for example, the Trail making test (TMT-A) Results could be reported as an overall test score that provides a composite measure across multiple areas of cognitive ability (i.e. global cognitive function), sub-scales that provide a measure of domain-specific cognitive function or cognitive abilities (e.g. processing speed, memory), or both.

Studies with a minimum follow-up of 6 months were eligible, a time frame chosen to ensure that studies were designed to examine more persistent effects of alcohol consumption. This threshold was based on previous reviews examining the association between long-term cognitive impairment and alcohol consumption (e.g. Anstey 2009 specified 12 months) and guidance from the Cochrane Dementia and Cochrane Improvement Group, which suggests a minimum follow-up of 9 months for studies examining progression from mild cognitive impairment to dementia. We deliberately specified a shorter period to ensure studies reporting important long-term effects were not missed.

No restrictions were placed on the number of points at which the outcome was measured, but the length of follow-up and number of measurement points (including a baseline measure of cognition) was considered when interpreting study findings and in deciding which outcomes were similar enough to combine for synthesis. Since long-term cognitive impairment is characterised as a decline from a previous level of cognitive function and implies a persistent effect, studies with longer-term outcome follow up at multiple time points should provide the most direct evidence.

We anticipated that individual studies would report data for multiple cognitive outcomes. Specifically, a single study may report results:

• For multiple constructs related to cognitive function, for example, global cognitive function and cognitive ability on specific domains (e.g. memory, attention, problem-solving, language); • Using multiple methods or tools to measure the same or similar outcome, for example reporting measures of global cognitive function using both the Mini-Mental State Examination and the Montreal Cognitive Assessment; • At multiple time points, for example, at 1, 5, and 10 years.

Where multiple cognition outcomes were reported, we selected one outcome for inclusion in analyses and for reporting the main outcomes (e.g. for GRADEing), choosing the result that provided the most complete information for analysis. Where multiple results remained, we listed all available outcomes (without results) and asked our content expert to independently rank these based on relevance to the review question, and the validity and reliability of the measures used. Measures of global cognitive function were prioritised, followed by measures of memory, then executive function.

In the circumstance where results from multiple multivariable models were presented, we extracted associations from the most fully adjusted model, except in the case where an analysis adjusted for a possible intermediary along the causal pathway (i.e. post-baseline measures of prognostic factors (e.g. smoking, drug use, hypertension))" (26) Example 2: In a review examining the effects of shockwave therapy for rotator cuff disease, the authors list and define the outcomes for which data were sought (e.g. pain, function), and specify the decision rules used to decide which results to collect when multiple were available in studies (e.g. when multiple measures, time points and unadjusted and adjusted analyses were available):

"We presented the major outcomes below in the 'Summary of findings' tables.

• Participant-reported pain relief of 30% or greater.

• Mean pain score, or mean change in pain score on VAS or Numerical Rating Scale (NRS) or categorical rating scale (in that order of preference). • Disability or function.

• Composite endpoints measuring 'success' of treatment such as participants feeling no further symptoms.

• Quality of life.

• Number of participant withdrawals, for example, due to adverse events or intolerance to treatment.

• Number of participants experiencing any adverse event.

We extracted outcome measures assessing benefits of treatment (e.g. pain, function, success, quality of life) at the time points:

• up to six weeks; • greater than six weeks to three months (this was the primary time point); • greater than three months to up to six months; • greater than six months to 12 months; • greater than 12 months.

If data were available in a trial at multiple time points within each of the above periods (e.g. at four, five and six weeks), we only extracted data at the latest possible time point of each period. We extracted adverse events, calcification resolution and treatment success at the end of the trial.

For a particular systematic review outcome there may be a multiplicity of results available in the trial reports (e.g. multiple scales, time points and analyses). To prevent selective inclusion of data based on the results, we used the following a priori defined decision rules to select data from trials.

• Where trialists reported both final values and change from baseline values for the same outcome, we extracted final values. • Where trialists reported both unadjusted and adjusted values for the same outcome, we extracted unadjusted values.

• Where trialists reported data analysed based on the intention-to-treat (ITT) sample and another sample (e.g. perprotocol, as-treated), we extracted ITT-analysed data.

Where trials did not include a measure of overall pain but included one or more other measures of pain, for the purpose of combining data for the primary analysis of overall pain, we combined overall pain with other types of pain in the following hierarchy:

• overall or unspecified pain; • pain at rest; • pain with activity;

• daytime pain;

• night-time pain.

Where trials included more than one measure of disability or function, we extracted data from the one function scale that was highest on the following a priori defined list:

• Shoulder Pain And Disability Index (SPADI);

• Shoulder Disability Questionnaire (SDQ); • Constant score;

• Disabilities of the Arm, Shoulder and Hand (DASH);

• Health Assessment Questionnaire (HAQ);

• any other function scale.

Where trials included more than one measure of treatment success, we extracted data from the one function scale that was highest on the following a priori defined list:

• participant-defined measures of success, such as asking participants if treatment was successful;

• trialist-defined measures of success, such as a 30-point increase on the Constant Score." (29)

Example 3: In a review examining the effects of strategies to improve the implementation of healthy eating, physical activity and obesity prevention policies, practices or programmes within childcare services, the authors report the following decision rules to select results when multiple were available in study reports (e.g. multiple time points, multiple outcome measures, change scores versus final values):

"We reported measures of treatment effect from included studies that were adjusted for potential confounding variables over reported estimates that were not adjusted for potential confounding. Where studies used multiple follow-up periods, we used data from the final (most recent) study follow-up. We included data from the primary implementation outcome in meta-analyses. In instances where the authors of included studies did not identify a primary implementation outcome, we used the outcome on which the study sample size and power calculation was based. In its absence, for studies using score-based measures of implementation, and reporting total and subscale scores, we assumed the total score represented the primary implementation outcome. Otherwise, we attempted to calculate a relative effect size for each implementation outcome measure, rank these based on effect size and used the measure reporting the median effect size to include in any pooled analysis. We calculated the effect size by subtracting the change from baseline of the primary implementation outcome for the control or comparison group from the change from baseline in the experimental or intervention group. If data to enable calculation of the change from baseline were unavailable, we used the differences between groups post-intervention. For score-based measures, we calculated a standardised ('d') measure of effect size for each outcome to rank the effect size. Where there were an even number of implementation outcomes, one of the two measures at the median was randomly selected and used for inclusion in metaanalysis." (30)

Example 4: In a review examining the effects of, the authors report how the outcome domains were selected and decision rules used to select results from among multiple measurement instruments:

"Twelve dementia care partners (nurses, allied health professionals, physicians, and a caregiver) selected our study outcomes (18) by independently ranking a group of commonly reported neuropsychiatric symptoms (for example, aggression, agitation, and sleep disturbances) in descending order of importance. The care partners selected change in aggression as our main outcome and change in agitation as our secondary outcome… For all of our NMAs, we preferentially abstracted a scale (e.g. Neuropsychiatry inventory (NPI) -agitation subscale, CMAI) reported by study authors before abstracting an individual aggressive or agitated behaviour (e.g. kicking, biting, screaming). Only in the case of our NMA for the outcome of overall agitation and aggression were there cases where study authors reported more than one scale for the same outcome (e.g. NPI-agitation subscale and CMAI). The CMAI was the most commonly reported scale for the outcome of overall agitation and aggression. The NPI-agitation subscale was the second most common scale for the outcome of overall agitation and aggression. Other scales were reported much less frequently. Therefore, the CMAI was always preferentially abstracted, where reported. If the CMAI was not reported, but the NPIagitation subscale was reported, then it was preferentially abstracted before any other scales used to report the outcome of overall agitation and aggression." (31) Item 10b. DATA ITEMS: List and define all other variables for which data were sought (e.g. participant and intervention characteristics, funding sources).

Describe any assumptions made about any missing or unclear information.

Example 1: In a review examining the long-term effects of alcohol consumption on cognitive function, the authors list and define all variables for which data were sought, including characteristics of the study design, exposure and comparator, and participants:

"We extracted information relating to the characteristics of included studies and results as follows.

• Study references (multiple publications arising from the same study were matched to an index reference, which is the study from which results were selected for analysis or summary) • Study or cohort name, location, and commencement date • Study design (categorised as 'prospective cohort study', 'nested case-control study', or 'other' using the checklist of study design features developed by Reeves and colleagues) • Funding sources and funder involvement in the study 2. Characteristics of the exposure and comparator groups

• Levels of alcohol consumption as defined in the study, including details of how consumption was measured and categorised, and information required to convert data for reporting and analysis o Qualitative descriptors of each category, if used (e.g. never or non-drinker, abstainer, former drinker, low/moderate/heavy consumption) o Upper and lower boundaries of each category (e.g. 1 to 29 g per day; 5.1 to 10 units per week based on a standard drink in the UK) o Group used as referent category (comparator) in analyses and how defined o Units of measurement (e.g. standard units of alcohol per day and definition of unit) o Method of collecting alcohol consumption data (e.g. retrospective survey involving recall of alcohol consumption over different periods of life; intake diaries to measure current alcohol consumption); time points at which exposure data were collected o Sample size for each exposure group at each measurement point and included in analysis; number lost to follow up [these data were used in the analysis and risk of bias assessment] o Any additional parameters used to derive each category or exposure measure (e.g. alcohol consumption at each drinking occasion; frequency of drinking; recall period) • Patterns of exposure o Any additional data not listed above that characterises and quantifies different patterns of alcohol exposure (e.g. consumption on heaviest drinking day; diagnosis of an alcohol-use disorder such as dependence or harmful drinking, and the method of assessment; definition of other frequency-based categories used to characterise patterns of drinking such as occasional drinking or infrequent consumption) o Duration/length of exposure period at study baseline and follow-up (directly reported or data that can be used to calculate) o Age at commencement of drinking (initial exposure)

• Age at baseline and follow up, sex, ethnicity, co-morbidities, socio-economic status (including education), use of licit or illicit drugs, family history of alcohol dependence • Other characteristics of importance within the context of each study • Eligibility criteria used in the study" (26) Example 2: In this review examining the effects of pharmacological, psychological, and non-invasive brain stimulation interventions for treating depression after stroke, the authors list and define all variables for which data were sought, including characteristics of the report, participants, study design and intervention: Examples "We collected data on:

• the report: author, year, and source of publication;

• the study: sample characteristics, social demography, and definition and criteria used for depression; • the participants: stroke sequence (first ever vs recurrent), social situation, time elapsed since stroke onset, history of psychiatric illness, current neurological status, current treatment for depression, and history of coronary artery disease; • the research design and features: sampling mechanism, treatment assignment mechanism, adherence, non-response, and length of follow up; • the intervention: type, duration, dose, timing, and mode of delivery." (32)

"When trial authors reported child grade rather than age, we assumed the following age distributions: kindergarten, four to six years; first grade, five to seven years; second grade, six to eight years, third grade, seven to nine; fourth grade, 8 to 10; fifth grade 9 to 11; sixth grade, 10 to 12; seventh grade, 11 to 13; eighth grade, 12 to 14; ninth grade, 13 to 15; tenth grade, 14 to 16; eleventh grade, 15 to 17; and twelfth grade, 16 to 18." (33) Item 11. STUDY RISK OF BIAS ASSESSMENT: Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and if applicable, details of Example 1: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors specify the risk of bias tool used, the domains of bias addressed by the tool, how many reviewers assessment each study and how an overall judgement was reached:

"We assessed risk of bias in the included studies using the revised Cochrane 'Risk of bias' tool for randomised trials (RoB 2.0) (Higgins 2016a), employing the additional guidance for cluster-randomised and cross-over trials (Eldridge 2016; Higgins 2016b). RoB 2.0 addresses five specific domains: (1) bias arising from the randomisation process; (2) bias due to deviations from intended interventions; (3) bias due to missing outcome data; (4) bias in measurement of the outcome; and (5) bias in selection of the reported result. Two review authors independently applied the tool to each included study, and recorded supporting information and justifications for judgements of risk of bias for each domain (low; high; some concerns). Any discrepancies in judgements of risk of bias or justifications for judgements were resolved by discussion to reach consensus between the two review authors, with a third review author acting as an arbiter if necessary. Following guidance given for RoB 2.0 (Section 1.3.4) (Higgins 2016a), we derived an overall summary 'Risk of bias' judgement (low; some concerns; high) for each specific outcome, whereby the overall RoB for each study was determined by the highest RoB level in any of the domains that were assessed." (20) automation tools used in the process.

Example 2: In a review examining the effects of red light camera interventions for reducing traffic violations and traffic crashes, the authors report the risk of bias domains they assessed, how each were rated, and how many reviewers performed assessments:

"The expanded risk of bias analysis was based on six dimensions that focused on the design of the study, the analysis of the data, and the contents of the study report. These six dimensions, which conform to the requirements set forth by the UK Economic and Social Research Council (ESRC), are:

1. Selection and matching of intervention and control areas 2. Blinding of data collection and analysis 3. Pre-and postintervention data collection periods 4. Reporting of results 5. Control of confounders 6. Control of other potential sources of bias See Appendix G for a list of the 17 specific criteria included in each dimension. Each individual criterion statement was scored on whether it was True, False, or Unclear and these were used to assess each study on whether it presented a high, low, or unclear risk of bias across the six domains.

Risk of bias assessment was performed independently by three review authors (E.G.C., S.K., and C.P.). For the studies identified in the previous review, the same three review authors independently assessed the risk of bias of the included studies. Any discrepancies were resolved by deferment to further review authors (R.S. and P.E.). All disagreements were resolved by consensus." (34) Item 12. EFFECT MEASURES: Specify for each outcome the effect measure(s) (e.g. risk ratio, mean difference) used in the synthesis or presentation of results.

Example 1: In a review examining the effects of psychological interventions to foster resilience in healthcare students, the authors report planning to use the risk ratio for dichotomous outcomes and the standardised mean difference for continuous outcomes:

"We planned to analyse dichotomous outcomes by calculating the risk ratio (RR) of a successful outcome (i.e. improvement in relevant variables) for each trial…Because the included resilience-training studies used different measurement scales to assess resilience and related constructs, we used standardised mean difference (SMD) effect sizes (Cohen's d) and their 95% confidence intervals (CIs) for continuous data in pair-wise meta-analyses." (35) Examples Example 2: In a review comparing the effects of pars plana vitrectomy combined with scleral buckle with pars plana vitrectomy alone for giant retinal tear, the authors report using the risk ratio in the synthesis or presentation of results for dichotomous outcomes:

"We estimated the risk ratio (RR) and its 95% confidence interval (CI) after surgery (pars plana vitrectomy combined with scleral buckle vs pars plana vitrectomy alone) for the following dichotomous outcomes with information obtained from the included studies.

• Primary surgical success.

• Second surgery for retinal reattachment.

• Development of adverse events such as retinal detachment recurrence, elevation of intraocular pressure above 21 mmHg, choroidal detachment, cystoid macular edema, macular pucker, proliferative vitreoretinopathy, progression of cataract in initially phakic eyes, and any other adverse events reported by included trials at any time from day one up to the last reported follow-up visit after surgery." (36) Example 3: In a review examining the effects of metformin for endometrial hyperplasia, the authors report using the hazard ratio or odds ratio in the synthesis or presentation of time-to-event (survival) outcomes"

"For survival outcomes (e.g. regression of endometrial hyperplasia, recurrence of endometrial hyperplasia, progression to endometrial carcinoma), we planned to calculate hazard ratios if data were available. Otherwise, we would calculate rates at a set time point, using the Mantel-Haenszel odds ratio (OR) and the numbers of events in control and intervention groups." (37) Item 13a. SYNTHESIS METHODS: Describe the processes used to decide which studies were eligible for each synthesis (e.g. tabulating the study intervention characteristics and comparing against the planned groups Example 1: In a review examining the effects of interventions to reduce homelessness, the authors report categorising the interventions delivered in the included studies according to four dimensions:

"Given the complexity of the interventions being investigated, we attempted to categorize the included interventions along four dimensions: (1) was housing provided to the participants as part of the intervention; (2) to what degree was the tenants' residence in the provided housing dependent on, for example, sobriety, treatment attendance, etc.; (3) if housing was provided, was it segregated from the larger community, or scattered around the city; and (4) if case management services were provided as part of the intervention, to what degree of intensity. We created categories of interventions based on the above dimensions:

1. Case management only 2. Abstinence-contingent housing 3. Non-abstinence-contingent housing 4. Housing vouchers

for each synthesis (item #5)).

Some of the interventions had multiple components (e.g. abstinence-contingent housing with case management). These interventions were categorized according to the main component (the component that the primary authors emphasized). They were also placed in separate analyses. We then organized the studies according to which comparison intervention was used (any of the above interventions, or usual services)." (38) Item 13b. SYNTHESIS METHODS: Describe any methods required to prepare the data for presentation or synthesis, such as handling of missing summary statistics, or data conversions.

Example 1: In a review examining the effects of interventions to reduce homelessness, the authors report methods used to calculate standard deviations from other statistics reported:

"In cases where the means, number of participants and test statistics for t-test were reported, but not the standard deviations, and there was the opportunity to include results in a meta-analysis, we calculated standard deviations, assuming the same standard deviation for each of the two groups (intervention and control)" (38) .

"Where we were interested in an intervention and it was compared to two or more comparison interventions that were both considered to be within the realm of "usual services", we combined the two comparison arms into one comparison group and compared the means of the combined control groups to the intervention for a given outcome (for Morse 1992). In one study we have combined two intervention arms that both employed slightly differing versions of an intervention (assertive community treatment) into one intervention group and compared that to the usual services comparison condition (for Morse 1997)" (38) .

Example 3: In a review examining the effects of food fortification with multiple micronutrients on health outcomes in the general population, the authors report estimating and imputing intra-cluster correlation coefficients for cluster-randomised trials:

"We used cluster-adjusted estimates from cluster randomised controlled trials (c-RCTs) where available. If the studies had not adjusted for clustering, we attempted to adjust their standard errors using the methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019), using an estimate of the intra-cluster correlation coefficient (ICC) derived from the trial. If the trial did not report the cluster-adjusted estimated or the ICC, we imputed an ICC from a similar study included in the review, adjusting if the nature or size of the clusters was different (e.g. households compared to classrooms). We assessed any imputed ICCs using sensitivity analysis." (39)

Example 4: In a review examining the effects of manually-generated reminders delivered on paper on professional practice and patient outcomes, the authors report standardising the direction of effects across studies:

"Some studies targeted quality problems that involve 'underuse', so that improvements in quality correspond to increases in the percentage of patients who receive a target process of care (for example, increasing the percentage of patients who receive the influenza vaccine). However, other studies targeted 'overuse', so that improvements correspond to reductions in the percentage of patients receiving inappropriate or unnecessary processes of care (for example, reducing the percentage of patients who receive antibiotics for viral upper respiratory tract infections). In order to standardise the direction of effects, we defined all process adherence outcomes so that higher values represented an improvement. For example, data from a study aimed at reducing the percentage of patients receiving inappropriate medications would be captured as the complementary percentage of patients who did not receive inappropriate medications. Increasing this percentage of patients for whom providers did not prescribe the medications would thus represent an improvement. Each outcome can then be interpreted as compliance with desired practice." (40) Item 13c. SYNTHESIS METHODS: Describe any methods used to tabulate or visually display results of individual studies and syntheses.

Example 1: In a review examining the effects of interventions to reduce ambient particulate matter air pollution on health, the authors report their chosen plot, along with a rationale:

"… in line with the review protocol we synthesized evidence narratively as well as graphically using harvest plots. Harvest plots have been shown to be an effective, clear and transparent way to summarize evidence of effectiveness for complex interventions (Ogilvie 2008; Turley 2013). We created eight separate harvest plots, one for health outcomes and one for air quality outcomes for each intervention category." (41) Example 2: In a review examining the effects of transfers and vouchers on the use and quality of maternity care services, the authors report using albatross plots to present results of individual studies, along with a rationale:

"Meta-analyses could not be undertaken due to the heterogeneity of interventions, settings, study designs and outcome measures. Albatross plots were created to provide a graphical overview of the data for interventions with more than five data points for an outcome. Albatross plots are a scatter plot of p-values against the total number of individuals in each study. Small pvalues from negative associations appear at the left of the plot, small p-values from positive associations at the right, and studies with null results towards the middle. The plot allows p-values to be interpreted in the context of the study sample size; effect contours show a standardised effect size (expressed as relative risk-RR) for a given p-value and study size, providing an indication of the overall magnitude of any association. We estimated an overall magnitude of association from these contours, but this should be interpreted cautiously." (42) Example 3: In a review examining the effects of altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption, the authors describe using 'Summary of findings' tables to present the synthesis results:

"We developed 'Summary of findings' tables using GRADEpro GDT. These tables comprise summaries of the estimated intervention effect and the number of participants and studies for each primary outcome, and include justifications underpinning GRADE assessments. We planned to present separate summary effect sizes and certainty of evidence ratings for food, alcohol, and tobacco products, and for availability and proximity interventions within each of these product types, but in practice no eligible alcohol or tobacco studies were identified. Results of random-effects meta-analyses are presented as SMDs with 95% CIs.

To facilitate interpretation of these estimated effect sizes, we re-expressed them employing selected familiar metrics of selection or consumption using observational data from a population-representative sample." (20) Item 13d. SYNTHESIS METHODS: Describe any methods used to synthesize results and provide a rationale for the choice(s). If meta-analysis was performed, describe the model(s), method(s) to identify the presence and extent of statistical heterogeneity, and software package(s) used.

Example 1: In a review examining the effects of functional appliance treatment on the temporomandibular joint, the authors report their chosen meta-analysis model, along with a rationale, the between-study variance estimator used, methods used to quantify statistical heterogeneity, and the software packages used:

"As the effects of functional appliance treatment were deemed to be highly variable according to patient age, sex, individual maturation of the maxillofacial structures, and appliance characteristics, a random-effects model was chosen to calculate the average distribution of treatment effects that can be expected. A restricted maximum likelihood random-effects variance estimator was used instead of the older DerSimonian-Laird one, following recent guidance. Random effects 95% predictions were to be calculated for meta-analyses with at least three studies to aid in their interpretation by quantifying expected treatment effects in a future clinical setting. The extent and impact of between-study heterogeneity were assessed by inspecting the forest plots and by calculating the tau-squared and the I-squared statistics, respectively. The 95% CIs (uncertainty intervals) around tausquared and the I-squared were calculated to judge our confidence about these metrics. We arbitrarily adopted the I-squared thresholds of > 75% to be considered as signs of considerable heterogeneity, but we also judged the evidence for this heterogeneity (through the uncertainty intervals) and the localization on the forest plot…All analyses were run in Stata SE 14.0 (StataCorp, College Station, TX) by one author." (43)

"Diverse interventions, settings, and participants characterise the field of smoking cessation. We judged it likely that the included studies would show heterogeneity in treatment effect (the observed intervention effects being more different from each other than one would expect because of random error alone). As such, the assumptions of a fixed-effect meta-analysis (that all studies in the meta-analysis share a common overall effect size and that all factors that could influence the effect size are the same across studies), were unlikely to hold…In random-effects meta-analysis models (restricted maximum-likelihood method), we calculated pooled risk ratios (RRs) with 95% confidence intervals (CIs) for both socioeconomic-position-tailored and nonsocioeconomic-position-tailored interventions as the weighted average of each individual study's estimated intervention effect. All computations were done on a log scale with the log RR, its variance, and standard error (SE), before exponentiating the summary effect for interpretation. We explored heterogeneity by observation of forest plots and use of the χ 2 test to show whether observed differences in results were compatible with chance alone. We calculated I 2 statistics to examine the level of inconsistency across study findings…Analyses were done in the RStudio development environment version 1.1.463 using R version 3.5.2 and the metafor package." (44)

"We based our primary analyses upon consideration of dichotomous process adherence measures (for example, the proportion of patients managed according to evidence-based recommendations). In order to provide a quantitative assessment of the effects associated with reminders without resorting to numerous assumptions or conveying a misleading degree of confidence in the results, we used the median improvement in dichotomous process adherence measures across studies…With each study represented by a single median outcome, we calculated the median effect size and interquartile range across all included studies for that comparison." (40)

"The statistical approach used, therefore, was the combination of the significance levels (P values). The rationale for this choice is that all the trials explored the same broad question, i.e. "is homeopathic treatment efficacious?", even if, for individual trials, the question asked expressed more specific terms and focused on a given treatment of a particular disease. Thus, unlike in the conventional meta-analytical methods, the method used does not involve pooling the numerical estimates of treatment effect sizes obtained, in our case, in very different situations. Using this approach, the null hypothesis tested is that the effect of interest (in this case, the efficacy of homeopathic treatment) is not present in any of the trials considered. If the null hypothesis is rejected, we can conclude that in at least one trial there is a non-null effect…Thus, we used seven methods: the sum of logs, the sum of Z, the weighted sum of Z, the sum of t, the mean Z, the mean P, the count test and the logit procedure. We present the results obtained with the method that gave the most conservative (least optimistic) results. A two-sided approach was adopted because of the format of the tested hypothesis (i.e. the effect could be either "negative" or "positive")." (45) Item 13e. SYNTHESIS METHODS: Describe any methods used to explore possible causes of heterogeneity among study results (e.g. subgroup analysis, meta-regression).

disadvantaged socioeconomic position, the authors report conducting meta-regression to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined, and noted that these were pre-specified:

"Given a sufficient number of trials, we used unadjusted and adjusted mixed-effects meta-regression analyses to assess whether variation among studies in smoking cessation effect size was moderated by tailoring of the intervention for disadvantaged groups. The resulting regression coefficient indicates how the outcome variable (log risk ratio (RR) for smoking cessation) changes when interventions take a socioeconomic-position-tailored versus non-socioeconomic-tailored approach. A statistically significant (p<0·05) coefficient indicates that there is a linear association between the effect estimate for smoking cessation and the explanatory variable. More moderators (study-level variables) can be included in the model, which might account for part of the heterogeneity in the true effects. We pre-planned an adjusted model to include important study covariates related to the intensity and delivery of the intervention (number of sessions delivered (above median vs below median), whether interventions involved a trained smoking cessation specialist (yes vs no), and use of pharmacotherapy in the intervention group (yes vs no). These covariates were included a priori as potential confounders given that programmes tailored to socioeconomic position might include more intervention sessions or components or be delivered by different professionals with varying experience. The regression coefficient estimates how the intervention effect in the socioeconomic-position-tailored subgroup differs from the reference group of non-socioeconomic-position-tailored interventions." (44) Example 2: In a review examining the effects of intensive LDL cholesterol-lowering treatment for the prevention of major vascular events, the authors report conducting subgroup analyses and meta-regression to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined:

"First, we assessed the association between absolute LDL cholesterol reduction (calculated as a difference in baseline minus lastmeasured achieved LDL cholesterol between the treatment groups) and the relative risk (RR) of major vascular events for statins, ezetimibe, and PCSK9 inhibitors. Second, we did analyses to establish the effect of a reduction of 1 mmol/L in LDL cholesterol on the RR of major vascular events, stratified into four groups with mean baseline LDL cholesterol concentrations of 2.60 mmol/L or less, 2.61-3.40 mmol/L, 3.41-4.10 mmol/L, and more than 4.10 mmol/L (the recommended LDL cholesterol thresholds for treatment initiation). Subgroups of trials that reported outcomes of patients with baseline LDL cholesterol less than 2.07 mmol/L (80 mg/dL) were also analysed to most closely approximate a mean baseline LDL cholesterol of 1.80 mmol/L (70 mg/dL; the LDL cholesterol threshold for treatment in high-risk patients in the ACC/AHA and CCS guidelines). Subgroups of trials that reported outcomes of patients by sex, presence or absence of diabetes, presence or absence of chronic kidney disease (defined as estimate glomerular filtration rate <60 mL/min per 1.73 m), and presence or absence of heart failure were also meta-analysed. Meta-regression analyses were done with the following covariates: baseline LDL cholesterol, extent of LDL cholesterol reduction, mean age, 10-year risk of atherosclerotic cardiovascular disease, and median duration of follow-up. Non-standardised and standardised analyses were done for each 1 mmol/L reduction in LDL cholesterol. We used a multivariable model including the same covariates and drug class... Heterogeneity of RRs were assessed using I 2 and Cochran's Q statistic was used to test for differences between subgroups." (46) Example 3: In a review examining the effects of psychological interventions for common mental disorders in women experiencing intimate partner violence in low-income and middle-income countries, the authors report conducting post-hoc subgroup analyses to explore possible causes of heterogeneity among study results, indicating the potential effect modifiers considered and how they were defined:

"We anticipated that, in settings where intimate partner violence was sufficiently prevalent to be measured, female therapists might have been considered more culturally acceptable to female participants. We did post-hoc subgroup analyses to compare differences in standardised mean differences (dSMDs) of trauma-focused interventions versus generic psychological interventions, female-delivered interventions versus mixed gender-delivered interventions, novel treatments for low and middle income countries (LMICs) versus those with an established evidence base in high-income countries, and those asking only about recent (within the past 12 months) intimate partner violence versus lifetime intimate partner violence." (5) Item 13f. SYNTHESIS METHODS: Describe any sensitivity analyses conducted to assess robustness of the synthesized results.

"We conducted sensitivity meta-analyses restricted to trials with recent publication (2000 or later); overall low risk of bias (low risk of bias in all seven criteria); and enrolment of generally healthy women (rather than those with a specific clinical diagnosis).

To incorporate trials with zero events in both intervention and control arms (which are automatically dropped from analyses of pooled relative risks), we also did sensitivity analyses for dichotomous outcomes in which we added a continuity correction of 0.5 to zero cells." (47)

"At the request of the funders, we did an additional sensitivity analysis with respect to compliance. Our protocol stated an intention to subgroup by "recent publications"; we changed this to run a sensitivity analysis including publications before 2010 combined with all publications from 2010 onwards with a trials registry entry (even if published retrospectively). As our funders were particularly interested in effects within trials of at least 12 months, we also ran an analysis limiting to trials of at least 52 weeks' duration." (48) Item 14. REPORTING BIAS ASSESSMENT: Describe any methods used to assess risk of bias due to missing results in a synthesis (arising from reporting biases).

Example 1: In a review examining the effects of surgery for rotator cuff tears, the authors report using funnel plots to assess small-study effects, noting that publication bias is one of several reasons for any asymmetry detected. They also report comparing outcomes specified within and across reports of studies to assess outcome reporting bias:

"To assess small-study effects, we planned to generate funnel plots for meta-analyses including at least 10 trials of varying size. If asymmetry in the funnel plot was detected, we planned to review the characteristics of the trials to assess whether the asymmetry was likely due to publication bias or other factors such as methodological or clinical heterogeneity of the trials. To assess outcome reporting bias, we compared the outcomes specified in trial protocols with the outcomes reported in the corresponding trial publications; if trial protocols were unavailable, we compared the outcomes reported in the methods and results sections of the trial publications." (49)

"To assess selective reporting bias, we compared the measurements and outcomes planned by the original investigators during the trial with those reported within the published paper by checking the trial protocols (when available) against the information in the final publication. Where published protocols were not available and the trial authors did not provide an unpublished protocol upon request, we compared the methods and the results sections of the published papers. We also used our knowledge of the clinical area to identify where trial investigators had not reported commonly used outcome measures." (50) Example 3: In a review examining the association between quality of dietary fat and genetic risk of type 2 diabetes, the authors report using contour-enhanced funnel plots and a statistical test for funnel plot asymmetry to assess small-study effects, noting that publication bias is one of several reasons for any asymmetry detected:

"Small study effects owing to potential publication bias, poor methodological quality in smaller studies, artefactual associations, true heterogeneity, or chance were evaluated by using contour-enhanced funnel plots alongside visual examination and statistical tests for asymmetry (Debray's test)." (51)

"Small-study effects (e.g. publication bias) was checked by contour-enhanced funnel plots and adjusted for by obtaining a precision-effect estimate with standard error. Although precision-effect estimate with standard error tends to slightly underestimate the true association if the observed effects were generated by questionable research practices, simulations suggest that it provides the most precise estimates in the presence of residual effect heterogeneity and small-study effects." (52) Item 15. CERTAINTY ASSESSMENT: Describe any methods used to assess certainty (or confidence) in the body of evidence for an outcome.

Example 1: In a review examining the effects of Tai Chi for rheumatoid arthritis, the authors report using the GRADE approach for assessing certainty in the body of evidence, stating how many reviewers performed assessments, the domains considered, and software used to perform assessments:

"Two people (AM, JS) independently assessed the certainty of the evidence. We used the five GRADE considerations (study limitations, consistency of effect, imprecision, indirectness, and publication bias) to assess the certainty of the body of evidence as it related to the studies that contributed data to the meta-analyses for the prespecified outcomes. We assessed the certainty of evidence as high, moderate, low, or very low. We considered the following criteria for upgrading the certainty of evidence, if appropriate: large effect, dose-response gradient, and plausible confounding effect. We used the methods and recommendations described in sections 8.5 and 8.7, and chapters 11 and 12, of the Cochrane Handbook for Systematic Reviews of Interventions. We used GRADEpro GDT software to prepare the 'Summary of findings' tables (GRADEpro GDT 2015). We justified all decisions to down-or up-grade the certainty of studies using footnotes, and we provided comments to aid the reader's understanding of the results where necessary." (53) Example 2: In a review examining the effects of implantable cardiac defibrillators for people with non-ischaemic cardiomyopathy, the authors report using standardised language to convey their certainty in the body of evidence for an outcome:

"We reported our findings using the language suggested by Glenton and colleagues, focusing on the size of the effect and its clinical significance in relation to the certainty of the evidence on which the result is based (including the precision of the effect) (see Appendix 3) ." (54)

"We classified the overall strength of evidence (SOE) for each outcome as high, moderate, low, or insufficient by using an established method that considers study quality, consistency of findings, directness of the comparisons, precision, and applicability (Berkman et al.). For findings with SOE greater than insufficient, we classified the direction of effect as "evidence of benefit," "no benefit" (that is, no difference from placebo or mixed findings), or "favors placebo."" (55) Item 16a. STUDY SELECTION: Describe the results of the search and selection process, from the number of records identified in the search to the number of studies included in the review, ideally using a flow diagram.

Example 1: In a review examining the effects of text message reminders for improving sun protection habits, the authors report results of the search and selection process in text and in a flow diagram:

"We found 1,333 records in databases searching. After duplicates removal, we screened 1,092 records, from which we reviewed 34 full-text documents, and finally included six papers [cited] . Later, we searched documents that cited any of the initially included studies as well as the references of the initially included studies. However, no extra articles that fulfilled inclusion criteria were found in these searches (Fig 1) ." (56)

"Our search of the Cochrane Kidney and Transplant specialised register identified 869 records. We identified an additional 78 records using other sources (reference lists of review articles, relevant studies, and clinical practice guidelines) -therefore a total of 947 records (n=176 studies) were identified. We excluded 61 studies (n=252 records), either due to a population other than heart failure (n=38 studies), a non-pharmacological intervention (n=5), follow-up shorter than three months (n=16), or a study design other than a RCT (n=2) (see Characteristics of excluded studies). Overall, 115 studies were eligible. Of these, three are ongoing and awaiting publication of primary data (PARAGON-HF 2018; RELAXAHF-2 2017; TMAC 2007) and will be included in a future update of this review. As a result, 112 studies were included in this review ( Figure 1 )." (57)

"A total of 3191 articles resulted from searching the four databases during the initial search (21 March 2018). After authors removed duplicates, 2822 articles remained for title and abstract review, including 14 articles identified through manual search of references. Two authors (CM and HMB) reviewed the titles and abstracts of all 2822 articles. A third author (SK) resolved any discrepancies. Following this step, two authors (CM and HMB) reviewed the full text of all 114 articles eligible for full-text screening. A third author (SK) resolved any discrepancies. Eighty articles were excluded for the following reasons: they did not have data on the specified outcomes (n=27), used qualitative methodologies (n=27), focused on a tobacco product other than ecigarettes (n=12), were only focused on menthol flavour (n=2), was a duplicate (n=1) or were not peer-reviewed, did not include original data, did not include full-text or included only a conference abstract (n=11). Articles that addressed e-cigarettes from the original systematic review (n=17) were then added to the 34 articles identified from this current review, combining for a total of 51 articles included in the final analysis. The study selection processes, which approximate but do not exactly follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, are illustrated in figure 1 ." (58) Example 4: In a review examining the effects of gabapentin for neuropathic pain, the authors present a flow diagram delineating the number records according to the information source from which they originated (i.e. databases, trials registers, or other "hidden" sources):

See page 52 of Supplement B of the systematic review by Mayo-Wilson et al. (59) , available at https://ars.elscdn.com/content/image/1-s2.0-S0895435617307217-mmc1.pdf Item 16b. STUDY SELECTION: Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.

"We excluded seven studies from our review (Bosiers 2015; ConSeQuent; DEBATE-ISR; EXCITE ISR; NCT00481780; NCT02832024; RELINE), and we listed reasons for exclusion in the Characteristics of excluded studies tables. We excluded studies because they compared stenting in Bosiers 2015 and RELINE, laser atherectomy in EXCITE ISR, or cutting balloon angioplasty in NCT00481780 versus uncoated balloon angioplasty for in-stent restenosis. The ConSeQuent trial compared DEB versus uncoated balloon angioplasty for native vessel restenosis rather than in-stent restenosis. The DEBATE-ISR study compared a prospective cohort of patients receiving DEB therapy for in-stent restenosis against a historical cohort of diabetic patients. Finally, the NCT02832024 study compared stent deployment versus atherectomy versus uncoated balloon angioplasty alone for in-stent restenosis." (60)

"Of the remaining 64 articles, 54 were excluded for a variety of other reasons (Fig. 1) . Ultimately, this review included a total of ten studies. All included studies were present in the initial database search. Excluded articles are listed in appendix study and present its characteristics.

"Of the 12 unique studies, three were prospective cohort studies, 15 18 22 three were case-control studies, 20 25 26 and six were crosssectional studies 14 16 17 23 24 39 (table 1) ." (62) Example 2: In a review examining the effects of antenatal corticosteroids for maturity of term or near term foetuses, the authors include a table presenting for each included study the citation, study location, study design, number of centres, duration of the study, percentage of participants lost to follow-up, number of participants, inclusion criteria, specific drug delivered and its dosage, the control, gestational age of participants at randomization, and how outcomes were defined:

" Table 1⇓ shows the characteristics of the included clinical trials." (63)

"A summary of the main intervention components is described using the items from the Template for Intervention Description and Replication (TIDieR) checklist (see Table 1 )." (64) Item 18. RISK OF BIAS IN STUDIES: Present assessments of risk of bias for each included study.

"We used the RoB 2.0 tool to assess risk of bias for each of the included studies. A summary of these assessments is provided in Table 1 . In terms of overall risk of bias, there were concerns about risk of bias for the majority of studies (20/24), with two of these assessed as at high risk of bias (Musher-Eizenman 2010; Wansink 2013a). A text summary is provided below for each of the six individual components of the 'Risk of bias' assessment. Justifications for assessments are available at the following (http://dx.doi.org/10.6084/m9.figshare.9159824)" (20)

" Fig 2. Forest plot (including the risk of bias assessment) demonstrating significant reduction in the risk of acute grade 2 or worse xerostomia with intensity modulated radiation therapy (IMRT) compared to conventional techniques. Note comparable benefit of IMRT over two-dimensional radiotherapy (2D-RT) and three-dimensional radiotherapy (3D-RT) on subgroup analyses. 

See meta-analysis in Figure 3 . The footnote to this forest plot states: "CSPP100A2308 study: the SBP reduction in the treatment and placebo group are reported from the CSR page 61. CSPP100A2405 study: the SD for all treatment groups are calculated from SEM reported on page 7 in the CSR". (67) Item 20a. RESULTS OF SYNTHESES: For each synthesis, briefly summarise the characteristics and risk of bias among contributing studies.

"Twelve included studies (described in 13 publications) assessed the effectiveness of Baby-Friendly Hospital Initiative interventions. All focused on postpartum women enrolled from hospital wards or birth facilities soon after delivery. Studies were conducted in diverse country settings including the United States (two studies); Taiwan (two studies); and one each in the Republic of Belarus, Hong Kong, Czech Republic, Russia, Croatia, Brazil, United Kingdom (multiple regions), and Scotland. All studies focused on multiple hospitals (>4) or clusters of hospitals. The majority of studies focused on women giving birth between 2000 and 2009; two enrolled women in the late 1990s…One included study was an RCT, 10 were prospective cohort studies, and 1 was a single-group pre-post study…In terms of population characteristics, seven studies reported on maternal age and generally enrolled women in their 20s and 30s. Three studies (set in the United States and United Kingdom) reported on race; the percentage of nonwhite participants enrolled ranged from 3 to 47 percent. In the six studies reporting on the percentage of enrolled women who were primiparous, the range was 38 to 67 percent" (68) "Overall, 61 studies described in 83 publications investigated our included tools for determining stroke risk in patients with nonvalvular AF and met the other inclusion criteria for Key Question 1. The included studies explored tools in studies of diverse quality, design, funding, and geographical location. Forty-three included studies were of good quality or rated as low risk of bias, 11 of fair quality or rated as medium risk of bias, and 7 were of poor quality or rated as high risk of bias. Studies with increased risk of bias had potential limitations related to handling of missing data, length of follow up between groups, blinding of outcomes assessors, whether confounders were assessed with reliable measures, and whether potential outcomes were prespecified. The studies covered broad geographical locations with 32 studies conducted in UK or continental Europe, 18 exclusively in the United States, 3 studies exclusively conducted in Canada, and 7 multinational trials. There was one study that did not report geographic location of enrolment. Ten studies were supported solely by industry, 8 studies received solely government support, 6 studies were supported by non-government, non-industry organizations, 15 studies received funding from multiple sources including government, industry, non-government and non-industry, and 22 studies did not report funding or it was unclear. We identified 52 studies using observational study design (prospective and retrospective cohorts) while 9 studies were identified as randomized controlled trials (RCTs)." (69) Example 3: In a review examining the effects of antipsychotics for the prevention and treatment of delirium, the authors summarise various characteristics and the risk of bias in studies comparing delirium incidence between haloperidol and placebo groups:

"Nine randomized controlled trials (RCTs) directly compared delirium incidence between haloperidol and placebo groups. These RCTs enrolled 3,408 patients in both surgical and medical intensive care and non-intensive care unit settings and used a variety of validated delirium detection instruments. Five of the trials were low risk of bias, three had unclear risk of bias, and one had high risk of bias owing to lack of blinding and allocation concealment. Intravenous haloperidol was administered in all except two trials; in those two exceptions, oral doses were given. These nine trials were pooled, as they each identified new onset of delirium (incidence) within the week after exposure to prophylactic haloperidol or placebo." (70)

See Graphical Overview for Evidence Reviews visual summary ( Example 1: In a review examining the effects of aspirin for primary prevention of cardiovascular disease, the authors report for a meta-analysis of risk ratios the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the relative effect into absolute terms:

"Twelve studies [each study cited], including a total of 159,086 patients, reported on the rate of major bleeding complications. Aspirin use was associated with a 46% relative risk increase of major bleeding complications (risk ratio 1.46; 95% CI, 1.30-1.64; p < 0.00001; I 2 = 31%; absolute risk increase 0.077%; number needed to treat to harm 1295; Fig 1) " (71) Example 2: In a review examining the effect of exercise programmes for ankylosing spondylitis, the authors report for a metaanalysis of mean differences the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the absolute effect into relative terms and describe the clinical importance of the result:

"Physical function (BASFI, 0 to 10 scale; lower score indicates higher function): Seven studies (312 participants) found a reduction in physical function score with exercise versus no intervention at the end of the intervention (mean difference (MD) -1.3, 95% confidence interval (CI) -1.7 to -0.9); absolute risk difference 13% (95% CI 9% to 17%); relative change 32% (95% CI 23% to 42%); Analysis 1.1). The statistical heterogeneity was not important (I²= 23%). There was no important clinical meaningful benefit." (72) Example 3: In a review examining the effects of strategies to improve the implementation of healthy eating, physical activity and obesity prevention policies, practices or programmes within childcare services, the authors report for a meta-analysis of standardised mean differences the number of included studies and participants, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency, and they translate the result into units of a particular measurement scale:

"Score-based measures of implementation were the most common continuous outcomes in studies comparing an implementation strategy with usual practice or minimal support control and were reported in 11 studies including nine randomised trials. Pooled analysis providing moderate-certainty evidence including all nine randomised trials with score-based measures of implementation [each study cited] reported an improvement (standardised mean difference 0.49; 95% confidence interval 0.19 to 0.79; I 2 = 54%; P < 0.001; participants = 495 services; equivalent to a mean difference of 0.88 on the Environment and Policy Assessment and Observation (EPAO) scale) favouring groups receiving implementation support strategies (Analysis 1.1)." (30)

Example 4: In a review examining the effects of workplace interventions for reducing sitting at work, the authors report for a meta-analysis of mean differences the number of included studies, summary estimate and its 95% confidence interval, the I 2 measure of inconsistency and a prediction interval:

"Ten studies compared the effects of using a sit-stand desk with or without information and counselling to the effects of using a sit-desk [each study cited]. The pooled analysis showed that the sit-stand desk with or without information and counselling intervention reduced sitting time at work by on average 100 minutes per eight-hour workday (95% confidence interval -116 to -84, I² = 37%; Analysis 1.1)… Data presented by one study, Sandy 2016, did not allow for calculation of time spent in sitting time at work and therefore we did not include the study in the quantitative synthesis. The prediction interval for sitting time ranged from -146 to -54 minutes a day." (73) "Among the 4 trials that recruited critically ill patients who were and were not receiving invasive mechanical ventilation at randomization, the association between corticosteroids and lower mortality was less marked in patients receiving invasive mechanical ventilation (ratio of Example 2: In a review examining the effects of community-based coordinating interventions in dementia care, the authors present results of several subgroup analyses, indicating for each the summary estimate and its precision for each subgroup and the P value for a test for subgroup differences:

"Interventions using a case manager with a nursing background showed a greater positive effect on caregiver quality of life compared to those that used other professional backgrounds (standardised mean difference = 0.94 versus 0.03, respectively; p < 0.001). Interventions that did not provide case managers with supervision showed greater effectiveness for reducing the percentage of patients that are institutionalised compared to those that provided supervision (odds ratio = 0.27 versus 0.96 respectively; p = 0.02). There was weak evidence that interventions using a lower caseload for case managers had greater effectiveness for reducing the number of patients institutionalised compared to interventions using a higher caseload for case managers (odds ratio = 0.23 versus 1.20 respectively; p = 0.08). There was little evidence that the other intervention components modify treatment effects (see Table 3 )." (75)

"The results of the five meta-regressions…are highlighted in Table 5 . The training duration, frequency, total trainings dose and training-to-sustainability ratio showed no impact on the effect size of the primary outcome pain. The PEDro sum score was negatively associated with the effect size; a study with a score-decrease of 1 point shows an increase in the effect size of .24. Fig 9 illustrates this association." (76) Example 4: In a review examining the effects of cannabinoid administration for pain, the authors present results of several meta-regression analyses, indicating for each the regression coefficient and its confidence interval, and using plots to visualise the relationships:

"Meta-regression results revealed that, when controlling for other explanatory variables, drug administration conditions were linked with pain reduction among included studies, such that cannabinoids (whole-plant cannabis and whole-cannabis extracts) β = −0.43, 95% confidence interval (CI) (−0.62, −0.24), p < 0.05 (Figure 4) , and synthetic cannabinoids (Dronabinol, Nabilone, and CT3) β = −0.39, 95% CI (−0.65, −0.14), p < 0.05 (Figure 4) , performed better than placebo. Furthermore, meta-regression results showed that, when controlling for other explanatory variables, sample size was linked with pain reduction, β = 0.01, 95% CI (0.00, 0.01), p < 0.05, such that studies involving smaller samples tended to report greater pain reduction effects (Figure 4 ). There were no observed interactions between drug administration conditions and sample size. Finally, meta-regression results showed that, when controlling for other explanatory variables, sample sex composition was linked with a modest, however non-significant, effect, β = −0.64, 95% CI (−1.37, 0.09), p = 0.09, such that studies including more female participants tended to report greater pain reductions ( Figure 5 )." (77) Item 20d. RESULTS OF SYNTHESES: Present results of all sensitivity analyses conducted to assess the robustness of the synthesized results.

"The magnitude of the pooled effect remained relatively stable in sensitivity analyses (table S13 in appendix 10)" (47) Example 2: In a review examining the effects of quadruple versus triple combination antiretroviral therapies for treatment naive people with HIV, the authors report that results of several sensitivity analyses were consistent with results of primary meta-analyses:

"Sensitivity analyses that removed studies with potential bias showed consistent results with the primary meta-analyses (risk ratio 1.00 for undetectable HIV-1 RNA, 1.00 for virological failure, 0.98 for severe adverse effects, and 1.02 for AIDS defining events; supplement 3E, 3F, 3H, and 3I, respectively). Such sensitivity analyses were not performed for other outcomes because none of the studies reporting them was at a high risk of bias. Sensitivity analysis that pooled the outcome data reported at 48 weeks, which also showed consistent results, was performed for undetectable HIV-1 RNA and increase in CD4 T cell count only (supplement 3J and 3K) and not for other outcomes owing to lack of relevant data. When the standard deviations for increase in CD4 T cell count were replaced by those estimated by different methods, the results of figure 3 either remained similar (that is, quadruple and triple arms not statistically different) or favoured triple therapies (supplement 2)." (66) Example 3: In a review examining the effects of operative treatment versus nonoperative treatment of Achilles tendon ruptures, the authors show in a table the results of primary and sensitivity analyses for two meta-analyses and present forest plots for sensitivity analyses in an appendix:

" Table 3 shows the results of the secondary sensitivity analyses. Re-rupture rate was reported in 17 (59%) high quality studies -10 randomized controlled trials and seven observational studies. The overall pooled effect showed that operative treatment was associated with a significant reduction in re-rupture rate compared with nonoperative treatment (risk difference 5.1%; risk ratio 0. 44 other three studies]…The authors reported small but significant improvements on the CIBIC-Plus for 183 patients (89 on latrepirdine and 94 on placebo) favouring latrepirdine following the 26-week primary endpoint (MD -0.60, 95% CI -0.89 to -0.31, P < 0.001). Similar results were found at the additional 52-week follow-up (MD -0.70, 95% CI -1.01 to -0.39, P < 0.001). However, we considered this to be low quality evidence due to imprecision and reporting bias. Thus, we could not draw conclusions about the efficacy of latrepirdine in terms of changes in clinical impression." (79) Example 2: In a review examining the effects of pharmacotherapy for social anxiety disorder, the authors used visual inspection of a contour-enhanced funnel plot to conclude that the asymmetry observed was likely due to publication bias: "There is evidence of possible funnel plot asymmetry providing data on response to short-term medication treatment, both for the SSRIs and all medications combined. Inspection of the contour enhanced funnel plots for the SSRIs ( Figure 4 ) and all of the trials ( Figure 5 ) suggests that this asymmetry is due to publication bias, as trials with less precise treatment response outcomes are more likely than their higher precision counterparts to be missing from regions of the plot representing statistically nonsignificant treatment effects. Egger regression tests quantitatively confirmed this visual impression, providing evidence of possible publication bias for all of the medication trials (t = 2.8226, df = 49, P = 0.0069) and for the SSRIs (t = 2.6426, df = 22, P = 0.015)." (80) Example 3: In a review examining the effects of bystander programs on the prevention of sexual assault among adolescents and college students, the authors used visual inspection of a contour-enhanced funnel plot to conclude that the asymmetry observed was unlikely to be due to publication bias:

"To examine small study and publication bias we created a contour-enhanced funnel plot of the 11 effect sizes plotted against their standard errors (see Figure 32 ). Visual inspection of the funnel plot reveals an absence of adverse intervention effects. Given the absence of negative effects in the regions of statistical significance and non-significance, the results from this contourenhanced funnel plot indicate a potential risk of publication bias. To further investigate the possibility of bias, we conducted an Egger test for funnel plot asymmetry. The results provided no significant evidence of small study effects (bias coefficient: 0.36; t: −0.60, p = .56)...With these collective findings, we therefore conclude that the meta-analysis results shown in Figure 31 are likely robust to any small study/publication bias." (81) Item 22. CERTAINTY OF EVIDENCE: Present assessments of certainty (or confidence) in the Example 1: In a review examining the effects of surgery for rotator cuff tears, the authors report their certainty in text, along with rationale for their judgement, and present a Summary of Findings table including certainty judgements for several outcomes:

"Compared with non-operative treatment, low-certainty evidence indicates surgery (repair with subacromial decompression) may have little or no effect on function at 12 months. The evidence was downgraded two steps, once for bias and once for imprecision body of evidence for each outcome assessed.

-the 95% CIs overlap minimal important difference in favour of surgery at this time point." (49) . The summary of findings table presents the same information as the text above, with footnotes explaining judgements.

Example 2: In a review examining the effects of polyunsaturated fatty acids on patient-important outcomes in children and adolescents with autism spectrum disorder, the authors report their certainty in the abstract text and present Summary of Findings tables including certainty judgements for several outcomes:

"Polyunsaturated fatty acids (PUFAs) were superior compared to placebo in reducing anxiety in individuals with autism spectrum disorder (standardised mean difference -1.01, 95% CI -1.86 to -0.17; very low certainty of evidence)…Summary of findings for the comparisons PUFAs versus placebo and PUFAs versus healthy diet are presented in Table 2 and Table 3 ." (82) Item 23a. DISCUSSION: Provide a general interpretation of the results in the context of other evidence.

Caribbean, the authors compare their findings with those observed in other relevant reviews:

"Although we need to exercise caution in interpreting these findings because of the small number of studies, these findings nonetheless appear to be largely in line with the recent systematic review on what works to improve education outcomes in lowand middle-income countries of Snilstveit et al. (2012) . They found that structured pedagogical interventions may be among the effective approaches to improve learning outcomes in low-and middle-income countries. This is consistent with our findings that teacher training is only effective in improving early grade literacy outcomes when it is combined with teacher coaching. The finding is also consistent with our result that technology in education programs may have at best no effects unless they are combined with a focus on pedagogical practices. In line with our study, Snilstveit et al. Example 2: In a review examining the effects of individualized funding interventions to improve health and social care outcomes for people with a disability, the authors describe how their review differs to the methods used in two previous reviews:

"As outlined in the protocol, the authors were aware of only two previous systematic reviews prior to commencing this study (Carter Anand et al., 2012; Webber et al., 2014). In one sense, the eligibility criteria within the current study were broader and more inclusive; for example, Webber et al. limited their review to mental health users only. The need for a results refinement process further highlights the broad scope of the current review. In another sense, however, this review was more restrictive in terms of the quality of evidence. To this end, quantitative studies were excluded if they were not designed to robustly evaluate effectiveness or did not have a control group, while previous reviews included studies without control groups (for example). Therefore, the studies included in this review are very different, in some respects from those captured in the above reviews. At the same time, however, the findings from this review were consistent in many respects with the two reviews previously identified." (84)

Example 3: In a review examining the effects of Alcoholics Anonymous and other 12-step programs for alcohol use disorder, the authors compare the current review with the previous of the review and with other reviews and studies:

"The evidence contained in this review is similar to, and extends that of the prior Cochrane Review (Ferri 2006b), which this review updates and replaces, as well as of other narrative reviews which found overall positive effects for AA/TSF interventions (e.g. Kaskutas 2009a; Kelly 2003b). The results presented in this review are also supported by other published analyses. One study from Project MATCH (Longabaugh 1998), found that regardless of whether outpatients' pre-treatment network was supportive or unsupportive of alcohol use at treatment intake, AA/TSF participants were more likely to be involved with AA, which in turn, subsequently explained the observed lower drinks per drinking day (DDD) and greater PDA advantages for TSF-treated participants observed at the 36-month follow-up. The prior Cochrane Review contained eight studies with 3417 participants (Ferri 2006b), and found that on the whole, AA/TSF interventions were as effective, but not more effective, than the interventions to which they were compared. This new review is based on 27 studies reported in 36 articles and has a total of 10,565 participants. It is considerably larger, comprises more rigorous studies, and found that, compared to other active psychosocial interventions for AUD, AA/TSF interventions often produce greater abstinence -notably continuous abstinence -as well as some reductions in drinking intensity, fewer alcohol-related consequences, and lower alcohol addiction severity. This review also included economic analyses, which augments prior reviews and adds important information regarding the cost-benefits of providing AA/TSF in clinical settings." (85) Item 23b. DISCUSSION: Discuss any limitations of the evidence included in the review.

"Study populations were young, and few studies measured longitudinal exposure. The included studies were often limited by selection bias, recall bias, small sample of marijuana-only smokers, reporting of outcomes on marijuana users and tobacco users combined, and inadequate follow-up for the development of cancer… Most studies poorly assessed exposure, and some studies did not report details on exposure, preventing meta-analysis for several outcomes." (86)

Example 2: In a review examining indicators associated with job morale among physicians and dentists in low-income and middle-income countries, the authors describe the limited applicability of the conclusions to people in low and middle income countries:

"…despite the use of a comprehensive search strategy, almost all included studies were from middle-income countries, possibly reflecting the shortage of resources for such studies in low-income countries. This means that our findings cannot be generalized to low-income countries. Also, relatively fewer findings were available from Africa, Southern Europe, and Central, Southern, and Southeastern Asia, which made it challenging to generalize conclusions about low and middle income countries." (87) Item 23c. DISCUSSION: Discuss any limitations of the review processes used.

Example 1: In a review examining the effect of quarantine alone or in combination with other public health measures to control COVID-19, the authors report several limitations of the review processes used:

"Because of time constraints…we dually screened only 30% of the titles and abstracts; for the rest, we used single screening. A recent study showed that single abstract screening misses up to 13% of relevant studies (Gartlehner 2020). In addition, single review authors rated risk of bias, conducted data extraction and rated certainty of evidence. A second review author checked the plausibility of decisions and the correctness of data. Because these steps were not conducted dually and independently, we introduced some risk of error…Nevertheless, we are confident that none of these methodological limitations would change the overall conclusions of this review. Furthermore, we limited publications to English and Chinese languages. Because COVID-19 has become a rapidly evolving pandemic, we might have missed recent publications in languages of countries that have become heavily affected in the meantime (e.g. Italian or Spanish)." (88) Example 2: In a review examining the effects of regular inhaled therapies for patients with stable chronic obstructive pulmonary disease, the authors report several limitations of the review processes used:

"We acknowledge several limitations…Although our network meta-analysis included all available randomized controlled trials, we could not conduct a subgroup analysis to identify a specific group of patients who could benefit from triple therapy more prominently…Because studies reporting information -such as eosinophil counts and chronic bronchitis -were fewer than expected, we could not generate a sufficient network for the sensitivity and meta-regression analyses. In addition, we did not evaluate the symptoms, use of rescue medication, quality of life, and lung function, which are other important outcomes." (89) Example 3: In a review examining the effects of red and processed meat consumption on risk of cardiometabolic and cancer outcomes, the authors report several limitations of the review processes used:

"One of the primary limitations of our work is the heterogeneity of dietary patterns across studies. Although all patterns discriminated between participants with low and high intake of red and processed meat, other food and nutrient characteristics of dietary patterns and the quantity of red and processed meat consumed varied widely across studies. Moreover, the quantity of red and processed meat consumed differed across dietary patterns and studies. For example, one study compared 1.4 versus 3.5 servings of processed meat per week, whereas another compared 0.7 versus 4.9 servings per week. Such inconsistencies may have increased heterogeneity of meta-analyses and potentially reduced the magnitude of observed associations. Also, analyses of extreme categories of adherence may artificially inflate effect estimates and may not be indicative of effects observed at typical levels of adherence. Second, we were unable to analyze the data separately for red and processed meat because authors typically combined them or did not distinguish between them in primary studies." (90) Item 23d. DISCUSSION: Discuss implications of the results for practice, policy, and future research.

Example 1: In a review examining the effects of bystander programs on the prevention of sexual assault among adolescents and college students, the authors discuss the implications for practice given the evidence of benefit observed:

"Implications for practice and policy: Findings from this review indicate that bystander programs have significant beneficial effects on bystander intervention behaviour. This provides important evidence of the effectiveness of mandated programs on college campuses. Additionally, the fact that our (preliminary) moderator analyses found program effects on bystander intervention to be similar for adolescents and college students suggests early implementation of bystander programs (i.e., in secondary schools with adolescents) may be warranted. Importantly, although we found that bystander programs had a significant beneficial effect on bystander intervention behaviour, we found no evidence that these programs had an effect on participants' sexual assault perpetration. Bystander programs may therefore be appropriate for targeting bystander behaviour, but may not be appropriate for targeting the behaviour of potential perpetrators. Additionally, effects of bystander programs on bystander intervention behaviour diminished by 6-month post-intervention. Thus, programs effects may be prolonged by the implementation of booster sessions conducted prior to 6 months post-intervention.

Implications for research: Findings from this review suggest there is a fairly strong body of research assessing the effects of bystander programs on attitudes and behaviors. However, there are a couple of important questions worth further exploration. First, according to one prominent logical model, bystander programs promote bystander intervention by fostering prerequisite knowledge and attitudes (Burn, 2009). Our meta-analysis provides inconsistent evidence of the effects of bystander programs on knowledge and attitudes, but promising evidence of short-term effects on bystander intervention. This casts uncertainty around the proposed relationship between knowledge/attitudes and bystander behavior. Although we were unable to assess these issues in the current review, this will be an important direction for future research. Our understanding of the causal mechanisms of program effects on bystander behavior would benefit from further analysis (e.g., path analysis mapping relationships between specific knowledge/attitude effects and bystander intervention). Second, bystander programs exhibit a great deal of content variability, most notably in framing sexual assault as a gendered or gender-neutral problem. That is, bystander programs tend to adopt one of two main approaches to addressing sexual assault: (a) presenting sexual assault as a gendered problem (overwhelmingly affecting women) or (b) presenting sexual assault as a gender-neutral problem (affecting women and men alike). Differential effects of these two types of programs remain largely unexamined. Our analysis indicated that (a) the sex of victims/perpetrators (i.e., portrayed in programs as gender neutral or male perpetrator and female victim) and (b) whether programs were implemented in mixed-or single-sex settings were not significant moderators of program effects on bystander intervention. However, these findings are limited to a single outcome and they should be considered preliminary, as they are based on a small sample (n = 11). Our understanding of the differential effects of gendered versus gender neutral programs would benefit from the design and implementation of high-quality primary studies that make direct comparisons between these two types of programs (e.g., RCTs comparing the effects of two active treatment arms that differ in their gendered approach). Finally, our systematic review and meta-analysis demonstrate the lack of global evidence concerning bystander program effectiveness. Our understanding of bystander programs' generalizability to non-US contexts would be greatly enhanced by high quality research conducted across the world." (81) Example 2: In a review examining the effects of trauma-informed approaches in schools, the authors discuss the implications for practice given the lack of evidence of benefit:

"From this review, it seems like the most prudent thing for school leaders, policymakers, and school mental health professionals to do would be proceed with caution in their embrace of a trauma-informed approach as an overarching framework and conduct rigorous evaluation of this approach. We simply do not have the evidence (yet) to know if this works, and indeed, we do not know if using a trauma-informed approach could actually have unintended negative consequences for traumatized youth and school communities. We also do not have evidence of other potential costs in implementing this approach in schools, whether they be financial, academic, or other opportunity costs, and whether benefits outweigh the costs of implementing and maintaining this approach in schools. That said, calling for caution in adopting trauma-informed care in schools does not preclude schools from continuing to implement evidence-informed programs that target trauma symptoms in youth, or that they should simply wait for the research to provide unequivocal answers. The benefit of the trauma-informed approach being made freely available by SAMHSA and other policymakers is that these components can form the basis for a school (or school district) to begin to adapt and apply this approach in schools." (91) Example 3: In a review examining the effects of lenvatinib and sorafenib for differentiated thyroid cancer, the authors list implications for future research in order of priority:

"In order of priority, the assessment group suggests the following further research priorities:

1. Clinical advice to the assessment group is that only radioactive iodine-refractory differentiated thyroid cancer patients experiencing symptoms, or those who have clinically significant progressive disease, are likely to be treated in routine clinical practice. Subgroup analyses suggest that the effects on progression-free survival are similar for patients treated with sorafenib regardless of whether they are symptomatic or asymptomatic. However, these findings are post hoc and include only a minority of symptomatic patients. It is unclear if other outcomes, such as overall survival, objective tumour response rate, adverse events and health-related quality of life, differ by symptomatic or asymptomatic disease. Future Item 24b. REGISTRATION AND PROTOCOL: Indicate where the review protocol can be accessed, or state that a protocol was not prepared.

Example 1: In a review examining the effects of psychological interventions for fatigue in cancer survivors, the authors report that the protocol for the review is published and provide a citation for it:

"The review protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO) database (registration number: CRD42014015219) and the protocol has been published [citation for protocol provided]." (96) Example 2: In a review examining psychotropic medication non-adherence and its associated factors among patients with major psychiatric disorders, the authors report that the protocol for the review is published and provide a citation for it:

"…this systematic review and meta-analysis protocol has been published elsewhere [citation for the protocol provided]." (97) Item 24c. REGISTRATION AND PROTOCOL: Describe and explain any amendments to information provided at registration or in the protocol.

Example 1: In a review examining the effects of pharmacological interventions for the treatment of delirium in critically ill adults, the authors describe and explain several amendments to information provided in the protocol:

"Differences between protocol and review: In our protocol (Burry 2015), we planned the primary outcome to be duration of delirium, defined as the time from which it was first identified to when it was first resolved (i.e. screened negative as defined by study authors (e.g. first negative screen, two consecutive screenings)), measured in days, and our secondary outcome to be the total duration of delirium, measured in days. There was far more variability in the definition of the outcome used than we had anticipated. Only two trials reported on the duration of delirium's first episode, and the remaining trials reported days with delirium, time in delirium, or total duration of delirium; most did not report when delirium was identified or how trial authors defined resolution of delirium. We therefore chose to report the total duration of delirium as our primary outcome and to pool the variable definitions. We added the outcome number of days in coma, as this outcome was reported in four trials, and we believed it important to include it in this review, as it is a newer outcome that is likely to be included in subsequent studies." (98) Example 2: In a review examining the effects of pharmacologic therapies on patients with idiopathic sudden sensorineural hearing loss, the authors describe and explain several amendments to information provided in the protocol:

"We incurred no deviations from the a priori review protocol, with the exception of a minor modification of our modelling approach for: 1) continuous endpoints at baseline and at final follow-up with corresponding standard deviations (but without average changes and corresponding standard deviations per group) in certain studies; and 2) follow-up time due to variations in endpoints assessment time across studies." (99) Example 3: In a review examining the effects of deworming in non-pregnant adolescent girls and adult women, the authors describe and explain several amendments to information provided in the protocol: collection forms; data extracted from included studies; data used for all analyses; analytic code; any other materials used in the review.

Example 2: In a review examining the effects of self-management smartphone-based apps for post-traumatic stress disorder symptoms, the authors report that the data and analytic code are publicly available in the Open Science Framework repository and provide a DOI for readers to access the files:

"All data and code are stored on a repository of the Open Science Framework (doi: 10.17605/OSF.IO/DZJT7)" (110) Example 3: In a review examining the effects of specialised treatments for anorexia nervosa, the authors report that the data and analytic code are publicly available in the Open Science Framework repository and provide a URL for readers to access the files:

"The dataset and script to perform the analyses are available at https://osf.io/q7v2d/?view_only=c3cdaf346298411eab9ed15e863c9f21." (111) 

Overactive/ 2. ((overactiv$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability or incontinen$) adj3 bladder$).ti,ab

OAB or OABS or IOAB or IOABS).ti,ab

$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability) adj3 detrusor$).ti,ab

Urination Disorders

exp Urinary Incontinence

Urinary Bladder Diseases

urin$ adj3 (incontinen$ or leak$ or urgen$ or frequen$)).ti,ab. 11. (urin$ adj3 (disorder$ or dysfunct$)).ti,ab. 12. (detrusor$ adj3 (hyperreflexia$ or hyper-reflexia$ or hypertoni$ or hyper-toni$)).ti,ab

$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability or incontinen$) adj3 bladder$).ti,ab

OAB or OABS or IOAB or IOABS).ti,ab

$ or over-activ$ or hyperactiv$ or hyper-activ$ or unstable or instability) adj3 detrusor$).ti,ab

urin$ adj3 (incontinen$ or leak$ or urgen$ or frequen$)).ti,ab. 11. (urin$ adj3 (disorder$ or dysfunct$)).ti,ab. 12. detrusor dyssynergia/ 13. (detrusor$ adj3 (hyperreflexia$ or hyper-reflexia$ or hypertoni$ or hyper-toni$)).ti,ab. 14

mirabegron or betmiga$ or myrbetriq$ or betanis$ or YM-178 or YM178 or 223673-61-8 or "223673618" or MVR3JL3B2V).ti,ab,rn

Differences from protocol: We modified the lower limit for age in our eligibility criteria from 12 years of age to 10 years of age because the age of adolescence was reduced. We used the WHO measures for severe anaemia, defined by haemoglobin levels < 80 g/L instead of < 70 g/L as stated in the protocol. We decided to add adverse events to our list of primary outcomes (instead of secondary) and we changed reinfection rate to a secondary outcome

Differences between protocol and review: • Title: The original title of the protocol was "Interventions for pruritus of unknown cause"; it was changed to "Interventions for chronic pruritus of unknown origin" as this is currently the most familiar and widely used term among clinicians

When we found studies with a subset of patients with a diagnosis of CPUO, we included them if data are presented separately for these patients, or if the majority (> 50%) of the included participants met the inclusion criteria. If data were not available for this subset of participants

aprepitant" as a systemic intervention in representation of the pharmacological group "substance P and neurokinin 1 receptor (NK1R) antagonist"; therefore for the report of the review, we changed this in the inclusion criteria. We found no studies evaluating the prioritised comparisons: emollient creams, cooling lotions, topical corticosteroids, topical antidepressants, systemic antihistamines, systemic antidepressants, systemic anticonvulsants, and phototherapy. Therefore

• Search methods: Due to the large number of excluded studies (n=67), we did not screen the bibliographies of excluded studies for further references to relevant reviews

Risk of bias assessment: We have updated the 'risk of bias' methods with the new tool ROB 2.0 in line with guidance from the new version of the Cochrane Handbook for Systematic Reviews of Interventions, and based on the protocol "Therapeutic interventions for alcohol dependence in non-inpatient settings: a systematic review and network metaanalysis

Because this review included only one study, we could not perform any meta-analyses, and hence could not assess publication bias nor perform sensitivity analysis or subgroup analyses. We did not impute missing data because we considered missing data to be minimal

Comparison of the Therapeutic Effects of Rivaroxaban Versus Warfarin in Antiphospholipid Syndrome: A Systematic Review. Archives of rheumatology

Repetitive Transcranial Magnetic Stimulation for the Treatment of Lower Limb Dysfunction in Patients Poststroke: A Systematic Review with Meta-Analysis. Journal of stroke and cerebrovascular diseases : the official journal of National Stroke Association

Efficacy, tolerability and safety of cannabis-based medicines for cancer pain : A systematic review with meta-analysis of randomised controlled trials

Does Routine Anti-Osteoporosis Medication Lower the Risk of Fractures in Male Subjects? An Updated Systematic Review With Meta-Analysis of Clinical Trials

Psychological interventions for common mental disorders in women experiencing intimate partner violence in low-income and middleincome countries: a systematic review and meta-analysis. The Lancet Psychiatry

Efficacy of cognitive bias modification interventions in anxiety and depressive disorders: a systematic review and network meta-analysis. The lancet Psychiatry

Effectiveness of interventions targeting antibiotic use in long-term aged care facilities: a systematic review and metaanalysis

Effect of dose and duration of reduction in dietary sodium on blood pressure levels: systematic review and meta-analysis of randomised trials

Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis

Mass deworming for improving health and cognition of children in endemic helminth areas: A systematic review and individual participant data network meta-analysis

Surgical treatments for women with stress urinary incontinence: the ESTER systematic review and economic evaluation

Effectiveness of dietary inorganic nitrate for lowering blood pressure in hypertensive adults: a systematic review. JBI database of systematic reviews and implementation reports

Down-titration and discontinuation strategies of tumour necrosis factor-blocking agents for rheumatoid arthritis in patients with low disease activity

Preventive Services Task Force Evidence Syntheses, formerly Systematic Evidence Reviews. Pre-Exposure Prophylaxis for the Prevention of HIV Infection: A Systematic Review for the US Preventive Services Task Force

Impact of mobile health (mHealth) interventions during the perinatal period for mothers in low-and middle-income countries: a systematic review. JBI database of systematic reviews and implementation reports

Screening for esophageal adenocarcinoma and precancerous conditions (dysplasia and Barrett's esophagus) in patients with chronic gastroesophageal reflux disease with or without other risk factors: two systematic reviews and one overview of reviews to inform a guideline of the Canadian Task Force on Preventive Health Care (CTFPHC). Systematic reviews

Family therapy approaches for anorexia nervosa

Perioperative interventions for prevention of postoperative pulmonary complications: systematic review and meta-analysis

AHRQ Comparative Effectiveness Reviews. Pharmacologic and Nonpharmacologic Therapies in Adult Patients With Exacerbation of COPD: A Systematic Review

Altering the availability or proximity of food, alcohol, and tobacco products to change their selection and consumption

Educational outcomes of children in contact with social care in England: a systematic review

Environmental interventions to reduce the consumption of sugar-sweetened beverages and their effects on health

HPV vaccination and Native Americans: protocol for a systematic review of factors associated with HPV vaccine uptake among American Indians and Alaska Natives in the USA

Comparative assessment of onabotulinumtoxinA and mirabegron for overactive bladder: an indirect treatment comparison

Key components of shared decision making models: a systematic review

Long-term effects of alcohol consumption on cognitive function: a systematic review and dose-response analysis of evidence published between

Pharmacological interventions for promoting smoking cessation during pregnancy

Comparative Efficacy of Interventions for Aggressive and Agitated Behaviors in Dementia: A Systematic Review and Network Meta-analysis

Pharmacological, psychological, and non-invasive brain stimulation interventions for treating depression after stroke

Caregiver involvement in interventions for improving children's dietary intake and physical activity behaviors

Red light camera interventions for reducing traffic violations and traffic crashes: A systematic review

Psychological interventions to foster resilience in healthcare students

Pars plana vitrectomy combined with scleral buckle versus pars plana vitrectomy for giant retinal tear

Metformin for endometrial hyperplasia

Effectiveness of interventions to reduce homelessness: a systematic review and meta-analysis

Food fortification with multiple micronutrients: impact on health outcomes in general population

Interventions to reduce ambient particulate matter air pollution and their effect on health

The effects of cash transfers and vouchers on the use and quality of maternity care services: A systematic review

What effect does functional appliance treatment have on the temporomandibular joint? A systematic review with meta-analysis

Individual-level behavioural smoking cessation interventions tailored for disadvantaged socioeconomic position: a systematic review and meta-regression. The Lancet Public Health

Evidence of clinical efficacy of homeopathy. A meta-analysis of clinical trials

Intensive LDL cholesterol-lowering treatment beyond current recommendations for the prevention of major vascular events: a systematic review and meta-analysis of randomised trials including 327 037 participants

Vitamin D supplementation during pregnancy: state of the evidence from a systematic review of randomised trials

Omega-3, omega-6, and total dietary polyunsaturated fat for prevention and treatment of type 2 diabetes mellitus: systematic review and meta-analysis of randomised controlled trials

Surgery for rotator cuff tears

Treatments for seizures in catamenial (menstrual-related) epilepsy

Quality of dietary fat and genetic risk of type 2 diabetes: individual participant data meta-analysis

Association of Testosterone Treatment With Alleviation of Depressive Symptoms in Men: A Systematic Review and Meta-analysis

Implantable cardiac defibrillators for people with non-ischaemic cardiomyopathy

Pharmacotherapy for the Treatment of Cannabis Use Disorder: A Systematic Review

Text message reminders for improving sun protection habits: A systematic review

Pharmacological interventions for heart failure in people with chronic kidney disease

Impact of non-menthol flavours in e-cigarettes on perceptions and use: an updated systematic review

Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy

Drug-eluting balloon angioplasty versus uncoated balloon angioplasty for the treatment of in-stent restenosis of the femoropopliteal arteries

Effect of organised cervical cancer screening on cervical cancer mortality in Europe: a systematic review

Aspirin and fracture risk: a systematic review and exploratory meta-analysis of observational studies

Antenatal corticosteroids for maturity of term or near term fetuses: systematic review and meta-analysis of randomized controlled trials

Interventions to facilitate shared decision making to address antibiotic use for acute respiratory infections in primary care

Systematic review and meta-analyses of intensity-modulated radiation therapy versus conventional two-dimensional and/or or three-dimensional radiotherapy in curative-intent management of head and neck squamous cell carcinoma

Quadruple versus triple combination antiretroviral therapies for treatment naive people with HIV: systematic review and metaanalysis of randomised controlled trials

Blood pressure lowering efficacy of renin inhibitors for primary hypertension

AHRQ Comparative Effectiveness Reviews. Breastfeeding Programs and Policies, Breastfeeding Uptake, and Maternal Health Outcomes in Developed Countries

AHRQ Comparative Effectiveness Reviews. Stroke Prevention in Patients With Atrial Fibrillation: A Systematic Review Update

AHRQ Comparative Effectiveness Reviews. Antipsychotics for the Prevention and Treatment of Delirium

Agency for Healthcare Research and Quality (US)

Aspirin for primary prevention of cardiovascular disease: a meta-analysis with a particular focus on subgroups

Exercise programmes for ankylosing spondylitis

Workplace interventions for reducing sitting at work

Association Between Administration of Systemic Corticosteroids and Mortality Among Critically Ill Patients With COVID-19: A Meta-analysis

The effectiveness of community-based coordinating interventions in dementia care: a metaanalysis and subgroup analysis of intervention components

Sustainability effects of motor control stabilisation exercises on pain and function in chronic nonspecific low back pain patients: A systematic review with metaanalysis and meta-regression

Effects of cannabinoid administration for pain: A meta-analysis and meta-regression

Operative treatment versus nonoperative treatment of Achilles tendon ruptures: systematic review and meta-analysis

Latrepirdine for Alzheimer's disease. The Cochrane database of systematic reviews

Pharmacotherapy for social anxiety disorder (SAnD)

Effects of bystander programs on the prevention of sexual assault among adolescents and college students: A systematic review

Impact of polyunsaturated fatty acids on patient-important outcomes in children and adolescents with autism spectrum disorder: a systematic review

What works to improve early grade literacy in Income Countries: A Systematic Review and Meta-analysis

Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review. The Cochrane database of systematic reviews

Comparisons of exacerbations and mortality among regular inhaled therapies for patients with stable chronic obstructive pulmonary disease: Systematic review and Bayesian network meta-analysis

Patterns of Red and Processed Meat Consumption and Risk for Cardiometabolic and Cancer Outcomes: A Systematic Review and Meta-analysis of Cohort Studies

Effects of trauma-informed approaches in schools: A systematic review

Lenvatinib and sorafenib for differentiated thyroid cancer after radioactive iodine: a systematic review and economic evaluation

Applicability of augmented reality in orthopedic surgery -A systematic review

Dose-response relationship between exercise and cognitive function in older adults with and without cognitive impairment: A systematic review and meta-analysis

A systematic review and metaanalysis: the effect of feedback on satisfaction with the outcome of task performance

The effectiveness of psychological interventions for fatigue in cancer survivors: systematic review of randomised controlled trials. Systematic reviews

Psychotropic medication non-adherence and its associated factors among patients with major psychiatric disorders: a systematic review and meta-analysis. Systematic reviews

Pharmacological interventions for the treatment of delirium in critically ill adults

A systematic review and network meta-analysis of existing pharmacologic therapies in patients with idiopathic sudden sensorineural hearing loss

Deworming in non-pregnant adolescent girls and adult women: a systematic review and meta-analysis

Interventions for chronic pruritus of unknown origin

Screening for Hepatitis C Virus Infection in Adolescents and Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force

Switching from a gonadotropin-releasing hormone (GnRH) agonist to a GnRH antagonist in prostate cancer patients: A systematic review and meta-analysis

Mobile health applications for improving the sexual health outcomes among adults with chronic diseases: A systematic review

Percutaneous vertebroplasty for osteoporotic vertebral compression fracture

Psychiatric morbidity and suicidal behaviour in low-and middle-income countries: A systematic review and meta-analysis

Traumatic brain injury in homeless and marginally housed individuals: a systematic review and meta-analysis

The protective effect of alcohol consumption on the incidence of cardiovascular diseases: is it real? A systematic review and meta-analysis of studies conducted in community settings

How Much Does Education Improve Intelligence? A Meta-Analysis

Efficacy of Self-Management Smartphone-Based Apps for Post-traumatic Stress Disorder Symptoms: A Systematic Review and Meta-Analysis

Treatment outcomes for anorexia nervosa: a systematic review and meta-analysis of randomized controlled trials

studies of patients should aim to include a greater proportion of patients with symptomatic disease and investigate possible differences. Consideration should be given to using the classification of patients as symptomatic or asymptomatic as a randomisation stratification factor. 2. It would be useful to record, and report, health-related quality of life outcomes from any future clinical study of lenvatinib and sorafenib. In particular, data should be collected, using the EQ-5D questionnaire, throughout the whole trial period, not only from patients whose disease has not progressed. Further research on health-related quality of life from treating patients who have symptomatic disease compared with those who do not is also required. 3. Currently, evidence does not allow a comparison of the effectiveness of treatment with lenvatinib with the effectiveness of treatment with sorafenib. A head-to-head trial considering these treatments and placebo would generate results that would be valuable to decision-makers. 4. It would be useful to explore how lenvatinib, sorafenib and best supportive care should be positioned in the treatment pathway." (92) Item 24a. REGISTRATION AND PROTOCOL: Provide registration information for the review, including register name and registration number, or state that the review was not registered.Example 1: In a review examining the applicability of augmented reality in orthopaedic surgery, the authors report that the review was registered, specifying the register name (PROSPERO) and registration number:"…this systematic review has been registered in the international prospective register of systematic reviews (PROSPERO) under the registration number: CRD42019128569" (93)Example 2: In a review examining the dose-response relationship between exercise and cognitive function in older adults with and without cognitive impairment, the authors report that the review was registered, specifying the register name (Open Science Framework) and a URL to the register entry:"The current protocol is registered with the Open Science Framework (url: https://osf.io/qe43p/)" (94) Example 3: In a review examining the effects of routine anti-osteoporosis medication in men, the authors report that the review was not registered:"A protocol for this systematic review was developed before the research began; however, this review was not registered" (4)Example 4: In a review examining the effect of feedback on satisfaction with the outcome of task performance, the authors report that the review was not registered and indicate the potential limitations of not doing so:"The present meta-analysis was not registered online while it was in the planning stage. This of course increases the probability of an unplanned duplication, and does not allow a verification that review methods were carried out as planned." (95) Item 25. SUPPORT: Describe sources of financial or nonfinancial support for the review, and the role of the funders or sponsors in the review. 

"Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: NK chairs and contributes to a number of guidelines for self-harm and suicidal behaviour and sits on the main government advisory group for suicide prevention in England. NK and DK also advised the Sri Lankan Ministry of Health on their suicide prevention strategy. NK receives research funding from government and charity sources. NK does not receive industry funding or personal remuneration." (106) Example 3: In a review examining the prevalence of traumatic brain injury in homeless and marginally housed individuals, the authors declare several competing interests:"Declaration of interests: NDS has sat on the paid advisory board of Highmark Interactive, received consulting or speaking fees from WorkSafeBC and Yukon WCB, the National Hockey League, and Major League Soccer, and has received fees for expert testimony in neuropsychology. WGH has received consulting fees or sat on paid advisory boards for the Canadian Agency for Drugs and Technology in Health, AlphaSights, Guidepoint, In Silico, Translational Life Sciences, Otsuka, Lundbeck, and Newron. WJP is the founder and chief executive officer of Translational Life Sciences, an early stage biotechnology company. He is also on the scientific advisory board of Medipure Pharmaceuticals and Vitality Biopharma, and in the past has been on the board of directors for Abattis Bioceuticals and on the advisory board for Vinergy Resources; these companies are early stage biotechnology enterprises with no relation to brain injury. All other authors declare no competing interests." (107) Example 4: In a review examining the effects of alcohol consumption on the incidence of cardiovascular diseases, the authors declare having no competing interests:"Competing interest: The authors declare that they have no competing interests". (108) Item 27. AVAILABILITY OF DATA, CODE AND OTHER MATERIALS: Report which of the following are publicly available and where they can be found: template data Example 1: In a review examining the effects of education on intelligence, the authors report that the origin of each data point is described in a spreadsheet made publicly available in the Open Science Framework repository, and provide a URL for readers to access the file (the repository also included datasets and analytic code for the review):"All meta-analytic data and all codebooks and analysis scripts (for Mplus and R) are publicly available at the study's associated page on the Open Science Framework (https://osf.io/r8a24/)...The precise sources (table, section, or paragraph) for each estimate are described in notes in the master data spreadsheet, available on the Open Science Framework page for this study (https://osf.io/r8a24/)" (109)