key: cord-0777503-kpoo7yzf authors: Gamage, U. S. H.; Mahesh, Pasyodun Koralage Buddhika; Schnall, Jesse; Mikkelsen, Lene; Hart, John D.; Chowdhury, Hafiz; Li, Hang; McLaughlin, Deirdre; Lopez, Alan D. title: Effectiveness of training interventions to improve quality of medical certification of cause of death: systematic review and meta-analysis date: 2020-12-11 journal: BMC Med DOI: 10.1186/s12916-020-01840-2 sha: b332caa5283f106cff8fd31b4ff93d6f3f381580 doc_id: 777503 cord_uid: kpoo7yzf BACKGROUND: Valid cause of death data are essential for health policy formation. The quality of medical certification of cause of death (MCCOD) by physicians directly affects the utility of cause of death data for public policy and hospital management. Whilst training in correct certification has been provided for physicians and medical students, the impact of training is often unknown. This study was conducted to systematically review and meta-analyse the effectiveness of training interventions to improve the quality of MCCOD. METHODS: This review was registered in the International Prospective Register of Systematic Reviews (PROSPERO; Registration ID: CRD42020172547) and followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. CENTRAL, Ovid MEDLINE and Ovid EMBASE databases were searched using pre-defined search strategies covering the eligibility criteria. Studies were selected using four screening questions using the Distiller-SR software. Risk of bias assessments were conducted with GRADE recommendations and ROBINS-I criteria for randomised and non-randomised interventions, respectively. Study selection, data extraction and bias assessments were performed independently by two reviewers with a third reviewer to resolve conflicts. Clinical, methodological and statistical heterogeneity assessments were conducted. Meta-analyses were performed with Review Manager 5.4 software using the ‘generic inverse variance method’ with risk difference as the pooled estimate. A ‘summary of findings’ table was prepared using the ‘GRADEproGDT’ online tool. Sensitivity analyses and narrative synthesis of the findings were also performed. RESULTS: After de-duplication, 616 articles were identified and 21 subsequently selected for synthesis of findings; four underwent meta-analysis. The meta-analyses indicated that selected training interventions significantly reduced error rates among participants, with pooled risk differences of 15–33%. Robustness was identified with the sensitivity analyses. The findings of the narrative synthesis were similarly suggestive of favourable outcomes for both physicians and medical trainees. CONCLUSIONS: Training physicians in correct certification improves the accuracy and policy utility of cause of death data. Investment in MCCOD training activities should be considered as a key component of strategies to improve vital registration systems given the potential of such training to substantially improve the quality of cause of death data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-020-01840-2. The Medical Certificate of Cause of Death (Fig. 1) is a standardised universal form recommended by the WHO for international use, which has been adopted by most WHO member states [6] . The WHO also provides instructions on correct cause of death reporting to improve the quality of medical certification and subsequent data [7] . When a single cause of death is reported on the death certificate, this becomes the underlying cause of death used for tabulation. When more than one cause of death is reported, the disease or injury which initiated the sequence of events that produced the fatal event becomes the underlying cause of death [6] . Despite the availability of guidance, errors in cause of death certification have been observed across all geographical regions, with inadequate certification by doctors remaining the principal reason for inaccurate death data [8, 9] . Over the past few decades, therefore, training medical doctors in death certification has become a key intervention employed by health services and national governments to improve mortality statistics. Interventions have included improvements in death certificate formats, training programmes on completion of death certificates, development of self-learning educational materials, implementation of cause of death query systems, periodic peer auditing of death certificates and increasing autopsy rates [10] [11] [12] . Several studies have investigated the effectiveness of interventions to improve the quality of death certification [13] [14] [15] . Whilst improvement in death certification accuracy is often reported, negative findings have also been published [16] . Moreover, there are few randomised controlled trials (RCTs) or similar studies that have produced high-quality evidence. A 2010 literature review identified 129 studies on the effectiveness of educational interventions for death certification, ultimately reviewing 14, including three RCTs [8] . All educational interventions identified in the review improved certain aspects of death certification, although the statistical significance of evaluation results varied with the type of intervention. Given the absence of any systematic review and metaanalysis of death certification training interventions, as well as the increase in experimental data produced in the past decade and the need-made even more urgent by the COVID-19 pandemic-to strengthen national vital registration and cause of death data systems, further evaluation is essential. In this study, we systematically review and meta-analyse the effectiveness of training interventions for improving the quality of medical certification of cause of death (MCCOD). To our knowledge, no study has specifically investigated interventions intended to reduce errors in MCCOD in a systematic review. This review was registered in the International Prospective Register of Systematic Reviews (PROSPERO; Registration ID: CRD42020172547). Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIS MA) guidelines were followed throughout the review process [17] . A comprehensive literature search was conducted to identify published articles investigating the effectiveness of training and education interventions to improve death certification (additional file 1: Fig. S1 ). The search was conducted on the CENTRAL, Ovid MEDLINE and Ovid EMBASE electronic databases, and returned 1060 results, which were exported to EndNote X9 citation manager and deduplicated. The remaining 676 studies were then limited to those published from 1994 onwards (where 1994 is the year ICD-10 was implemented) resulting in 616 studies for screening. This study aimed to assess the effectiveness of training interventions in improving the quality of MCCOD compared to generic academic training in training curricula for current, as well as prospective physicians (in randomised studies), or pre-intervention quality parameters (in non-randomised studies) [8] . Two reviewers (BPK and JS) independently reviewed each study against inclusion/exclusion criteria (additional file 2: Fig. S2 ). Studies were screened by titles and abstracts using DistillerSR online screening software. Full texts of 44 records were then reviewed, as well as an additional eight records that were identified from the study reference lists. All disputes were resolved by an expert third reviewer (LM). Researchers were blinded to each others' decisions. A total of 21 studies were included for data extraction and final analysis (Fig. 2 ). One reviewer extracted data from the selected studies (BPK), with findings then reviewed by a second reviewer (JS). Disputes were resolved independently by the third reviewer (LM). Selected studies were categorised under 'randomised' and 'non-randomised', and risk of bias was assessed by two reviewers (BPK and JS) with disputes resolved by the third reviewer (LM). Randomised trials were assessed using the seven domains of the GRADE recommendations, and non-randomised studies were assessed using the seven domains of ROBINS-I criteria [18, 19] . All studies were initially assessed for clinical and methodological heterogeneity [20] . Four interventions were eligible to undergo meta-analysis in relation to five outcomes. As these were before-and-after studies without control groups, the 'generic inverse variance method' was used in pooling [21] . Review Manager 5.4 software was used in the meta-analysis and the effect measure was 'risk difference' (i.e. percentage of death certificates with each error). Statistical heterogeneity was assessed using the I-square statistic and chi-square test. When potential outliers were removed in dealing with statistical heterogeneity, sensitivity analyses were performed with and without excluded studies [22] . Robustness of the effect measures was explored further using a sensitivity analysis with both fixed and random effect assumptions [22] . Potential publication bias was explored with the generation of funnel plots. The meta-analysis findings were imported through the 'GRADEproGDT' online tool. A 'summary of findings' table was prepared, and related narrative components added to the table [23] . The certainty assessments were done using eight criteria: study design, risk of bias, potential of publication bias, imprecision, inconsistency, indirectness, magnitude of effect, dose-response gradient and effect of plausible confounders [24] . Studies or subgroups that were not included in the meta-analysis were included in a narrative synthesis of findings. Within the 21 selected articles [13] [14] [15] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] , there were 24 distinct interventions, with one article describing four interventions across four countries [30] . In another, findings were stratified under two study populations [27] . Three were randomised controlled trials [13, 35, 37] and 21 were non-randomised interventions. Amongst the latter, one was a non-randomised controlled study [31] whilst the remainder were noncontrolled before-after studies. Characteristics of the selected studies are shown in Table 1 . In seven interventions, the study populations consisted of medical students [14, 15, 27, 29, 35, 39, 41] . These medical students were comprised of first year students (UK) [35] , medical trainees in teaching hospitals (Spain) [41] , third year students (USA) [14] and final year students (Fiji and Spain) [15, 29] . Generally, however, the study populations were physicians or doctors, and referred to as residents (Canada, USA, India) [13, 28, 34, 36] , medical interns (South Africa, Spain) [37, 39] , postgraduates (USA, India) [31, 36, 40] , secondary healthcare physicians (Bahrain) [26] , family doctors (Spain, Canada) [27, 33, 39] or Senior House Officers (England) [38] . Seminars, interactive workshops, teaching programmes and training sessions were the most common terms used in introducing the interventions. These ranged in duration from 45 min [13] to 5 h [27] , and some interventions included subsequent sessions on additional days [36] . Other descriptions included 'training of trainers' (Philippines, Myanmar, Sri Lanka) [30] , a video (UK) [35] and web-based or online training (USA, Fiji) [14, 15, 31] . In Peru, training was complementary to an online death certification system [32] . For the majority of interventions, a comparison of certification errors pre-and post-intervention was used as the measure of impact, although some studies developed a special knowledge test or used a quality Quasi-experimental study index. These included the Mid-America-Heart Institute (MAHI) Death-Certificate-Scoring System (two interventions) [13, 14] , knowledge assessment tests developed by the investigators (three interventions) [31, 35, 37] , and quality indexes providing numerical scores based on ICD volume 2 best-practice certification guidelines [15] . The risk of bias assessments for the randomised studies [13, 35, 37] are shown in Fig. 3a and in Fig. 3b for the non-randomised studies. For all randomised studies, 'blinding of participants and personnel' was assessed as high-risk given the difficulty of maintaining blinding for training interventions. All three studies had pre-determined outcomes and were rated low risk for 'selective reporting'. All but one study were before-after studies without a separate control group. Due to the method of recruitment, none of the studies was characterised as low-risk in relation to confounding and selection bias. However, since the intervention periods were clearly defined, all studies were characterised as lowrisk for 'bias in measurement classification of interventions'. Since the interventions targeting medical students were found to be clinically heterogenous, potential metaanalyses were restricted to those targeting physicians. In anticipation of substantial methodological heterogeneity, the meta-analysis was planned separately for nonrandomised studies. Findings of the studies and subgroups initially entered to the meta-analysis are summarised in additional file 3: Tables S1-S5. As the initial meta-analyses showed statistical heterogeneity, sensitivity analyses were performed after excluding a potential outlier in each comparison, with both fixed and random effect assumptions (Table 2) . Except for 'ill-defined underlying cause of death' [43] , the direction and significance of the estimates did not change with these sensitivity analyses. The forest plots of the five outcomes (i.e. after excluding the outliers) included in the meta-analyses are shown in Fig. 4a -e. Three interventions were included in each meta-analysis [30] . The lowest pooled risk difference (15%) was observed for 'multiple causes per line' and 'ill-defined underlying cause of death' whereas the highest was for 'no disease time interval' (33%). Funnel plots exploring potential publication bias are shown in Fig. 5a -e. All funnel plots were generally symmetrical. A cautious interpretation of these is included in the "Discussion" section. In the 'summary of findings' table (Table 3) , the certainty assessments of these five outcomes are presented. 'Moderate certainty' was assigned to four outcomes and 'low certainty' to one. Findings of related additional studies have also been summarised as comments in Table 3 . In two of the three randomised studies conducted on medical interns, overall scores improved with the intervention (p < 0.05) [13, 37] . In the third study, which was conducted on medical students, there was weak evidence for an improvement in the overall performance score (p = 0.046), as well as a 'skill score' (p = 0.066) [35] . In one study, 'correct identification of the COD' improved more in the intervention group (15% to 91%) compared to the control group (16% to 55%), and 'erroneous identification of cardiac deaths' decreased more with the intervention (56% to 6%) compared to the controls (64% to 43%) [13] . In a South African study, three errors ('mechanism only', 'improper sequence' and 'absence of time interval') were significantly reduced in the intervention group only, whereas 'competing causes' and 'abbreviations' were reduced in both groups [37] . Non-randomised study findings on medical students Degani et al. (2009) showed improvements in the modified-MAHI score following the intervention (mean difference of 7.1; p < 0.0001) [14] . Vilar and Perez (2007) reported improvements in 'at least one error' (p < 0.0001), including 'mechanism of death only' (p < 0.0001), 'improper sequence' (p < 0.0001), 'listing cause of death in Part 2' (p < 0.0001) and 'mechanism as UCOD' (p < 0.0001) [41] . In the same study, two error types ('abbreviations' and 'listing two causally related causes as COD') did not show evidence of improvement (p = 0.413 and p = 0.290) [41] . In a Fijian study, training produced improvements of 1.67% to 19.4% in the following: 'quality index score', 'average error rate', 'abbreviations', 'sequence', 'one cause per line', 'not reporting a mode of death' and 'legibility' [15] . In two Spanish studies, the intervention improved performance in 'sequence', 'cause of death', 'precision of terms', 'abbreviations' and 'legibility' [29, 39] . Case-wise comparisons with a set of errors were conducted in two studies [25, 27] . Most errors decreased following the intervention. In one non-randomised controlled study, a custom performance score increased post-intervention [31] . One study in England explored 'mentioning consultant's name' and 'completion by a non-involved doctor', both of which improved following the intervention [38] . In a Canadian study, 'increased use of specific diseases as UCOD' and 'being more knowledgeable on not using conditions like 'old age'' improved in the intervention group [33] . 'Competing causes' were less common post-intervention in two Indian studies, with varying strength of evidence (p = 0.001 and p = 0.069) [28, 36] , but not in a Canadian study (p = 0.81) [34] . 'Mechanism of death followed by a legitimate UCOD' showed non-significant reductions in three studies (45.9% to 36.1%, 13.5% to 7.8% and 16% to 6.6%) [28, 34, 36] . Other studies that assessed 'presence of at least one-major error' and 'keeping blank lines' in the sequence generally showed a reduction following the intervention [30, 34] . We conducted a systematic review of the impact of 24 selected interventions to improve the quality of MCCOD. Our meta-analysis suggests that selected training interventions significantly reduced error rates amongst participants, with moderate certainty (four outcomes), and low certainty (one outcome). Similarly, the findings of the narrative synthesis suggest a positive impact on both physicians and medical trainees. These findings highlight the feasibility and importance of strengthening the training of current and prospective physicians in correct MCCOD, which will in turn increase the quality and policy utility of data routinely produced by vital statistics systems in countries. The systematic approach we followed distinguishes this study from the more common 'narrative reviews', whilst the meta-analysis provides pooled and precise estimates of training impact [44] . Rigorous heterogeneity and 'certainty of evidence' assessments were performed. To enable a better comparison of the quality of the selected studies, risk of bias assessments were performed using different criteria for randomised and nonrandomised studies [18, 19] . Given the controversy surrounding conventional direct comparison methods for before-after studies in the literature-due to these methods' non-independent nature [45] -less controversial 'generic inverse variance methods' were used in this review. Irrespective of the study design (i.e. randomised or not) and population (i.e. physicians or medical students), training interventions were shown to reduce diagnostic errors, either in relative terms or due to an increase in scaled scores. Risk differences were used as pooled effect measures and typically suggested that certification errors decreased between 15 and 33% as a result of the training. Our findings also suggest that refresher trainings and regular dissemination of MCCOD quality assessment findings can further reduce diagnostic errors. However, due to the inherent limitations of using 'absolute risk estimates' like risk differences, we place greater emphasis on the direction of the effect measure and not on its size [46] . The pre-intervention percentages of all error categories selected for meta-analyses were below 51%, except for the category 'absence of time intervals', which ranged from 37 to 93% [30] . Based on post-intervention percentages, we therefore conclude that the intervention had a markedly favourable impact. For example, postintervention errors were reduced to between 6.0 and 20.8% for 'multiple causes in a single line' and between 5.8 and 20.3% for 'improper sequence'. For all interventions reviewed under the meta-analysis, posttraining assessments were conducted between 6 months and 2 years after the intervention. Hence, the observed risk differences reflect the impact of the intervention over a longer time period, which is likely to be a more useful measure of the sustainability and effectiveness of training interventions than the more commonly used immediate post-training assessments. The classification of errors into 'minor' or 'major' varies between studies. For example, 'absence of time intervals' was considered a major error in one study [32] , but minor in several others [28, 30, 34, 36] . Some studies, although not all, classified 'mechanism of death followed by a legitimate UCOD' as an error [26, 28, 34, 36, 40] -furthermore, the scoring method and content of the assessment varied between studies [13, 14, 31, 35, 37] . Given this heterogeneity, it is important to focus on the patterns of individual errors and to be clear about how errors are defined before comparing results across studies. Interestingly, we found greater variation across studies for post-intervention composite error indicators than for specific errors. Across the six interventions considered, post-intervention measures of 'at least one major error' ranged from 3.75 to 44.8% [30, 34, 40] whilst the fraction of cases with 'at least one error' ranged from 9 to 74.8% [30, 38, 41] . It is also interesting to note that doctors appeared to benefit less from the interventions compared to interns. This may in part reflect lower priority given by doctors to certification compared to patient management, possibly due to limited understanding of the public policy utility of data derived from individual death certificates. In some studies, it is possible that a small proportion of post-intervention death certificates were actually completed by doctors who had not undergone training. This would have the effect of diluting the impact estimates of the training interventions. Further, constructing the causal sequence on the death certificate may involve a degree of public health and epidemiological consideration, in addition to clinical reasoning, which may be challenging for some doctors to incorporate into the certification process. This could explain the general lower improvement scores reported for the causal sequence. Finally, correct certification practices are heavily dependent on the attitudes of doctors towards the process, as well as the level of monitoring, accountability and feedback related to their certification performance. Most interventions were conducted as interactive workshops that enabled participants to undergo 'on-thespot' training [13, 25-30, 33, 34, 36, 37, 41] . There is a paucity of studies with control groups that compare different interventions. One study concluded that a 'faceto-face' intervention was more effective than 'printed instructions' [13] . However, another concluded that an added 'teaching session' did not improve performance compared to an 'education handout', although both strategies were independently effective [13, 37] . More research is required to test the relative effectiveness of training methods, such as online interventions, compared to those requiring face-to-face interaction. Our analysis suggests several cost-effective options for improving the quality of medical certification. To the extent that individual-level training of doctors in correct medical certification is costly, strengthening the curricula in medical schools designed to teach medical students how to correctly certify causes of death, and ensuring that these curricula are universally applied, is likely to be the most economical and sustainable way to improve the quality of medical certification. How and when this training is applied prior to completion of medical training is likely to vary from one context to another and will depend on local requirements for internship training. Training smaller groups of physicians as master trainers in medical certification and subsequently rolling out the training in provincial and district hospitals is likely to be an effective and economical interim measure to improve certification accuracy, as has been demonstrated in a number of countries [30] . In some countries, electronic death certification has been used as a means to standardise and improve the quality of cause of death data [32] . Electronic death certification can be helpful in avoiding certain errors such as illegible handwriting and reporting multiple causes on In one Sri Lankan study, ill-defined underlying cause of death was observed to be higher postintervention (10.6% versus 4.4%) GRADE Working Group grades of evidence. High certainty: We are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low certainty: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect CI Confidence interval a Due to being non-randomised studies and since in some studies, pre-and as post-analyses were not done immediately close to the intervention; the bias due to confounding was marked as 'serious' b Funnel plot not fully symmetrical in one study that underwent meta-analysis *The risk in the intervention group (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI) a single line (by not allowing the certifier to report more than one condition per line) [47] . An electronic certification system can also generate pop-up messages to remind the certifier not to report modes of dying, or symptoms and signs, as the underlying cause. However, electronic certification cannot improve the accuracy of the causal sequence or alleviate the reporting of competing causes, unspecified neoplasms or non-reporting of external causes. Furthermore, whilst cause of death data entered in free text format could improve the quality of medical certification [48] when electronic certification is enhanced with suggested text options and 'pick' lists, this can lead to systematic errors in medical certification. This review has several limitations. The studies examined in this review included a diverse range of participants and intervention methods and were conducted in various cultural settings. The duration and modality of the training interventions varied substantially across studies. Only three interventions were randomised, and due to the diversity in non-randomised studies, the potential influence of confounding factors on the quality parameters assessed cannot be excluded. These factors were, however, considered in risk of bias and heterogeneity assessments. There is also considerable subjectivity in the assessment of some criteria, including 'legibility' and 'incorrect sequence' that could lead to bias in the assessments. Despite outcomes usually being pre-defined, adherence to risklowering strategies, such as 'blinding the assessor', was often not described [14, 15, 25, 26, 28-33, 36, 38-42] . Despite the inclusion of only three interventions, each meta-analysis included an adequate number of at least 1500 observations per group. Even though funnel plots were presented for gross exploration of publication bias, generally the interpretation of these are recommended for meta-analyses with more than 10 comparisons. Furthermore, little evidence is available on the appropriateness of funnel plots drawn with risk differences [49] . Death certificates: why it matters how your patient died Improving the quality and use of birth, death and cause-of-death information: guidance for a standards-based review of country practices Department of Health and Human Services Centres for Disease Control and Prevention NCHS Who needs cause-of-death data Mortality certification and cause-of-death reporting in developing countries International Statistical Classification of Diseases and Related Health Problems. 10th Revision World Health Organization. ICD10 International statistical classification of diseases and related health problems-volume 2-Instruction Manual Teaching cause-of-death certification: lessons from international experience Errors in the certification of deaths from cancer and the limitations for interpreting the site of origin Improvement of the quality and comparability of causes-of-death statistics inside the European Community. EUROSTAT Task Force on "causes of death statistics Improving cause-of-death statistics A critical assessment of mortality statistics in Thailand: potential for improvements Improving death certificate completion: a trial of two training interventions The effect of student training on accuracy of completion of death certificates An accessible method for teaching doctors about death certification The effect of educational intervention on medical diagnosis recording among residents The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration GRADE guidelines: 4. Rating the quality of evidence--study limitations (risk of bias) ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions What is heterogeneity? : The Cochrane Colloboration Effect estimates and generic inverse variance meta-analysis: The Cochrane Colloboration Strategies for addressing heterogeneity: The Cochrane Colloboration GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes GRADE guidelines: 3. Rating the quality of evidence Programa piloto para la mejora de la certificación de las causas de muerte en atención primaria en Cataluña (A pilot program to improve causes of death certification in primary care of Catalonia, Spain) Improving the accuracy of death certification among secondary care physicians B-learning training in the certification of causes of death Impact of an educational intervention on errors in death certification: an observational study from the intensive care unit of a tertiary care teaching hospital Eficacia de un seminario informativo en la certificacion de causas de muerte (efficacy of an informative seminar in the certification of causes of death) Improving medical certification of cause of death: effective strategies and approaches based on experiences from the Data for Health Initiative Integrating public health-oriented e-learning into graduate medical education Saving lives through certifying deaths: assessing the impact of two interventions to improve cause of death data in Peru Death duties: workshop on what family physicians are expected to do when patients die Improving the accuracy of death certification Death certification: production and evaluation of a training video Educational intervention to improve death certification at a teaching hospital Death certificates: let's get it right! S A good death certificate: improved performance by simple educational measures Aprendizaje y satisfaccion de los talleres de pre y postgrado de medicina para la mejora en la certificacion de las causas de defuncion, 1992-1996 (Learning and Satisfaction of the Workshops for Pre and Postgraduates of Medicine for the Improvement in the Certification of the Causes of Death, 1992-1996) Assessment of standards in issuing cause of death certificate before and after educational intervention Evaluating an educational intervention to improve the accuracy of death certification among trainees from various specialties Death certification in Northern Alberta: error occurrence rate and educational intervention Avoiding ill-defined and unusable underlying causes Systematic review and meta-analysis: when one study is just not enough Pre-post effect sizes should be avoided in meta-analyses Implementing GRADE: calculating the risk difference from the baseline risk and the relative risk Pertinence of electronic death certificates for real-time surveillance and alert Recommendations on testing for funnel plot asymmetry Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors would like to acknowledge Sara Hudson and Avita Streatfield of the University of Melbourne for proofreading and editing the manuscript.Authors' contributions USHG and ADL conceptualised the review. ADL supervised and guided the overall review. USHG, PKBM, JS, HC, HL, DM and ADL contributed in developing the protocol. USHG obtained the registration of the review. PKBM and JS conducted the study search. PKBM, JS and LM contributed in study selection and risk of bias assessments. PKBM, JS and JH contributed in the data extraction. PKBM conducted the meta-analysis. DM and USHG did the overall coordination of the review. All authors contributed in drafting of the initial manuscript. JH, HC, HL, LM, DM and ADL were involved in revising the manuscript. All authors read and approved the final manuscript. The subscription for the DistillerSR application was funded by the Bloomberg Philanthropies data for health initiative of the University of Melbourne. All data generated or analysed during this study are included in this published article (and its supplementary information files). Ethics approval is not applicable for this review of previously conducted studies. The authors declare that they have no competing interests.Received: 20 August 2020 Accepted: 3 November 2020 Both pooled estimates and narrative findings demonstrate the effectiveness of training interventions in improving the accuracy of death certification. Meta-analyses revealed that these interventions are effective in reducing diagnostic errors, including 'no time interval', 'using abbreviations', 'improper sequence', 'multiple causes per line' with moderate certainty and 'ill-defined underlying CoDs' with 'low certainty'. In general, 'no time interval' was observed to be the most common error, and 'illegibility' the least observed amongst pre-intervention errors. 'No time interval' appeared to be the error with most improvement following intervention, as evidenced by both the pooled and narrative findings.Strategic investment in MCCOD training activities will enable long-term improvements in the quality of cause of death data in CRVS systems, thus improving the utility of these data for health policy. Whilst these findings strengthen the evidence base for improving the quality of MCCOD, more research is needed on the relative effectiveness of different training methods in different study populations. From the limited evidence thus far, our meta-analysis indicates that training doctors and interns in correct cause of death certification can increase the accuracy of certification and should be routinely implemented in all settings as a means of improving the quality of cause of death data. The online version contains supplementary material available at https://doi. org/10.1186/s12916-020-01840-2.Additional file 1: Figure S1 . Search strategy used in the review of literature.Additional file 2: Figure S2 . Selection criteria used in study selection.Additional file 3: Tables S1-S5. Data used for meta-analysis.