key: cord-0859095-1ynjplf7 authors: Neumann, Ignacio; Quiñelen, Eduardo; Nahuelhual, Paula; Burdiles, Pamela; Celedón, Natalia; Cerda, Katherine; Herrera-Omegna, Paloma; Kraemer, Patricia; Cancino, Karen Dominguez; Valenzuela, Juan Pablo; Sepúlveda, Dino; Morgano, Gian Paolo; Akl, Elie A.; Schünemann, Holger J. title: Using Explicit Thresholds for Benefits and Harms in partially contextualized GRADE Guidelines. Pilot experience from a living COVID-19 guideline date: 2022-03-30 journal: J Clin Epidemiol DOI: 10.1016/j.jclinepi.2022.03.017 sha: a64083d2fdedcafc7550d1b6a7ab9cf16d7bebce doc_id: 859095 cord_uid: 1ynjplf7 Objective Guideline panels must assess the magnitude of health benefits and harms to develop sensible recommendations. However, they rarely use explicit thresholds. In this paper we report on the piloting and the use thresholds for benefits and harms. Study Design and Setting We piloted the use of thresholds in a Chilean COVID-19 living guideline. For each of the critical outcomes, we asked panelists to suggest values of the thresholds for large, moderate, small, or trivial or no effect. We collected this information through a survey and an on-line discussion. Results Twelve panelists decided on thresholds for 3 critical outcomes (mortality, need for mechanical ventilation and serious adverse events). For all outcomes, an absolute risk reduction was considered large with more than 50 events, moderate with less than 50 events, small with less than 25 events, and trivial with less than 10 events. Having these a priori thresholds in place significantly impacted on the development of recommendations. Conclusions Explicit thresholds were a valuable addition to the judgment of the certainty in the evidence, to decide the direction and strength of the recommendation and to evaluate the need for update. We believe this is a line of research worth perusing. Guideline panels must assess the magnitude of the potential health benefits and harms to develop sensible recommendations. Structured and transparent processes, e.g., application of the GRADE Evidence to Decision (EtD) Framework, 1 were designed to make transparent the judgements about the size of the benefits and the size of the harms. The EtD framework categorizes these effects sizes large, moderate, small, or trivial or no effect. However, guideline panels, until now, rarely use explicit thresholds for these categories. Rather, panelists reach an implicit agreement through discussion and consensus. Thresholds may be also needed to judge the certainty of evidence. 2 In the context of guidelines, panelists may choose to use a partially contextualized approach, in which the judgment of the certainty of the evidence reflects the confidence that the true effect lies within a specific range (large, moderate, small, or trivial or no effect). In Chile, since August 2020, the Ministry of Health and several National Scientific Societies have been developing and maintaining a GRADE living guideline about management options for COVID-19. 3 Over the span of 18 months, the guideline panel has made recommendations and updated them in the light of new evidence. Maintaining consistency in judgments over an extended period was a challenge, and thus, we developed thresholds for the different criteria in the EtD Framework to anchor future discussions. In this paper we report on the piloting and the use thresholds for benefits and harms. Chilean COVID-19 living guideline is an on-going national effort. It currently has 25 recommendations: 21 about pharmacologic interventions, 2 about diagnostic test and 2 about non-pharmacologic interventions. The guideline panel has 12 content experts (from relevant scientific societies and front-line clinicians), 2 methodological coordinators and 2 representatives from payers and administrators. Content experts comes from different disciplines: Five are specialists in infectious diseases, two in respiratory diseases, two in emergency medicine, one in critical care, one in general internal medicine and one in palliative care. Additionally, the work of the guidelines is supported by a team of 6 methodologists who conduct and update systematic reviews. Half of the content experts had participated in an evidence-based guidelines before, and half were new to the process. At the time of this report, none of the participants, content experts nor methodologists, had conflict of interests with any of the recommendation addressed in the guidelines. After the initial nine recommendations, we obtained estimates of the thresholds for large, moderate, small, or trivial or no effect for benefits and harms from the twelve content experts. First, using surveymokey (www.surveymonkey.com), we designed a survey and asked each panelist what magnitude of the effect they considered large, moderate, small, or trivial or no effect for each of the critical outcomes: mortality, need of mechanical ventilation and serious adverse events. The magnitude of such outcomes was considered independently following the GRADE partially contextualized approach. We provided 4 independent questions for each outcome, asking a specific category, following the formula "in your opinion, what magnitude of the effect corresponds to a large/moderate/small/trivial effect?". We framed these questions as neutral statement without introducing a specific direction. For the outcomes mortality or need of mechanical ventilation, we did not provide a detailed description, since our respondents were clinicians and were aware of what those outcomes entail. However, for serious adverse events, we developed a standard description of the outcome for consistency purposes. We described it as follows: negative effects of the intervention that are serious enough to discontinue the treatment but resolves spontaneously or with specific treatment after discontinuation. The response options were sliding bars going from 1 per 1000 to 200 per 1000. In each question, panelists had to select the magnitude they believed corresponded to effect size that was being asked. The second step in our process was to reach consensus based on the survey results. We averaged the responses of the panelist and presented them in a formal meeting. Through discussion and consensus, the panel agreed on the final threshold for all the critical outcomes: for simplicity they decided to use the multiple of 5 that was closer to the average of the survey. Additionally, they decided to place the same value in all the critical outcomes and hence, to use the same thresholds for the three of them. The results are presented in table 1 (and conceptualized in figure 1 ). As can be observed, for different outcomes, the magnitude of the effect that correspond to each category were different. However, during the panel meeting, though discussion, panelists decided to use the same thresholds for the three critical outcomes. J o u r n a l P r e -p r o o f How thresholds impacted recommendations? In the context of GRADE partially conceptualized approach, panelists must judge for each outcome, what is the confidence that the true effect of an intervention lies within a specific range (large, moderate, small, or trivial or no effect). Having explicit thresholds facilitated judgments about precision. For example, our meta-analysis of 3 randomized trials (n=4,628) showed that in patients with COVID-19, the use of colchicine may reduce mortality (RR 0.47, 95%CI 0.18-1.23) and the need of mechanical ventilation (RR 0.47, 95% CI 0.24-0.94). 3 By focusing on the relative estimates, a particular panel may decide to rate down the certainty of the evidence of mortality by imprecision, given that the confidence interval includes no effect and potential harm. Using this approach, the need of mechanical ventilation likely would be considered precise. However, since mortality is a critical outcome, rating it down for imprecision would lead to a lower overall certainty of the evidence. Another example is the use of budesonide in non-hospitalized COVID-19 patients. Our metanalysis of 2 randomized trials showed that budesonide did not impact on outcomes such as mortality or need for mechanical ventilation. However, it did reduce the chance of hospitalization (RR 0.71, 95% CI 0.53-0.95). 3 The patients enrolled in the trials, however, were in general at high risk of hospitalization: older than 65 years and with significant comorbidity. This contrasts with the usual practice, where most patients with COVID-19 do not require hospitalization. We contextualized this finding using appropriate baseline risks. To do this, we took as moderate risk of hospitalization the national average during the pandemic: 8% 5 ; and to illustrate high and low risk, we multiplied the average by 2 for high risk, and divided it by four for low risk (although arbitrary, panelists considered it an appropriate range, given the lack of more precise data). As we see in figure 3 , these baseline risks led to confidence intervals around the absolute effects of different width. For lower risk patients, the confidence interval around the absolute effect (from 1 to 9 fewer per 1000) was entirely within what our panel considered a priory a trivial effect. For moderate (average) risk patient, the confidence interval (4 to 38 fewer per 1000) crossed the trivial, small, and moderate effect. Finally, for high-risk patients, the confidence interval crossed the entire range of potential benefits. This may be interpreted that there is more uncertainty regarding the effect of budesonide in high or moderate-risk patients than in low-risk Significant increase of adverse events (Diarrhea: 70 more per 1000; 95% CI 46 to 97 more) Inexpensive drug J o u r n a l P r e -p r o o f patient. The benefit seems to exist in all, however, the confidence interval of the absolute estimate in low-risk patient is precise enough to be confident that the benefit is trivial in magnitude. The confidence interval for high-risk patients, on the other hand, is so wide that may be appropriate to further penalize it by imprecision: for different baseline risks, the certainty of the evidence seems to be different, even within the same outcome. An upcoming GRADE guidance paper will provide rules to operationalize the judgements about precision using explicit thresholds. In the examples presented on the previous section, the observation that the use of colchicine likely will lead to a trivial benefit, in addition to the increment in adverse events, led to a recommendation against its use. In the example of budesonide, given the differences in absolute effects, the panel decided to make two separate conditional recommendations: one against its use in low-risk individuals and one in favor in moderate and high-risk patients (table 2) . In another example, our meta-analysis of 10 randomized trials (n=6,700) showed that the use of tocilizumab may be associate with a lower mortality (RR 0.84, 95% CI 0.75-0.94). 3 Here, there were no serious concerns regarding risk of bias, inconsistency, indirectness, or publications bias. With this evidence, is perfectly reasonable for a panel to make a recommendation in favor of using tocilizumab in settings where tocilizumab is accessible and affordable. No significant increase of adverse events Inexpensive drug, widely available J o u r n a l P r e -p r o o f In our case, however, there were some concerns with access and affordability, and hence, also with equity. We decided to explore further the effect of tocilizumab considering different groups of patients with different baseline risks. The largest study evaluating tocilizumab vs usual care until now is the RECOVERY trial. 6 Investigators categorized patients according to the degree of respiratory support at randomization: Non-ventilatory support, non-invasive ventilation, and invasive mechanical ventilation. Mortality for those groups in the control arm were 23%, 42% and 51%, respectively. No interaction between the severity of the disease at randomization and the effect of tocilizumab was detected. Therefore, the authors appropriately concluded that the benefit of tocilizumab was observed in a wide range of patients regardless of the level of respiratory support. One immediate concern with these data is the unexpected high mortality in patients with no respiratory support. Thus, we decided to use a more conservative estimate from a systematic review of observational studies, which reported a mortality in hospitalized COVID-19 patients of 5%. 7 With these data, we estimated that in hospitalized patients with a mild COVID-19 (death risk of 5%), the use of tocilizumab would result in a trivial effect (8 fewer deaths, 95% CI from 3 to 13 fewer). In contrast, in severe patients (death risk of 51%) the effect is large (82 fewer deaths, 95% CI from 31 to 128 fewer). The use of tocilizumab did not significantly increase the risk of adverse events (RR 0.93, 95% CI 0.78-1.10). So basically, the direction of the recommendation had to balance the size of the benefits of the intervention with its cost and the existence of a limited stock. Having the explicit thresholds in place greatly facilitated discussion and increased the transparency of the decision. A trivial benefit in hospitalized patients with no respiratory support was not enough to justify the use of tocilizumab in a context of resources, feasibility, and equity concern. However, the situation was different for patients who may get a large mortality reduction. This analysis led to recommendations with opposite directions for different group of patients ( figure 4 and table 2 ). Another example is use of dexamethasone in COVID-19 patients, which results in a significant reduction of mortality (RR 0.90, 95% CI 0.83-0.98, 3 RCTs n=6774). However, the RECOVERY trial, 4 the largest of the three studies available, showed compelling evidence of an interaction between the effect of dexamethasone and the severity of patients. The relative effect of the intervention had a gradient with the severity of the disease and the interaction test was statistically significant. 8 Considering this interaction, and the influence of baseline risk, we estimated that in severe patients (death risk 44%) the use of dexamethasone will result in 120 fewer deaths per 1000 patients (95% CI from 58 to 174 fewer). In moderate patients, however, with a death risk 26%, the use of dexamethasone will prevent 29 deaths per 1000 (95% CI 0 to 55 fewer). There were no concerns regarding risk of bias, inconsistency, indirectness, or publications bias. However, as before, the precision of the estimates for different baseline risk was not the same: in severe patients, the confidence interval was entirely within the boundaries of a large effect. In contrast, for moderate patients, the confidence interval crossed the entire range of benefits. Accordingly, we rated down the certainty of the evidence for this group by imprecision. Dexamethasone in relative low doses and for a short period of time was associated with mild adverse effects (mainly hyperglycemia). Also, is not an expensive drug, and it is widely available and accessible. Considering that at the time was the only intervention that provided a mortality reduction, the threshold for a strong recommendation in favor of dexamethasone in COVID-19 patients was relatively small. In the case of severe patients, this threshold was clearly achieved, and the panel quickly decided a strong recommendation in favor. However, given the striking differences in the absolute effect (120 vs 29 fewer deaths per 1000) and the differences in the 5 and table 2) . However, it worth noting that the recommendation in favor of dexamethasone in severe patients is based in high certainty evidence and is "stronger" than the recommendation for moderate patients, which in is based in moderate certainty evidence was "a significant change in the effect estimates" for any of the critical outcomes. Here, having the thresholds defined a priori was vey useful. For example, before the publication of solidarity trial, 9 our meta-analysis about the use of remdesivir in COVID-19 patients suggested a benefit in mortality but with imprecise values (RR 0.76, 95% CI 0.57-1.01, 3 trials n=1882). 3 Using the median of baseline risk observed in the control groups (12%, to standardize calculations) we estimated an absolute effect of 29 fewer deaths per 1000, (95% CI 52 fewer to 1 more). This effect, according to our pre-defined thresholds was May lead to hyperglycemia (28 more per 1000) Negligible resources required No feasibility or Equity concerns J o u r n a l P r e -p r o o f considered moderate, and the certainty of the evidence low (there were some concerns regarding the risk of bias of included trials and the results was judged imprecise, since included trivial benefit and harm). With the publication of the Solidarity trial, the pooled estimate changed to RR 0.93, 95% CI 0.81 to 1.06. Using the same baseline risk as before, the absolute effect moved from a moderate to a trivial benefit: 8 fewer deaths per 1000, 95% CI 23 fewer to 7 more) ( Figure 6 ). The certainty of the evidence did not change, since the new estimate was considered still imprecise because included a small benefit. However, the change in the point estimate was enough to trigger a recommendation update. This is similar to what has happened with other living guidelines when the effect estimates changes. For example, the American Society of Hematology guidelines on the use of anticoagulation for thromboprophylaxis in patients with COVID-19, changed their recommendations as consequence of an evidence update. 10 Having the thresholds in place from the beginning may help to expedite the process. Ideally, when judging the size of benefits and harms in a partially contextualized approach, thresholds should come from empirical data: what users of guidelines may consider a small, moderate, or large benefit or harm. A survey of guideline panelists and users is underway. 11 Given the emergency of the COVID-19 pandemic, we took a pragmatic approach to identify empiric thresholds and estimated them directly from panelists appreciation. One particularity of our pilot experience is that panelists placed the same value on all the critical outcomes. This is unlikely to be the case for most guideline recommendations since evidence point that different outcome has not the same importance for patients. 12 It is interesting how panelists decided to match the thresholds for mortality and mechanical ventilation, even when the survey results suggested that a priory, they placed more value in avoiding mortality. One potential explanation is the unique circumstances of the pandemic, where the excessive number of severe patients quickly overwhelmed critical care units and there was an actual shortage of mechanical ventilators. In these extreme circumstances, considering avoiding mechanical ventilation equally important than mortality is a sensible decision from a public health perspective. However, this is likely an exception. Another particularity of our pilot experience is we assumed that the thresholds for benefits and harms were symmetrical. Although pragmatical and easy to implement, this may not be the case if guideline users place a different value in potential benefits and harms within the same outcome. For example, patients likely will place more value in avoiding an increase of mortality, thus, lower thresholds on the "harm side" may be appropriate. In fact, for some outcomes it may be not necessary to quantify the size of the harms; once a critical threshold of harm is surpassed, the intervention may be no longer acceptable for patients and clinicians. One limitation of our work is we did not assess the utilities of critical the outcomes. To stablish thresholds in a sensible way, panels should consider the dyad of magnitude and the relatively importance of the effect. Making these values explicit may even lead to more quantitative approaches, like estimating the net benefit or the net harm. 13 Another limitation is our sample size: only one panel. However, our pilot experience was very positive: the use of explicit Using explicit thresholds for benefits and harms in a real panel helped to maintain consistency and to enhance transparency in living guidelines. Further, they were a valuable addition to the judgments of the certainty in the evidence, to decide the direction and strength of the recommendation and to evaluate the need for update. We believe this is a line of research worth perusing GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines The GRADE Working Group clarifies the construct of certainty of evidence Evidence based recommendations for management of COVID-19 Dexamethasone in Hospitalized Patients with Covid-19 Tocilizumab in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial Prevalence and risk factors of mortality among hospitalized patients with COVID-19: A systematic review and Meta-analysis Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and metaanalyses Repurposed Antiviral Drugs for Covid-19 -Interim WHO Solidarity Trial Results American Society of Hematology 2021 guidelines on the use of anticoagulation for thromboprophylaxis in patients with COVID-19 Defining decision thresholds for judgments on health benefits and harms using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Evidence to Decision (EtD) frameworks: a protocol for a randomized methodological study Sent to publication Patient values and preferences regarding VTE disease: a systematic review to inform American Society of Hematology guidelines Defining certainty of net benefit: a GRADE concept