key: cord-0720340-46e7xb7b authors: Marfori, Cherie Q.; Klebanoff, Jordan S.; Wu, Catherine Z.; Barnes, Whitney A.; Carter-Brooks, Charelle M.; Amdur, Richard L. title: Reliability and Validity of Two Surgical Prioritization Systems for Reinstating Non-Emergent Benign Gynecologic Surgery During the COVID-19 Pandemic date: 2020-07-30 journal: J Minim Invasive Gynecol DOI: 10.1016/j.jmig.2020.07.024 sha: a7c49c418d54859302a7b5351b3c365d30054b65 doc_id: 720340 cord_uid: 46e7xb7b STUDY OBJECTIVE: Scientifically evaluate the validity and reproducibility of two novel surgical triaging systems, as well as offer modifications to the MeNTS criteria for improved application in gynecologic surgeries DESIGN: Retrospective cohort study SETTING: Academic university hospital PATIENTS: 97 patients with delayed benign gynecologic procedures due to the COVID-19 pandemic INTERVENTION(S): Surgical prioritization was assessed using two novel scoring systems, the Gyn-MeNTS and mESAS systems for all 93 patients included MEASUREMENTS AND MAIN RESULTS: The inter-rater reliability and validity of two novel surgical prioritization systems (Gyn-MeNTS and mESAS) were assessed. Gyn-MeNTS scores were calculated by three raters and analyzed as continuous variables, with a lower score indicating more urgency/priority. The mESAS score was calculated by two raters and analyzed as a 3-level ordinal variable with a higher score indicating more urgency/priority. All five raters were blinded to reduce bias. Gyn-MeNTS inter-rater reliability was tested using Spearman r and paired t-tests were used to detect systematic differences between raters. Weighted kappa indicated mESAS reliability. Concurrent validity with mESAS and surgeon self-prioritization (SSP) was examined with Spearman r and logistic regression. Spearman r's for all Gyn-MeNTS rater pairs were above 0.80 (0.84 for 1 vs. 2, 0.82 for 1 vs. 3, 0.82 for 2 vs. 3, all p<.0001) indicating strong agreement. The weighted kappa for the 2 mESAS raters was 0.57 (95% CI 0.40-0.73) indicating moderate agreement. When used together, both scores were significantly independently associated with SSP, with strong discrimination (AUC 0.89). CONCLUSIONS: Inter-rater reliability is acceptable for both scoring systems, and concurrent validity of each is moderate for predicting SSP, but discrimination improves to a high level when they are used together. The respiratory illness caused by the novel coronavirus SARS-CoV-2 has upended global healthcare delivery. In March 2020, the American College of Surgeons (ACS), the US Surgeon General, and the Centers for Medicare & Medicaid services (CMS) all made recommendations to cancel elective surgery throughout the US in order to preserve resources to treat those critically ill and to reduce transmission between doctors and patients. [1] [2] [3] However, the term elective does not mean unnecessary, and postponement of these surgeries may lead to significant morbidity or even mortality. Surgeons are faced with the potentially overwhelming and morally exhausting task of prioritizing postponed patients within their departments and hospital systems. In response, The American College of Surgeons (ACS) developed a tiered ranking system for prioritization of elective surgeries based on the risk of delay and patient co-morbidities, the Elective Surgery Acuity Scale (ESAS). 4 While this system provides a framework, many hospitals and surgeons find implementation difficult given its vagueness. Recognizing these limitations, the Society for Gynecologic Surgeons (SGS) issued a joint statement which applied the ACS tier to numerous benign gynecologic procedures to serve as a guide of acuity. 5 Additionally, a third and novel scoring system by Prachand et al serves to individually rank nonemergent surgeries coined Medically-Necessary, Time-Sensitive (MeNTS) procedures. 6 This scoring system attempts to objectively prioritize surgeries by grading 21 factors within the broad categories of procedure variables, disease state, and comorbidities. All of these scoring systems are based on expert opinion and consensus, and none have been externally validated nor tested for reliability. At our institution, the Division of Benign Gynecologic Surgery incurred a large number of case cancellations with the orders to halt elective surgery. We recognized a need for a reproduceable system beyond a three-tiered scale to help triage and prioritize affected patients. We took great interest in the MeNTS tool as a potential solution. While our institution elected not to adapt the MeNTS criteria system-wide, our Division sought to critically evaluate it as a solution to ethically and efficiently prioritize patients beyond the ACS criteria. Our objective was to scientifically evaluate the validity and reproducibility of both of these triaging systems, as well as offer modifications to the MeNTS criteria for improved application in gynecologic surgeries. After obtaining IRB exemption (IRB#NCR202525), we performed a single-center retrospective cohort study evaluating the inter-rater reliability and validity of two novel prioritization systems, the SGS adaptation of the ESAS (mESAS) and a modified version of the MeNTS tool (Gyn-MeNTS). A total of 97 benign gynecologic procedures were affected between March 16, 2020 and April 30, 2020 in our tertiary academic institution in Washington, DC, including cases from eight General Obstetrician Gynecologists, two Minimally-Invasive Gynecologic Surgeons, a Urogynecologist, and a Gynecologic Oncologist. We excluded three patients who had not completed their preoperative evaluations and thus the severity of their disease was unclear. One additional patient was excluded given she suffered from a sub-arachnoid hemorrhage and was no longer eligible for her planned procedure. Beginning March 16, as required by our hospital system, all patients were categorized by their respective surgeons into one of three categories based on their level of morbidity if delayed. Level 1 indicated no morbidity with delay, level 2 indicated some morbidity with delay, and level 3 indicated significant morbidity and/or mortality with delay. Only those deemed level 3 were initially allowed to proceed as scheduled. Level 1 and 2 would be postponed until further notice, likely to extend 2-3 months. For the purpose of the validity analysis, those patients who were allowed to proceed with surgery will be referred to as "Urgent SSP". The ESAS scale divides elective surgeries into three tiers, low acuity (tier 1) defined as "non life-threatening illness", intermediate acuity (tier 2) defined as "nonlife-threatening but with potential for future morbidity and mortality", and high acuity (tier 3). Additionally, each tier is further divided into subtype A (healthy patients) or subtype B (unhealthy patients) to better discriminate patient risk and hospital resource use. The ACS recommended proceeding with tier 3 surgery and postponing tier 1 and 2 surgery (or performing at an ambulatory surgery-center). 4 The joint statement published by SGS on April 28 applied the ACS tiered system to numerous benign gynecologic procedures ( Figure 1 ). 5 This framework categorizes specific surgical procedures taking into consideration their indications and severity of disease symptoms. Two authors (WAB and CMCB) assigned all patients into one of the three mESAS tiers by reviewing the patient's electronic medical record. These authors were queried about their awareness of the alternative MeNTS scoring system and, after finding them unfamiliar, were instructed to intentionally proceed in this manner to reduce potential bias towards another scoring system. The original MeNTS scoring criteria attempts to objectively prioritize surgeries by grading 21 factors within the broad categories of procedure variables, disease state, and comorbidities. 6 The cumulative score ranges between 21 and 105 and serves as a rank in priority, with lower numbers equating to greater priority (Table 1) . Higher scores equate to poorer perioperative outcomes, higher hospital resource utilization, increased risk of COVID-19 transmission, and an increased ability to safely defer surgery. When attempting to apply the MeNTS model to our gynecologic patients, we felt adaptations could be made to improve clarity, objectivity, and validity. Specific adaptations made to the original MeNTS score included ( 3. Instead of calculating "non-operative treatment effectiveness percentage" and "exposure risk", we transformed these two variables into five distinct categories pertinent to the gyn surgical patient including: whether and how many alternative therapies have been tried, the presence/severity of pain, the presence/severity of anemia, the impact on desired immediate fertility, and the impact on adjacent genitourinary and gastrointestinal systems. 6. Removing the variable "influenza-like illness symptoms." In our opinion, any patient demonstrating these symptoms should continue to be delayed until resolution. 7. Limiting the options for "exposure to known COVID positive patient in the last 14 days" to improve reproducibility. At our institution, we implemented universal COVID testing for all patients undergoing emergent and scheduled surgery. Three authors (CQM, JSK, CZW) adapted and applied the modified Gyn-MeNTS scoring system to all patients. All five authors were blinded to each other's scores to reduce bias. Authors could not be blinded from the Surgeons' Self-Prioritization (Urgent SSP) given these surgeries were performed as scheduled and could be elicited from chart review. Variables. Gyn-MeNTS scores were calculated by three reviewers and analyzed as continuous variables with possible scores ranging from 21-105. The lower the score, the more prioritized the surgery would be. The mESAS score was calculated by two different reviewers and analyzed as a three-level ordinal variable (1/2/3) with a higher score indicating more urgency/prioritization. Finally, the SSP score was made into a binary variable with those considered highest priority (Urgent SSP) separated from those considered lower priority. Reliability. The three Gyn-MeNTS raters' scores were examined using Spearman r to determine the level of monotonic association, and paired t-tests to determine whether there were systematic differences between raters. A relevant systematic difference was indicated by a mean difference > 0.1 along with a significant difference on the paired t-test. If all Spearman r's were above 0.80, this indicated strong inter-rater reliability. In order to determine which Gyn-MeNTS items had the worst reliability, agreement between raters' Gyn-MeNTS item scores were examined using percent exact agreement, rather than kappa, because raters used different numbers of categories on several items. Concurrent Validity. We took the mean of the Gyn-MeNTS scores and examined the Spearman r of this score with the mean of the mESAS scores, and with Urgent SSP as measured by actual scheduling (a binary variable, Yes/No). Concurrent validity for the Gyn-MeNTS score was indicated by strong Spearman r with both the mESAS score and with Urgent SSP. We also examined whether Urgent SSP could be predicted independently using both the Gyn-MeNTS score and the mESAS rating in a multivariable logistic regression model. If both were significant independent predictors, we then used the log-linear equation produced by the regression model to calculate each patient's probability of being classified as Urgent SSP and examined the association of probability quartile with Urgent SSP status using chi-square. We examined the distribution of mESAS rating with the Gyn-MeNTS scores achieving the highest 67%, 75%, 80%, 85%, 90% and 95% of urgency levels using chi-square. SAS (version 9.4, Cary, NC) was used for data analysis, with p<.05 considered significant. The mean ± SD Gyn-MeNTS score for raters 1, 2, and 3 were 58.0 ± 4.8, 59.2 ± 4.3, and 58.0 ± 5.0, respectively. Gyn-MeNTS scores for raters 1 and 3 did not differ p=.002), after adjusting for the mESAS score. When the probabilities derived from this model were coded into quartiles, the incidence of being Urgent SSP was 75% in the highest priority quartile, 14% and 12% in the middle two quartiles, and 0% in the lowest quartile (p<.0001). The equation for calculating probabilities is in Appendix I. The 67th, 75th, 80th, 85th, 90th, and 95th percentiles of Gyn-MENTS score are shown in Table 4 . In order to make it easier to visualize these relationships, we coded the Gyn-MeNTS into three levels: low, medium and high priority, based on quartiles: the lowest quartile of scores is highest priority, middle two quartiles are medium priority, and the highest quartile of scores is the lowest priority. We then compared these three levels of Gyn-MeNTS urgency and the three levels of SGS urgency, with the Urgent SSP (Table 5 ). Comparing Gyn-MeNTS to Urgent SSP, we found that from the highest priority quartile on Gyn-MeNTS to the lowest, 52%, 23%, and 4% of patients were Urgent SSP (p=.0009). For patients with mESAS levels of high, moderate, and low priority, the percentages found within the Urgent SSP category were 88%, 36% and 10%, respectively (p<.0001). When looking only at the 12 of 24 patients who were deemed the "highest priority" by the SSP scheme and their surgeries performed without delay, the mESAS system was able to capture 92% of these patients in its most urgent quartile, while the Gyn-MeNTS system captured only 67% in its most urgent quartile. Despite finding overall high inter-rater reproducibility in the Gyn-MeNTS scoring system (Spearman r 0.82-0.84), it does not appear this scoring system strongly discriminates the most urgent cases as determined by either the mESAS system (Spearman r 0.31) or when surgeons proceed using their instinct alone, the SSP (Spearman r 0.46). The inter-rater reproducibility of the mESAS tiered system was moderate (weighted kappa 0.57), and it appears to perform slightly better in discerning how surgeons instinctively prioritize (Spearman r 0.53). In fact, the mESAS system identified 92% of the most urgent SSP patients while the Gyn-MeNTS found only 67%. However, when used together, the two scoring systems had high discrimination in capturing clinicians' instinctive beliefs about urgency, and each contributed independently, suggesting that a) they capture distinct issues related to urgency, and b) their combined use may provide the optimal system for objectively rating surgical urgency. Despite seeing merit in the original scoring system, we felt additional steps could be taken to make the MeNTS model more objective, yielding higher inter-observer reliability. Our goal was to create a modified gynecologic MeNTS that would still score in comparable ranges to the original in the event our institution later decided to prioritize by this route. Most modifications were made within the disease factors category. We felt the category should be given more weight of importance (with more scored items) and tailored to the disease burden gynecologic patients uniquely incur. Thus, we expanded the number of scoring categories and created objective criteria for quantifying pain, anemia, fertility impact, and impact on adjacent organ systems such as the genitourinary and gastrointestinal systems. While we agree that exposure to known COVID-19 should be considered, we recommend that, if available, all patients undergoing scheduled surgery be tested for COVID-19 within 48 hours of their planned surgery and delayed if found positive. If testing is unavailable, we recommend screening all elective surgery patients for influenza-like symptoms, and surgery postponed if present. Given the reports of significantly worsened morbidity and mortality when surgery is unwittingly performed in pre-symptomatic COVID-19 patients, all attempts to identify these patients should be made. 9, 10 Additionally, universal COVID-19 testing protects health care workers from unnecessary exposure, as well as creates a binary PPE triaging system to "standard precautions" or "transmission-based precautions" to protect scare resources. Despite our attempts to make the scoring system as objective as possible, many categories are open to wider interpretation than it may first seem. We recommend, prior to implementing the scoring system, an initial dialogue to "lay the ground rules" occurs to improve reviewer reproducibility and thus reliability across the cohort. For example: 1. Lung Disease. Attempt to define what constitutes 'minimal' disease. Our system did not capture smoking history other than a binary yes/no. Quantifying risk of disease in a pack/year history calculation may prove beneficial as smoking impacts risks of both respiratory disease and wound healing with direct impact on outcomes. It is also becoming increasingly known that lung disease, such as Chronic Obstructive Pulmonary Disease (COPD) and asthma, can worsen patient outcomes in the setting of COVID, independent of their usual perioperative risks. 11 2. Surgical team size. We agreed team size would be calculated by the minimum number of surgeons needed to perform a procedure safely and efficiently. In an academic teaching institution, the size of surgical teams can easily be twice the actual number needed. Thus, all hysteroscopy (that did not require intraoperative sonography) was scored a one, and almost all laparoscopy a two. When constant uterine manipulation was needed, a score of three was given to laparoscopy. 3. OR time. At our institution, we are asked to provide estimates of "wheels-in" to "wheels-out" rather than "incision-to-closure" time when posting cases. As long as consistency is applied across graders, this variable has the potential to be reliable. Given surgeon notoriety around being poor predictors of needed surgical time, it is not surprising that this variable performed among the worst in inter-rater agreement. 1. Pain. Despite our attempt to objectively define pain using a VAS system and incorporating ability to control this pain based on frequency of office or ER visits, our inter-rater agreement remained fair (agreement = 57%) ( Table 3) . This was due, in large part, to provider differences in documentation of pain, its impact on quality of life, and the ability to control pain with medical management. 2. Alternative Therapy. Despite our attempt to objectively quantitate numbers of alternative therapies tried and incorporating the frequency of office visits to maintain therapy, our inter-rater agreement was poor (agreement = 34%) despite our extensive list of what constitutes alternative therapy in the Table 2 footnotes. We feel this was due to difficulty in abstracting this information from charts. 3. Prediction of peri-operative transfusion. We agree the presence of anemia can be inconsequential when it comes to procedures with a risk of low blood loss. The counter is true as well; high blood loss procedures can be tolerated when the patient has no baseline anemia. The most important question, particularly during a time of blood shortage, is whether and to what degree the patient will need perioperative blood products. Finding an objective way to quantify this risk was difficult, a problem compounded by surgeons' inability to reliably predict estimates of EBL (agreement = 73%). 4. Cardiovascular disease. While simplifying the determination of heart disease to represent the number of medications it takes to control it, care should be taken to avoid this simple assumption. Patients with untreated hypertension may carry significantly more heart disease than patients well-controlled on three medications due simply to their access to health care and compliance with recommended therapy. 5. Emphasis on hospital resource utilization with a bias towards the young, healthy patient. Perhaps the biggest limitation to the Gyn-MeNTS scoring system is its favor towards quick procedures on young, healthy patients. Despite our attempts to increase the weight of disease burden by adding more graded variables, young and healthy patients getting quick, elective procedures (such as tubal ligation or polypectomy) were consistently prioritized in this grading system. While not an invalid conclusion when a hospital system is severely limited in its capacity to do anything but the most quick and simple of procedures on healthy patients, it would be difficult to justify, for example, elective sterilization over treatment of debilitating pain from endometriosis, especially if patients have access to alternative contraception. In our institution, despite receiving some of the lowest Gyn-MeNTS scores, patients requesting sterilization are not being prioritized at this time. 6. We also must acknowledge that many of our changes to the original MeNTS scoring system could be specific to the study institution which thus impacts the generalizability of the Gyn-MeNTS system. Despite the extensive list of surgical examples given by major gynecologic surgical societies in the mESAS table, we found inconsistencies with application between reviewers. It is clear there is still room for interpretation of disease severity, and thus acuity, making assignment of tiers prone to significant variation. Additionally, at the end of this exercise, a high-volume institution could still have large numbers of patients within each cohort that still must be prioritized further, and this system provides no guidance as to how to perform this. Finally, the mESAS system appears to contain some inconsistencies within their surgical examples. For instance, Endometriosis with poorly controlled pain and desire for fertility is categorized as a level 2 while myomectomy for an asymptomatic patient that is experiencing infertility is categorized as a level 3. A hysteroscopic polypectomy in the infertile patient is categorized more urgently (level 3) than a hysteroscopic evaluation or polypectomy in patients over 50 with inability to sample in the office (level 2) that have higher risks of malignant potential. While these surgical assignments have been agreed upon by major societal stakeholders, these discrepancies deserve attention. To a certain extent, both scoring systems depend on accurate and elaborate chart documentation. Ideally, surgeons would grade their own patients to improve accuracy and overcome this obstacle. Difficulty was encountered when reviewers graded each other's patients in categories such as efficacy of alternative therapies, immediacy of fertility desire, and severity of pain. This clearly impacts the interrater reliability and can lead to the false assumption that poor inter-rater reliability means a scoring system is invalid. Finally, further evaluation in a setting with clinicians who did not develop the scoring system is warranted. To our knowledge, this is the first study to assess reliability and validity of previously published surgical scoring systems. Additionally, we are the first to report application of these scoring systems in gynecologic patients and have made recommendations for implementation in this arena. More robust prospective data are needed to either confirm or refute our retrospective findings. Further study should also evaluate the efficacy of utilizing both the Gyn-MeNTS and mESAS system together to triage non-emergent procedures. We also feel strongly that a system to provide more emphasis on the disease variables of the Gyn-MeNTS scoring system would yield an even more valid triaging system. CQM, JSK, CZW, and RLA were responsible for study design and creation of the modified scoring system. CQM, JSK, CZW, WAB, and CMCB were responsible for chart review and data abstraction. RLA was responsible for data analysis and interpretation. CQM, JSK, CZW, WAB, and CMCB were responsible for data interpretation. CQM, JSK, CZW, WAB, CMCB, and RLA were responsible for manuscript drafting and editing. JSK was responsible for publication management. Data sharing: All data collected for the study including the study protocol, a data dictionary defining each field, and de-identified patient data will be made available to qualified researchers using a data sharing platform upon reasonable request with publication. Recommendations for Management of Elective Surgical Procedures American College of Surgeons Surgeon General. Hospitals & healthcare systems, PLEASE CONSIDER STOPPING ELECTIVE PROCEDURES until we can #FlattenTheCurve Centers for Medicare & Medicaid Services. CMS Releases Recommendations on Adult Elective Surgeries, Non-Essential Medical, Surgical, and Dental Procedures During COVID-19 Response American College of Surgeons. COVID-19: Guidance for Triage of Non-Emergent Surgical Procedures American College of Surgeons Joint Statement on Re-Introduction of Hospital and Office-Based Procedures in the COVID-19 Climate IL: Society of Gynecologic Surgeons Medically Necessary, Time-Sensitive Procedures: Scoring System to Ethically and Efficiently Manage Resource Scarcity and Provider Risk During the COVID-19 Joint Statement on Minimally Invasive Gynecologic Surgery During the COVID-19 Pandemic Society of American Gastrointestinal and Endoscopic Surgeons. SAGES and EAES Recommendations Regarding Surgical Response to Covid-19 Crisis Clinical Characteristics and Outcomes of Patients Undergoing Surgeries During the Incubation Period of COVID-19 Infection. EClinicalMedicine Author's Reply: Hazardous Postoperative Outcomes of Unexpected COVID-19 Infected Patients: A Call for Global Consideration of Sampling All Asymptomatic Patients Before Surgical Treatment Epidemiological, comorbidity factors with severity and prognosis of COVID-19: a systematic review and meta-analysis