Biomarkers of acute appendicitis: systematic review and cost–benefit trade-off analysis R E V I E W Biomarkers of acute appendicitis: systematic review and cost–benefit trade-off analysis Amish Acharya1 • Sheraz R. Markar1 • Melody Ni1 • George B. Hanna1 Received: 17 March 2016 / Accepted: 9 July 2016 / Published online: 5 August 2016 � The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Background Acute appendicitis is the most common sur- gical emergency and can represent a challenging diagnosis, with a negative appendectomy rate as high as 20 %. This review aimed to evaluate the clinical utility of individual biomarkers in the diagnosis of appendicitis and appraise the quality of these studies. Methods A systematic review of the literature between January 2000 and September 2015 using of PubMed, OvidMedline, EMBASE and Google Scholar was con- ducted. Studies in which the diagnostic accuracy, statistical heterogeneity and predictive ability for severity of several biomarkers could be elicited were included. Information regarding costs and process times was retrieved from the regional laboratory. European surgeons blinded to these reviews were independently asked to rank which charac- teristics of biomarkers were most important in acute appendicitis to inform a cost–benefit trade-off. Sensitivity testing and the QUADAS-2 tool were used to assess the robustness of the analysis and study quality, respectively. Results Sixty-two studies met the inclusion criteria and were assessed. Traditional biomarkers (such as white cell count) were found to have a moderate diagnostic accuracy (0.75) but lower costs in the diagnosis of acute appen- dicitis. Conversely, novel markers (pro-calcitonin, IL 6 and urinary 5-HIAA) were found to have high process-related costs including analytical times, but improved diagnostic accuracy. QUADAS-2 analysis revealed significant poten- tial biases in the literature. Conclusion When assessing biomarkers, an appreciation of the trade-offs between the costs and benefits of individual biomarkers is needed. Further studies should seek to investigate new biomarkers and address concerns over bias, in order to improve the diagnosis of acute appendicitis. Keywords Acute appendicitis � Biomarkers � Cost–benefit trade-off Acute appendicitis is the most common surgical emer- gency, with an annual incidence in the USA of 9.38 per 100,000 [1]. Cases are characterized by an acute inflam- matory process, but in approximately 16.5 % the appendix has perforated and become gangrenous or there is overt peritonitis, termed ‘complicated appendicitis’ [2]. Whilst in rare special circumstances management may differ, the mainstay of treatment for the majority of patients remains surgery either by an open or by laparoscopic approach. With 326,000 appendectomies performed in the USA during 2007, at an average estimated cost of $6242 [3], appendicitis represents a highly prevalent condition with significant expenditure associated with its treatment. Despite the frequency of appendicitis, accurate diagno- sis remains difficult. The National Surgical Research Col- laborative in the UK has estimated that the negative appendectomy rate is as high as 20.6 % [2]. The use of ultrasound and computerized tomography (CT) has in some cases been shown to improve appendicitis diagnostic accuracy and reduce the number of negative appendec- tomies [7], with the latter shown to decrease rates to less Electronic supplementary material The online version of this article (doi:10.1007/s00464-016-5109-1) contains supplementary material, which is available to authorized users. & George B. Hanna g.hanna@imperial.ac.uk 1 Division of Surgery, Department of Surgery and Cancer, Imperial College London, 10th Floor QEQM Building, St Mary’s Hospital, South Wharf Road, London W2 1NY, UK 123 Surg Endosc (2017) 31:1022–1031 DOI 10.1007/s00464-016-5109-1 and Other Interventional Techniques http://dx.doi.org/10.1007/s00464-016-5109-1 http://crossmark.crossref.org/dialog/?doi=10.1007/s00464-016-5109-1&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1007/s00464-016-5109-1&domain=pdf than 10 % [4]. However, exposing patients to high levels of radiation is undesirable given the lifetime risk of cancer, along with the increase in costs associated with increased utilization of CT, representing negatives to this approach. Whilst this radiation dosage is avoided by using ultrasound scanning, the technique is operator dependent, and in as many as 55 % of cases the appendix fails to be visualized [5]. Several studies have previously examined a variety of biomarkers associated with appendicitis to more appropri- ately assign risk and allocate further diagnostic investiga- tion. These have the potential of providing noninvasive objective criteria to aid clinicians in the diagnosis of appendicitis and in some cases predict the severity of the condition, with no adverse effects upon the patient. In several studies, biomarkers have been shown to have potentially good diagnostic accuracy and reliability, but with variable financial and timing implications. The latter significantly limits the clinical effectiveness of a biomarker in the emergency setting. The ‘ideal’ diagnostic biomarker would therefore maximize clinical utility and minimize procedural cost including analytical time. The aim of this study was to evaluate specific characteristics of biomarkers that surgeons’ value and to critically assess the cost–benefit of both traditional and novel biomarkers in the diagnosis of acute appendicitis from published literature. Materials and methods Literature search strategy A literature search of PubMed, OvidMedline, EMBASE and Google Scholar electronic databases was conducted from January 1, 2000, up to and including September 1, 2015, for studies regarding the use of urine or serum biomarkers in the diagnosis of appendicitis or the predic- tion of complicated appendicitis. Search terms used included: appendicitis, serum, blood, urine, biomarkers, diagnosis, diagnostics, perforation, complicated and severity in various combinations, as well as the name of specific biomarkers previously identified. Research titles were then screened for suitability, and full-text copies were retrieved. Further potentially appro- priate papers were highlighted by assessing the reference lists and citations of the articles being screened. All studies that investigated the diagnostic ability of a single or mul- tiple biomarkers that could be tested in the urine or blood of patients were included. Exclusion criteria involved studies with no available English translation, no full-text edition available, and those assessing the predictive ability of biomarkers for severity in which no diagnostic accuracy could be calculated. Of those studies meeting inclusion criteria, the year of publication, population demographics, the number of patients enrolled and the stated specificity and sensitivity of the biomarker for diagnosis and severity were extracted. For studies that did not explicitly state the sensitivity and specificity of the biomarker, provided sufficient data were available, these were independently calculated. Literature standard The QUADAS-2 tool was used to appraise the standard of the literature. It was implemented, as it has been previously described, to assess the quality and risk of bias of the included studies [6]. The tool involves four domains: patient selection, index test, reference standard and the flow of subjects through the study. Prompting questions are used to allow the reviewer to assess whether there is a risk of bias with respect to each of the four domains. It also allows the reviewer to gauge the applicability of the study to the review with respect to the first three domains. In this review, the reference standard is the histological exami- nation of the appendix. Biomarker survey General surgeon members of the European Association of Endoscopic Surgery (EAES) were asked to complete an anonymous survey regarding their opinions on the most desirable characteristics the ideal diagnostic biomarker of acute appendicitis would possess (Table 1). The surgeons were asked to rank each characteristic in the order of importance, including diagnostic benefits (high sensitivity, high specificity, reproducibility and predictive ability of perforation), process-related financial costs, time for result, ease of testing and patient acceptability. The average rank for each of the attributes, e.g., sensitivity, was then cal- culated, to identify which were the most desired charac- teristics. These ranks were used to inform the weightings for the cost–benefit trade-off, with greater importance placed upon higher ranked attributes. Statistical methodology For each of the assessments of acute appendicitis and severity of appendicitis (perforation), paired sensitivity and specificity were calculated for diagnosis or severity, as appropriate, from each eligible study. A bivariate model for meta-analysis of statistical accuracy provides more accu- rate results than fixed-effect modeling. Following the val- idated methodology of Harbord et al. [7], bivariate meta- analyses were therefore performed to generate pooled point estimates and 95 % confidence intervals for the sensitivity and specificity of the biomarker under investigation with Surg Endosc (2017) 31:1022–1031 1023 123 histopathological confirmation of acute appendicitis, toge- ther with hierarchical summary receiver operating charac- teristic (ROC) curves. The software used for this analysis was the custom-designed statistical package MIDAS [8]. Areas under the hierarchical summary ROC curves, and I 2 statistics, were obtained directly from the MIDAS output. See Zhou and Tu [9] for an in-depth description of the statistical methods used. Cost–benefit trade-off analysis To evaluate the biomarkers, we applied decision analysis methodology, employing multi-criteria decision analysis (MCDA) [10] to assess trade-offs between cost (both time and financial) and benefit amongst the biomarkers, in terms of their performance characteristics (Table 2). The list of performance characteristics was grouped into three areas, namely monetary costs, time to results and benefits, encompassing all the remaining characteristics that were neither costs nor time. Through the literature review and expert survey, we determined the mean level of perfor- mance of the biomarkers on each of the characteristics. Criteria on which all biomarkers had identical perfor- mance, such as patient acceptability, were removed. The performance level was converted to a score by assigning a value of 0 to represent the worst performance (e.g., the highest unit price or worst sensitivity) and a value of 100 to represent the highest performance (e.g., lowest unit price or highest sensitivity). We assumed linearity between per- formance and value, such that for any intermediate level the corresponding value was interpolated from the worst and best performances on that criterion (valued 0 and 100, respectively). Criteria weightings were derived from the rankings assigned by the European surgeons. The highest ranked criterion was given a weighting of 100, the second highest ranked criterion was given a weighting of 90, and so forth. The weightings were normalized so that they totaled 1, for each performance area. We applied a weighted average rule to combine the value scores across criteria as in: Table 1 Definitions of the characteristics of biomarkers the consultants were asked to rank Definitions Outcome utilized Sensitivity Result of pooled sensitivity for diagnosis of acute appendicitis Specificity Results of pooled specificity for diagnosis of acute appendicitis Predictive of perforation Area under the curve of summary ROC for diagnosis of perforated appendicitis Cost Cost of investigation from Imperial College NHS Trust East of testing Level of invasiveness of testing Acceptability The impression of patient acceptability Time for result Time from sample being taken to result being available for clinician interpretation as described by Imperial College NHS Trust Reproducibility I 2 statistic for heterogeneity: increasing value indicates LESS consistency Table 2 Performance of various biomarkers with respect to the surgeon rankings Biomarker Sens. (%) Spec. (%) Ease of test Predictive of perforation (%) Cost (£) Time for result (h) Acceptability Reproducibility WCC 79 55 Easy 69 2.5 1 Good 92 CRP 76 50 Easy 78 30 1 Good 81 Bilirubin 51 78 Easy 71 2 1 Good 98 Pro-calcitonin 36 88 Easy 83 17.42 12 Good 96 IL-6 73 72 Easy 84 15.5 168 Good 91 5-HIAA 72 86 Easy 0 21 240 Good 93 Surgeon rank 1 2 3 4 5 6 7 8 Acceptability considered ‘good’ as all can be done routinely. Ease of testing all considered ‘easy’ as all are noninvasive WCC White cell count, CRP C-reactive protein, IL-6 Interleukin 6, 5-HIAA Urinary serotonin, Sens Sensitivity, Spec Specificity 1024 Surg Endosc (2017) 31:1022–1031 123 Value ¼ X k WkValuek; where Vk indicates the value of an option on the kth cri- terion and Wk is the weighting assigned to that criterion. The overall value was therefore bounded between 0 and 100: A biomarker that had the worst performance on all the criteria would have an overall value of 0, whereas the biomarker that had the best performance on all the criteria would have an overall value of 100. The more desirable the biomarker was, the higher this value was. Two-way cost– benefit maps highlighted the trade-offs between different aspects of the biomarkers. Sensitivity analyses examined the robustness of the results. Trade-off analyses were per- formed using decision analytic software HiView (version 3.2.0.7, educational copy). Results Literature search Sixty-two full-text articles met the inclusion criteria and were appraised following the literature search (Fig. 1). Forty-nine of these were used to assess the diagnostic accuracy of biomarkers. Eight studies assessed urinary markers (7 for urinary serotonin and 1 for leucine-rich gly- coprotein). Forty-three studies investigated serum biomarkers (23 on white cell count, 24 on C-reactive protein, 13 on bilirubin, 3 on serum amyloid A, 1 on S100 A8/9 protein, 2 on calprotectin, 7 on pro-calcitonin, 1 on D-dimer, 5 on interleukin 6, 1 on interleukin 10, 1 on leucine-rich glycoprotein, 1 on fibrinogen, 1 on liposaccharide binding protein and 1 on high mobility group box protein-1). Thirty- seven studies assessed whether biomarkers were predictive of severity (20 on white cell count, 19 on C-reactive peptide, 19 on bilirubin, 7 on pro-calcitonin, 3 on interleukin 6 and 1 on urinary serotonin) [12, 15, 20, 21, 23, 24, 26–29, 31–33, 38–41, 43, 44, 47, 48, 50, 58, 60–72]. The demographics of these studies are shown in Appendixes 1 and 2 in ESM. QUADAS-2 evaluation The results of the QUADAS-2 assessment of the studies are shown in Fig. 2. Fifty-nine percent of studies had an ‘un- clear’ or ‘high’ risk of bias with respect to patient selection due to constraining exclusion criteria. This limited the applicability of fifty-eight percent of the studies with respect to patient selection. Forty-one percent and thirty- one percent of studies had an ‘unclear’ or ‘high’ risk of bias with respect to the index and reference standards, respec- tively. This was due to a lack of information regarding blinding, thresholds and the order in which they were assessed. Only thirteen percent of studies had an ‘unclear’ or ‘high’ risk of bias with respect to the patient flow. Fig. 1 Schematic to show the strategic literature search Surg Endosc (2017) 31:1022–1031 1025 123 Biomarkers that were included in more than 2 studies were taken forward for pooled analysis. Pooled analysis for individual serum biomarkers in acute appendicitis White cell count The pooled sensitivity of white cell count for the diagnosis of acute appendicitis was 0.79 (95 % CI 0.78–0.81; I 2 = 92.0 %), and its pooled specificity was 0.55 (95 % CI 0.54–0.57; I 2 = 88.0 %). The area under the curve for the summary ROC was 0.75 ± 0.02. For the diagnosis of perforated appendicitis, the pooled sensitivity was 0.70 (95 % CI 0.68–0.73; I 2 = 95.5 %) and pooled specificity was 0.49 (95 % CI 0.48–0.50; I 2 = 98.5 %), giving an area under the curve of 0.69 ± 0.05. C-reactive protein The pooled sensitivity of C-reactive protein for the diag- nosis of acute appendicitis was 0.76 (95 % CI 0.75–0.78; I 2 = 81.4 %), and its pooled specificity was 0.50 (95 % CI 0.48–0.52; I 2 = 94.2 %). The area under the curve for the summary ROC was 0.80 ± 0.02. For the diagnosis of perforated appendicitis, the pooled sensitivity was 0.76 (95 % CI 0.74–0.78; I 2 = 95.2 %) and pooled specificity was 0.52 (95 % CI 0.51–0.53; I 2 98.4 %), giving an area under the curve of 0.78 ± 0.02. Bilirubin The pooled sensitivity of bilirubin for the diagnosis of acute appendicitis was 0.51 (95 % CI 0.50–0.52; I 2 = 97.7 %), and its pooled specificity was 0.78 (95 % CI 0.76–0.80; I 2 = 92.0 %). The area under the curve for the summary ROC was 0.72 ± 0.05. For the diagnosis of perforated appendicitis, the pooled sensitivity was 0.52 (95 % C.I 0.49–0.54; I 2 = 87.2 %) and pooled specificity was 0.76 (95 % CI 0.75–0.77; I 2 = 97.8 %), giving an area under the curve of 0.71 ± 0.04. Pro-calcitonin The pooled sensitivity of pro-calcitonin for the diagnosis of acute appendicitis was 0.36 (95 % CI 0.31–0.40; I 2 = 96.0 %), and its pooled specificity was 0.88 (95 % CI 0.83–0.91; I 2 = 81.8 %). The area under the curve for the summary ROC was 0.82 ± 0.10. For the diagnosis of perforated appendicitis, the pooled sensitivity was 0.69 (95 % CI 0.62–0.76; I 2 = 93 %) and pooled specificity was 0.67 (95 % CI 0.62–0.71; I 2 = 97 %), giving the area under the curve of 0.83 ± 0.07. IL-6 The pooled sensitivity of IL-6 for the diagnosis of acute appendicitis was 0.73 (95 % CI 0.67–0.78; I 2 = 91.1 %), and its pooled specificity was 0.72 (95 % CI 0.63–0.79; I 2 = 62.3 %). The area under the curve for the summary ROC was 0.74 ± 0.04. For the diagnosis of perforated appendicitis, the pooled sensitivity was 0.79 (95 % CI 0.72–0.85; I 2 = 65.1 %) and pooled specificity was 0.62 (95 % CI 0.55–0.68; I 2 = 95 %), giving an area under the curve of 0.84 ± 0.03. Pooled analysis for 5-HIAA from urine in acute appendicitis The pooled sensitivity of urinary 5-HIAA for the diagnosis of acute appendicitis was 0.72 (95 % CI 0.68–0.76; I 2 = 93.4 %), and its pooled specificity was 0.86 (95 % CI 0.80–0.92; I 2 = 68 %). The area under the curve for the summary ROC was 0.88 ± 0.07. Pooled analysis for severity was precluded as only one study met the inclusion criteria. Fig. 2 A Graph displaying the percentage of studies with varying degree of bias for each of the four QUADAS-2 domains. B Graph displaying the percentage of studies of varying applicability with respect to three of the four QUADAS-2 domains 1026 Surg Endosc (2017) 31:1022–1031 123 Biomarker survey Six hundred and eighty-eight surgeon members of the EAES responded to the survey (77 % of which were consultants, 18 % registrar level and 4 % other grades), giving a response rate of 12.7 %. Diagnostic sensitivity was given the highest average rank by the surgeon consensus and was thus weighted as the most important biomarker characteristic. The results of the other parameters are listed in Table 2. Cost–benefit trade-off Since all biomarkers had identical performances in terms of ‘ease of test’ and ‘acceptability,’ these two criteria were removed from the trade-off analysis. Table 3 displays the normalized weighted scores out of 100 for each of the six biomarkers with respect to the costs, time for result and benefits (diagnostic sensitivity, specificity, prediction of perforation and reproducibility), as well as an overall performance score. Figure 3A displays trade-offs between the benefits, as defined above, and the costs. White cell count and bilirubin performed best overall with the latter scoring marginally higher. When appraising the benefits in isolation, inter- leukin-6 performed the best. Sensitivity analysis demon- strated how the performance of the biomarkers would change if the relative importance of the various charac- teristics, as determined by the survey, was altered. If less importance was placed upon the financial cost or the time for result than its relative benefits (such as sensitivity), then the surgeons’ preference would be shifted further in favor of novel markers such as IL-6 (Fig. 3B, C). The remaining biomarkers (C-reactive peptide, sero- tonin and pro-calcitonin) were inferior to those previously mentioned in a way that probabilistically dominated by the other three tests. Discussion This study has highlighted the variable performance of biomarkers in the diagnosis of appendicitis, which reduces their potential to provide established objective criteria when used in isolation. This analysis has shown that whilst traditional markers including white cell count are associ- ated with low temporal and financial cost, their overall diagnostic accuracy is relatively poor. As such weighting the analysis in favor of diagnostic characteristics such as high sensitivity or specificity, as opposed to process-related performance, would favor the use of novel biomarkers. The low diagnostic accuracy of elevated WCC is likely due to the presence of the underlying generalized inflammatory process seen with acute appendicitis, but also a number of other inflammatory conditions [12]. Conversely, novel markers that are less commonly used clinically in the diagnosis of appendicitis such as interleukin-6 have been shown to have a higher diagnostic benefit, but are associ- ated with significant costs. The results of the literature search also highlight the expansion of work to look for novel diagnostic biomarkers, which to date remain only tested in isolated studies preventing meaningful analysis for clinical application [34]. There was a ‘high’ or ‘unclear’ risk of bias in 59 % of the studies with respect to patient selection. This was due to insufficient information regarding selection criteria. A number of studies assessing novel markers utilized healthy controls, or for example, with bilirubin, excluded patients in whom this could be caused by alternative pathology. This, however, leads to a selection bias when assessing the diagnostic ability of the biomarker with respect to sus- pected appendicitis and can spuriously improve the speci- ficity. There was also an ‘unclear’ bias with respect to the index tests, especially with novel biomarkers, as diagnostic thresholds were not stated. The majority of the studies showed good applicability, but the assessment of a restricted demographic, such as pediatric patients, limited the studies performance with respect to this domain. This study has highlighted the challenges associated with using single biomarkers in the diagnosis of appen- dicitis. Radiological investigation, especially CT, has been shown to have far superior diagnostic ability, with a reported sensitivity and specificity of 94 and 95 %, respectively [73]. However, the estimated radiation dose associated with a CT abdomen is 14mSV, equating to an increase of 0.2 % in the cancer risk for a 30-year-old Table 3 Normalized scores (out of 100) for the six biomarkers with respect to financial cost, time, diagnostic benefit (composite of sensitivity, specificity, reproducibility and prediction of perforation) and overall performance WCC CRP Bilirubin Pro-calcitonin IL-6 5-HIAA Cost performance 98 0 100 45 52 32 Time performance 100 100 100 95 30 0 Diagnostic benefit 64.3 45 44 58 53 87 Overall performance 74.6 52.0 75.1 65.0 68.3 52.2 WCC White cell count, CRP C-reactive protein, IL-6 Interleukin 6, 5-HIAA Urinary serotonin Surg Endosc (2017) 31:1022–1031 1027 123 patient [74]. Furthermore, CT remains a relatively expen- sive modality that could not be practically used in all patients in many areas of the world. Several studies have already suggested the use of diagnostic algorithms to ensure judicious use of radiology [73] and have demon- strated the potential to halve the use of CT scanning without increasing the negative appendectomy rate. Biomarkers could therefore be incorporated into these diagnostic algorithms in order to rationalize and more appropriately allocate further investigations. Previous studies on biomarkers in appendicitis have focused solely upon their diagnostic accuracy. However, this study has highlighted the importance of considering clinical utility when assessing biomarkers. Interleukin-6 had the overall highest overall beneficial characteristics; however, this neglects its 168-h process time and expensive cost per test, which would preclude it from actual clinical use. This is further highlighted by the sensitivity analysis, which demonstrated that factoring in the significance of costs, more traditional biomarkers such as WCC, will be preferred. This study has therefore highlights the potential importance of cost–benefit modeling to improve this decision-making process when considering regional or national allocation of resources for diagnostic investigations. In fact, no single biomarker had all the desired charac- teristics for the diagnosis of acute appendicitis. More commonly used biomarkers have less process-related costs due to the widespread availability of the testing, but are of relatively poor diagnostic accuracy when used in isolation. New proposed biomarkers whilst having high diagnostic value often require more complex assays, in which some circumstances require them to be sent to regional centers for analysis. However, a combination of biomarkers, as is used by some institutions clinically with white cell count and CRP, may improve the diagnostic ability [41, 45]. Alternatively, the use of a biomarker in conjunction with a consistent clinical history and examination may improve diagnostic accuracy in a more feasible manner. This could be achieved by utilization of stratification scores such as the Alvarado, which is a 10-point scoring system incor- porating the typical signs and symptoms seen with appendicitis. With a cutoff of 7, this diagnostic algorithm has been shown to have a reported specificity as high as 100 % [75]. However, the limitation of the utilization of these scoring systems is the subjective interpretation of b Fig. 3 A Cost–benefit trade-off for the six biomarkers. Benefits include a summation sensitivity, specificity, predictive ability and reproducibility. B Sensitivity analysis revealing the effect of changing the current weighting (dashed line) placed upon financial cost and overall benefits. C Sensitivity analysis revealing the effect of changing the current weighting (dashed line) placed upon time for result and overall benefits 1028 Surg Endosc (2017) 31:1022–1031 123 clinical history and examination findings [42]. Further- more, a surgeon’s clinical impression has in some cases been shown to be of equivalent diagnostic value as these scoring systems, highlighting the value of clinical experi- ence and the limitations of the widespread utilization of scoring systems [36]. In effect, therefore we have shown that clinically white cell count and bilirubin should be considered of greater use in the diagnosis of acute appen- dicitis when compared other biomarkers. However, given the limitations associated with current biomarkers, a high level of discrimination is required when interpreting these in practice, and the use of clinical impression in conjunc- tion with radiological investigations remains the mainstay of the diagnostic paradigm. The limitations of this study are primarily as a result of the studies included to inform the cost–benefit trade-off. Patient selection varied, and a lack of details regarding exclusion criteria limited the applicability of the studies to a patient population. Moreover, there was heterogeneity in the study designs, with a number of retrospective studies being included. Many of these trials did not explicitly mention blinding of the investigators, which is another potential source for bias and limitation of this review. Inherently with the use of novel biomarkers, no preexisting widely accepted threshold exists, leading many studies to assess various diagnostic cutoff values. Without blinding the investigators to the results of the histology, this increases the scope for bias. Furthermore, these studies often employed ‘healthy’ controls to formulate the testing thresholds; however, minimal details were provided as to the demographics of these controls, as well as leading to the aforementioned issues regarding specificity. A further limitation of this type of review is the potential for publi- cation bias. Whilst this was mitigated by conducting a thorough multi-database search, the presence of language and publication bias still persists. The results are further limited by the fact that the weighting was based upon the results of an online survey which had a response rate of 12.7 % and represented only surgeons affiliated with the EAES. Moreover, as the best overall marker changed with increasing the importance of sensitivity, the reliance upon the weighting system demonstrates how the conclusions would change depend- ing on the opinions of the surgeons. Conclusion Appendicitis continues to pose a diagnostic challenge to emergency physicians and surgeons. Clinical impression remains a crucial tool in diagnosis, and treatment allocation in those with suspected appendicitis. As yet no biomarker has been shown to have sufficient diagnostic performance to be used in isolation clinically. This would suggest that further areas of research should focus upon the search for new novel diagnostic tests and the clinical utility of the tests, rather than repeat existing research into previously studied biomarkers. Through this approach, the accuracy of diagnosis of appendicitis can be enhanced, reducing the number of negative appendectomies performed, implied adverse impact to patients and treatment costs to hospitals. Funding Mr. Sheraz Markar is funded by the National Institute of Health Research (NIHR). This research was supported by the National Institute for Health Research (NIHR) Diagnostic Evidence Co-operative London at Imperial College Healthcare NHS Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Compliance with ethical standards Disclosures Amish Acharya, Sheraz R. Markar, Melody Ni and George B. Hanna have no conflicts of interest or financial ties to disclose. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creative commons.org/licenses/by/4.0/), which permits unrestricted use, dis- tribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. References 1. D’Souza N, Nugent K (2016) Appendicitis. Am Fam Physician 93(2):142–143 2. National Surgical Research Collaborative (2013) Multicentre observational study of performance variation in provision and outcome of emergency appendicectomy. Br J Surg 100(9):1240– 1252 3. Nguyen NT, Zainabadi K, Mavandadi S, Paya M, Stevens CM, Root J, Wilseon SE (2004) Trends in utilization and outcomes of laparoscopic versus open appendectomy. Am J Surg 188(6): 813–820 4. Jones K, Pena AA, Dunn EL, Nadalo L, Mangram AJ (2004) Are negative appendectomies still acceptable? Am J Surg 188(6):748–754 5. D’Souza N, D’Souza C, Grant D, Royston E, Farouk M (2015) The value of ultrasonography in the diagnosis of appendicitis. Int J Surg 13C:165–169 6. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM (2011) QUADAS-2: a revised tool for the quality assessment of diag- nostic accuracy studies. Ann Intern Med 155(8):529–536 7. Harbord RM, Whiting P, Sterne JA, Egger M, Deeks JJ, Shang A, Bachmann LM (2008) An empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary. J Clin Epidemiol 61(11):1095–1103 8. Young K, Weber P, Schuff N (2006) The MIDAS statistical anal- ysis package. University of California, San Francisco, California 9. Zhou XH, Tu W (2000) Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics 56(4): 1118–1125 10. Keeney RL, Raiffa H (1993) Decisions with multiple objectives: preferences and value trade-offs, 2nd edn. Cambridge University Press, Cambridge Surg Endosc (2017) 31:1022–1031 1029 123 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ 11. Abbas MH, Choudhry MN, Hamza N, Ali B, Amin AA, Ammori BJ (2014) Admission levels of serum amyloid a and procalcitonin are more predictive of the diagnosis of acute appendicitis com- pared with c-reactive protein. Surg Laparosc Endosc Percuta- neous Tech 24(6):488–494 12. Al-Abed YA, Alobaid N, Myint F (2014) Diagnostic markers in acute appendicitis. Am J Surg 209(6):1043–1047 13. Albayrak Y, Albayrak A, Celik M, Gelincik I, Demiryilmaz I, Yil- drim R, Ozogul B (2011) High mobility group box protein-1 (HMGB-1) as a new diagnostic marker in patients with acute appendicitis. Scand J Trauma Resusc Emerg Med 19:27-7241-19-27 14. Asfar S, Safar H, Khoursheed M, Dashti H, Al-Bader A (2000) Would measurement of C-reactive protein reduce the rate of negative exploration for acute appendicitis? J R Coll Surg Edinb 45(1):21–24 15. Atahan K, Ureyen O, Aslan E, Deniz M, Cokmez A, Gur S, Avci A, Tarcan E (2011) Preoperative diagnostic role of hyperbiliru- binaemia as a marker of appendix perforation. J Int Med Res 39(2):609–618 16. Bealer JF, Colgin M (2010) S100A8/A9: a potential new diag- nostic aid for acute appendicitis. Acad Emerg Med 17(3):333– 336 17. Beltran MA, Almonacid J, Vicencio A, Gutierrez J, Cruces KS, Cumsille MA (2007) Predictive value of white blood cell count and C-reactive protein in children with appendicitis. J Pediatr Surg 42(7):1208–1214 18. Bolandparvaz S, Vasei M, Owji AA, Ata-Ee N, Amin A, Daneshbod Y, Hosseini SV (2004) Urinary 5-hydroxy indole acetic acid as a test for early diagnosis of acute appendicitis. Clin Biochem 37(11):985–989 19. Cardall T, Glasser J, Guss DA (2004) Clinical value of the total white blood cell count and temperature in the evaluation of patients with suspected appendicitis. Acad Emerg Med 11(10): 1021–1027 20. Chandel V, Batt SH, Bhat MY, Kawoosa NU, Yousuf A, Zargar BR (2011) Procalcitonin as the biomarker of inflammation in diagnosis of appendicitis in pediatric patients and prevention of unnecessary appendectomies. Indian J Surg 73(2):136–141 21. Emmanuel A, Murchan P, Wilson I, Balfe P (2011) The value of hyperbilirubinaemia in the diagnosis of acute appendicitis. Ann R Coll Surg Engl 93(3):213–217 22. Erkasap S, Ates E, Ustuner Z, Sahin A, Yilmaz S, Yasar B, Kiper H (2000) Diagnostic value of interleukin-6 and C-reactive protein in acute appendicitis. Swiss Surg 6(4):169–172 23. Estrada JJ, Petrosyan M, Barnhart J, Tao M, Sohn H, Towfigh S, Mason RJ (2007) Hyperbilirubinemia in appendicitis: a new predictor of perforation. J Gastrointest Surg 11(6):714–718 24. Farooqui W, Pommergaard HC, Burcharth J, Eiksen JR (2014) The diagnostic value of a panel of serological markers in acute appendicitis. Scand J Surg 104(2):72–78 25. Groselj-Grenc M, Repse S, Dolenc-Strazar Z, Hojker S, Derganc M (2007) Interleukin-6 and lipopolysaccharide-binding protein in acute appendicitis in children. Scand J Clin Lab Invest 67(2):197–206 26. Gurleyik G, Gurleyik E, Cetinkaya F, Unalmiser S (2002) Serum interleukin-6 measurement in the diagnosis of acute appendicitis. ANZ J Surg 72(9):665–667 27. Hong YR, Chung CW, Kim JW, Kwon CI, Ahn DH, Kwon SW, Kim SK (2012) Hyperbilirubinemia is a significant indicator for the severity of acute appendicitis. J Korean Soc Coloproctol 28(5):247–252 28. Kaser SA, Fankhauser G, Willi N, Maurer CA (2010) C-reactive protein is superior to bilirubin for anticipation of perforation in acute appendicitis. Scand J Gastroenterol 45(7–8):885–892 29. Kaya B, Sana B, Eris C, Karabulut K, Bat O, Kutanis R (2012) The diagnostic value of D-dimer, procalcitonin and CRP in acute appendicitis. Int J Med Sci 9(10):909–915 30. Keskek M, Tez M, Yoldas O, Acar A, Akgul O, Gocmen E, Koc M (2008) Receiver operating characteristic analysis of leukocyte counts in operations for suspected appendicitis. Am J Emerg Med 26(7):769–772 31. Khan MN, Davie E, Irshad K (2004) The role of white cell count and C-reactive protein in the diagnosis of acute appendicitis. J Ayub Med Coll Abbottabad JAMC 16(3):17–19 32. Khan S (2008) Elevated serum bilirubin in acute appendicitis: a new diagnostic tool. Kathmandu Univ Med J (KUMJ) 6(2):161–165 33. Khan S (2009) The diagnostic value of hyperbilirubinaemia and total leucocyte count in the evaluation of acute appendicitis. J Clin Diagn Res 3:1647 34. Kharbanda AB, Rai AJ, Cosme Y, Liu K, Dayan PS (2012) Novel serum and urine markers for pediatric appendicitis. Acad Emerg Med 19(1):56–62 35. Kouame DB, Garrigue MA, Lardy H, Machet MC, Giraudeau B, Robert M (2005) Is procalcitonin able to help in pediatric appendicitis diagnosis? Ann Chir 130(3):169–174 36. Lameris W, van Randen A, Go PM, Bouma WH, Donkervoort SC, Bossuyt PM, Stoker J, Boermeester MA (2009) Single and combined diagnostic value of clinical features and laboratory tests in acute appendicitis. Acad Emerg Med 16(9):835–842 37. Lycopoulou L, Mamoulakis C, Hantzi E, Demetriadis D, Antypas S, Giannaki M, Bakoula C, Chrousos G, Papassotiriou I (2005) Serum amyloid A protein levels as a possible aid in the diagnosis of acute appendicitis in children. Clin Chem Lab Med 43(1):49–53 38. McGowan DR, Sims HM, Zia K, Uheba M, Shaikh IA (2013) The value of biochemical markers in predicting a perforation in acute appendicitis. ANZ J Surg 83(1–2):79–83 39. Mentes O, Eryilmaz M, Harlak A, Ozturk E, Tufan T (2012) The value of serum fibrinogen level in the diagnosis of acute appen- dicitis. Turk J Trauma Emerg Surg 18(5):384–388 40. Paajanen H, Mansikka A, Laato M, Ristamaki R, Pulkki K, Kostiainen S (2002) Novel serum inflammatory markers in acute appendicitis. Scand J Clin Lab Invest 62(8):579–584 41. Panagiotopoulou IG, Parashar D, Lin R, Antonowicz S, Wells AD, Bajwa FM, Krijgsman B (2013) The diagnostic value of white cell count, C-reactive protein and bilirubin in acute appendicitis and its complications. Ann R Coll Surg Engl 95(3):215–221 42. Pruekprasert P, Maipang T, Geater A, Apakupakul N, Ksuntigij P (2004) Accuracy in diagnosis of acute appendicitis by comparing serum C-reactive protein measurements, Alvarado score and clinical impression of surgeons. J Med Assoc Thail 87(3):296–303 43. Sand M, Trullen XV, Bechara FG, Pala XF, Sand D, Landgrafe G, Mann B (2009) A prospective bicenter study investigating the diagnostic value of procalcitonin in patients with acute appen- dicitis. Eur Surg Res 43(3):291–297 44. Sand M, Bechara FG, Holland-Letz T, Sand D, Mehnert G, Mann B (2009) Diagnostic value of hyperbilirubinemia as a predictive factor for appendiceal perforation in acute appendicitis. Am J Surg 198(2):193–198 45. Schellekens DH, Hulsewe KW, van Acker BA, van Bijnen AA, de Jaegere TM, Sastrowijoto SH, Buurman WA, Derikx JP (2013) Evaluation of the diagnostic accuracy of plasma markers for early diagnosis in patients suspected for acute appendicitis. Acad Emerg Med 20(7):703–710 46. Sengupta A, Bax G, Paterson-Brown S (2009) White cell count and C-reactive protein measurement in patients with possible appendicitis. Ann R Coll Surg Engl 91(2):113–115 47. D’Souza N, Karim D, Sunthareswaran R (2013) Bilirubin; a diagnostic marker for appendicitis. Int J Surg 11(10):1114–1117 48. Vaziri M, Ehsanipour F, Pazouki A, Tamannaie Z, Taghavi R, Pishgahroudsari M, Jesmi F, Chaichian S (2014) Evaluation of 1030 Surg Endosc (2017) 31:1022–1031 123 procalcitonin as a biomarker of diagnosis, severity and postop- erative complications in adult patients with acute appendicitis. Med J Islam Repub Iran 28:50 49. Wu HP, Lin CY, Chang CF, Chan YJ, Huang CY (2005) Pre- dictive value of C-reactive protein at different cutoff levels in acute appendicitis. Am J Emerg Med 23(4):449–453 50. Wu JY, Chen HC, Lee SH, Chan RC, Lee CC, Chang SS (2012) Diagnostic role of procalcitonin in patients with suspected appendicitis. World J Surg 36(8):1744–1749 51. Xharra S, Gashi-Luci L, Xharra K, Veselaj F, Bicaj B, Sada F, Krasnigi A (2012) Correlation of serum C-reactive protein, white blood count and neutrophil percentage with histopathology findings in acute appendicitis. World J Emerg Surg 7(1):27-7922- 7-27 52. Yang HR, Wang YC, Chung PK, Chen WK, Jeng LB, Chen RJ (2006) Laboratory tests in patients with acute appendicitis. ANZ J Surg 76(1–2):71–74 53. Yildirim O, Solak C, Kocer B, Unal B, Karabeyoglu M, Bozkurt B, Aksara S, Cengiz O (2006) The role of serum inflammatory markers in acute appendicitis and their success in preventing negative laparotomy. J Investig Surg 19(6):345–352 54. Hernandez R, Jain A, Rosiere L, Henderson SO (2008) A prospective clinical trial evaluating urinary 5-hydroxyin- doleacetic acid levels in the diagnosis of acute appendicitis. Am J Emerg Med 26(3):282–286 55. Ilkhanizadeh B, Owji AA, Tavangar SM, Vasei M, Tabei SM (2001) Spot urine 5-hydroxy indole acetic acid and acute appendicitis. Hepatogastroenterology 48(39):609–613 56. Jangjoo A, Varasteh AR, Mehrabi Bahar M, Tayyebi Meibodi N, Esmaili H, Nazeri N, Aliakbarian M, Azizi SH (2012) Is urinary 5-hydroxyindoleacetic acid helpful for early diagnosis of acute appendicitis? Am J Emerg Med 30(4):540–544 57. Mihmanli M, Uysalol M, Coskun H, Demir U, Dilege E, Eroglu T (2004) The value of 5-hydroxyindolacetic acid levels in spot urine in the diagnosis of acute appendicitis. Turk J Trauma Emerg Surg TJTES 10(3):173–176 58. Oruc MT, Kulah B, Ozozan O, Ozer V, Kulacoglu H, Turhan T, Coskun F (2004) The value of 5-hydroxy indole acetic acid measurement in spot urine in diagnosis of acute appendicitis. East Afr Med J 81(1):40–41 59. Sarhan H, Hatroosh A, Alobaidi A (2013) The role of urinary 5-hydroxyindoleacetic acid determination in diagnosis of acute appendicitis. J Investig Biochem 2(1):1 60. Sack U, Biereder B, Elouahidi T, Bauer K, Keller T, Trobs RB (2006) Diagnostic value of blood inflammatory markers for detection of acute appendicitis in children. BMC Surg 28(6):15 61. Muller S, Falch C, Wilhelm P, Hein D, Konigsrainer A, Kirschniak A (2014) Diagnostic accuracy of hyperbilirubinaemia in anticipating appendicitis and its severity. Emerg Med J 32(9):698–702 62. Gavela T, Cabeza B, Serrano A, Casado-Flores J (2012) C-re- active protein and procalcitonin are predictors of the severity of acute appendicitis in children. Pediatr Emerg Care 28(5):416–419 63. Chambers AC, Bismohun SL, Davies H, White P, Patil AV (2015) Predictive value of abnormally raised serum bilirubin in acute appendicitis: a cohort study. Int J Surg 13:207–210 64. Socea B, Carap A, Rac-Albu M, Constantin V (2013) The value of serum bilirubin level and of white blood cell count as severity markers for acute appendicitis. Chirugia 108(6):829–834 65. Nomura S, Watanabe M, Komine O, Shioya T, Toyoda T, Bou H, Shibuya T, Suzuki H, Uchida E (2014) Serum total bilirubin elevation is a predictor of the clinicopathological severity of acute appendicitis. Surg Today 44(6):1104–1108 66. Khan S (2006) Evaluation of hyperbilirubinemia in acute inflammation of appendix: a prospective study of 45 cases. Kathmandu Univ Med J 4(3):281–289 67. Kentsis A, Ahmed S, Kurek K, Brennan E, Bradwin G, Steen H, Bachur R (2012) Detection and diagnostic value of urine leucine- rich a-2-glycoprotein in children with suspected acute appen- dicitis. Ann Emerg Med 60(1):78–83 68. Siddique K, Baruah P, Bhandari S, Mirza S, Harinath G (2011) Diagnostic accuracy of white cell count and C-reactive protein for assessing the severity of paediatric appendicitis. JRSM Short Rep 2(7):59 69. Yokoyama S, Takifuji K, Hotta T, Matsuda K, Nasu T, Nakamore M, Hirabayashi N, Kinoshita H, Yamaue H (2009) C-Reactive protein is an independent surgical indication marker for appen- dicitis: a retrospective study. World J Emerg Surg 4:36 70. Al-gaithy ZK (2012) Clinical value of total white blood cells and neutrophil counts in patients with suspected appendicitis: retro- spective study. World J Emerg Surg 7(1):32 71. Zyluk A, Ostrowski P (2011) An analysis of factors influencing accuracy of the diagnosis of acute appendicitis. Pol Przegl Chir 83(3):135–143 72. Shera AH, Nizami FA, Malik AA, Naikoo ZA, Wani MA (2011) Clinical scoring system for diagnosis of acute appendicitis in children. Indian J Pediatr 78(3):287–290 73. Tan WJ, Acharyya S, Goh YC, Chan WH, Wong WK, Ooi LL, Ong HS (2014) Prospective comparison of the Alvarado score and CT scan in the evaluation of suspected appendicitis: a pro- posed algorithm to guide CT use. J Am Coll Surg 220(2): 218–224 74. Mettler FA Jr, Huda W, Yoshizumi TT, Mahesh M (2008) Effective doses in radiology and diagnostic nuclear medicine: a catalog. Radiology 248(1):254–263 75. McKay R, Shepherd J (2007) The use of the clinical scoring system by Alvarado in the decision to perform computed tomography for acute appendicitis in the ED. Am J Emerg Med 25(5):489–493 Surg Endosc (2017) 31:1022–1031 1031 123 Biomarkers of acute appendicitis: systematic review and cost--benefit trade-off analysis Abstract Background Methods Results Conclusion Materials and methods Literature search strategy Literature standard Biomarker survey Statistical methodology Cost--benefit trade-off analysis Results Literature search QUADAS-2 evaluation Pooled analysis for individual serum biomarkers in acute appendicitis White cell count C-reactive protein Bilirubin Pro-calcitonin IL-6 Pooled analysis for 5-HIAA from urine in acute appendicitis Biomarker survey Cost--benefit trade-off Discussion Conclusion Funding References