A dataset of systematic review updates This is a repository copy of A dataset of systematic review updates. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/146321/ Version: Accepted Version Proceedings Paper: Alharbi, A. and Stevenson, R. orcid.org/0000-0002-9483-6006 (2019) A dataset of systematic review updates. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 21-25 Jul 2019, Paris, France. ACM , pp. 1257-1260. ISBN 978-1-4503-6172-9 https://doi.org/10.1145/3331184.3331358 © 2019 The Authors. This is an author-produced version of a paper subsequently published in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Uploaded in accordance with the publisher's self-archiving policy. eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. mailto:eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ A Dataset of Systematic Review Updates Amal Alharbi∗ Ki♪g Abdulaziz U♪iversity Jeddah, Saudi Arabia ahalharbi1@sheield.ac.uk ℧ark Steve♪so♪ U♪iversity of Sheield Sheield, U♪ited Ki♪gdom mark.steve♪so♪@sheield.ac.uk ABSTRACT Systematic reviews ide♪tify, summarise a♪d sy♪thesise evide♪ce releva♪t to speciic research questio♪s. They are widely used i♪ the ield of medici♪e where they i♪form health care choices of both professio♪als a♪d patie♪ts. It is importa♪t for systematic reviews to stay up to date as evide♪ce cha♪ges but this is challe♪gi♪g i♪ a ield such as medici♪e where a large ♪umber of publicatio♪s appear o♪ a daily basis. Developi♪g methods to support the updati♪g of reviews is importa♪t to reduce the workload required a♪d thereby e♪sure that reviews remai♪ up to date. This paper describes a dataset of systematic review updates i♪ the ield of medici♪e created usi♪g 25 Cochra♪e reviews. Each review i♪cludes the Boolea♪ query a♪d rel- eva♪ce judgeme♪ts for both the origi♪al a♪d updated versio♪s. The dataset ca♪ be used to evaluate approaches to study ide♪tiicatio♪ for review updates. KEYWORDS Systematic review; systematic review update; test collectio♪; evalu- atio♪ ACM Reference Format: Amal Alharbi a♪d ℧ark Steve♪so♪. 2019. A Dataset of Systematic Review Updates. I♪ Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), July 21–25, 2019, Paris, France. AC℧, New York, NY, USA, 4 pages. https:⁄⁄doi.org⁄10. 1145⁄3331184.3331358 1 INTRODUCTION Systematic reviews are widely used i♪ the ield of medici♪e where they are used to i♪form treatme♪t decisio♪s a♪d health care choices. They are based o♪ assessme♪t of evide♪ce about a research questio♪ which is available at the time the review is created. Reviews ♪eed to be updated as evide♪ce cha♪ges to co♪ti♪ue to be useful. However, the volume of publicatio♪s that appear i♪ the ield of medici♪e o♪ a daily basis makes this diicult [2]. I♪ fact, it has bee♪ estimated that 7% of systematic reviews are already out of date by the time of publicatio♪ a♪d almost a quarter (23%) two years after they have appeared [19]. A review ca♪ be updated at a♪y poi♪t after it has bee♪ cre- ated a♪d would ideally be carried out whe♪ever ♪ew evide♪ce be- comes available but the efort required makes this impractical. The ∗Curre♪tly studyi♪g at the U♪iversity of Sheield SIGIR ’19, July 21–25, 2019, Paris, France © 2019 Copyright held by the ow♪er⁄author(s). Publicatio♪ rights lice♪sed to AC℧. This is the author's versio♪ of the work. It is posted here for your perso♪al use. Not for redistributio♪. The dei♪itive Versio♪ of Record was published i♪ Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19), July 21–25, 2019, Paris, France, https:⁄⁄doi.org⁄10.1145⁄3331184. 3331358. Cochra♪e Collaboratio♪ recomme♪ds that reviews should be up- dated every two years. Cochra♪e's Livi♪g Evide♪ce Network have rece♪tly started developi♪g livi♪g systematic reviews for which evide♪ce is reviewed freque♪tly (♪ormally mo♪thly) [7] but it is u♪- clear whether this efort is sustai♪able. The Age♪cy for Healthcare Research a♪d Quality suggests that reviews are updated depe♪di♪g o♪ ♪eed, priority a♪d the availability of ♪ew evide♪ce [15]. The process that is applied to update a systematic review is similar to the o♪e used to create a ♪ew review [6]. A search query is ru♪ a♪d the resulti♪g citatio♪s scree♪ed i♪ a two stage process. I♪ the irst stage (abstract screening) o♪ly the title a♪d abstract of the papers retrieved by the Boolea♪ search are exami♪ed. It is commo♪ for the majority of papers to be removed from co♪sideratio♪ duri♪g the abstract scree♪i♪g stage. The remai♪i♪g papers are co♪sidered i♪ a seco♪d stage (content screening) duri♪g which the full papers is exami♪ed. If a♪y ♪ew releva♪t studies are fou♪d the♪ data is extracted a♪d i♪tegrated i♪to the review. The review's i♪di♪gs are also updated if the evide♪ce is fou♪d to have cha♪ged from the previous versio♪. The scree♪i♪g stages are o♪e of the most time co♪sumi♪g parts of this process si♪ce a♪ experie♪ced reviewer takes at least 30 seco♪ds to review a♪ abstract a♪d substa♪tially lo♪ger for complex topics [22]. The problem is made more acute by the fact that the search queries used for systematic reviews are desig♪ed to maximise recall, with precisio♪ a seco♪dary co♪cer♪, while the volume of medical publicatio♪s i♪creases rapidly. Developi♪g methods to support the updati♪g of reviews are therefore required to reduce the workload required a♪d thereby e♪sure that reviews remai♪ up to date. However, previous work o♪ the applicatio♪ of I♪formatio♪ Retrieval (IR) to the systematic review process has o♪ly paid limited atte♪tio♪ to the problem of updati♪g reviews (see Sectio♪ 2). This paper describes a dataset created for evaluati♪g automated methods applied to the problem of ide♪tifyi♪g releva♪t evide♪ce for the updati♪g of systematic reviews. It is, to our k♪owledge, the irst resource made available for this purpose. I♪ additio♪, this paper also reports performa♪ce of some baseli♪e approaches applied to the dataset. The dataset described i♪ this paper is available from https:⁄⁄github.com⁄Amal-Alharbi⁄Systematic↓Reviews↓Update. 2 RELATED WORK A sig♪iica♪t ♪umber of previous studies have demo♪strated the useful♪ess of IR tech♪iques to reduce the workload i♪volved i♪ the systematic review scree♪i♪g process for ♪ew reviews, for exam- ple [3, 5, 12±14, 16, 17, 22]. A ra♪ge of datasets have bee♪ made available to support the developme♪t of automated methods for study ide♪tiicatio♪. Ωidely used datasets i♪clude o♪e co♪tai♪i♪g 15 systematic reviews about drug class eicie♪cy [3] a♪d a♪other https://doi.org/10.1145/3331184.3331358 https://doi.org/10.1145/3331184.3331358 https://doi.org/10.1145/3331184.3331358 https://doi.org/10.1145/3331184.3331358 https://github.com/Amal-Alharbi/Systematic_Reviews_Update co♪tai♪i♪g two reviews (o♪ Chro♪ic Obstructive Pulmo♪ary Dis- ease a♪d Proto♪ Beam therapy) [22]. Rece♪tly the CLEF eHealth track o♪ Tech♪ology Assisted Reviews i♪ Empirical ℧edici♪e [9, 20] developed datasets co♪tai♪i♪g 72 topics created from diag♪ostic test accuracy systematic reviews produced by the Cochra♪e Col- laboratio♪. A♪other test collectio♪ has also bee♪ derived from 94 Cochra♪e reviews [18]. However, ♪o♪e of these datasets focus o♪ the review updates. O♪ly a few previous studies have explored the use of IR tech- ♪iques to support the problem of updati♪g reviews [3, 11, 21]. I♪ the majority of cases this work has bee♪ evaluated agai♪st simulatio♪s of the update process, for example by łtime slici♪g" the i♪cluded studies a♪d treati♪g those that appeared i♪ the three years before review publicatio♪ as bei♪g added i♪ a♪ update [11]. A♪ excep- tio♪ is work that used update i♪formatio♪ for ♪i♪e drug therapy systematic reviews [4], but this dataset is ♪ot publicly available. To the best of our k♪owledge there is ♪o accessible dataset that focuses o♪ the problem of ide♪tifyi♪g studies for i♪clusio♪ i♪ a re- view update. The problem is subtly difere♪t from the ide♪tiicatio♪ of studies for i♪clusio♪ i♪ a ♪ew review si♪ce releva♪ce judgeme♪ts are available (from the origi♪al review) which have the pote♪tial to improve performa♪ce. A suitable dataset for this problem would i♪- clude the list of studies co♪sidered for i♪clusio♪ i♪ both the origi♪al a♪d updated reviews together with a list of the studies that were ac- tually i♪cluded i♪ each review. This paper describes such a resource. 3 DATASET The dataset is co♪structed usi♪g systematic reviews from the Cochra♪e Database of Systematic Reviews1, a sta♪dard source of evide♪ce to i♪form healthcare decisio♪-maki♪g. I♪terve♪tio♪ reviews, that is reviews which assess the efective♪ess of a particular healthcare i♪terve♪tio♪ for a disease, are the most commo♪ type of reviews carried out by Cochra♪e. A set of 25 published i♪terve♪tio♪ sys- tematic reviews were selected for i♪clusio♪ i♪ the dataset. Reviews i♪cluded i♪ the dataset must have bee♪ available i♪ a♪ origi♪al a♪d updated versio♪ (i.e. a♪ updated versio♪ of the review has bee♪ published) a♪d at least o♪e ♪ew releva♪t study ide♪tiied duri♪g the abstract scree♪i♪g stage for the update. The followi♪g i♪formatio♪ was automatically extracted from each review: (1) review title, (2) Boolea♪ query, (3) set of i♪cluded a♪d excluded studies (for both the origi♪al a♪d updated versio♪s) a♪d (4) update history (i♪cludi♪g publicatio♪ date a♪d URL of origi- ♪al a♪d updated versio♪s). 3.1 Boolean Query Ca♪didate studies for i♪clusio♪ i♪ systematic reviews are ide♪tiied usi♪g Boolea♪ queries co♪structed by domai♪ experts. These queries are desig♪ed to optimise recall si♪ce reviews aim to ide♪tify a♪d assess all releva♪t evide♪ce. Queries are ofte♪ complex a♪d i♪clude operators such as AND, OR a♪d NOT, i♪ additio♪ to adva♪ced operators such as wildcard, explosio♪ a♪d tru♪catio♪ [10]. Boolea♪ queries i♪ the reviews i♪cluded i♪ the dataset are created for either the OVID or Pub℧ed i♪terfaces to the ℧EDLINE database of medical literature. For ease of processi♪g, each OVID query was 1https:⁄⁄www.cochra♪elibrary.com⁄cdsr⁄about-cdsr automatically co♪verted to a si♪gle-li♪e Pub℧ed query usi♪g a Pytho♪ script created speciically for this purpose (see Figure 1). (a) Multi-line query in OVID format 1. endometriosis/ 2. (adenomyosis or endometrio$).tw. 3. or/1-2 (b) One-line PubMed translation endometriosis[Mesh:NoExp] OR adenomyosis[Text Word] OR endometrio*[Text Word] Figure 1: Example portion of Boolean query [8] in (a) OVID format and (b) its translation into single-line PubMed for- mat. This portion of the query contains three clauses and the last clause represents the combining results of clause 1 and 2 in a disjunction (OR). 3.2 Included and Excluded Studies For each versio♪ of the reviews (origi♪al a♪d updated) the dataset i♪cludes a list of all the studies that were i♪cluded after each stage of the scree♪i♪g process (abstract a♪d co♪te♪t). The set of studies i♪cluded after the co♪te♪t level scree♪i♪g is a subset of those i♪- cluded after abstract scree♪i♪g a♪d represe♪ts the studies i♪cluded i♪ the updated review. I♪cluded a♪d excluded studies are listed i♪ the dataset as P℧IDs (u♪ique ide♪tiiers for Pub℧ed citatio♪s that make it straightfor- ward to access details about the publicatio♪). If the P℧ID for a study was listed i♪ the systematic review (which accou♪ted for a majority of cases) the♪ it was used. If it was ♪ot the♪ the title of the study a♪d year of publicatio♪ were used to form a query that is used to search Pub℧ed (see Figure 2). If the e♪tire text of the title, publicatio♪ year a♪d volume of the retrieved record match the details listed i♪ the systematic review the♪ the P℧ID of that citatio♪ is used. Study title: Cli♪ical experie♪ce treati♪g e♪dometriosis with ♪afareli♪. Publication Year: 1989 Search Query: clinical[Title] AND experience[Title] AND treati♪g[Title] AND endometriosis[Title] AND nafarelin [Title] AND 1989[Date - Publication] Figure 2: Example of search query generated from title and publication year for study from Topic CD000155 [8]. 3.3 Update History Details of the date of publicatio♪ of each versio♪ (origi♪al a♪d update) are also extracted a♪d i♪cluded. 3.4 Dataset Characteristics Descriptive statistics for the 25 systematic reviews that form the dataset are show♪ i♪ Table 1. It is worth drawi♪g atte♪tio♪ to the small ♪umber of studies i♪cluded after the i♪itial abstract scree♪i♪g stage. Table 1: List of the 25 systematic reviews with the total num- ber of studies returned by the query (Total) and the num- ber included following the abstract (Abs) and content (Cont) screening stages. The average (unweighted mean) number of studies is shown in the bottom row. Note that for the up- dated review, the number of included studies in the table lists only the new studies that were added during the update. Original Review Updated Review Review Total Abs Cont Total Abs Cont CD000155 397 42 14 101 6 4 CD000160 433 7 6 1980 1 1 CD000523 34 6 3 18 1 1 CD001298 1384 22 15 1020 17 13 CD001552 2082 2 2 844 2 2 CD002064 38 2 2 9 1 0 CD002733 13778 30 10 6109 6 6 CD004069 951 5 2 771 9 7 CD004214 57 5 2 21 4 1 CD004241 838 25 9 193 5 3 CD004479 112 6 1 153 4 3 CD005025 1524 43 8 1309 46 4 CD005055 648 8 4 353 3 0 CD005083 462 46 16 107 9 2 CD005128 25873 5 4 5820 9 3 CD005426 6289 13 8 1413 3 0 CD005607 851 11 7 103 2 1 CD006839 239 8 6 93 3 3 CD006902 290 18 6 106 10 5 CD007020 348 47 4 47 4 3 CD007428 157 7 3 190 9 3 CD008127 5460 7 0 6720 2 1 CD008392 5548 15 5 1095 2 0 CD010089 41675 22 10 4514 4 0 CD010847 571 15 1 111 6 0 Average 4402 17 6 1335 7 3 4 EXPERIMENTS AND RESULTS Experime♪ts were co♪ducted to establish baseli♪e performa♪ce ig- ures for the dataset. The aim is to reduce workload i♪ the scree♪i♪g stage of the review update by ra♪ki♪g the list of studies retrieved by the Boolea♪ query. Performa♪ce at both abstract a♪d co♪te♪t scree♪i♪g levels was explored. The collectio♪ was created by usi♪g the Boolea♪ query to search ℧EDLINE usi♪g the Entrez package from biopython.org. The list of studies i♪cluded after abstract scree♪i♪g was used as the releva♪ce judgeme♪ts for abstract level evaluatio♪ a♪d the list of studies i♪cluded after the co♪te♪t scree♪i♪g was used for co♪te♪t level evaluatio♪. 4.1 Approaches 4.1.1 Baseline uery. A łbaseli♪e query" was formed usi♪g the review title a♪d terms extracted from the Boolea♪ query. This query is passed to B℧25 [1] to ra♪k the set of studies retur♪ed from the Boolea♪ query for the review update. 4.1.2 Relevance Feedback. A feature of the problem of ide♪tify- i♪g studies for i♪clusio♪ i♪ updates of systematic reviews is that a sig♪iica♪t amou♪t of k♪owledge about which studies are suit- able is available from the origi♪al review a♪d this i♪formatio♪ was exploited usi♪g releva♪ce feedback. Rocchio's algorithm [1] was used to reformulate the baseli♪e query by maki♪g use of releva♪ce judgeme♪ts derived from the origi♪al review. Co♪te♪t scree♪i♪g judgeme♪ts (i♪cluded a♪d excluded studies) were used for the ma- jority of reviews. Abstract scree♪i♪g judgeme♪ts were used if these were ♪ot available, i.e. ♪o studies remai♪ed after co♪te♪t scree♪i♪g. 4.2 Evaluation Metrics ℧ea♪ average precisio♪ (℧AP) a♪d work saved over sampli♪g (ΩSS) are the metrics most commo♪ly used to evaluate approaches to study ide♪tiicatio♪ for systematic reviews, e.g. [5, 9, 20]. ℧AP represe♪ts the mea♪ of the average precisio♪ scores over all reviews. ΩSS measures the work saved to retrieve a dei♪ed perce♪tage of the i♪cluded studies. For example ΩSS@95 measures the work saved to retrieve 95% of the i♪cluded studies. ΩSS at recall 95 a♪d 100 (ΩSS@95 a♪d ΩSS@100) was used for the experime♪ts reported i♪ this paper. 4.3 Results Results of the experime♪t are show♪ i♪ Table 2. As expected, perfor- ma♪ce improves whe♪ releva♪ce feedback is used. The scree♪i♪g efort required to ide♪tify all releva♪t studies (100% recall) is re- duced by 63.5% at abstract level a♪d 74.9% at co♪te♪t level. This demo♪strates that maki♪g use of i♪formatio♪ from the origi♪al review ca♪ improve study selectio♪ for review updati♪g. Table 2: Performance ranking abstracts for updated reviews at (a) abstract and (b) content levels. Results are computed across all reviews at abstract level (25 reviews) and only across reviews in which a new study was added in the up- dated version for content level (19 reviews). Approach MAP WSS@95 WSS@100 (a) abstract level (25 reviews) Baseli♪e Query 0.213 51.70% 56.60 % Releva♪ce Feedback 0.413 58.80% 63.50% (b) content level (19 reviews) Baseli♪e Query 0.260 65.50% 70.50% Releva♪ce Feedback 0.382 69.90% 74.90% Figure 3 shows the results of AP scores for all 25 reviews. Rele- va♪ce feedback improved AP for 23 (92%) of the reviews. There are also four reviews where the use of releva♪ce feedback produced a♪ AP score of 1, i♪dicati♪g that the approach reduces work required by up to 99.9%. 5 CONCLUSION Updati♪g systematic reviews is a♪ importa♪t problem but o♪e which has largely bee♪ overlooked. This paper described a dataset co♪tai♪- i♪g 25 i♪terve♪tio♪ reviews from the Cochra♪e collaboratio♪ that Figure 3: Abstract screening AP scores for each review using Baseline Query and Relevance Feedback. ca♪ be used to support the developme♪t of approaches to automate the updati♪g process. The title, Boolea♪ query, releva♪ce judge- me♪ts for both the origi♪al a♪d the updated versio♪s are i♪cluded for each systematic review. Sta♪dard approaches were applied to the dataset i♪ order to es- tablish baseli♪e performa♪ce igures. Experime♪ts demo♪strated that i♪formatio♪ from the origi♪al review ca♪ be used to improve study selectio♪ for systematic review updates. It is hoped that this resource will e♪courage further research i♪to the developme♪t of ap- proaches that support the updati♪g of systematic reviews, thereby helpi♪g to keep them up to date a♪d valuable. REFERENCES [1] Ricardo Baeza-Yates a♪d Berthier Ribeiro-Neto. 2011. Modern Information Re- trieval (2♪d ed.). Addiso♪-Ωesley Publishi♪g Compa♪y, Bosto♪, ℧A, USA. [2] Hilda Bastia♪, Paul Glasziou, a♪d Iai♪ Chalmers. 2010. Seve♪ty-Five Trials a♪d Eleve♪ Systematic Reviews a Day: How Ωill Ωe Ever Keep Up? PLOS Medicine 7, 9 (Sep 2010), 1±6. https:⁄⁄doi.org⁄10.1371⁄jour♪al.pmed.1000326 [3] Aaro♪ Cohe♪. 2008. Optimizi♪g feature represe♪tatio♪ for automated systematic review work prioritizatio♪. AMIA ... Annual Symposium proceedings (2008), 121± 125. [4] Aaro♪ Cohe♪, Kyle Ambert, a♪d ℧aria♪ ℧cDo♪agh. 2012. Studyi♪g the pote♪tial impact of automated docume♪t classiicatio♪ o♪ scheduli♪g a systematic review update. BMC Medical Informatics and Decision Making 12, 1 (2012), 33. https: ⁄⁄doi.org⁄10.1186⁄1472-6947-12-33 [5] Aaro♪ Cohe♪, Ωilliam Hersh, Kim Peterso♪, a♪d Po-Yi♪ Ye♪. 2006. Reduci♪g workload i♪ systematic review preparatio♪ usi♪g automated citatio♪ classiicatio♪. Journal of the American Medical Informatics Association : JAMIA 13, 2 (2006), 206± 19. https:⁄⁄doi.org⁄10.1197⁄jamia.℧1929 [6] ℧ark R Elki♪s. 2018. Updati♪g systematic reviews. Journal of Physiotherapy 64, 1 (2018), 1±3. https:⁄⁄doi.org⁄10.1016⁄j.jphys.2017.11.009 [7] Julia♪ H. Elliott, A♪♪eliese Sy♪♪ot, Tari Tur♪er, ℧ark Simmo♪ds, Elie A. Akl, et al. 2017. Livi♪g systematic review: 1. I♪troductio♪ - the why, what, whe♪, a♪d how. Journal of Clinical Epidemiology 91 (November 2017), 23±30. https: ⁄⁄doi.org⁄10.1016⁄J.JCLINEPI.2017.08.010 [8] Edward Hughes, Julie Brow♪, Joh♪ Colli♪s, Ci♪dy Farquhar, Do♪♪a Fedorkow, et al. 2007. Ovulatio♪ suppressio♪ for e♪dometriosis for wome♪ with subfertil- ity. Cochrane Database of Systematic Reviews 3 (2007). https:⁄⁄doi.org⁄10.1002⁄ 14651858.CD000155.pub2 [9] Eva♪gelos Ka♪oulas, Da♪ Li, Leif Azzopardi, a♪d Re♪e Spijker. 2017. CLEF 2017 tech♪ologically assisted reviews i♪ empirical medici♪e overview. I♪ Working Notes of CLEF 2017 - Conference and Labs of the Evaluation forum, Dublin, Ireland, September 11-14, 2017, CEUR Workshop Proceedings, Vol. 1866. 1±29. [10] Sarv♪az Karimi, Stefa♪ Pohl, Falk Scholer, Lawre♪ce Cavedo♪, a♪d Justi♪ Zobel. 2010. Boolea♪ versus ra♪ked queryi♪g for biomedical systematic reviews. BMC medical informatics and decision making 10, 1 (2010), 1±20. https:⁄⁄doi.org⁄10. 1186⁄1472-6947-10-58 [11] ℧adia♪ Khabsa, Ahmed Elmagarmid, Ihab Ilyas, Hossam Hammady, ℧ourad Ouzza♪i, et al. 2016. Lear♪i♪g to ide♪tify releva♪t studies for systematic reviews usi♪g ra♪dom forest a♪d exter♪al i♪formatio♪. Machine Learning 102, 3 (℧ar 2016), 465±482. https:⁄⁄doi.org⁄10.1007⁄s10994-015-5535-7 [12] Halil Kilicoglu, Di♪a Dem♪er-Fushma♪, Thomas C Ri♪dlesch, Na♪cy Ωilczy♪ski, a♪d Bria♪ Hay♪es. 2009. Towards automatic recog♪itio♪ of scie♪tiically rigorous cli♪ical research evide♪ce. AMIA 16 (2009), 25±31. https:⁄⁄doi.org⁄10.1197⁄jamia. ℧2996 [13] Seu♪ghee Kim a♪d Ji♪wook Choi. 2014. A♪ SV℧-based high-quality article classiier for systematic reviews. Journal of Biomedical Informatics 47 (2014), 153±159. [14] Atha♪asios Lagopoulos, A♪to♪ios A♪ag♪ostou, Adama♪tios ℧i♪as, a♪d Grig- orios Tsoumakas. 2018. Lear♪i♪g-to-Ra♪k a♪d Releva♪ce Feedback for Litera- ture Appraisal i♪ Empirical ℧edici♪e. I♪ Experimental IR Meets Multilinguality, Multimodality, and Interaction - 9th International Conference of the CLEF Asso- ciation, CLEF 2018, Avignon, France, September 10-14, 2018, Proceedings. 52±63. https:⁄⁄doi.org⁄10.1007⁄978-3-319-98932-7↓5 [15] Ersilia Luce♪teforte, Alessa♪dra Bettiol, Salvatore De ℧asi, a♪d Gia♪♪i Virgili. 2018. Updating Diagnostic Test Accuracy Systematic Reviews: Which, When, and How Should They Be Updated? Spri♪ger I♪ter♪atio♪al Publishi♪g, Cham, 205±227. https:⁄⁄doi.org⁄10.1007⁄978-3-319-78966-8↓15 [16] David ℧arti♪ez, Sarv♪az Karimi, Lawre♪ce Cavedo♪, a♪d Timothy Baldwi♪. 2008. Facilitati♪g biomedical systematic reviews usi♪g ra♪ked text retrieval a♪d classiicatio♪. I♪ 13th Australasian Document Computing Symposium (ADCS). Hobart Tasma♪ia, 53±60. [17] ℧akoto ℧iwa, James Thomas, Aliso♪ O'℧ara-Eves, a♪d Sophia A♪a♪iadou. 2014. Reduci♪g systematic review workload through certai♪ty-based scree♪i♪g. Journal of Biomedical Informatics 51 (2014), 242±253. https:⁄⁄doi.org⁄10.1016⁄j.jbi.2014. 06.005 [18] Harrise♪ Scells, Guido Zucco♪, Beva♪ Koopma♪, A♪tho♪y Deaco♪, Leif Azzopardi, et al. 2017. A test collectio♪ for evaluati♪g retrieval of studies for i♪clusio♪ i♪ systematic reviews. I♪ 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Tokyo, Japa♪, 1237±1240. https: ⁄⁄doi.org⁄10.1145⁄3077136.3080707 [19] Kaveh G Shoja♪ia, ℧argaret Sampso♪, ℧ohammed T A♪sari, Ju♪ Ji, Steve Doucette, et al. 2007. How quickly do systematic reviews go out of date? A survival a♪alysis. Annals of Internal Medicine 147 (2007), 224±233. https: ⁄⁄doi.org⁄10.7326⁄0003-4819-147-4-200708210-00179 [20] Ha♪♪a Suomi♪e♪, Liadh Kelly, Lorrai♪e Goeuriot, Eva♪gelos Ka♪oulas, Leif Azzopardi, et al. 2018. Overview of the CLEF eHealth Evaluatio♪ Lab 2018. I♪ Experimental IR Meets Multilinguality, Multimodality, and Interaction. Spri♪ger I♪ter♪atio♪al Publishi♪g, Cham, 286±301. [21] Byro♪ C Ωallace, Kevi♪ Small, Carla E Brodley, Joseph Lau, Christopher H Schmid, et al. 2012. Toward moder♪izi♪g the systematic review pipeli♪e i♪ ge♪etics: eicie♪t updati♪g via data mi♪i♪g. Genetics in Medicine 14 (2012), 663. https:⁄⁄doi.org⁄10.1038⁄gim.2012.7 [22] Byro♪ C Ωallace, Thomas A Trikali♪os, Joseph Lau, Carla E Brodley, a♪d Christo- pher H Schmid. 2010. Semi-automated scree♪i♪g of biomedical citatio♪s for systematic reviews. BMC Bioinformatics (2010). https://doi.org/10.1371/journal.pmed.1000326 https://doi.org/10.1186/1472-6947-12-33 https://doi.org/10.1186/1472-6947-12-33 https://doi.org/10.1197/jamia.M1929 https://doi.org/10.1016/j.jphys.2017.11.009 https://doi.org/10.1016/J.JCLINEPI.2017.08.010 https://doi.org/10.1016/J.JCLINEPI.2017.08.010 https://doi.org/10.1002/14651858.CD000155.pub2 https://doi.org/10.1002/14651858.CD000155.pub2 https://doi.org/10.1186/1472-6947-10-58 https://doi.org/10.1186/1472-6947-10-58 https://doi.org/10.1007/s10994-015-5535-7 https://doi.org/10.1197/jamia.M2996 https://doi.org/10.1197/jamia.M2996 https://doi.org/10.1007/978-3-319-98932-7_5 https://doi.org/10.1007/978-3-319-78966-8_15 https://doi.org/10.1016/j.jbi.2014.06.005 https://doi.org/10.1016/j.jbi.2014.06.005 https://doi.org/10.1145/3077136.3080707 https://doi.org/10.1145/3077136.3080707 https://doi.org/10.7326/0003-4819-147-4-200708210-00179 https://doi.org/10.7326/0003-4819-147-4-200708210-00179 https://doi.org/10.1038/gim.2012.7 Abstract 1 Introduction 2 Related work 3 Dataset 3.1 Boolean Query 3.2 Included and Excluded Studies 3.3 Update History 3.4 Dataset Characteristics 4 Experiments and Results 4.1 Approaches 4.2 Evaluation Metrics 4.3 Results 5 Conclusion References