Commentary

 

How to Develop a Validated Geographic Search Filter: Five Key Steps

 

Lynda Ayiku

Information Specialist

National Institute for Health and Care Excellence (NICE)

Manchester, United Kingdom

Email: lynda.ayiku@nice.org.uk

 

Jenny Craven

Information Specialist

National Institute for Health and Care Excellence (NICE)

Manchester, United Kingdom

Email: jenny.craven@nice.org.uk

 

Thomas Hudson

Information Specialist

National Institute for Health and Care Excellence (NICE)

Manchester, United Kingdom

Email: thomas.hudson@nice.org.uk

 

Paul Levay

Information Specialist

National Institute for Health and Care Excellence (NICE)

Manchester, United Kingdom

Email: paul.levay@nice.org.uk

 

Received: 5 Sept. 2019                                                                  Accepted: 24 Jan. 2020

 

 

cc-ca_logo_xl 2020 Ayiku, Craven, Hudson, and Levay. This is an Open Access article distributed under the terms of the Creative Commons‐Attribution‐Noncommercial‐Share Alike License 4.0 International (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly attributed, not used for commercial purposes, and, if transformed, the resulting work is redistributed under the same or similar license to this one.

 

 

DOI: 10.18438/eblip29633

 


Introduction

 

The purpose of this commentary is to increase awareness of the existing validated geographic search filters and to encourage the creation of new filters for additional places in the world.

 

Search filters are collections of search terms that are designed to find evidence with a common feature (Glanville et al., 2008). They differ from search strategies because their retrieval ability has been tested (validated) against a set of relevant references (Glanville et al., 2008). This provides users with an indication of how successfully filters work for retrieving the type of evidence that they wish to identify.

 

Most filters aim to retrieve evidence with a specific study design (Damarell, May, Hammond, Sladek & Tieman, 2019). Information professionals will probably be most familiar with those for systematic reviews or randomized controlled trials. However, an increasing number of “topic search filters” have been developed for clinical conditions, demography, health care delivery issues, and geographic locations (Damarell et al., 2019).

 

Geographic search filters are applied to literature searches with the aim of retrieving evidence about geographic locations such as continents or countries. As of 2020, only three validated geographic filters are available in published literature (Glanville, Lefebvre & Wright, 2020):

 

1.       Spain: PubMed (Valderas, Mendivil, Parada, Losada‐Yáñez,& Alonso, 2006)

2.       Africa: PubMed and Embase (Pienaar, Grobler, Busgeeth, Eisinga, & Siegfried, 2011)

3.       UK: MEDLINE and Embase, OVID platform (Ayiku et al., 2017, 2019)

 

There are search strategies for other geographic locations that are labelled as “search filters”, but these have not been created and validated using recognized filter development methods (Ayiku et al., 2017).

 

Geographic restrictions are not always applied to searches with a geographic focus when validated geographic filters are unavailable. For instance, in a post-development study for the National Institute for Health and Care Excellence (NICE) UK filters, 100 UK-focused systematic reviews were identified that had no geographic restrictions in their searches (the searches were conducted before the UK filters were available publicly) (Ayiku & Finnegan, 2019). A potential reason for this is that information professionals may have concerns about excluding relevant geographic evidence by accident through the use of untested search approaches. However, when restrictions are not applied, references about a specific location need to be identified from a larger set of irrelevant geographic literature. This approach is time-consuming and inefficient.

 

Geographic filters enable effective and efficient literature searches for topics with a geographic focus. They can retrieve most of the evidence about a geographic region while limiting the retrieval of irrelevant references about other geographic regions (Ayiku et al., 2017, 2019). Geographic filters therefore save time and associated resource costs spent on selecting evidence for topics about specific regions.

 

Developing and Validating Geographic Search Filters: Five Key Steps

 

The following steps are based on filter development methodologies (Jenkins, 2004; Sampson et al., 2006; Glanville et al., 2008) in addition to the authors’ knowledge gained during the creation of the NICE UK filters for MEDLINE and Embase (Ayiku et al., 2017, 2019). The process for developing geographic filters is outlined in Figure 1.

 
Figure 1
Process for developing a geographic search filter.

Step 1. Define the Geographic Region

 

Official definitions can help to specify the geographic region for the filter if required.

 

Step 2. Find References for the Region

 

2a. Identifying References

 

A set of references about the geographic region for the filter is required to develop and validate geographic search filters. This set is called a “gold standard” (also known as a “reference set”) (Jenkins, 2004). Evidence based sources such as systematic reviews or guidelines usually provide descriptions about the geographic setting of the references that informed them. The gold standard set can be created by pooling relevant references that have informed evidence-based sources (Sampson et al., 2006). The aim is to enable the pragmatic collection of references that have been previously identified for the topic of the filter. This method of reference identification is used to validate filters via the “relative recall” approach and it is quicker than finding relevant references by hand searching journals (Sampson et al., 2006). However, hand searching can be used to create a gold standard set for geographic filters if preferred.

 

The authors identified references with a UK setting for the gold standard set from NICE guidance documents to develop the NICE UK filters (Ayiku et al., 2017, 2019).

2b. How Many References for the Gold Standard Set are Needed?

 

The authors advise that at least 300 references about a geographic location should be identified for the gold standard set. This is because it is possible that some references will not be available in the bibliographic database for the filter. In addition, the references will need to be divided into the following sets:

 

1.       Development set: used to create filters

2.       Validation set: used to validate filters

 

Sampson et al. (2006) suggest that at least 100 references are required to validate filters because this sample size will provide a reasonable confidence interval (assuming that the filter retrieves 90% of the validation set references). Finding a minimum of 300 references will help to ensure that there are at least 100 references for the validation set.

 

Step 3. Form the Gold Standard Set

 

3a. Locating References in the Bibliographic Database for the Filter

 

When 300 or more references have been identified, their availability in the database for the filter needs to be checked. To locate the references in the database, enter key bibliographic details (such as title and author) for each reference into the database. The references that are available will form the gold standard set.

 

The existing geographic filters have been designed for the PubMed, MEDLINE, and Embase bibliographic databases (Valderas et al., 2006; Pienaar et al., 2011; Ayiku et al., 2017, 2019). However, it may be appropriate to design a filter for another database if it is relevant to do so.

3b. Creating the Development and Validation Sets

 

Next, the references in the gold standard set need to be split into a development set and a validation set. For rigor, the references should be randomized prior to their division. To do this, assign each of the references a number (this could simply be their number order). A free online randomizer tool can be used to randomize the numbers. The authors used RANDOM.ORG (Randomness and Integrity Services Ltd, 2020) for the NICE UK filters (Ayiku et al., 2017, 2019).

 

Once the references have been randomized and divided, create two search strategies in the database for the filter; one for the development set references and another for the validation set references. For both search strategies, combine the references at the end using the OR Boolean operator. As an example, the NICE UK filter search strategies for the development set and validation set references were structured as follows:

 

1.       Langford I (author) AND “The potential effects of climate change on winter mortality in England and Wales” (title) AND 1995 (year)

2.       Chahal R (author) AND “A study of the morbidity, mortality and long-term survival following radical cystectomy and radical radiotherapy in the treatment of invasive bladder cancer in Yorkshire” (title) AND 2003 (year)

3.       Saka O (author) AND “Cost of stroke in the United Kingdom” (title) AND 2009 (year)

4.       Etc…

5.       1 OR 2 OR 3 OR 4…

 

Save both search strategies in the database account so that they can be re-run to test the retrieval ability of the filter during steps four and five.

 

Step 4. Develop Filter

 

4a. Development Set

 

The purpose of the development set references is to identify the most relevant search fields and search terms to create the geographic filter. Creating filters using fields and terms from the development set references will help to ensure that the most relevant details for the filter are identified (Hausner, Waffenschmidt, Kaiser, & Simon, 2012). Filters that are created in this way are known as “objectively-derived” filters (Jenkins, 2004). The authors used this approach to create the NICE UK filters (Ayiku et al., 2017, 2019).

 

Identifying Relevant Search Fields

 

An Excel spreadsheet can be used to identify relevant search fields from the development set references. If the filter is for an Ovid database, the “Excel sheet” export option can be used to transfer the database records for development set references into Excel. Using the “CSV” export option will work in a similar way to transfer database records into Excel if the filter is for PubMed.

 

In the Excel spreadsheet, the content for each search field from the development set database records is displayed in separate columns. The search fields that contain geographic setting details about your region of interest will be the relevant fields for your filter.

 

The most relevant fields found in Excel for the NICE UK filters (Ayiku et al., 2017, 2019) were:

 

·         Subject heading

·         Title

·         Abstract

·         Journal name

·         Institution

 

UK setting terms also appeared in the ‘country of publication’ field but it was not included in the final version of the filter. This is because several UK-based publishing companies produce journals that contain international content. However, it may be useful to add the ‘country of publication’ field if your filter is for a country in which publishing companies are more likely to publish geographic-specific content.

 

Identifying Search Terms

 

Once the relevant search fields have been identified, word frequency analysis can be conducted to find candidate geographic setting search terms for the filter. The authors used the WriteWords (2020) word and phrase counter tool to conduct the frequency analysis for the NICE UK filters (Ayiku et al., 2017, 2019). WriteWords (2020) is available for free online. Other free online counters are available such as DataBasic (Bhargava & D'Ignazio, 2020) and commercial counters can be used too.

 

For the NICE UK filters, the authors copied the content contained in each relevant search field from Excel and pasted it into WriteWords (2020) one field at a time. The frequency of single words up to phrases containing four words was then recorded for each field. Next, the high frequency words and phrases used to describe UK settings were examined. The most frequent UK settings identified from the development set references were:

 

·         Countries

·         Nationalities

·         Cities

·         UK National Health Service (NHS)

 

4b. Constructing the Filter

 

A geographic filter can be drafted once the relevant search fields and geographic setting terms have been identified. Save the draft filter in the database account so that it can be easily re-run to test its retrieval ability.

 

As an example of a geographic filter structure, an outline of the NICE UK filters is provided in Figure 2. The full NICE UK filters for MEDLINE and Embase can be found in published journal articles (Ayiku et al., 2017, 2019) and in the InterTASC Information Specialists’ Sub‐Group (ISSG) Search Filter Resource section on geographic search filters (Glanville et al., 2020).

Figure 2

Example structure for a geographic filter to retrieve evidence about a country.

4c. Internal Validity Test

 

When the geographic filter is drafted, the next step is to test how successfully it retrieves the references that were used to create it. This is known as an “internal validity” test (Jenkins, 2004). To do this, run the saved search strategy for the development set references. Next, run the saved search strategy for the draft filter and apply it to the development set search strategy using the AND Boolean operator. For example, the search strategy structure used to test the retrieval ability of the NICE UK filters was as follows:

 

1.       Langford I (author) AND “The potential effects of climate change on winter mortality in England and Wales” (title) AND 1995 (year)

2.       Chahal R (author) AND “A study of the morbidity, mortality and long-term survival following radical cystectomy and radical radiotherapy in the treatment of invasive bladder cancer in Yorkshire” (title) AND 2003 (year)

3.       Saka O (author) AND “Cost of stroke in the United Kingdom” (title) AND 2009 (year)

4.       Etc…

5.       1 OR 2 OR 3 OR 4…

6.       Draft UK geographic search filter

7.       5 AND 6

 

It is unlikely that the draft filter will retrieve all of the development set references because it is rare for search filters to have a 100% retrieval rate. For instance, some references will contain no details about their geographic setting in their database records (Ayiku et al., 2017, 2019).

 

If the draft filter retrieves all of the references in the development set, it can be validated using the instructions in step five. If the draft filter does not retrieve all of the references, the reasons why the missing references were not retrieved must be investigated. Carefully look through the database records for the missing references to see if any geographic setting details are contained within them. Consider making modifications to the filter to retrieve missing references that contain setting details for the region. Ensure that you record any changes you make to the filter and provide explanations about why the changes were made. Also make a record of any references that cannot retrieved by the draft filter and explain why the references were not retrieved. Save the final version of the filter in the database account so that it can be easily re-run to validate the filter (see step five).

 

Step 5. Validate Filter

 

Validation is the final process for filter development. The validation set contains references that have not been used previously to develop the filter and it is used to assess the filter’s “external validity” (Glanville et al., 2008). Validating filters using an independent set of references provides an indication of how well filters perform in retrieving relevant evidence in any search (Glanville et al., 2008).

 

To validate the filter, run the saved search strategy for the validation set references. Next, run the saved search strategy for the final version of the filter. Apply the filter to the validation set search strategy using the AND Boolean operator following the same example structure shown above in step four.

 

The filter’s recall can now be calculated. “Recall”, also known as “sensitivity”, is used to measure a filter’s ability to retrieve a set of known relevant references and it is calculated as follows (Jenkins, 2004):

 

·         Number of relevant records retrieved by filter/Total number of relevant records (× 100 to express as a percentage)

 

The term “relative recall” is more accurate than “recall” when the relative recall approach has been used to identify references pooled from multiple evidence based sources for the validation set (Sampson et al., 2006), however, in practice both terms are used.

 

It is unlikely that the filter will achieve 100% recall and the reasons why missing references were not retrieved should be investigated and recorded. There is no standard definition of “high” recall. However, 90% or above has been used as a threshold in previous studies (Beynon et al., 2013). The existing geographic filters performed as follows:

 

·         Spain filter: PubMed: 88.1% recall (Valderas et al., 2006)

·         Africa filters: PubMed: 74% recall, Embase: 73% recall (Pienaar et al., 2011)

·         NICE UK filters: MEDLINE UK filter: 99.5% recall, Embase UK filter: 99.8% recall (for references with UK identifiers) (Ayiku et al., 2017, 2019)

 

Note that no changes can be made to the filter once its recall against the validation set has been calculated. Another validation set containing at least 100 previously unused references will need to be created if filter modifications are required to increase recall. In this case, the former validation set becomes a “test set” that was used to inform the filter’s development.

 

Tips for Creating Filters

 

Seek Advice

 

It may be helpful to seek advice from a professional peer with relevant experience if needed.

 

Limiting Retrieval of Irrelevant Results

 

Some setting names for the geographic region of the filter may be found elsewhere in the world. Using the NOT Boolean operator can help to minimize the retrieval of irrelevant geographic references. For example, the NICE UK filters included the following strategy to help minimize the retrieval of irrelevant geographic references about the US: York NOT “New York” (Ayiku et al., 2017, 2019).

 

Language Variations

 

If relevant, use language variations for the geographic region. For instance, the Spain filter included the following language variations for the country: Spain, Espagne, Espana, and Spagna (Valderas et al., 2006).

 

Retrieving References by Language

 

Consider retrieving references by language if the filter is for a region with a language that is uncommon in other geographic locations. The search strategy to retrieve references by language is: “language.lg” for OVID databases or “language.la” for PubMed (e.g., welsh.lg or welsh.la). Add the language search strategy to the rest of the filter using the OR Boolean operator.

 

Share the Filter

 

The filter should be published along with the accompanying filter development processes to make it widely available. It will be added to the ISSG Search Filter Resource section on geographic search filters when it is published which will increase its dissemination (Glanville et al., 2020). In addition, the filter could be promoted at conferences and on social media.

 

Acknowledge Limitations

 

No filter is perfect, it is unlikely that the filter will achieve 100% recall. Make sure to explain why the filter does not retrieve certain geographic references so that users understand its limitations.

 

Keep the Filter Up to Date

 

Make sure that the filter is kept updated with any changes to the geographic setting terms. The updated filter may not be validated but the original recall level can still be considered as a baseline for this type of change.

 

Conclusion

 

Geographic search filters enable effective and efficient systematic literature searches for topics with a geographic focus. There are currently only three validated filters identified in the published literature for Spain, Africa and the UK (Glanville et al., 2020). The authors hope that this commentary has increased awareness of the existing filters and encourages the creation of new geographic filters for additional places in the world.

 

References

 

Ayiku, L., & Finnegan, A. (2019). OP23 smart searches for context-sensitive topics: Geographic search filters. International Journal of Technology Assessment in Health Care, 35(S1), 5. https://doi.org/10.1017/S0266462319000953

 

Ayiku, L., Levay, P., Hudson, T., Craven, J., Barrett, E., Finnegan, A., & Adams, R. (2017). The MEDLINE UK filter: Development and validation of a geographic search filter to retrieve research about the UK from OVID MEDLINE. Health Information and Libraries Journal, 34(3), 200–216. https://doi.org/10.1111/hir.12187

 

Ayiku, L., Levay, P., Hudson, T., Craven, J., Finnegan, A., Adams, R., & Barrett, E. (2019). The Embase UK filter: Validation of a geographic search filter to retrieve research about the UK from OVID Embase. Health Information and Libraries Journal, 36(2), 121-133. https://doi.org/10.1111/hir.12252

 

Beynon, R., Leeflang, M. M., McDonald, S., Eisinga, A., Mitchell, R. L., Whiting, P., & Glanville, J. M. (2013). Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database of Systematic Reviews (9), MR000022. https://doi.org/10.1002/14651858.mr000022.pub3

 

Bhargava, R., & D'Ignazio, C. (2020). DataBasic Word Counter. Emerson College and University of Massachusetts, Massachusetts, USA. Retrieved from  https://DataBasic.io/en/wordcounter/

 

Damarell, R. A., May, N., Hammond, S., Sladek R. M., & Tieman, J. J. (2019). Topic search filters: A systematic scoping review. Health Information and Libraries Journal, 36(1), 4-40. https://doi.org/10.1111/hir.12244

 

Glanville, J., Bayliss, S., Booth, A., Dunda, Y., Fernandes, H., Fleeman, N. D., Foster, L., Fraser, C., Fry-Smith, A., Golder, S., Lefebvre, C., Miller, C., Paisley, S., Payne, L., Price, A., Welch, K. (2008). So many filters, so little time: The development of a search filter appraisal checklist. Journal of the Medical Library Association, 96(4), 356–361. https://doi.org/10.3163/1536-5050.96.4.011.

 

Glanville, J., Lefebvre, C., & Wright, K. (2020). The InterTASC information specialists’ sub‐group search filter resource: Filters to find studies of geographic locations. ISSG Filters Resource. University of York, York, UK. Retrieved from https://sites.google.com/a/york.ac.uk/issg-search-filters-resource/other-filters/filters-to-find-studies-of-geographic-locations

 

Hausner E., Waffenschmidt, S., Kaiser, T., & Simon, M. (2012). Routine development of objectively derived search strategies. Systematic Reviews, 1(19). https://doi.org/10.1186/2046-4053-1-19

 

Jenkins, M. (2004). Evaluation of methodological search filters – A review. Health Information and Libraries Journal, 21(3), 148–163.

https://doi.org/10.1111/j.1471-1842.2004.00511.x

 

Pienaar, E., Grobler, L., Busgeeth, K., Eisinga, A., & Siegfried, N. (2011). Developing a geographic search filter to identify randomised controlled trials in Africa: Finding the optimal balance between sensitivity and precision. Health Information and Libraries Journal, 28(3), 210–215. https://doi.org/10.1111/j.1471-1842.2011.00936.x

 

Randomness and Integrity Services Ltd. (2020). RANDOM.ORG. Retrieved from https://www.random.org/

 

Sampson, M., Zhang, L., Morrison, A., Barrowman, N. J., Clifford, T. J., Platt, R. W., Klassen, T. P., & Moher, D. (2006). An alternative to the hand searching gold standard: Validating methodological search filters using relative recall. BMC Medical Research Methodology, 6(33). https://doi.org/10.1186/1471-2288-6-33

 

Valderas, J., Mendivil, J., Parada, A., Losada‐Yáñez, M. & Alonso, J. (2006). Development of a geographic filter for PubMed to identify studies performed in Spain. Revista Española de Cardiología, 59(12), 1244–1251. https://doi.org/10.1016/S1885-5857(07)60080-2

 

WriteWords. (2020). WriteWords Frequency Counters. Retrieved from http://www.writewords.org.uk/word_count.asp