Conducting a Search
Last updated: August 11th 2020
Key CEE Standards for Conduct and Reporting
Systematic Reviews and Systematic Maps:
- Sources of articles used should capture both commercially published scientific literature and grey literature (may or may not be peer-reviewed).
- Comprehensiveness of the search should be demonstrated by a series of tests using samples of the relevant literature to demonstrate adequate sensitivity.
- All search terms and/or strings, Boolean operators (‘AND’, ‘OR’ etc.) and wildcards should be clearly provided (in text or additional files) so that the exact search is replicable by a third party.
- Comprehensive information should be given about the databases and websites searched and search engines used (including any search options or settings chosen), together with dates of searches.
- Any update to searches undertaken during the conduct of the review should be reported and justified.
- A clear account of grey literature and supplementary searches should be provided.
- Limitations due to, for example, language or publication date should be considered.
To achieve a rigorous evidence synthesis searches should be transparent and reproducible and minimise biases. A key requirement of a review team engaged in evidence synthesis is to try to gather a maximum of the available relevant documented bibliographic evidence in articles and the studies reported therein. Biases (including those linked to the search itself) should be minimized and/or highlighted as they may affect the outputs of the synthesis (Petticrew & Roberts, 2006; EFSA, 2010; Higgins & Green 2011). Failing to include relevant information in an evidence synthesis could significantly affect and/or bias its findings (Konno & Pullin 2020). This section assumes that full planning has previously been undertaken and the Protocol sets out the plan for the search.
A step-by-step overview of the search process for evidence synthesis is illustrated in Figure 3.1 (Section 3). The planning section (Subsection 3.2) should be read and used in combination with this Section and the overlap is intentional.
5.2 Conducting the Search
Once the search terms and strategy have been reviewed and agreed in the published Protocol, the review team can conduct the search by implementing the whole search strategy.
5.2.1 Prioritizing bibliographic sources
Glanville et al. (in press) suggests that the Review Team should start the search using the source where the largest number of relevant papers are likely to be found, and subsequent searches can be constructed with the aim to complement these first results. Sources containing abstracts allow greater understanding of relevance and should be given priority. Combined with the use of the test-list, ordering the use of sources may allow the Review Team to find the largest number of relevant articles early during the search, which is useful when time and resources are limited. Searching the grey literature can be conducted in parallel with searches in sources of indexed documents.
5.2.2 Modifying the search string
The list of search terms needs to be combined into search strings that retrieve as many relevant results as possible (exhaustiveness) while also limiting the number of irrelevant results (precision). This will first be done at the Protocol stage (see Section 3). However, search strings needs to be modified (usually simplified) to match the functionality of each electronic bibliographic source to be searched (e.g. Haddaway et al. 2015). To modify the string, the team should consult the syntax available in the help pages of the bibliographic sources, including details of the limitations on use of Booleanoperators, where applicable. All modification should be fully recorded and reported.
The search syntax is the set of options provided in the interface of the bibliographic source to achieve searches. The syntax options can usually be found in the help pages of the bibliographic source interface.
Typical syntax features are listed below and will vary by interface:
- Wildcards and truncation: symbols used within words or at the end of the root of the word to signal that the spelling may vary. Wildcards are useful within words to capture British and US spelling variants, for example ‘behavi?r’ in some interfaces will retrieve records containing ‘behaviour’ as well as ‘behavior’. As well as wildcards within words, many interfaces offer truncation options at the end of word stems. Truncation can help with identifying words with plural and various grammatical forms. For example, ‘forest*’ in some bibliographic sources will retrieve records containing forest, forests, forestry, forestal… Some options can also be further defined, for example in the Ovid interface ‘forest$1’ can be used to restrict searches to words with no or one extra character.
- Parentheses are used, where provided, to group search terms together (e.g. a set of synonyms linked by a Boolean operator, see below) and they determine the sequence in which search operations will be carried out by the interface. Search string operations within parentheses are, typically, carried out before those that are not enclosed within parentheses. In complex search strings, nesting of groups of search terms within different sets of parentheses may be helpful, and the search operation is then performed first on the search terms that are within the innermost set of parentheses. In this sense, parentheses as used in search strings function in a similar way to those used in mathematical calculations. For example: (road*OR railway*) AND (killing OR mortality) (for more explanations about OR, see Boolean operators below).
- Phrase searching: Some database interfaces allow words to be grouped and searched as phrases by using, for example, double quotation marks. For example, “organic farming”, “tropical forest”.
- Lemmatization: lemmatization involves the automated reduction of words to their respective “lemmas” (roots). For example, the lemma for the words “computation” and “computer” is the word “compute”. When using defense as a search term, it would also find variants such as defence. Lemmatization can reduce or eliminate the need to use wildcards to retrieve plurals and variant spellings of a word, but it may also retrieve irrelevant variants (e.g. cite as a search term may retrieve articles with citing, cities, cited and citation, Web of Science helpfile). Web of Science automatically applies lemmatization rules to Topic and Title search queries. This facility is not available in all interfaces.
5.2.3 Refining the results
The finalised search extracts a first pool of articles that is a mixture of relevant and irrelevant articles, because the search, in trying to capture the maximum number of relevant papers, inevitably captures other articles that do not attempt to answer the question. Screening the outputs of the search for eligibility will be done by examining the extracted papers at title, abstract and full-text (See Section 6). If the volume of search results is too large to process within available resources, the Review Team may consider using some tools provided by some electronic databases (e.g. Web of Science) to refine the results of the search by categories (e.g. discipline, research areas) in order to discard some irrelevant articles prior to extracting the final pool of articles and thus lower the number of articles to be screened. There is a real risk in using such tools, as removing articles based on one irrelevant category may remove relevant papers that also belong to another relevant category. This can occur because categories characterise the journal rather than each article and because we are relying on the categories being applied consistently. As a consequence, using refining tools provided by electronic bibliographic sources should be done with great caution and only target categories that are strongly irrelevant for the question (e.g. excluding PHYSICS APPLIED, PERIPHERAL VASCULAR DISEASE or LIMNOLOGY in a search about reintroduction or release of carnivores). Using these tools on the results of a search should not change the number of articles of the test list that have been successfully retrieved. The test-list is again an indicator of the performance of the strategy when using such tools. If the Review Team do decide to use such tools, they should report all details of tools used to refine the outputs of the search prior to screening in the evidence synthesis Protocol and discuss the limitations of the approach they have used.
5.2.4 Searching for grey literature
More and more documents are being indexed including those in the grey literature (Mahood et al. 2014). Nevertheless, conducting a search for grey literature requires time and the authors should assess the need to include it or not in the synthesis (Haddaway and Bayliss 2015). Repeatability and susceptibility to bias should be assessed and reported as much as possible.
Bibliographic tools for grey literature
There are some databases or platforms which reference grey literature. INIST (Institute for Scientific and Technical Information, France) holds the European OpenSIGLE resource (opensigle.inist.fr), which provides access to all the SIGLE records (System for Information on Grey Literature), new data added by EAGLE members (the European Association for Grey Literature Exploitation) and information from Greynet. There are also some programs which can help to make web-based searches for grey literature more transparent, a practice that is part of “scraping methods” (Haddaway 2015). Examples of sources available for grey literature:
- BASE (https://www.base-search.net) allows the selection of document types and provides the option to focus on unpublished material
- eu provides access to more than 700.000 bibliographical references of grey literature produced in Europe.
- Zenodo is an open-access repository initially linked to European projects. It welcomes research outputs from all over the world and all disciplines, including grey literature. It allows search by keywords and includes publications, thesis, datasets, figures, posters, etc.
Examples of sources providing access to theses and dissertations include: DART-Europe (free); Open Access Theses and Dissertations (free); ProQuest Dissertations and Theses (http://pqdtopen.proquest.com/, upon subscription); OAISTER; EThOS (British Library, free); WorldCat.org (free); OpenThesis.org (free, dissertations/theses, but does include other types of publications). Further resources can be found at http://www.ndltd.org/resources/find-etds. Individual universities frequently provide access to their thesis collections.
Websites of organisations and professional networks
Many organisations and professional networks make documents freely available through their web pages, and many more contain lists of projects, datasets and references. The list of organisations to be searched is dependent upon both the subject of the evidence synthesis and any regional focus (see examples in Land et al. 2013; Ojanen et al. 2014; Soderström et al. 2014; Bottrill et al. 2014). Many websites have a search facility but their functionality tends to be quite limited and must be taken into consideration when planning for the time allocated to such task.
- TROPENBOS is a non-governmental agency created in the Netherlands in 1986. It contributes to the establishment of research programmes in tropical forestry and it has its own website with many documents, including proceedings of workshops, books and articles that contain useful datasets and references. tropenbos.org
- Databases such as ScienceResearch.com and AcademicInfo.net, contain links to hand-selected sites of relevance for a given topic or subject area and are particularly useful when searching for subject experts or pertinent organisations, helping to focus the searching process and ensure relevance.
Asking authors, experts and colleagues
Direct contact with knowledge-holders and other stakeholders in networks and organisations may be very time-consuming but may allow collection of very relevant articles (Bayliss & Beyer 2015; Schindler et al. 2016). This can be especially useful to help access older or unpublished data sources, when the research area is sensitive to controversy (e.g. GMO, Frampton, pers. comm.) or when resources are limited (Doerr et al. 2015). This may also help enable access to articles written in languages other than English.
Search engines (e.g. Google, Yahoo) cannot index the entire web, and they differ widely in the order of their results. They all have their own algorithms favouring different criteria and both retrieval and ranking of results may be affected by the location, the device used to search (mobile, desktops), the business model of the search engine and commercial purposes. It is important to use more than one search engine to increase chance to identify relevant papers. Google Scholar is often used to scope for existing relevant literature but it cannot be used as a standalone resource for evidence synthesis (see 1.3.2, Bramer et al. 2013; Haddaway et al. 2015)
5.2.5 Additional approaches: hand-searching, snowballing and citation searching
Hand-searching is a traditional (pre-digital) mode of searching which involves looking at all items in a bibliographic source rather than searching the publication using search terms. Hand-searching can involve thoroughly reading the tables of contents of journals, meeting proceedings or books (Glanville, in press).
Snowballing and citation searching (also referred to as ‘pearl growing’, ‘citation chasing’, ‘footnote chasing’, ‘reference scanning’, ‘checking’ or ‘reference harvesting’) refer to methods where the reference lists contained within articles are used to identify other relevant articles (Sayers 2007). Citation searching (or ‘reverse snowballing’) uses known relevant articles to identify later publications which have cited those papers on the assumption that such publications may be relevant for the review.
Using these methods depends on the resources available to the Review Team (access to sources, time). Hand-searching is rarely at the core of the search strategy, but snowballing and citation searching are frequently used (e.g. McKinnon et al. 2016). Recent developments in some bibliographic sources automatically highlight and allow the user to link, to cited and related articles when viewing (e.g. when scanning Elsevier journals, or when downloading full-text PDF). This may be difficult to handle as those references may or may not have been found by the systematic approach using search strings and may have to be reported as additional articles. The use of those methods and their outputs should be reported in detail in the final evidence-synthesis.
5.3 Managing References and Recording the Search
Good documenting, recording and archiving of searches and their resulting articles may save a substantial amount of time and resource by reducing duplication of results and enabling the search to be re-assessed or amended easily (Higgins & Green 2011). Good recording ensures that any of the limitations of the search are explicit and hence allows assessment of any possible consequences of those limitations on the synthesis’ findings. Good archiving enables the Review Team to respond to the queries about the search process efficiently. If a Review Team is asked why they did not include an article in their review, for example, proper archiving of the workflow will allow the team to check whether the article was detected by the search, and if it was, why it was discarded.
Good documenting, recording and archiving has two main aspects: (1) the clear recording of the search strategy and the results of all of the searches (records) and (2) the way the search is reported in the evidence synthesis Protocol and final report. Reporting standards keep improving (see a comparative study in Mullins et al. 2014) and many reporting checklists exist to help Review Teams (Rader et al. 2014). See Section 10 for CEE reporting standards.
5.3.1 Keeping track of the search strategy and recording results
The Review Team should document its search methodology in order to be transparent and to be able to justify their use of a search term or the choice of resources. Enough detail should be provided to allow the search to be replicated including the name of the database, the interface, the date of the search and the full search with all the search terms, which should be reported exactly as run (Kugley et al. 2016). The search history and number of articles retrieved by each search should be recorded in a logbook or using screenshots and may be reported in the final evidence synthesis (e.g. as supplementary material). The number of articles retrieved and screened and discarded should be recorded in a flow diagram (see ROSES template) and this should accompany the reporting of the search and eligibility screening stages within an evidence-synthesis report.
For internet searches, reviewers should record and report the URL, the date of the search, the search strategy used (search strings with all options making the search replicable), as well as the number of results of the search, even if this may not be easily reproducible. Saving search results as HTML pages (possibly as screenshots to allow archiving that can be perused later even if the webpage has changed in the meantime) provides transparency for this type of search (Haddaway et al. 2017). Recording searches in citation formats (e.g. RIS files) makes them compatible with reference or review management software and allows archiving for future use.
5.3.2 Reporting the final search strategy and findings
Although the search strategy will have been listed in the Protocol, the searches as finally run should be reported in the final evidence synthesis report, possibly as additional files or supplementary information, since the search as finally run may be different from the Protocol. The final synthesis reports the results and performance of the search. Minor amendments to the Protocol (e.g. adding or removing search terms) should be reported in the final synthesis, but the search should not be substantially changed once approved by reviewers (but see below).
The Review Team may report the details of each search string and how it was developed (e.g. Bottrill et al. 2014) and whether the strategy has been adjusted to the various databases consulted (e.g. Land et al. 2013, Haddaway et al. 2015) or developed in several languages (e.g. Land et al. 2013). Limitations of the search should be reported as much as possible, including the range of languages, types of documents, time-period covered by the search, date of the search (e.g. Land et al. 2013; Söderström et al. 2014), and any unexpected difficulty that impacted the search compared to what was described in the Protocol (e.g. end of access, Haddaway et al. 2015).
5.4 Updating and Amending Searches
Updating or amending a search may be conducted by the same Review Team that undertook the initial searches, but this is not always the case. Therefore, it is important that the original searches are well documented and, if possible, libraries (e.g. EndNote databases) of retrieved articles are saved (and, if possible, reported or made available) to ensure that new search results can be differentiated from previous ones, as easily as possible.
There are two main reasons why a search needs to be changed. The first may occur when the evidence synthesis extends over a long time period (for instance more than 2 years) and the publication rate of relevant documents on the topic is high. In this case, the conclusions of the review may be out of date even before it is published. It is recommended that the search is rerun using the same search strategy (Bayliss et al. 2016) for the time period elapsed subsequent to the end of the initial search and before the report is finalised. The second case occurs when the evidence synthesis final report has already been published, and there is a need for revision because new primary research data or developments have subsequently been published and need to be taken into account. In this case the search Protocol should be checked to identify whether new search terms need to be added or additional sources need to be searched. Deciding whether a new Protocol needs to be published will depend on the extent of the amendments and may be discussed with the Collaboration for Environmental Evidence. From the moment a search is completed, new articles may be published as research effort is dynamic.
There are a number of issues that need to be considered when updating a search:
- Do you have access to the original search strings, sources, and can you read these files (proper software available)?
- Was the original search Protocol adequate and appropriate or does it need revising?
- Do you know when the initial search took place and which time boundaries were set up at that time? If not, can you contact the authors to get those details?
- If relevant, do you have similar details regarding searches in grey literature?
- Do you have access to the same sources of documents (e.g. database platforms), including institutional websites, subscriptions?
- Will the same languages be used?
Then the revised (or original) strategy may be run (Bayliss et al. 2016). As with the original searches, it is important to document clearly any updates to the searches, their dates, and any reasons for changes to the original searches, most typically in an appendix. If the new search differs from the initial one, a new Protocol may need to be submitted before the amendment is conducted (Bayliss et al. 2016).