**Section 8**

**Data Synthesis**

*Last updated: June 25th 2023*

For CEE Standards for conduct and reporting of data synthesis click here.

** 8****.1 Developing data synthesis methods**

Data synthesis refers to the collation of all relevant evidence identified in the Systematic Review in order to answer the review question. A narrative synthesis of the data should always be planned involving listing of eligible studies and tabulation of their key characteristics and outcomes. For Systematic Reviews, if evidence is available in a suitable format and quantity then a quantitative synthesis, such as aggregating by meta-analysis, may also be planned. The likely form of the data synthesis may be informed by the previous pilot-testing of data extraction and critical appraisal steps. For example, the Review Team may identify whether the studies reported in the articles are likely to be of sufficient quality to allow relatively robust statistical synthesis and what sorts of study designs are appropriate to include. This pilot-testing process should also inform the approach to the synthesis by allowing, for example: the identification of the range of data types and methodological approaches; the determination of appropriate effect size metrics and analytical approaches (e.g. meta-analysis or qualitative synthesis); and the identification of study covariates. This section includes an overview of different forms of synthesis, narrative, quantitative and qualitative. All Systematic Reviews should present some form of narrative synthesis and many will contain more than one of these approaches (e.g. Bowler et al. 2010). It is not the intention here to give detailed guidelines on synthesis methods since each has its own supporting literature. This Section concentrates on how to make decisions on the correct form of synthesis to conduct.

**8.2 Systematic Reviews**

### 8.2.1 Narrative synthesis

Narrative synthesis is the tabulation and/or visualisation (often with descriptive statistics) of the findings of individual primary studies with supporting text to explain the context. A narrative synthesis is often viewed as preparatory when compared with quantitative synthesis and this may be true in terms of application of analytical rigour and lack of statistical power but narrative synthesis has advantages when dealing with broader questions and disparate outcomes. Often narrative synthesis is the only option when faced with a pool of disparate studies of relatively high susceptibility to bias, but such syntheses also accompany quantitative syntheses in order to provide context and background and help characterise the full evidence base. Some form of narrative synthesis should be provided in any Systematic Review, simply to present the context and overview of the evidence. A valuable guide to the conduct of narrative synthesis is provided by Popay (2006).

Narrative synthesis requires the construction of tables, developed from data coding and extraction forms (see Section 8) that provide details of the study or population characteristics, data quality, and relevant outcomes, all of which should have been defined a priori in the Protocol. Narrative synthesis should include a statement of the measured effect reported in each study and the Review Team’s assessment of study validity (including internal and external validity). Where the validity of studies varies greatly, reviewers may wish to give greater weight to some studies than others. In these instances it is vital that the studies have been subject to standardised a priori critical appraisal with the value judgments regarding both internal and external validity clearly stated. Ideally these will have been subject to stakeholder scrutiny at the Protocol stage. The level of detail employed and emphasis placed on narrative synthesis will be dependent on whether other types of synthesis are also employed. An example of an entirely narrative synthesis (Davies et al. 2006) and a narrative synthesis that complements a quantitative synthesis (Bowler et al. 2010) are available in the CEE Library.

Use of simple vote counting as a form of synthesis (e.g. comparing how many studies showed a positive versus negative or neutral outcome based on statistical significance of the results) should be avoided. Vote counting is misleading because this procedure does not take into account differences in study validity and power. Moreover, vote-counting does not provide an estimate of the magnitude of the effect in question. Whilst tabulation may make it easy for the reader to vote count, the authors should avoid its use in developing and reporting their findings.

Recording of key characteristics of each study included in a narrative synthesis is vital if the Systematic Review is to be useful in summarising the evidence base. Key characteristics are normally presented in tabular form and a minimum list is given below.

- Article reference
- Subject population
- Intervention/exposure variable
- Setting/context
- Outcome measures
- Methodological design
- Relevant reported results

It should be noted here that the interpretation of the results provided by the authors of the study is normally not summarised as this could simply compound subjective assessments or decisions.

### 8.2.2 Quantitative data synthesis

Usually, when attempting to measure the effect of an intervention or exposure, a quantitative synthesis is desirable to make the best use of the available data and optimise the power of the analysis and the precision of the effect estimate. A quantitative data synthesis estimates the overall mean and variance of the effect (of an intervention or exposure) by weighting and aggregating the individual effect estimates from all individual studies included in the analysis. Quantitative data synthesis also enables robust investigation of the impacts of effect modifiers and other sources of heterogeneity (e.g. due to varying populations or environmental conditions) in the contributing studies.

Meta-analysis and meta-regression are now the most commonly used methods of quantitative data synthesis in the environmental sciences and there is a well-developed supporting literature (e.g. Gurevitch & Hedges 2001; Gates 2002; Borenstein et al. 2009; Koricheva et al. 2013). The most recent guidance for quantitative data synthesis in environmental sciences (Nakagawa et al. 2023) also provides supporting online guidance and training resources (e.g. https://itchyshin.github.io/Meta-analysis_tutorial/). Consequently, we provide here an overview rather than detailed guidance.

Commonly-used terms are explained in Table 8.1.

Table 8.1. Summary of terms commonly used in quantitative synthesis (Deeks et al. 2022 and Nakagawa et al. 2023)

Term |
Explanation |

Effect modifier | A variable that changes the association of an exposure with the outcome of interest. Where effect modification is present the effect size for the outcome of interest would differ according to the value of the effect modifier (e.g. with the age or the sex of the population if these are effect modifiers). Effect modification can be investigated by subgroup analysis (e.g. by conducting the analysis separately by age or sex) or by meta-regression. |

Effect size | A statistical estimate of the size of an effect (of an exposure or intervention) on a given outcome of interest. The effect size is commonly expressed as a standardised mean difference or risk difference (or ratio) of the outcome between the exposure/intervention group and a comparator (control) group. Effect size can also refer to the magnitude of change in the outcome of interest within a group, or the strength of association between two or more variables of interest. Effect size is also the response variable in a meta-analytic statistical model. |

Fixed-effect analysis | An analysis model which assumes that the true effect of an exposure/intervention is the same (in magnitude and direction) in every study, and therefore observed differences among the studies’ results are due solely to chance, not due to statistical heterogeneity. |

Heterogeneity | An indicator of consistency among effect sizes among the studies included in a quantitative data synthesis. The identification and explanation of heterogeneity is an important component of the data synthesis process. |

I^{2} statistic |
A statistic for estimating heterogeneity among observed effect sizes that describes the relative proportion of the observed variance that is likely to be explained by variation in true effects rather than by sampling error. |

Meta-analysis | A statistical method of quantitative data synthesis that estimates an overall mean effect size by aggregating effect sizes from two or more studies addressing the same question (i.e. studies with the same or a similar population, exposure/intervention, comparator and outcome). Formal (weighted) meta-analysis accounts for differences in the sample size between studies by assigning a weight to each study based on the study’s sampling variance. |

Overall mean effect | The mean effect size estimated across all the studies included in a quantitative data synthesis, based on a meta-analytic statistical model. The overall mean and variance of the effect of an intervention or exposure is obtained by weighting and aggregating the individual effect estimates from all individual studies that are included in the analysis. |

Meta-regression | A regression model which extends a meta-analytic model to include one or more moderator variables (i.e. effect modifiers), aiming to explain heterogeneity (quantified as R^{2}) and quantifying the effect of each moderator variable. |

Publication bias | The preferential publication of certain (e.g. positive or favourable) results such that other relevant evidence (e.g. negative or unfavourable results) remains unpublished. Publication bias should be accounted for where feasible at the data synthesis stage of evidence synthesis. |

Random-effects analysis | An analysis model which assumes that the true effect of an intervention/exposure varies across the included studies, usually according to a normal distribution, and therefore that observed differences among the studies’ results are due both to chance and to variation in intervention effects (i.e. heterogeneity). |

Risk of bias | The potential for bias within an individual study (as opposed to publication bias which is the potential for bias in the whole set of all included studies). As with publication bias, results of the critical appraisal stage of the systematic review (Section 7) should be considered at the data synthesis stage of the evidence synthesis. |

Sensitivity analysis | A set of statistical analyses that checks the robustness of one’s main analysis; if sensitivity analysis shows different results (qualitatively and/or quantitively), then we must doubt the robustness of the main findings. |

**Meta-analysis methods**

Meta-analysis provides summary (overall mean) effect sizes and measures of heterogeneity and explores reasons for heterogeneity. There are different methods for calculating effect sizes depending on the nature of the data (whether the data are comparisons between groups, single groups, or associations; the scale of the outcome measure (continuous, ordinal, dichotomous, time-to-event or count data); and whether the effect estimate is a difference or ratio) (Deeks et al. 2022; Nakagawa et al. 2023).

In quantitative data synthesis, generally each included study effect size is weighted in proportion to sample size or inverse proportion to the variance of its effect (i.e. more weight is given to large studies with precise effect estimates and less to small studies with imprecise effect estimates).

Quantitative data synthesis may not always be feasible. For instance, there may be insufficient quantitative data available in the required format for analysis; or the studies available for meta-analysis may be too dissimilar (heterogeneous) for quantitative pooling of their results to make sense. An important consideration is therefore to establish whether quantitative data synthesis should be conducted or not. In practice, this is likely to vary according to which outcomes are being assessed within an evidence synthesis since issues of data availability and heterogeneity often differ between outcomes.

If there is considerable variation in the results of the included studies (for a given outcome) then a quantitative data synthesis may be inappropriate unless the heterogeneity among studies can be taken into account (for further discussion of heterogeneity see below). If quantitative data synthesis is not conducted for a given outcome then a descriptive (narrative) synthesis should always be provided as an alternative.

**Meta-analysis models**

Estimation of overall effect estimates across the studies can be undertaken with fixed-effect or random-effects statistical models. Fixed-effect models estimate the overall effect of an intervention/exposure assuming there is a single true underlying effect across the studies, whereas random-effects models assume there is a distribution of effects that depend on study characteristics. Random-effects models include inter-study variability; thus, when there is heterogeneity, an effect estimate from a random-effects model usually has wider confidence intervals than an effect estimate from a fixed-effect model (NHS CRD 2001; Khan et al. 2003). When no heterogeneity among studies is present, a fixed-effect model will give the same result as a random-effects model. Random-or mixed-effects models (containing both random and fixed effects) may be appropriate for the analysis of environmental data because the numerous complex interactions common in environmental sciences are likely to result in heterogeneity between studies or sites.

Where data are limited, because there are few studies and/or the size of the studies is small, a random-effects model may fail to adequately estimate the amount of heterogeneity. Fixed-effect methods may perform better at estimating the overall effect size in such situations but at the expense of ignoring heterogeneity (Deeks et al. 2022).

Fixed-and random-effects models are widely used (Deeks et al. 2002). However, in a review of recently-published meta-analyses in environmental sciences, Nakagawa et al (2023) found that almost all meta-analyses included multiple effect estimates from the same study, without considering the non-independence of these effect estimates. Nagakawa et al. (2023) outline multilevel models that can account for dependence among effect sizes, as an alternative to random-effects models. CEE recommends that review teams should read Deeks et al. (2002) and Nakagawa et al. (2023) and carefully consider the strengths and limitations of the different approaches when justifying their choice of meta-analytic model.

The output of a quantitative data synthesis should include:

- The individual effect estimates for each included study and, where the studies are sufficiently homogeneous, the overall mean effect estimate (i.e. the pooled effect across the included studies).
- Where possible, a
**forest plot**should be provided to display the effect estimates. If there is considerable heterogeneity such that it is inappropriate to calculate an overall effect estimate then a forest plot may be presented without the overall effect estimate displayed. Forest plots should be clearly labelled to identify each study and the sample sizes of each group in each study. For substantial meta-analyses where there are too many studies to display in a forest plot an alternative approach could be to use an**orchard plot**(Nagakawa et al. 2023). It is important that the data displayed in forest or orchard plots can be clearly traced back to the source studies; data extraction tables included with the final review report should be clearly structured to facilitate this. - An investigation of heterogeneity of the effect estimates across studies (see next section below);
- A consideration of the risk of bias for each individual study included in the analysis (i.e.. results of the critical appraisal stage of the systematic review, Section 7, – see
**investigating sources of bias**below); - A consideration of potential publication bias across the whole body of evidence – see
**investigating sources of bias**below).

**Accounting for** **heterogeneity of effect estimates**

It is important to investigate heterogeneity among the effect sizes across the studies included in the meta-analysis to enable the interpretation of relevance of the study findings to the review question. Exploration of heterogeneity is important from a management perspective, as there is rarely a one-size-fits-all solution to environmental problems. However, investigation of heterogeneity requires an adequate sample size and may not be feasible if only small studies are available.

Exploration of heterogeneity consists of identifying whether heterogeneity is present; and then attempting to identify the cause(s) of the heterogeneity. Important factors that could produce variation in effect size should be considered when developing the review protocol (e.g. factors identified through stakeholder engagement) and defined a priori. These factors may include differing populations, interventions, outcomes, and methodology.

Heterogeneity of effect estimates may be identified:

- By analysing the studies according to subgroups pre-specified in the review protocol).
- By conducting meta-regression to identify the relative importance of moderator variables (effect modifiers) pre-specified in the protocol. Meta-regression aims to provide summary effects after adjusting for such specified study-level covariates.
- By considering post hoc the spread of effect estimates and their confidence intervals across the studies, which may be visualised in a forest plot (NB limitations of interpreting confidence intervals of effect estimates are discussed below).
- Using post hoc heterogeneity statistics such as Chi-squared or
*I*^{2}.

The methods and variables for subgroup analyses and meta-regression should be pre-specified in the review protocol where feasible to reduce the risk of over-interpretation of the data which could lead to erroneous conclusions. These analyses may be referred to as ‘sensitivity analyses’ since they explore how sensitive the overall effect estimate is to the influence of moderator variables. Note that sensitivity analyses must be interpreted with caution because statistical power may be limited (Type I errors possible) and multiple analyses of numerous subgroups could result in spurious significance (Type II errors possible). Results of within-subgroup statistical tests (e.g. p-values for intervention/exposure versus comparator) should not be used to infer between-subgroup differences (Deeks et al. 2022).

Where heterogeneity is detected (e.g. by the spread of effect estimates in a forest plot), the source(s) of heterogeneity may be explored by considering the features of the studies which differ (e.g. using tables that group studies with similar characteristics and outcomes together). Ideally, variation in effect sizes across studies can then be explored by meta-regression. However, for meta-regression to be feasible a minimum of 10 studies is usually required (Deeks et al. 2002). Alternative options could be (i) to exclude any studies identified as outliers on key study characteristics and then re-run the meta-analysis; or (ii) group studies together according to their similarity on specified study characteristics and run separate meta-analyses on those groups of studies. However, such post-hoc analyses risk introducing bias as they could allow selective data analysis. Therefore, the intended approach for exploring heterogeneity should be specified a priori in the review protocol and should be reported fully and transparently.

If insufficient data are available for a quantitative assessment of heterogeneity, a descriptive (narrative) consideration of heterogeneity should be provided where possible to highlight those factors that could explain variability among the studies but for which quantitative analysis is lacking.

Note that subgroup analyses and meta-regression are regarded as being observational rather than experimental evidence (Deeks et al. 2020) since their results may be influenced by unmeasured or non-analysed factors which differ between the studies.

Heterogeneity statistics such as Chi-squared and *I ^{2}* are useful for identifying the relative extent to which a meta-analytic model includes heterogeneity in the true effect estimates compared to heterogeneity arising due to chance. Such statistics should be used to interpret results of meta-analysis, but not for deciding on which meta-analytic model to (e.g. fixed or random-effects) to use (Deeks et al. 2022; Nagakawa et al. 2023).

**Investigating sources of bias**

Two potential sources of bias arising within the evidence base should be considered at the data synthesis stage of the systematic review, to ensure that any bias identified is taken into account when formulating the review’s conclusions. These are: (i) the risks of bias within each individual study included in the data synthesis (as assessed at the critical appraisal stage of the systematic review; Section 7); and (ii) bias arising from selective publication of certain types of studies (publication bias). Note that publication bias is not a property of an individual study (so it is not assessed at the critical appraisal stage of the systematic review) but occurs at the level of the whole evidence base (i.e. all studies together).

(i) Within-study risks of bias

Results of the critical appraisal of study validity (Section 7) should always be considered in the quantitative data synthesis, unless the critical appraisal identifies that there are no validity concerns for any of the studies. If sufficient data exist, meta-analyses should be undertaken on subgroups (subsets of studies) identified from the critical appraisal of study validity, e.g. grouping together studies deemed to be at low, medium or high risks of bias and investigating whether the effect estimate differs between these subgroups. An alternative approach is to use meta-regression using the study validity (or risk of bias) assessment class as a categorical variable.

(ii) Publication bias

Publication bias refers to the preferential publication of only some of the relevant studies, meaning that the published information may not be reflective of all information available on a topic, e.g. positive and/or statistically significant results are more likely to be available than non-significant or negative results. Where possible, the data synthesis should be accompanied by an exploration of possible effects of publication bias. There are a number of exploratory plots and tests for publication bias. One example is the funnel plot often accompanied by the Egger test (Egger et al. 1997). This approach aims to test for a relationship between the size and precision of study effects, plotted on the x- and y-axes of the funnel plot respectively. Another approach is to calculate the fail-safe number, which is the number of null result studies that would have to be added to a meta-analysis to lower the significance or the magnitude of the effect to a specified level (e.g. where it would be considered statistically or biologically non-significant). Wherever possible, grey literature and unpublished studies should be included in a meta-analysis to allow direct assessment of publication bias by comparison of effect sizes in published and unpublished studies.

In the event that a quantitative data synthesis is not feasible or appropriate for certain outcomes, the influence of the two types of bias described above should be considered in the narrative synthesis (and limitations of the review) for those outcomes.

**Interpreting test statistics**

Review teams should be aware of how to correctly interpret test statistics (e.g. those for assessing heterogeneity such as Chi-square or *I ^{2}*). Note that

*I*provides a relative estimate of heterogeneity not an absolute measure. To express the impact of heterogeneity on the effect estimate in absolute terms, ‘prediction intervals’ may be calculated (for details see Deeks et al. 2022 and Nagakawa et al. 2023).

^{2 }Non-overlapping confidence intervals for individual studies in a forest plot can usefully indicate the presence of heterogeneity among the studies. However, note that the confidence interval of the overall effect estimate in a random-effects meta-analysis does not describe the degree of heterogeneity among the studies; it instead describes uncertainty in the mean of systematically different effects in the different studies (Deeks et al. 2022).

**Face validity checks and quality assurance**

As with all stages of a systematic review, to minimise errors, data for analysis should be checked for their accuracy by a second reviewer (this might be conducted as part of the data extraction stage of the review). Results of the analysis should also be checked for their face validity, i.e. that the results are plausible and interpretable in a way that is meaningful to answer the review question.

**Reporting the data synthesis**

The quantitative data synthesis should follow methods which have been pre-specified in the review protocol. If any deviations from the protocol were required these should be reported with a clear justification.

The overall approach to the data synthesis should be clearly explained (it may be sufficient to cite the review protocol for this). If some or all of the outcomes of interest were analysed descriptively (narratively) rather than quantitatively an explanation should be provided.

The choice of meta-analytic model, e.g. frequentist or Bayesian, whether a fixed- or random-effects (or mixed) model or multi-level model, was used, and the approach for assessing heterogeneity, including subgroup analyses, meta-regression and/or heterogeneity statistics, should be justified (it may be sufficient to cite the review protocol for this information). Adequate information on the methods followed should be provided for the statistical approach to be reproducible by other researchers. The statistical software used should be stated and the statistical software code, if available, provided in an appendix to the review report.

The input data used in the quantitative data synthesis should be provided and should be traceable back to the individual studies (e.g. included as part of a data extraction table). If assumptions or imputations were required to account for missing data these should be clearly explained so it is clear which data were missing from which study groups and at which timepoints. Any calculations used to convert data in the included studies into a suitable format for meta-analysis should be provided with the data extraction table.

Both the risks of bias in individual studies and publication bias arising across the whole evidence base should be considered in the data synthesis where feasible. If either of these bias analyses are not feasible, an explanation should be provided.

Despite the attempt when undertaking meta-analyses to achieve objectivity in reviewing scientific data, considerable subjective judgment is involved. Subjective judgements include the choice of effect measure, the choice of analysis method, the question of which data are methodologically sound enough to be included, and of which sources of heterogeneity are important (Thompson 1994; Nakagawa et al. 2023). Reviewers should state these decisions explicitly to minimise bias and ensure transparency.

A statement of the strengths, limitations and uncertainties of the analysis approach should be provided, clarifying how these impact on the overall review conclusions. This should include a consideration of whether the results of the analysis are subject to any unexplained heterogeneity.