How to Do a Meta-Analysis
What Meta-Analysis Adds to Systematic Reviews
Meta-analysis is the quantitative component of a systematic review. While a systematic review identifies and evaluates all relevant studies on a topic, the meta-analysis takes the additional step of statistically combining their results. Not all systematic reviews include a meta-analysis, as pooling is only appropriate when the included studies are sufficiently similar in design, population, intervention, and outcome measurement. When studies are too heterogeneous, narrative synthesis is more appropriate than statistical combination.
The primary advantage of meta-analysis is increased precision. Individual studies, especially small ones, have wide confidence intervals that make it difficult to determine whether an effect is real or due to chance. By combining data across studies, meta-analysis narrows the confidence interval around the pooled estimate, providing a clearer picture of the true effect size. This increased precision can reveal statistically significant effects that no single study had the power to detect.
Step 1: Define the Research Question
A well-defined question is essential because it determines which studies are eligible for inclusion and what outcome measures will be pooled. The PICO framework (Population, Intervention, Comparison, Outcome) provides structure. For example, a meta-analysis might ask whether cognitive behavioral therapy (Intervention) reduces symptoms of depression (Outcome) in adults with major depressive disorder (Population) compared to waitlist controls (Comparison). Specificity in the question prevents scope creep and ensures that the included studies are genuinely comparable.
Step 2: Conduct the Systematic Search and Select Studies
The search process follows the same rigorous standards described for systematic reviews. Comprehensive, reproducible searching across multiple databases minimizes the risk of missing relevant studies. After screening, the final set of included studies should share enough methodological common ground to make statistical pooling meaningful. Studies must report the same type of outcome measure, or their outcomes must be convertible to a common metric.
Step 3: Extract Effect Sizes
The effect size is the standardized measure of the treatment effect reported or calculated from each study. Common effect size metrics include the standardized mean difference (Cohen d or Hedges g) for continuous outcomes, the odds ratio or risk ratio for binary outcomes, and the correlation coefficient for association studies. Each effect size is accompanied by a measure of its precision, typically the standard error or the confidence interval, which reflects the sample size and variability within the study.
When studies report results in different formats, the meta-analyst must convert them to a common effect size metric. Formulas exist for converting between most common metrics, such as converting t-statistics to Cohen d values. When raw data are not reported, authors may need to be contacted for additional information.
Step 4: Choose a Statistical Model
The fixed-effect model assumes that all studies estimate the same true effect, and that variation between study results is due entirely to sampling error. This model is appropriate when studies are highly similar in design and population. The random-effects model assumes that the true effect varies across studies due to genuine differences in populations, interventions, or contexts, and estimates both the average effect and the variance between studies. Random-effects models are more conservative, producing wider confidence intervals, and are generally preferred when any meaningful heterogeneity is expected.
The choice of model affects the weights assigned to each study. Under fixed-effect models, larger studies receive proportionally more weight. Under random-effects models, the weights are more balanced because smaller studies are assumed to estimate somewhat different effects than larger ones.
Step 5: Assess Heterogeneity and Publication Bias
Heterogeneity, the variation in effect sizes across studies, is quantified using statistics like Cochran Q (which tests whether heterogeneity is statistically significant) and I-squared (which estimates the percentage of variation due to genuine differences rather than chance). An I-squared value above 50 percent generally indicates substantial heterogeneity, and values above 75 percent indicate considerable heterogeneity. When heterogeneity is high, subgroup analyses and meta-regression can explore whether specific study characteristics (such as study quality, sample characteristics, or intervention dosage) explain the variation.
Publication bias is assessed using funnel plots, which graph each study effect size against its precision. In the absence of bias, the plot should be symmetrical. Asymmetry suggests that small studies with non-significant results may be missing from the evidence base. Statistical tests like Egger test and trim-and-fill analysis complement visual inspection. Sensitivity analyses that exclude outlier studies or low-quality studies help assess the robustness of the pooled estimate.
Presenting Results
Meta-analysis results are typically displayed in a forest plot, which shows each study effect size as a square (sized proportionally to its weight), its confidence interval as a horizontal line, and the pooled estimate as a diamond at the bottom. Forest plots provide a clear visual summary of the evidence, showing both the individual study contributions and the overall conclusion. The pooled effect should be reported with its confidence interval, significance level, and measures of heterogeneity.
Advanced Meta-Analytic Techniques
Network meta-analysis (also called mixed-treatment comparison) extends traditional meta-analysis by comparing multiple treatments simultaneously, even when some pairs of treatments have never been directly compared in a head-to-head trial. By combining direct evidence (from trials that compared treatments A and B) with indirect evidence (inferred from trials comparing A to C and B to C), network meta-analysis produces a ranking of all available treatments and estimates the probability that each is the best option. This technique is particularly valuable for clinical decision-making when multiple treatment options exist.
Individual participant data (IPD) meta-analysis uses the raw data from each included study rather than relying on published summary statistics. This approach allows more sophisticated analyses, including the examination of treatment effect modifiers at the individual level, standardization of outcome definitions across studies, and proper handling of missing data. IPD meta-analyses are considered the gold standard for evidence synthesis, but they require cooperation from original study authors and substantial additional effort to obtain, clean, and harmonize data from multiple sources.
Cumulative meta-analysis adds studies one at a time in chronological order and recalculates the pooled estimate after each addition. This technique reveals how the evidence has evolved over time and can identify the point at which sufficient evidence had accumulated to establish a reliable conclusion. Cumulative meta-analyses sometimes reveal that definitive evidence was available years before it was recognized, highlighting the potential waste of resources from redundant studies conducted after the question had already been answered.
Meta-analysis software and tools have become increasingly accessible. Free programs such as RevMan (from the Cochrane Collaboration), OpenMeta-Analyst, and the meta and metafor packages in R provide comprehensive functionality for conducting meta-analyses. Commercial software like Comprehensive Meta-Analysis offers user-friendly interfaces for researchers less comfortable with statistical programming. Regardless of which tool is used, transparency about analytical choices and sensitivity analyses showing how results change under different assumptions are essential for credible meta-analytic work.
Meta-analysis combines study results statistically to produce more precise and reliable estimates than any individual study. Its value depends on the quality of the underlying systematic review and the appropriateness of pooling, making transparent methods and rigorous assessment of heterogeneity and bias essential.