9 min read

How to identify and deal with outliers in meta-analysis

How to identify and deal with outliers in meta-analysis, using R.
How to identify and deal with outliers in meta-analysis

One of the first steps when analysing primary data is to visualise your results. This can help identify unusual trends, outliers, and data points that have a disproportionate influence on your results, which might not be clear by looking at summary statistics alone.

To demonstrate, let's have a look at a set of thirteen datasets from the datasaurus R package, which each contain x-values and y-values that all generate the same means and standard deviations, and roughly the same correlation coefficients.

install.packages("datasauRus")
library(datasauRus)
library(viridis)
if(requireNamespace("dplyr")){
  suppressPackageStartupMessages(library(dplyr))
  datasaurus_dozen %>% 
    group_by(dataset) %>% 
    summarize(
      mean_x    = mean(x),
      mean_y    = mean(y),
      std_dev_x = sd(x),
      std_dev_y = sd(y),
      corr_x_y  = cor(x, y)
    )
}

This will code generate the following output:

# A tibble: 13 x 6
   dataset    mean_x mean_y std_dev_x std_dev_y corr_x_y
   <chr>       <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
 1 away         54.3   47.8      16.8      26.9  -0.0641
 2 bullseye     54.3   47.8      16.8      26.9  -0.0686
 3 circle       54.3   47.8      16.8      26.9  -0.0683
 4 dino         54.3   47.8      16.8      26.9  -0.0645
 5 dots         54.3   47.8      16.8      26.9  -0.0603
 6 h_lines      54.3   47.8      16.8      26.9  -0.0617
 7 high_lines   54.3   47.8      16.8      26.9  -0.0685
 8 slant_down   54.3   47.8      16.8      26.9  -0.0690
 9 slant_up     54.3   47.8      16.8      26.9  -0.0686
10 star         54.3   47.8      16.8      26.9  -0.0630
11 v_lines      54.3   47.8      16.8      26.9  -0.0694
12 wide_lines   54.3   47.8      16.8      26.9  -0.0666
13 x_shape      54.3   47.8      16.8      26.9  -0.0656

Next, let's visualise these x-y relationships via a series of scatterplots:

if(requireNamespace("ggplot2")){
  library(ggplot2)
  library(viridis)
  dp <- ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
        geom_point()+
        theme_void()+
        theme(legend.position = "none")+
        facet_wrap(~dataset, ncol=3)
  dp + scale_colour_viridis_d(option = "plasma")
}

ds

Despite having similar statistical characteristics, the shape of the data was different between datasets (and now you probably understand why this package is called "datasauraus", if you look carefully).

Meta-analysis is no different. Typically, a meta-analyst will construct a forest plot to visualise effect sizes and their variances, which are used to synthesize data into a summary effect size. Here's an example meta-analysis and forest plot and the code to construct it, using a dataset from metafor and a meta-analysis and forest plot function from the meta package.

library(meta)
library(metafor)

dat <- dat.molloy2014 

datcor <-   metacor(ri,
                    ni,
                    data = dat,
                    studlab = paste(authors),
                    method.tau = "REML",
                    comb.random = TRUE,
                    comb.fixed = FALSE,
                    sm = "ZCOR")
                    
meta::forest(datcor, print.I2 = FALSE)  

datcor

## OUTPUT WITH SOME INFO OMITTED ##

Number of studies combined: k = 16

                        COR           95%-CI    z  p-value
Random effects model 0.1488 [0.0878; 0.2087] 4.75 < 0.0001

Quantifying heterogeneity:
 tau^2 = 0.0081 [0.0017; 0.0378]; tau = 0.0901 [0.0412; 0.1944]
 I^2 = 60.7% [32.1%; 77.2%]; H = 1.59 [1.21; 2.10]

Test of heterogeneity:
     Q d.f. p-value
 38.16   15  0.0009
 

Looking at the results of the meta-analysis, the random-effects model was statistically significant (p < .0001), with a summary effect size estimate of 0.1488. The test of heterogeneity was also statistically significant (p < .0009).

Now let's look at our forest plot. Of course, this is very subjective, but forest plots can identify studies that are worth closer inspection.

forest-1

On first glance, there aren't any studies that stand out in this meta-analysis, but it's always worth taking a more in-depth look at possible outliers and influential studies.

One approach for outlier detection is a "leave-one-out" analysis, which assesses the influence of individual studies by performing a series of meta-analyses that leave out one of the studies in the original meta-analysis. The general concept with this analysis is that by removing one study, you can observe how much the results change, thus demonstrating the effect of the inclusion of that particular study on your results.

It's worth noting that none of these tests within a leave-one-out analysis is a definitive test, as you're arbitrarily excluding one study per analysis that would ordinarily meet your inclusion criteria. So if your primary meta-analysis was not statistically significant but one of your leave-one-out meta-analyses were significant, you can't make big claims regarding the significance of your finding. Instead, consider this approach a sensitivity test.

Leave-one-out analysis can also help identify sources of heterogeneity. Perhaps there are two studies that seem to be influential—what is it about those studies that make them influential? Maybe they include participants from the same kind of population? Sometimes these factors don't become clear until you see these studies grouped together via leave-one-out analysis.

It's straightforward to perform a leave-one-out analysis using the metafor R package. The influence()function contains a suite of leave-one-out diagnostic tests, summarised in Viechtbauer & Cheung (2010), that you can run to help identify influential studies.

library(metafor) 

dat <- dat.molloy2014 

dat <- escalc(measure="ZCOR", 
              ri=ri, ni=ni, 
              data=dat.molloy2014, 
              slab=paste(authors, year, sep=", ")) 
             
res <- rma(yi, vi, data=dat) 
inf <- influence(res)
inf

# OUTPUT #

                          rstudent  dffits cook.d  cov.r 
Axelsson et al., 2009       0.2918  0.0485 0.0025 1.1331 
Axelsson et al., 2011       0.1196 -0.0031 0.0000 1.2595 
Bruce et al., 2010          1.2740  0.2595 0.0660 0.9942 
Christensen et al., 1999    1.4711  0.3946 0.1439 0.9544 
Christensen & Smith, 1995   0.8622  0.1838 0.0339 1.0505 
Cohen et al., 2004         -0.9795 -0.2121 0.0455 1.0639 
Dobbels et al., 2005        0.2177  0.0296 0.0010 1.1740 
Ediger et al., 2007        -0.9774 -0.3120 0.1001 1.1215 
Insel et al., 2006          0.7264  0.1392 0.0195 1.0561 
Jerant et al., 2011        -1.8667 -0.5861 0.2198 0.8502 
Moran et al., 1997         -1.4985 -0.2771 0.0756 1.0073 
O'Cleirigh et al., 2007     1.8776  0.4918 0.2148 0.8819 
Penedo et al., 2003        -1.1892 -0.2939 0.0859 1.0550 
Quine et al., 2012         -0.0020 -0.0423 0.0021 1.2524 
Stilley et al., 2004        0.8066  0.2126 0.0459 1.0907 
Wiebe & Christensen, 1997  -0.7160 -0.1656 0.0280 1.0853 
                          tau2.del  QE.del    hat  weight 
Axelsson et al., 2009       0.0091 37.7109 0.0568  5.6776 
Axelsson et al., 2011       0.0100 36.7672 0.1054 10.5396 
Bruce et al., 2010          0.0075 35.3930 0.0364  3.6432 
Christensen et al., 1999    0.0068 33.5886 0.0562  5.6195 
Christensen & Smith, 1995   0.0082 36.5396 0.0441  4.4069 
Cohen et al., 2004          0.0084 37.1703 0.0411  4.1094 
Dobbels et al., 2005        0.0094 37.6797 0.0714  7.1362 
Ediger et al., 2007         0.0084 36.1484 0.0889  8.8886 
Insel et al., 2006          0.0083 37.0495 0.0379  3.7886 
Jerant et al., 2011         0.0047 25.0661 0.1058 10.5826 
Moran et al., 1997          0.0077 35.6617 0.0369  3.6922 
O'Cleirigh et al., 2007     0.0059 31.9021 0.0511  5.1150 
Penedo et al., 2003         0.0080 36.3291 0.0587  5.8732 
Quine et al., 2012          0.0100 37.7339 0.0998  9.9778 
Stilley et al., 2004        0.0083 35.8385 0.0684  6.8403 
Wiebe & Christensen, 1997   0.0087 37.7017 0.0411  4.1094 
                             dfbs inf 
Axelsson et al., 2009      0.0481     
Axelsson et al., 2011     -0.0032     
Bruce et al., 2010         0.2623     
Christensen et al., 1999   0.3994     
Christensen & Smith, 1995  0.1837     
Cohen et al., 2004        -0.2112     
Dobbels et al., 2005       0.0296     
Ediger et al., 2007       -0.3128     
Insel et al., 2006         0.1387     
Jerant et al., 2011       -0.5430     
Moran et al., 1997        -0.2791     
O'Cleirigh et al., 2007    0.5059     
Penedo et al., 2003       -0.2941     
Quine et al., 2012        -0.0434     
Stilley et al., 2004       0.2125     
Wiebe & Christensen, 1997 -0.1642 

This analysis also provides a classification for what's considered influential. Look for an asterisk in the last column of the output, called 'inf' (this particular analysis had no influential studies, so there are no asterisks here). See the metafor documentation for what's considered influential, but here's an important point from this document worth emphasising:

Note that the chosen cut-offs are (somewhat) arbitrary. Substantively informed judgment should always be used when examining the influence of each study on the results.

You can also make the same plot (with slightly different names for the influence tests) using the dmetar package:

library(devtolls)
devtools::install_github("MathiasHarrer/dmetar")
library(dmetar)

inf.analysis <- InfluenceAnalysis(x = datcor, random = TRUE)

plot(inf.analysis, "influence")

inf_dmetar

If you want a nice visualisation for your leave-one-out analysis, you can pair the 'meta' and 'dmetar' packages:


library(dmetar)
library(meta)
library(metafor)

dat <- dat.molloy2014 

datcor <-   metacor(ri,
                    ni,
                    data = dat,
                    studlab = paste(authors),
                    method.tau = "REML",
                    comb.random = TRUE,
                    comb.fixed = FALSE,
                    sm = "ZCOR")
                    
inf.analysis <- InfluenceAnalysis(x = datcor,
                                 random = TRUE)
                                 
plot(inf.analysis, "es")

loo_plot-2

This plot is like a forest plot, but instead of visualising the effect size and variance for each study in each row, it visualises the summary effect sizes for meta-analyses without the study named in each row. The original summary effect size (including all studies) is shown as a dotted vertical line, and the 95% confidence interval of the original meta-analysis is shown by the green bounds. In this example, there isn't a big shift between the smallest correlation at the top and the largest correlation at the bottom.

Outliers just based on effect size are easier to eyeball from a forest plot, however, an outlier of heterogeneity is a little tricker to identify.

To better understand studies that are influential in terms of heterogeneity, you can also order the plot via I2, which is one measure of heterogeneity. While a single study effect size may not have a large influence on the summary effect size, it could have an impact on heterogeneity.

plot(inf.analysis, "I2")

loo_I2-1

Removing the Jerant et al study seems to have an appreciable effect on reducing study heterogeneity. To more closely explore this, we can construct a Baujat plot, which shows the overall heterogeneity contribution for each study against the influence on the pooled result for each study.

plot(inf.analysis, "baujat")

baujaut-1

Jerant et al is really sticking out now, as it has almost three times more influence on the pooled result compared to all other studies, and a comparatively high overall heterogeneity contribution. If we look back at our leave-one-out plot ordered by correlation, removing Jerant et al leads to the largest effect size, but it's not the largest by much. This indicates that this Jerant et al is pulling down the overall summary effect size estimate, but not by a large amount.

What do you do if your analysis identifies potential outlier studies?

I don't think that outliers shouldn't be removed from your primary analysis, unless they're implausible.

What do I mean by implausible?

To give a recent example, a man in the UK was put into the priority list for a COVID vaccine due to morbid obesity. Thinking that this may have been a mistake, as he considered his weight to be relatively normal, he followed this up to discover that the local authorities had him registered with a BMI of 28,000, because his height of 6ft 2ins was noted as 6.2cm instead. As BMI of 40 and over is considered 'morbidly obese', a BMI value of 28,000 is obviously implausible.

Leave-one-out analyses are better framed as robustness checks and as a way to identify sources of study heterogeneity that you may have missed otherwise. You've set your inclusion/exclusion criteria for your meta-analysis for a reason, so if a study fulfills criteria, then it should be included.

Outlier analyses are also a good check to make sure that you have entered numbers correctly. A common reason for mistakes in meta-analysis is that standard error and standard deviation get mixed up. Of course, this should be carefully checked for all studies, but outlier analysis is a good backstop, just in case.

It's worthwhile reporting your leave-one-out analysis, regardless of whether you discover outliers. If your result hangs on the inclusion of a single study, then your findings aren't terribly robust and this should be made clear in the discussion of your results.