Meta-Analysis Random Effect

Meta-Analysis Random Effect#

Random-effects meta-analysis accounts for variability in effect sizes across studies by assuming each study estimates a different effect drawn from a shared distribution, thus modeling both within-study error and between-study heterogeneity.

Graphical Summary#

Fig

Key Formula#

In random-effects meta-analysis, we assume that the true effect sizes vary across studies. The weighted mean effect size is calculated as:

\[\hat{\beta} = \frac{\sum_{k=1}^{K} w_k^* \hat{\beta}_k}{\sum_{k=1}^{K} w_k^*}\]

Where:

\(\hat{\beta}\) is the combined effect estimate across all studies
\(\hat{\beta}_k\) is the effect estimate from study \(k\)
\(w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}\) is the random-effects weight for study \(k\)
\(\tau^2\) is the between-study variance (heterogeneity)
\(K\) is the number of studies

The key difference from fixed-effects is that weights now include \(\tau^2\), which accounts for true heterogeneity between studies.

Technical Details#

Heterogeneity#

Heterogeneity refers to the variability in effect sizes across different studies beyond what we’d expect from random sampling error alone. In other words, are the studies telling us the same story, or are they finding genuinely different effects? The source of heterogeneity comes from different genetic ancestries, study designs or measurement methods, and environmental context, etc.

When heterogeneity is present, assume each study estimates its own true effect drawn from a common distribution.

Assumption: True effects follow a distribution

\[ \beta_k \sim N(\beta, \tau^2) \]

Where:

\(\beta\) = mean effect across all possible studies (what we want to estimate)
\(\tau^2\) = between-study variance (how much true effects vary across studies)
Each study \(k\) has its own true effect \(\beta_k\)

Measuring Heterogeneity#

\(I^2\) statistic: The most intuitive measure - tells us what percentage of the observed variation comes from real differences between studies rather than random chance.

\[ I^2 = \max\left(0, \frac{Q - (K-1)}{Q} \times 100\%\right) \]

Interpretation:

\(I^2 = 0\%\): Studies are consistent - variation is just due to random sampling
\(I^2 = 25\%\): Low heterogeneity - studies are mostly similar
\(I^2 = 50\%\): Moderate heterogeneity - some real differences between studies
\(I^2 = 75\%\): High heterogeneity - studies are finding quite different effects

Example: If \(I^2 = 60\%\), this means 60% of the variation we see across studies reflects real differences in effect sizes, while only 40% is due to random sampling error.

Comparison Between Fixed and Random Effects#

Aspect	Fixed Effect	Random Effect
Assumption	All studies estimate the same true effect	Studies estimate different true effects from a distribution
Weight Formula	\(\frac{1}{\text{SE}_k^2}\) (only within-study variance)	\(\frac{1}{\text{SE}_k^2 + \tau^2}\) (within-study + between-study variance)
When to Use	Studies are very similar in design and population	Significant heterogeneity between studies
Results	Narrower confidence intervals, give the effect in the populations	Wider confidence intervals, give the average effect across broader populations

Example#

We simulate two studies with different true effect sizes (drawn from a distribution with τ² = 0.3) to demonstrate random-effects meta-analysis. This scenario mimics real heterogeneity where populations differ in genetic backgrounds or environmental factors, making the random effects model more appropriate than fixed effects.

Setup#

rm(list=ls())
set.seed(18)

# Simulate 2 diverse cohorts where TRUE EFFECT SIZES are drawn from a distribution
K <- 2  # Number of studies
N <- c(5000, 8000)  # Different sample sizes

# Different MAFs reflecting population diversity
mafs <- c(0.25, 0.40)

# RANDOM EFFECTS MODEL: True effect sizes drawn from distribution
# This is the key difference - betas are RANDOM, not fixed
beta_mean <- 1.0      # Mean effect size across all possible studies
tau_squared_true <- 0.3  # True between-study variance

# Draw true effect sizes from distribution
true_betas <- rnorm(K, mean = beta_mean, sd = sqrt(tau_squared_true))

Then we generate the data for each study and create a summary table.

# Generate data for each study
studies_data <- list()
for(i in 1:K) {
  # Generate genotypes
  genotypes <- rbinom(N[i], 2, mafs[i])
  
  # Generate phenotypes using the RANDOM true effect
  phenotypes <- true_betas[i] * genotypes + rnorm(N[i], 0, 3)
  
  # Run regression
  lm_result <- lm(phenotypes ~ genotypes)
  
  # Store results
  studies_data[[i]] <- list(
    study_id = i,
    n = N[i],
    maf = mafs[i],
    true_beta = true_betas[i],
    observed_beta = coef(lm_result)["genotypes"],
    se = summary(lm_result)$coefficients["genotypes", "Std. Error"]
  )
}

# Create summary table
studies <- data.frame(
  Study = 1:K,
  N = sapply(studies_data, function(x) x$n),
  MAF = sapply(studies_data, function(x) x$maf),
  True_Beta = sapply(studies_data, function(x) x$true_beta),
  Observed_Beta = sapply(studies_data, function(x) x$observed_beta),
  SE = sapply(studies_data, function(x) x$se)
)
studies$P_Value <- 2 * pnorm(-abs(studies$Observed_Beta / studies$SE))
studies

A data.frame: 2 × 7
Study	N	MAF	True_Beta	Observed_Beta	SE	P_Value
<int>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>
1	5000	0.25	1.507443	1.507900	0.06967405	7.1981e-104
2	8000	0.40	1.998400	1.973478	0.04827749	0.0000e+00

Heterogenity#

We first assess the heterogeneity for this meta-analysis:

# Calculate inverse variance weights for heterogeneity testing
w <- 1 / studies$SE^2
studies$Weight <- w / sum(w)

# Calculate naive weighted average (fixed-effects estimate)
beta_naive <- sum(studies$Observed_Beta * w) / sum(w)

# Calculate Q statistic (test for heterogeneity)
Q <- sum(w * (studies$Observed_Beta - beta_naive)^2)
df <- K - 1
p_heterogeneity <- 1 - pchisq(Q, df)

# I^2 statistic (percentage of variation due to heterogeneity)
I_squared <- max(0, (Q - df) / Q) * 100

cat("Heterogeneity Statistics:\n")
cat("Q statistic:", round(Q, 3), "\n")
cat("p-value for heterogeneity:", round(p_heterogeneity, 4), "\n")
cat("I^2 statistic:", round(I_squared, 1), "%\n\n")

if(I_squared > 25) {
  cat("HETEROGENEITY DETECTED - Random-effects model is appropriate!\n")
} else {
  cat("Low heterogeneity detected\n")
}

Heterogeneity Statistics:
Q statistic: 30.168 
p-value for heterogeneity: 0 
I^2 statistic: 96.7 %

HETEROGENEITY DETECTED - Random-effects model is appropriate!