Skip to contents

Overview

The OTTERS pipeline (Zhang et al. 2024) performs TWAS in two stages:

  1. Stage I (otters_weights): Train eQTL weights using multiple summary-statistics-based (RSS) methods.
  2. Stage II (otters_association): Compute per-method TWAS z-scores and combine p-values across methods using ACAT.

Loading real LD data

In real analyses, LD matrices are loaded from PLINK2 pgen files or pre-computed LD blocks using pecotmr’s data infrastructure. Here we show the loading pattern using chr22 data from the ADSP reference panel (the code is not evaluated here since the data paths are machine-specific):

library(pecotmr)

# LD metadata file points to PLINK2 pgen/pvar/psam files
ld_meta <- "~/Documents/ADSP_chr22_sketch/ld_meta_chr22.tsv"

# Define gene regions of interest
gene_regions <- c(
  "chr22:20000000-21000000",
  "chr22:25000000-26000000",
  "chr22:30000000-31000000"
)

# Option 1: Load LD on the fly via ld_loader() (memory-efficient)
loader <- ld_loader(ld_meta_path = ld_meta, regions = gene_regions)
R_gene1 <- loader(1)  # loads LD for first gene region on demand

# Option 2: Pre-load all blocks into a list
R_list <- lapply(gene_regions, function(reg) {
  ld <- load_LD_matrix(ld_meta, region = reg)
  ld$LD_matrix
})
loader_from_list <- ld_loader(R_list = R_list)

# Option 3: Load genotype matrix and compute LD with shrinkage
ld_data <- load_LD_matrix(ld_meta, region = gene_regions[1],
                          return_genotype = TRUE)
X <- ld_data$LD_matrix
R_shrunk <- compute_LD(X, shrinkage = 0.5)  # lassosum-style regularization

# eQTL summary stats are loaded similarly:
# eqtl <- load_rss_data(sumstat_path, column_file_path, region, n)

Simulating data for demonstration

We simulate a realistic scenario: load genotype matrices from real data (or simulate them), compute LD, then simulate eQTL effects and GWAS z-scores on top.

library(pecotmr)
set.seed(2024)

# Simulate 3 gene regions with realistic LD
# (In practice, these would come from ld_loader() as shown above)
n_eqtl <- 500
n_gwas <- 50000
n_genes <- 3
gene_p <- c(40, 50, 30)  # variants per gene

# Simulate genotype matrices and LD per gene
X_list <- lapply(gene_p, function(p) {
  X <- matrix(rbinom(n_eqtl * p, 2, 0.3), nrow = n_eqtl)
  X <- scale(X); X[is.na(X)] <- 0
  X
})
R_list <- lapply(X_list, cor)

# Create a loader from the pre-loaded list
loader <- ld_loader(R_list = R_list)

# Simulate eQTL effects and GWAS signal per gene
eqtl_sumstats_list <- list()
gwas_z_list <- list()
beta_true_list <- list()

for (g in 1:n_genes) {
  p <- gene_p[g]
  beta_eqtl <- rep(0, p)
  # 2-3 causal eQTLs per gene
  n_causal <- sample(2:3, 1)
  causal <- sort(sample(p, n_causal))
  beta_eqtl[causal] <- rnorm(n_causal, sd = 0.25)
  beta_true_list[[g]] <- beta_eqtl

  # eQTL z-scores
  expr <- X_list[[g]] %*% beta_eqtl + rnorm(n_eqtl)
  eqtl_z <- as.vector(cor(expr, X_list[[g]])) * sqrt(n_eqtl)
  eqtl_sumstats_list[[g]] <- data.frame(z = eqtl_z)

  # GWAS z-scores (gene g affects trait with effect 0.05)
  gwas_z_list[[g]] <- as.numeric(
    R_list[[g]] %*% (beta_eqtl * 0.05 * sqrt(n_gwas))
  ) + rnorm(p)
}

Running the OTTERS pipeline per gene

Default methods and parameters

otters_weights() runs three methods by default, with parameters matching the original OTTERS (Zhang et al. 2024):

Method Key defaults Description
P+T p_thresholds = c(0.001, 0.05) Select SNPs by eQTL p-value; weights = marginal z/sqrt(n)
lassosum s = c(0.2, 0.5, 0.9, 1.0), lambda = 20 values from 1e-4 to 0.1 L1-penalized regression; grid search over shrinkage
PRS-CS phi = 1e-4 (fixed), n_iter = 1000, n_burnin = 500, thin = 5 Bayesian continuous shrinkage; fixed global shrinkage
SDPR iter = 1000, burn = 200, thin = 1 Dirichlet process mixture; no thinning

Process all genes using the loader

all_results <- list()

for (g in 1:n_genes) {
  # Get LD from loader (or could pass R_list[[g]] directly)
  R_g <- loader(g)

  # Stage I: train weights
  weights_g <- otters_weights(
    sumstats = eqtl_sumstats_list[[g]],
    LD = R_g,
    n = n_eqtl
  )

  # Stage II: association test
  twas_g <- otters_association(
    weights = weights_g,
    gwas_z = gwas_z_list[[g]],
    LD = R_g
  )
  twas_g$gene <- paste0("gene_", g)
  all_results[[g]] <- twas_g
}

results_df <- do.call(rbind, all_results)
print(results_df[, c("gene", "method", "twas_z", "twas_pval", "n_snps")])
#>      gene        method   twas_z    twas_pval n_snps
#> 1  gene_1      PT_0.001 4.718430 2.376714e-06      1
#> 2  gene_1       PT_0.05 4.947677 7.510444e-07      2
#> 3  gene_1  lassosum_rss 4.344763 1.394264e-05     40
#> 4  gene_1        prs_cs 4.737833 2.160159e-06     40
#> 5  gene_1          sdpr 4.718923 2.370960e-06     40
#> 6  gene_1 ACAT_combined       NA 1.845938e-06     NA
#> 7  gene_2      PT_0.001 3.911016 9.190882e-05      1
#> 8  gene_2       PT_0.05 3.143661 1.668488e-03      3
#> 9  gene_2  lassosum_rss 1.714494 8.643795e-02     50
#> 10 gene_2        prs_cs 3.822629 1.320362e-04     50
#> 11 gene_2          sdpr 3.910520 9.209771e-05     50
#> 12 gene_2 ACAT_combined       NA 1.670973e-04     NA
#> 13 gene_3      PT_0.001 3.529987 4.155804e-04      2
#> 14 gene_3       PT_0.05 3.235894 1.212622e-03      3
#> 15 gene_3  lassosum_rss 3.008507 2.625348e-03     29
#> 16 gene_3        prs_cs 3.067179 2.160893e-03     30
#> 17 gene_3          sdpr 2.355212 1.851216e-02     30
#> 18 gene_3 ACAT_combined       NA 1.211075e-03     NA

Extending with additional learners

Any function following the *_weights(stat, LD, ...) convention can be added. No code changes needed - methods are dispatched dynamically.

Example: adding mr.ash.rss

mr_ash_rss_weights() (from susieR) uses a flexible mixture-of-normals prior that can capture both sparse and polygenic architectures:

weights <- otters_weights(
  sumstats = eqtl_sumstats_list[[1]],
  LD = R_list[[1]],
  n = n_eqtl,
  methods = list(
    # Default OTTERS methods
    lassosum_rss = list(),
    prs_cs = list(phi = 1e-4, n_iter = 1000, n_burnin = 500, thin = 5),
    sdpr = list(iter = 1000, burn = 200, thin = 1, verbose = FALSE),
    # Additional learner
    mr_ash_rss = list(
      var_y = 1,
      sigma2_e = 0.5,
      s0 = c(0, 0.001, 0.01, 0.1, 0.5),
      w0 = rep(0.2, 5)
    )
  )
)

Example: quick two-method run

weights <- otters_weights(
  sumstats = eqtl_sumstats_list[[1]],
  LD = R_list[[1]],
  n = n_eqtl,
  methods = list(lassosum_rss = list()),
  p_thresholds = c(0.05)
)

Note on data preparation

otters_association() assumes that weights, GWAS z-scores, and LD are aligned to the same variants and allele orientation. For real data, use pecotmr’s QC functions before calling the pipeline:

  • allele_qc() - match alleles between summary stats and LD reference, handle strand flips (A/T, C/G), negate effects for allele-flipped SNPs
  • rss_basic_qc() - full QC pipeline including allele matching, indel filtering, and region-based variant selection

References

  • Zhang et al. (2024). OTTERS: a powerful TWAS framework leveraging summary-level reference data. Nature Communications.
  • Gusev et al. (2016). Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics.
  • Mak et al. (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology.
  • Ge et al. (2019). Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications.
  • Zhou et al. (2023). SDPR: a statistical method for leveraging summary-level data for polygenic risk prediction.
  • Liu & Xie (2020). Cauchy combination test: a powerful test with analytic p-value calculation. JASA.