OTTERS: Omnibus TWAS using multiple RSS methods
Source:vignettes/otters-pipeline.Rmd
otters-pipeline.RmdOverview
The OTTERS pipeline (Zhang et al. 2024) performs TWAS in two stages:
-
Stage I (
otters_weights): Train eQTL weights using multiple summary-statistics-based (RSS) methods. -
Stage II (
otters_association): Compute per-method TWAS z-scores and combine p-values across methods using ACAT.
Loading real LD data
In real analyses, LD matrices are loaded from PLINK2 pgen files or pre-computed LD blocks using pecotmr’s data infrastructure. Here we show the loading pattern using chr22 data from the ADSP reference panel (the code is not evaluated here since the data paths are machine-specific):
library(pecotmr)
# LD metadata file points to PLINK2 pgen/pvar/psam files
ld_meta <- "~/Documents/ADSP_chr22_sketch/ld_meta_chr22.tsv"
# Define gene regions of interest
gene_regions <- c(
"chr22:20000000-21000000",
"chr22:25000000-26000000",
"chr22:30000000-31000000"
)
# Option 1: Load LD on the fly via ld_loader() (memory-efficient)
loader <- ld_loader(ld_meta_path = ld_meta, regions = gene_regions)
R_gene1 <- loader(1) # loads LD for first gene region on demand
# Option 2: Pre-load all blocks into a list
R_list <- lapply(gene_regions, function(reg) {
ld <- load_LD_matrix(ld_meta, region = reg)
ld$LD_matrix
})
loader_from_list <- ld_loader(R_list = R_list)
# Option 3: Load genotype matrix and compute LD with shrinkage
ld_data <- load_LD_matrix(ld_meta, region = gene_regions[1],
return_genotype = TRUE)
X <- ld_data$LD_matrix
R_shrunk <- compute_LD(X, shrinkage = 0.5) # lassosum-style regularization
# eQTL summary stats are loaded similarly:
# eqtl <- load_rss_data(sumstat_path, column_file_path, region, n)Simulating data for demonstration
We simulate a realistic scenario: load genotype matrices from real data (or simulate them), compute LD, then simulate eQTL effects and GWAS z-scores on top.
library(pecotmr)
set.seed(2024)
# Simulate 3 gene regions with realistic LD
# (In practice, these would come from ld_loader() as shown above)
n_eqtl <- 500
n_gwas <- 50000
n_genes <- 3
gene_p <- c(40, 50, 30) # variants per gene
# Simulate genotype matrices and LD per gene
X_list <- lapply(gene_p, function(p) {
X <- matrix(rbinom(n_eqtl * p, 2, 0.3), nrow = n_eqtl)
X <- scale(X); X[is.na(X)] <- 0
X
})
R_list <- lapply(X_list, cor)
# Create a loader from the pre-loaded list
loader <- ld_loader(R_list = R_list)
# Simulate eQTL effects and GWAS signal per gene
eqtl_sumstats_list <- list()
gwas_z_list <- list()
beta_true_list <- list()
for (g in 1:n_genes) {
p <- gene_p[g]
beta_eqtl <- rep(0, p)
# 2-3 causal eQTLs per gene
n_causal <- sample(2:3, 1)
causal <- sort(sample(p, n_causal))
beta_eqtl[causal] <- rnorm(n_causal, sd = 0.25)
beta_true_list[[g]] <- beta_eqtl
# eQTL z-scores
expr <- X_list[[g]] %*% beta_eqtl + rnorm(n_eqtl)
eqtl_z <- as.vector(cor(expr, X_list[[g]])) * sqrt(n_eqtl)
eqtl_sumstats_list[[g]] <- data.frame(z = eqtl_z)
# GWAS z-scores (gene g affects trait with effect 0.05)
gwas_z_list[[g]] <- as.numeric(
R_list[[g]] %*% (beta_eqtl * 0.05 * sqrt(n_gwas))
) + rnorm(p)
}Running the OTTERS pipeline per gene
Default methods and parameters
otters_weights() runs three methods by default, with
parameters matching the original OTTERS (Zhang et al. 2024):
| Method | Key defaults | Description |
|---|---|---|
| P+T | p_thresholds = c(0.001, 0.05) |
Select SNPs by eQTL p-value; weights = marginal z/sqrt(n) |
| lassosum |
s = c(0.2, 0.5, 0.9, 1.0), lambda = 20
values from 1e-4 to 0.1 |
L1-penalized regression; grid search over shrinkage |
| PRS-CS |
phi = 1e-4 (fixed), n_iter = 1000,
n_burnin = 500, thin = 5
|
Bayesian continuous shrinkage; fixed global shrinkage |
| SDPR |
iter = 1000, burn = 200,
thin = 1
|
Dirichlet process mixture; no thinning |
Process all genes using the loader
all_results <- list()
for (g in 1:n_genes) {
# Get LD from loader (or could pass R_list[[g]] directly)
R_g <- loader(g)
# Stage I: train weights
weights_g <- otters_weights(
sumstats = eqtl_sumstats_list[[g]],
LD = R_g,
n = n_eqtl
)
# Stage II: association test
twas_g <- otters_association(
weights = weights_g,
gwas_z = gwas_z_list[[g]],
LD = R_g
)
twas_g$gene <- paste0("gene_", g)
all_results[[g]] <- twas_g
}
results_df <- do.call(rbind, all_results)
print(results_df[, c("gene", "method", "twas_z", "twas_pval", "n_snps")])
#> gene method twas_z twas_pval n_snps
#> 1 gene_1 PT_0.001 4.718430 2.376714e-06 1
#> 2 gene_1 PT_0.05 4.947677 7.510444e-07 2
#> 3 gene_1 lassosum_rss 4.344763 1.394264e-05 40
#> 4 gene_1 prs_cs 4.737833 2.160159e-06 40
#> 5 gene_1 sdpr 4.718923 2.370960e-06 40
#> 6 gene_1 ACAT_combined NA 1.845938e-06 NA
#> 7 gene_2 PT_0.001 3.911016 9.190882e-05 1
#> 8 gene_2 PT_0.05 3.143661 1.668488e-03 3
#> 9 gene_2 lassosum_rss 1.714494 8.643795e-02 50
#> 10 gene_2 prs_cs 3.822629 1.320362e-04 50
#> 11 gene_2 sdpr 3.910520 9.209771e-05 50
#> 12 gene_2 ACAT_combined NA 1.670973e-04 NA
#> 13 gene_3 PT_0.001 3.529987 4.155804e-04 2
#> 14 gene_3 PT_0.05 3.235894 1.212622e-03 3
#> 15 gene_3 lassosum_rss 3.008507 2.625348e-03 29
#> 16 gene_3 prs_cs 3.067179 2.160893e-03 30
#> 17 gene_3 sdpr 2.355212 1.851216e-02 30
#> 18 gene_3 ACAT_combined NA 1.211075e-03 NAExtending with additional learners
Any function following the *_weights(stat, LD, ...)
convention can be added. No code changes needed - methods are dispatched
dynamically.
Example: adding mr.ash.rss
mr_ash_rss_weights() (from susieR) uses a flexible
mixture-of-normals prior that can capture both sparse and polygenic
architectures:
weights <- otters_weights(
sumstats = eqtl_sumstats_list[[1]],
LD = R_list[[1]],
n = n_eqtl,
methods = list(
# Default OTTERS methods
lassosum_rss = list(),
prs_cs = list(phi = 1e-4, n_iter = 1000, n_burnin = 500, thin = 5),
sdpr = list(iter = 1000, burn = 200, thin = 1, verbose = FALSE),
# Additional learner
mr_ash_rss = list(
var_y = 1,
sigma2_e = 0.5,
s0 = c(0, 0.001, 0.01, 0.1, 0.5),
w0 = rep(0.2, 5)
)
)
)Example: quick two-method run
weights <- otters_weights(
sumstats = eqtl_sumstats_list[[1]],
LD = R_list[[1]],
n = n_eqtl,
methods = list(lassosum_rss = list()),
p_thresholds = c(0.05)
)Note on data preparation
otters_association() assumes that weights, GWAS
z-scores, and LD are aligned to the same variants and allele
orientation. For real data, use pecotmr’s QC functions before calling
the pipeline:
-
allele_qc()- match alleles between summary stats and LD reference, handle strand flips (A/T, C/G), negate effects for allele-flipped SNPs -
rss_basic_qc()- full QC pipeline including allele matching, indel filtering, and region-based variant selection
References
- Zhang et al. (2024). OTTERS: a powerful TWAS framework leveraging summary-level reference data. Nature Communications.
- Gusev et al. (2016). Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics.
- Mak et al. (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology.
- Ge et al. (2019). Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications.
- Zhou et al. (2023). SDPR: a statistical method for leveraging summary-level data for polygenic risk prediction.
- Liu & Xie (2020). Cauchy combination test: a powerful test with analytic p-value calculation. JASA.