Enrichment analysis of molecular QTL in genetic variants associated with complex traits

This vignette demonstrates the enrichment analysis part of the pecotmr package, which largely follows from fastENLOC (https://github.com/xqwen/fastenloc) but uses susieR fitted objects as input to estimate priors for use with the coloc package (coloc v5, aka SuSiE-coloc). The main differences are: 1) now enrichment is based on all QTL variants whether or not they are inside signal clusters; 2) Causal QTL are sampled from SuSiE single effects, not signal clusters; 3) Allow a variant to be QTL for not only multiple conditions (eg cell types) but also multiple regions (eg genes).

The example uses de-identified data shipped with the package. All sample names, variant positions, and identifiers are synthetic.

Set up environment and Load data

Load the pecotmr package for this enrichment analysis.

library(pecotmr)

Load fine-mapping results from the package example data and write them to temporary RDS files, since xqtl_enrichment_wrapper expects file paths.

data(gwas_finemapping_example)
data(qtl_finemapping_example)
gwas_path <- tempfile(fileext = ".rds")
qtl_path <- tempfile(fileext = ".rds")
saveRDS(gwas_finemapping_example, gwas_path)
saveRDS(qtl_finemapping_example, qtl_path)
gwas <- gwas_finemapping_example
qtl <- qtl_finemapping_example

Input data requirements

GWAS data requirements

In the GWAS data, the file is stored as a list, with the fine mapping results already stored under the first layer.

names(gwas[[1]])

## [1] "alpha" "pip"   "V"     "sets"

Posterior inclusion probabilities of GWAS data:

plot(gwas[[1]]$pip, ylab = "PIP", xlab = "Variant index", main = "GWAS PIPs", pch = 16, cex = 0.6)

QTL data requirements

In the QTL data, the file is stored as a list and is more complicated. The first layer is named as the molecular trait region ID, followed by different contexts as the second layer. The variant names and fine mapping results can be found under these contexts (variant_names and susie_result_trimmed).

names(qtl)

## [1] "region_1"

names(qtl[["region_1"]])

## [1] "context_1"

names(qtl[["region_1"]][["context_1"]])

## [1] "susie_result_trimmed" "variant_names"

names(qtl[["region_1"]][["context_1"]][["susie_result_trimmed"]])

## [1] "alpha" "pip"   "V"     "sets"

Posterior inclusion probabilities of QTL data:

plot(qtl[["region_1"]][["context_1"]][["susie_result_trimmed"]]$pip,
     ylab = "PIP", xlab = "Variant index", main = "QTL PIPs", pch = 16, cex = 0.6)

Perform enrichment analysis

The input for xqtl_enrichment_wrapper requires the paths for GWAS and QTL fine mapping data. You need to specify the list name of SuSiE results in each dataset as xqtl_finemapping_obj and the list name of variant names as xqtl_varname_obj. For example, if SuSiE results for QTL data are under qtl[["region_1"]][["context_1"]][["susie_result_trimmed"]], then xqtl_finemapping_obj would be c("context_1", "susie_result_trimmed"), and xqtl_varname_obj would be c("context_1", "variant_names") since the variant names are stored directly under context_1. For GWAS data, the fine mapping results and variant names are stored under the first layer directly, so you do not need to specify gwas_finemapping_obj and gwas_varname_obj. However, if your data does not follow this structure, you may need to specify these parameters accordingly.

enrich_res <- xqtl_enrichment_wrapper(
  gwas_files = gwas_path, xqtl_files = qtl_path,
  xqtl_finemapping_obj = c("context_1", "susie_result_trimmed"),
  xqtl_varname_obj = c("context_1", "variant_names")
)

## Warning in compute_qtl_enrichment(gwas_pip = dat$gwas_pip, susie_qtl_regions =
## dat$xqtl_data, : num_gwas is not provided. Estimating pi_gwas from the data.
## Note that this estimate may be biased if the input gwas_pip does not contain
## genome-wide variants.

## Estimated pi_gwas:  0.00141

## Warning in compute_qtl_enrichment(gwas_pip = dat$gwas_pip, susie_qtl_regions =
## dat$xqtl_data, : pi_qtl is not provided. Estimating pi_qtl from the data. Note
## that this estimate may be biased if either 1) the input susie_qtl_regions does
## not have enough data, or 2) the single effects only include variables inside of
## credible sets or signal clusters.

## Estimated pi_qtl:  0.00071

## Fine-mapped GWAS and QTL data loaded successfully for enrichment analysis!
## Proportion of xQTL missing from GWAS variants: 0 in MI round 0
## Proportion of xQTL missing from GWAS variants: 0 in MI round 1
## Proportion of xQTL missing from GWAS variants: 0 in MI round 2
## Proportion of xQTL missing from GWAS variants: 0 in MI round 3
## Proportion of xQTL missing from GWAS variants: 0 in MI round 4
## Proportion of xQTL missing from GWAS variants: 0 in MI round 5
## Proportion of xQTL missing from GWAS variants: 0 in MI round 6
## Proportion of xQTL missing from GWAS variants: 0 in MI round 7
## Proportion of xQTL missing from GWAS variants: 0 in MI round 8
## Proportion of xQTL missing from GWAS variants: 0 in MI round 9
## Proportion of xQTL missing from GWAS variants: 0 in MI round 10
## Proportion of xQTL missing from GWAS variants: 0 in MI round 11
## Proportion of xQTL missing from GWAS variants: 0 in MI round 12
## Proportion of xQTL missing from GWAS variants: 0 in MI round 13
## Proportion of xQTL missing from GWAS variants: 0 in MI round 14
## Proportion of xQTL missing from GWAS variants: 0 in MI round 15
## Proportion of xQTL missing from GWAS variants: 0 in MI round 16
## Proportion of xQTL missing from GWAS variants: 0 in MI round 17
## Proportion of xQTL missing from GWAS variants: 0 in MI round 18
## Proportion of xQTL missing from GWAS variants: 0 in MI round 19
## Proportion of xQTL missing from GWAS variants: 0 in MI round 20
## Proportion of xQTL missing from GWAS variants: 0 in MI round 21
## Proportion of xQTL missing from GWAS variants: 0 in MI round 22
## Proportion of xQTL missing from GWAS variants: 0 in MI round 23
## Proportion of xQTL missing from GWAS variants: 0 in MI round 24
## EM updates completed!

print(enrich_res[[1]])

## $`Alternative (coloc) p1`
## [1] 0.001413425
## 
## $`Alternative (coloc) p12`
## [1] 1.000238e-06
## 
## $`Alternative (coloc) p2`
## [1] 0.0007061743
## 
## $`Effective MI rounds`
## [1] 24
## 
## $`Enrichment (no shrinkage)`
## [1] -7.950078
## 
## $`Enrichment (w/ shrinkage)`
## [1] -7.953078e-06
## 
## $Intercept
## [1] -6.559616
## 
## $`sd (intercept)`
## [1] 0.5002662
## 
## $`sd (no shrinkage)`
## [1] 999.8109
## 
## $`sd (w/ shrinkage)`
## [1] 0.9999995

print(head(enrich_res[["unused_xqtl_variants"]][[1]]))

## NULL

The enrichment analysis output is stored as a 2-layer list. The first layer contains the enrichment results, which will be used as priors in coloc analysis. The second layer stores the QTL variants that were not detected in the GWAS dataset.

Session information

Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.

sessionInfo()

## R version 4.4.3 (2025-02-28)
## Platform: x86_64-conda-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /tmp/pixi/.pixi/envs/r44/lib/libopenblasp-r0.3.30.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] pecotmr_0.3.21
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1    viridisLite_0.4.3   bigsnpr_1.12.21    
##  [4] dplyr_1.2.0         farver_2.1.2        viridis_0.6.5      
##  [7] fastmap_1.2.0       reshape_0.8.10      bigassertr_0.1.7   
## [10] flock_0.7           digest_0.6.39       lifecycle_1.0.5    
## [13] magrittr_2.0.4      compiler_4.4.3      rlang_1.1.7        
## [16] sass_0.4.10         rngtools_1.5.2      tools_4.4.3        
## [19] yaml_2.3.12         data.table_1.17.8   knitr_1.51         
## [22] doRNG_1.8.6.3       htmlwidgets_1.6.4   bit_4.6.0          
## [25] plyr_1.8.9          RColorBrewer_1.1-3  rmio_0.4.0         
## [28] coloc_5.2.3         withr_3.0.2         bigsparser_0.7.3   
## [31] purrr_1.2.1         bigstatsr_1.6.2     BiocGenerics_0.52.0
## [34] desc_1.4.3          grid_4.4.3          stats4_4.4.3       
## [37] susieR_0.15.51      future_1.69.0       ggplot2_3.5.2      
## [40] globals_0.19.0      scales_1.4.0        iterators_1.0.14   
## [43] cli_3.6.5           rmarkdown_2.30      crayon_1.5.3       
## [46] ragg_1.5.0          generics_0.1.4      otel_0.2.0         
## [49] future.apply_1.20.2 tzdb_0.5.0          cachem_1.1.0       
## [52] stringr_1.6.0       parallel_4.4.3      matrixStats_1.5.0  
## [55] vctrs_0.7.1         Matrix_1.7-4        jsonlite_2.0.0     
## [58] IRanges_2.40.0      hms_1.1.4           S4Vectors_0.44.0   
## [61] mixsqp_0.3-54       bit64_4.6.0-1       irlba_2.3.7        
## [64] listenv_0.10.0      systemfonts_1.3.1   foreach_1.5.2      
## [67] tidyr_1.3.2         jquerylib_0.1.4     bigparallelr_0.3.2 
## [70] glue_1.8.0          parallelly_1.46.1   pkgdown_2.2.0      
## [73] codetools_0.2-20    cowplot_1.2.0       stringi_1.8.7      
## [76] gtable_0.3.6        doFuture_1.2.1      tibble_3.3.1       
## [79] pillar_1.11.1       furrr_0.3.1         htmltools_0.5.9    
## [82] R6_2.6.1            textshaping_1.0.4   doParallel_1.0.17  
## [85] vroom_1.6.7         evaluate_1.0.5      lattice_0.22-9     
## [88] readr_2.2.0         bslib_0.10.0        Rcpp_1.1.1         
## [91] gridExtra_2.3       xfun_0.56           fs_1.6.6           
## [94] pkgconfig_2.0.3