Simulation Studies#

This directory contains simulation designs and implementation codes used in the paper.

1. Simulation Design Overview#

This section provides a summary of the simulation notebooks (1-11) that implement different aspects of our multi-trait colocalization method evaluations.

  • Phenotype data simulation: Establishes the fundamental simulation framework for generating synthetic phenotype data. Simulates phenotype data (Y matrix) based on real genotype data (X matrix) using total heritability and SNP-level heritability approaches. Configurable for different numbers of traits (2, 5, 10, 20) and causal variants with controllable heritability.

  • Run ColocBoostb: Executes the ColocBoost algorithm on simulated datasets to identify colocalizing variants and trait clusters. Processes and standardizes results for performance evaluation with key output metrics.

  • Other colocalization methods: Implements competing colocalization methods (HyprColoc, MOLOC, and COLOC (V5)) for benchmarking. Standardizes outputs across methods to enable fair comparison.

  • Colocalization result summary: Calculates performance metrics including power and false discovery rates from method results. Generates standardized comparison tables summarizing method effectiveness across simulation scenarios.

  • Secondary simulations: Creates advanced simulation scenarios including 50-trait datasets and complex colocalization configurations. Implements specialized trait clustering patterns (5+5, 3+3+2+2) and random variant sharing to test method robustness.

  • GWAS / weaker signal simulation and ColocBoost: Implements simulations specifically designed to mimic real-world GWAS summary statistics.

  • Correlated phenotypes simulation: Evaluates method performance under scenarios with correlated traits and complex pleiotropy patterns.

  • Null simulation: Tests type I error control and false discovery rates under null scenarios where no colocalization exists.

  • FineBoost (single trait ColocBoost): Demonstrates the FineBoost extension that incorporates fine-mapping capabilities into the ColocBoost framework.

  • OPERA running: Compares performance with the OPERA method and evaluates under OPERA-specific simulation settings.

  • OPERA: simulation using original proportion configuration: Implements the original OPERA design for benchmark comparisons and methodological validation.

2. Simulation Dependence#

This section contains R functions for implementing competing multi-trait colocalization methods and supporting utilities.

  • colocboost_summary.r: Contains the core implementation of summary colocalization results from ColocBoost.

  • fineboost_summary.r: Contains the core implementation of summary fine-mapping results from FineBoost.

  • hypercoloc_set.r: Contains configuration settings and implementation for running HyPrColoc, including parameter optimization and result parsing functions.

  • moloc_set.r: Provides parameter settings and configuration for the moloc method, including prior specifications and output formatting.

  • moloc.r: Contains the core implementation of MOLOC method, which is an extension of COLOC across multiple traits.

  • susie_coloc.r: Contains the core implementation of COLOC (V5) for pair-wise colocalization analysis comparison. COLOC (V5) adapts the SuSiE (Sum of Single Effects) fine-mapping framework for colocalization analysis to relax the single causal variant assumption.

  • ld_utils.R: Contains utility functions for handling linkage disequilibrium (LD) matrices, including LD estimation, pruning, and conditioning operations that are used by multiple colocalization methods.

3. References#

[1] simxQTL: In house simulation R package to support investigations of various QTL association methods.

[2] colocboost: R package implements ColocBoost for multi-trait colocalization analysis. See details in our tutorial website.

[3] hyprcoloc: R package implements HyPrColoc, an efficient deterministic Bayesian divisive clustering algorithm using GWAS summary statistics, that can detect colocalization across vast numbers of traits simultaneously.

[4] moloc: R package implements MOLOC, an extension of COLOC, a Bayesian method for colocalization across multiple traits.

[5] coloc: R package implements COLOC that can be used to perform genetic colocalisation analysis of two potentially related phenotypes, to ask whether they share common genetic causal variant(s) in a given region. COLOC (V5) introduces use of the SuSiE approach to deal with multiple causal variants rather than conditioning or masking.

[6] OPERA: software tool implements the OPERA (omics pleiotropic association) method, which allows for testing the combinatorial pleiotropic associations between multiple molecular phenotypes (e.g., expression level of a gene and DNA methylation level at CpG sites) with a complex trait of interest using summary-level data from GWAS and molecular QTL studies.