Simulation Schematics

Simulation Schematics#

Overview#

This notebook provides a comprehensive overview of the simulation framework used to evaluate and benchmark the ColocBoost with competing methods. We present a structured series of simulation designs that test the method’s performance under various conditions.

Simulation Framework#

Comparison with multi-trait colocalization methods#

We compared ColocBoost against three established colocalization methods, COLOC (V5), HyPrColoc, and MOLOC, including realistic simulation analyses, simulation designs adopted in competing methods, and simulation with correlated traits, encompassing multiple scenarios varying in the number of causal variants, variant-trait causal configurations, and effect-size heterogeneity (Figure S2a).

Primary numerical study: Realistic genomic scenarios that mirror empirical genetic architecture (see more details in notebook Phenotype Data Simulation).
Secondary simulations: Benchmark simulations replicating designs from original publications of competing colocalization methods (see more details in notebook Secondary Simulations), including i) fully colocalized design and ii) clustered and randomized colocalization design.
Correlated phenotypes simulations: Complex trait correlations to evaluate performance with interdependent phenotypes (see more details in notebook Correlated Phenotypes Simulation).
Weaker signal simulations: Implements simulations specifically designed to mimic real-world GWAS summary statistics (see more details in notebook Weaker Signal Simulation).
Null simulation: Tests type I error control and false discovery rates under null scenarios where no colocalization exists (see more details in notebook Null Simulation).

These simulations systematically assess key performance metrics including statistical power, false discovery control, computational efficiency, and robustness to various data conditions.

Comparison with OPERA using ‘target trait’ benchmarks#

To benchmark ColocBoost with the recently proposed multi-omics colocalization method OPERA in simulation studies, given that OPERA focuses on colocalization with a `target trait’ (GWAS) and requiring genome-wide GWAS summary statistics for estimating its hyperparameters, we computed summary statistics for 500 of the 1,287 independent, non-overlapping gene regions previously described, to serve as 500 replicates for power and FDR calculations. In our simulation studies, we considered the following simulation designs:

Primary numerical study: Realistic genomic scenarios that mirror empirical genetic architecture (see more details in notebook Run OPERA).
Secondary simulations: Benchmark simulations replicating designs from OPERA paper with the original proportion configuration for 3 and 5 exposures (see more details in notebook Comparison with OPERA).

For comprehensive technical details of simulation parameters, data generation procedures, and evaluation metrics, please refer to Supplementary Note S.6.2 in our manuscript.

FigureS2a