Simulation Studies

Simulation Studies#

This directory contains simulation designs and implementation codes used in the paper.

1. Simulation Design Overview#

This section provides a summary of the simulation notebooks that implement different aspects of our multi-trait colocalization method evaluations.

  • Simulation Schematics: Comprehensive overview of the simulation framework used to evaluate and benchmark the ColocBoost with competing methods.

  • Phenotype Data Simulation: Establishes the fundamental simulation framework for generating synthetic phenotype data. Simulates phenotype data (Y matrix) for \(L\) traits based on real genotype data (X matrix) using total heritability and SNP-level heritability approaches. Configurable for different numbers of traits (2, 5, 10, 20) and causal variants with controllable heritability.

  • Run ColocBoost: Executes the ColocBoost algorithm on simulated datasets to identify colocalizing variants and trait clusters. Processes and standardizes results for performance evaluation with key output metrics.

  • Other Colocalization Methods: Implements competing colocalization methods (HyprColoc, MOLOC, and COLOC (V5)) for benchmarking. Standardizes outputs across methods to enable fair comparison.

  • Colocalization Result Summary: Calculates performance metrics including power and false discovery rates from method results. Generates standardized comparison tables summarizing method effectiveness across simulation scenarios.

  • Secondary Simulations: Creates advanced simulation scenarios including 50-trait datasets and complex colocalization configurations. Implements specialized trait clustering patterns (5+5, 3+3+2+2) and random variant sharing to test method robustness.

  • Weaker Signal Simulation: Implements simulations specifically designed to mimic real-world GWAS summary statistics.

  • Correlated Phenotypes Simulation: Evaluates method performance under scenarios with correlated traits and complex pleiotropy patterns.

  • Null Simulation: Tests type I error control and false discovery rates under null scenarios where no colocalization exists.

  • FineBoost (single trait ColocBoost): Demonstrates the FineBoost extension that incorporates fine-mapping capabilities into the ColocBoost framework.

  • Run OPERA: Compares performance with the OPERA method and evaluates under OPERA-specific simulation settings.

  • Comparison with OPERA: Implements the original OPERA design for benchmark comparisons and methodological validation.

Part of the data needed is provided in the Data folder.

2. References#

[1] simxQTL: In house simulation R package to support investigations of various QTL association methods.

[2] colocboost: R package implements ColocBoost for multi-trait colocalization analysis. See details in our tutorial website.

[3] hyprcoloc: R package implements HyPrColoc, an efficient deterministic Bayesian divisive clustering algorithm using GWAS summary statistics, that can detect colocalization across vast numbers of traits simultaneously.

[4] moloc: R package implements MOLOC, an extension of COLOC, a Bayesian method for colocalization across multiple traits.

[5] coloc: R package implements COLOC that can be used to perform genetic colocalisation analysis of two potentially related phenotypes, to ask whether they share common genetic causal variant(s) in a given region. COLOC (V5) introduces use of the SuSiE approach to deal with multiple causal variants rather than conditioning or masking.

[6] OPERA: software tool implements the OPERA (omics pleiotropic association) method, which allows for testing the combinatorial pleiotropic associations between multiple molecular phenotypes (e.g., expression level of a gene and DNA methylation level at CpG sites) with a complex trait of interest using summary-level data from GWAS and molecular QTL studies.