Generate eQTL Data with Multiple Genetic Architecture Components
Source:R/simulate_eQTL.R
generate_cis_qtl_data.RdUsage
generate_cis_qtl_data(
G,
h2g = 0.25,
prop_h2_sparse = 0.5,
prop_h2_oligogenic = 0.35,
prop_h2_infinitesimal = 0.15,
n_sparse = 2,
n_oligogenic = 5,
n_inf = 15,
mixture_props = c(0.75, 0.25),
sparse_sd = 0.5,
oligo_sds = c(0.05, 0.15),
inf_sd = 0.01,
standardize = TRUE,
independent = TRUE,
ld_threshold = 0.15,
max_attempts = 200,
seed = NULL
)Arguments
- G
Standardized genotype matrix (samples x SNPs).
- h2g
Total SNP heritability (proportion of variance explained by genotyped SNPs).
- prop_h2_sparse
Proportion of h2g explained by sparse effects.
- prop_h2_oligogenic
Proportion of h2g explained by oligogenic effects.
- prop_h2_infinitesimal
Proportion of h2g explained by infinitesimal effects.
- n_sparse
Number of sparse SNPs.
- n_oligogenic
Number of oligogenic SNPs to simulate.
- n_inf
Number of infinitesimal SNPs to simulate. If NULL (default), all remaining SNPs after sparse and oligogenic selection receive infinitesimal effects.
- mixture_props
Mixture proportions for oligogenic effects (must sum to 1). Default c(0.75, 0.25) means 75
sparse_sdStandard deviation for drawing sparse effects (default 0.5).
oligo_sdsStandard deviations for oligogenic mixture components (default c(0.05, 0.15)).
inf_sdStandard deviation for drawing infinitesimal effects (default 0.01).
standardizeLogical; if TRUE, the genotype matrix will be standardized.
independentLogical; if TRUE, ensures all sparse and oligogenic SNPs have |r| < ld_threshold with each other (default TRUE).
ld_thresholdNumeric; maximum allowed absolute correlation between causal variants when independent = TRUE (default 0.15).
max_attemptsInteger; maximum number of attempts to find SNPs satisfying LD constraints (default 200).
seedOptional seed for reproducibility.
A list containing the standardized genotype matrix, simulated phenotype, combined beta values, indices for each effect component, realized heritability estimates, effect size ranges, hierarchy validation results, and causal indices. This function generates simulated gene expression data