Skip to contents

This vignette demonstrates how to use the simulate_phenotype() function to simulate a sparse architecture, with control over:

  • The number of causal variants
  • Per-SNP heritability
  • Whether causal variants should be in low linkage disequilibrium (LD) with each other

Data

library(simxQTL)
set.seed(1)

# Simulate genotype matrix (500 individuals x 100 SNPs)
G <- matrix(rbinom(500 * 100, size = 2, prob = 0.3), nrow = 500, ncol = 100)

# Standardize the genotype matrix
G <- scale(G)

Simulate Phenotype

Now we use simulate_phenotype() to generate a phenotype with 3 independent causal variants, each explaining 5% of phenotypic variance:

result <- simulate_phenotype(
  X = G,
  n_causal = 3,
  h2_per_snp = 0.05,
  independent = TRUE
)

Key parameters:

  • n_causal: Number of causal variants to simulate
  • h2_per_snp: Heritability contributed by each causal SNP (total h2 = n_causal * h2_per_snp)
  • independent: If TRUE, causal SNPs are constrained to have low LD (|r| < 0.15) with each other

Examine Output

The function returns a list with all simulation components:

# Output structure
names(result)
# [1] "G"                 "y"                 "beta"             
# [4] "causal"            "h2_total"          "h2_per_snp"       
# [7] "residual_variance"

# Phenotype vector
head(result$y)
# [1]  -4.7634051  -0.1104029   5.5577233 -14.4453739  -2.8296897  -5.1891438

# Indices of causal variants
result$causal
# [1] 93 87 46

# Effect sizes (non-zero for causal variants)
result$beta[result$causal]
# [1] 1 1 1

# Heritability
cat("Total heritability:", result$h2_total, "\n")
# Total heritability: 0.15
cat("Per-SNP heritability:", result$h2_per_snp, "\n")
# Per-SNP heritability: 0.05

Visualize Effect Sizes

We can visualize the effect sizes, highlighting the causal variants:

plot(abs(result$beta), ylab = "Absolute Effect Size", xlab = "SNP Index", pch = 16)
points(which(result$beta != 0), abs(result$beta[result$beta != 0]), col = 2, pch = 16)

Verify LD Constraint

We can verify the causal variants have low LD when independent = TRUE:

# Check LD among causal variants
causal_cor <- cor(G[, result$causal])
print(round(causal_cor, 3))
#        [,1]   [,2]   [,3]
# [1,]  1.000  0.006 -0.056
# [2,]  0.006  1.000 -0.012
# [3,] -0.056 -0.012  1.000