SDPR (Summary-Statistics-Based Dirichelt Process Regression for Polygenic Risk Prediction)

This function is a wrapper for the SDPR C++ implementation, which performs Markov Chain Monte Carlo (MCMC) for estimating effect sizes and heritability based on summary statistics and reference LD matrices.

Usage

sdpr(
  bhat,
  LD,
  n,
  per_variant_sample_size = NULL,
  array = NULL,
  a = 0.1,
  c = 1,
  M = 1000,
  a0k = 0.5,
  b0k = 0.5,
  iter = 1000,
  burn = 200,
  thin = 5,
  n_threads = 1,
  opt_llk = 1,
  verbose = TRUE,
  seed = NULL
)

Arguments

bhat: A vector of marginal beta values for each SNP.
LD: A list of LD matrices, where each matrix corresponds to a subset of SNPs.
n: The total sample size of the GWAS.
per_variant_sample_size: (Optional) A vector of sample sizes for each SNP. If NULL (default), it will be initialized to a vector of length equal to `bhat`, with all values set to `n`.
array: (Optional) A vector of genotyping array information for each SNP. If NULL (default), it will be initialized to a vector of 1's with length equal to `bhat`.
a: Factor to shrink the reference LD matrix. Default is 0.1.
c: Factor to correct for the deflation. Default is 1.
M: Max number of variance components. Default is 1000.
a0k: Hyperparameter for inverse gamma distribution. Default is 0.5.
b0k: Hyperparameter for inverse gamma distribution. Default is 0.5.
iter: Number of iterations for MCMC. Default is 1000.
burn: Number of burn-in iterations for MCMC. Default is 200.
thin: Thinning interval for MCMC. Default is 5.
n_threads: Number of threads to use. Default is 1.
opt_llk: Which likelihood to evaluate. 1 for equation 6 (slightly shrink the correlation of SNPs) and 2 for equation 5 (SNPs genotyped on different arrays in a separate cohort). Default is 1.
verbose: Whether to print verbose output. Default is true.

Value

A list containing the estimated effect sizes (beta) and heritability (h2).

Note

This function is a wrapper for the SDPR C++ implementation, which is a rewritten and adopted version of the SDPR package. The original SDPR documentation is available at https://htmlpreview.github.io/?https://github.com/eldronzhou/SDPR/blob/main/doc/Manual.html

Examples

# Generate example data
set.seed(985115)
n <- 350
p <- 16
sigmasq_error <- 0.5
zeroes <- rbinom(p, 1, 0.6)
beta.true <- rnorm(p, 1, sd = 4)
beta.true[zeroes] <- 0

X <- cbind(matrix(rnorm(n * p), nrow = n))
X <- scale(X, center = TRUE, scale = FALSE)
y <- X %*% matrix(beta.true, ncol = 1) + rnorm(n, 0, sqrt(sigmasq_error))
y <- scale(y, center = TRUE, scale = FALSE)

# Calculate sufficient statistics
XtX <- t(X) %*% X
Xty <- t(X) %*% y
yty <- t(y) %*% y

# Set the prior
K <- 9
sigma0 <- c(0.001, .1, .5, 1, 5, 10, 20, 30, .005)
omega0 <- rep(1 / K, K)

# Calculate summary statistics
b.hat <- sapply(1:p, function(j) {
  summary(lm(y ~ X[, j]))$coefficients[-1, 1]
})
s.hat <- sapply(1:p, function(j) {
  summary(lm(y ~ X[, j]))$coefficients[-1, 2]
})
R.hat <- cor(X)
var_y <- var(y)
sigmasq_init <- 1.5

# Run SDPR
LD <- list(blk1 = R.hat)
out <- sdpr(b.hat, LD, n)
#> 100 iter. h2: 128.105 max beta: 8.97683
#> 200 iter. h2: 124.916 max beta: 8.81275
#> 300 iter. h2: 128.34 max beta: 8.93466
#> 400 iter. h2: 127.515 max beta: 8.9221
#> 500 iter. h2: 125.586 max beta: 8.82759
#> 600 iter. h2: 129.451 max beta: 9.00726
#> 700 iter. h2: 127.766 max beta: 8.94951
#> 800 iter. h2: 126.963 max beta: 8.94212
#> 900 iter. h2: 127.994 max beta: 8.90338
#> 1000 iter. h2: 131.222 max beta: 9.01457
#> h2: 127.746 max: 8.91668
# In sample prediction correlations
cor(X %*% out$beta_est, y) #
#>           [,1]
#> [1,] 0.9983386