Bayesian linear regression using summary statistics

This function is adapted from those written by Peter Sorensen in the qgg package. The following prior distributions are provided:

Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).

Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO

Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.

Bayes C: uses a "rounded spike" (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).

Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .

Usage

gbayes_rss(
  sumstats = NULL,
  LD = NULL,
  variant_ids = NULL,
  nit = 100,
  nburn = 0,
  nthin = 4,
  method = "bayesR",
  vg = NULL,
  vb = NULL,
  ve = NULL,
  ssg_prior = NULL,
  ssb_prior = NULL,
  sse_prior = NULL,
  lambda = NULL,
  h2 = NULL,
  pi = 0.001,
  updateB = TRUE,
  updateG = TRUE,
  updateE = TRUE,
  updatePi = TRUE,
  adjustE = TRUE,
  nug = 4,
  nub = 4,
  nue = 4,
  mask = NULL,
  ve_prior = NULL,
  vg_prior = NULL,
  algorithm = "mcmc",
  tol = 0.001,
  nit_local = NULL,
  nit_global = NULL
)

Arguments

sumstats: dataframe with marker summary statistics. Required: beta coefficient (beta), standard error of the beta coefficient (se), GWAS sample size (n). Optional: variant_id or rsid, alleles (A1 and A2), minor allele frequency (maf).
LD: is a the LD matrix corresponding to the same markers as in the stat dataframe
variant_ids: is an optional character vector of variant ids or rsids, provided outside of the rss dataframe
nit: is the number of iterations
nburn: is the number of burnin iterations
nthin: is the thinning parameter
method: specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
vg: is a scalar or matrix of genetic (co)variances
vb: is a scalar or matrix of marker (co)variances
ve: is a scalar or matrix of residual (co)variances
ssg_prior: is a scalar or matrix of prior genetic (co)variances
ssb_prior: is a scalar or matrix of prior marker (co)variances
sse_prior: is a scalar or matrix of prior residual (co)variances
lambda: is a vector or matrix of lambda values
h2: is the trait heritability
pi: is the proportion of markers in each marker variance class
updateB: is a logical for updating marker (co)variances
updateG: is a logical for updating genetic (co)variances
updateE: is a logical for updating residual (co)variances
updatePi: is a logical for updating pi
adjustE: is a logical for adjusting residual variance
nug: is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
nub: is a scalar or vector of prior degrees of freedom for marker (co)variances
nue: is a scalar or vector of prior degrees of freedom for prior residual (co)variances
mask: is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
ve_prior: is a scalar or matrix of prior residual (co)variances
vg_prior: is a scalar or matrix of prior genetic (co)variances
algorithm: is the algorithm to use. Should take on values ("mcmc", "em-mcmc")
tol: is tolerance, i.e. convergence criteria used in gbayes
nit_local: is the number of local iterations
nit_global: is the number of global iterations

Value

Returns a list structure including

bm: vector of posterior means for marker effects
dm: vector of posterior means for marker inclusion probabilities
vbs: scalar or vector (t) of posterior means for marker variances
vgs: scalar or vector (t) of posterior means for genomic variances
ves: scalar or vector (t) of posterior means for residual variances
pis: vector of probabilites for each mcmc iteration
pim: posterior distribution probabilities
r: vector of residuals
b: vector of estimates from the final mcmc iteration
param: a list current parameters (same information as item listed above) used for restart of the analysis
stat: matrix (mxt) of marker information and effects used for genomic risk scoring
method: the method used
mask: which loci were masked from analysis
conv: dataframe of convergence metrics
post: posterior parameter estimates
ve: mean residual variance
vg: mean genomic variance