Bayesian linear regression using summary statistics
Source:R/regularized_regression.R
gbayes_rss.RdThis function is adapted from those written by Peter Sørensen in the qgg package. The following prior distributions are provided:
Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).
Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO
Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.
Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).
Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .
Usage
gbayes_rss(
sumstats = NULL,
LD = NULL,
variant_ids = NULL,
nit = 100,
nburn = 0,
nthin = 4,
method = "bayesR",
vg = NULL,
vb = NULL,
ve = NULL,
ssg_prior = NULL,
ssb_prior = NULL,
sse_prior = NULL,
lambda = NULL,
h2 = NULL,
pi = 0.001,
updateB = TRUE,
updateG = TRUE,
updateE = TRUE,
updatePi = TRUE,
adjustE = TRUE,
nug = 4,
nub = 4,
nue = 4,
mask = NULL,
ve_prior = NULL,
vg_prior = NULL,
algorithm = "mcmc",
tol = 0.001,
nit_local = NULL,
nit_global = NULL
)Arguments
- sumstats
dataframe with marker summary statistics. Required: beta coefficient (beta), standard error of the beta coefficient (se), GWAS sample size (n). Optional: variant_id or rsid, alleles (A1 and A2), minor allele frequency (maf).
- LD
is a the LD matrix corresponding to the same markers as in the stat dataframe
- variant_ids
is an optional character vector of variant ids or rsids, provided outside of the rss dataframe
- nit
is the number of iterations
- nburn
is the number of burnin iterations
- nthin
is the thinning parameter
- method
specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
- vg
is a scalar or matrix of genetic (co)variances
- vb
is a scalar or matrix of marker (co)variances
- ve
is a scalar or matrix of residual (co)variances
- ssg_prior
is a scalar or matrix of prior genetic (co)variances
- ssb_prior
is a scalar or matrix of prior marker (co)variances
- sse_prior
is a scalar or matrix of prior residual (co)variances
- lambda
is a vector or matrix of lambda values
- h2
is the trait heritability
- pi
is the proportion of markers in each marker variance class
- updateB
is a logical for updating marker (co)variances
- updateG
is a logical for updating genetic (co)variances
- updateE
is a logical for updating residual (co)variances
- updatePi
is a logical for updating pi
- adjustE
is a logical for adjusting residual variance
- nug
is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
- nub
is a scalar or vector of prior degrees of freedom for marker (co)variances
- nue
is a scalar or vector of prior degrees of freedom for prior residual (co)variances
- mask
is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
- ve_prior
is a scalar or matrix of prior residual (co)variances
- vg_prior
is a scalar or matrix of prior genetic (co)variances
- algorithm
is the algorithm to use. Should take on values ("mcmc", "em-mcmc")
- tol
is tolerance, i.e. convergence criteria used in gbayes
- nit_local
is the number of local iterations
- nit_global
is the number of global iterations
Value
Returns a list structure including
- bm
vector of posterior means for marker effects
- dm
vector of posterior means for marker inclusion probabilities
- vbs
scalar or vector (t) of posterior means for marker variances
- vgs
scalar or vector (t) of posterior means for genomic variances
- ves
scalar or vector (t) of posterior means for residual variances
- pis
vector of probabilites for each mcmc iteration
- pim
posterior distribution probabilities
- r
vector of residuals
- b
vector of estimates from the final mcmc iteration
- param
a list current parameters (same information as item listed above) used for restart of the analysis
- stat
matrix (mxt) of marker information and effects used for genomic risk scoring
- method
the method used
- mask
which loci were masked from analysis
- conv
dataframe of convergence metrics
- post
posterior parameter estimates
- ve
mean residual variance
- vg
mean genomic variance