Impute Summary Statistics Using LD (RAISS)

This function is a part of the statistical library for SNP imputation from: https://gitlab.pasteur.fr/statistical-genetics/raiss/-/blob/master/raiss/stat_models.py It is R implementation of the imputation model described in the paper by Bogdan Pasaniuc, Noah Zaitlen, et al., titled "Fast and accurate imputation of summary statistics enhances evidence of functional enrichment", published in Bioinformatics in 2014.

Usage

raiss(
  ref_panel,
  known_zscores,
  LD_matrix = NULL,
  genotype_matrix = NULL,
  lamb = 0.01,
  rcond = 0.01,
  svd_tol = 1e-08,
  R2_threshold = 0.6,
  minimum_ld = 5,
  verbose = TRUE
)

Arguments

ref_panel: A data frame containing 'chrom', 'pos', 'variant_id', 'A1', and 'A2'.
known_zscores: A data frame containing 'chrom', 'pos', 'variant_id', 'A1', 'A2', and 'z' values.
LD_matrix: Either a square matrix or a list of matrices for LD blocks. Provide either LD_matrix or genotype_matrix, not both.
genotype_matrix: A centered and scaled genotype matrix (n x p) as an alternative to LD_matrix. Column order must match the variant order in ref_panel. When provided, the imputation uses an SVD-based approach that avoids forming the p x p LD matrix.
lamb: Regularization term added to the diagonal of the LD_matrix.
rcond: Threshold for filtering eigenvalues in the pseudo-inverse computation (only used with LD_matrix path).
svd_tol: Relative tolerance for filtering small singular values (only used with genotype_matrix path).
R2_threshold: R square threshold below which SNPs are filtered from the output.
minimum_ld: Minimum LD score threshold for SNP filtering.
verbose: Logical indicating whether to print progress information.

Value

A list containing filtered and unfiltered results, and filtered LD matrix (LD_mat is NULL when using genotype_matrix path).

Details

This function can process either a single LD matrix or a list of LD matrices for different blocks. For a list of matrices, it processes each block separately and combines the results. Alternatively, it can accept a genotype matrix X directly, avoiding the need to form the p x p LD matrix (memory and compute savings when n << p).