Skip to contents

This function performs weights computation for Transcriptome-Wide Association Study (TWAS) with fitting models using mvSuSiE and mr.mash with the option of using a limited number of variants selected from mvSuSiE fine-mapping for computing TWAS weights with cross-validation.

Usage

multivariate_analysis_pipeline(
  X,
  Y,
  maf,
  X_variance = NULL,
  other_quantities = list(),
  imiss_cutoff = 1,
  maf_cutoff = 0.01,
  xvar_cutoff = 0.01,
  ld_reference_meta_file = NULL,
  pip_cutoff_to_skip = 0,
  max_L = -1,
  data_driven_prior_matrices = NULL,
  data_driven_prior_matrices_cv = NULL,
  data_driven_prior_weights_cutoff = 1e-04,
  canonical_prior_matrices = TRUE,
  mrmash_max_iter = 5000,
  mvsusie_max_iter = 200,
  signal_cutoff = 0.025,
  coverage = c(0.95, 0.7, 0.5),
  min_abs_corr = 0.8,
  twas_weights = TRUE,
  sample_partition = NULL,
  max_cv_variants = -1,
  cv_folds = 5,
  cv_threads = 1,
  verbose = 0
)

Arguments

X

A matrix of genotype data where rows represent samples and columns represent genetic variants.

Y

A matrix of phenotype measurements, representing samples and columns represent conditions.

maf

A list of vectors for minor allele frequencies for each variant in X.

ld_reference_meta_file

An optional path to a file containing linkage disequilibrium reference data. If provided, variants in X are filtered based on this reference.

pip_cutoff_to_skip

Cutoff value for skipping conditions based on PIP values. Default is 0.

max_L

The maximum number of components in mvSuSiE. Default is 30.

data_driven_prior_matrices

A list of data-driven covariance matrices for mr.mash weights.

data_driven_prior_matrices_cv

A list of data-driven covariance matrices for mr.mash weights in cross-validation.

data_driven_prior_weights_cutoff

The minimum weight for prior covariance matrices. Default is 1e-4.

canonical_prior_matrices

If set to TRUE, will compute canonical covariance matrices and add them into the prior covariance matrix list in mrmash_wrapper. Default is TRUE.

mrmash_max_iter

The maximum number of iterations for mr.mash. Default is 5000.

mvsusie_max_iter

The maximum number of iterations for mvSuSiE. Default is 200.

signal_cutoff

Cutoff value for signal identification in PIP values for susie_post_processor. Default is 0.025.

coverage

A vector of coverage probabilities, with the first element being the primary coverage and the rest being secondary coverage probabilities for credible set refinement. Defaults to c(0.95, 0.7, 0.5).

min_abs_corr

Minimum absolute correlation for credible set purity filtering. Default is 0.8, which is stricter than the susieR default of 0.5.

sample_partition

Sample partition for cross-validation.

max_cv_variants

The maximum number of variants to be included in cross-validation. Defaults to -1 which means no limit.

cv_folds

The number of folds to use for cross-validation. Set to 0 to skip cross-validation. Default is 5.

cv_threads

The number of threads to use for parallel computation in cross-validation. Defaults to 1.

verbose

Verbosity level. Default is 0.

min_cv_maf

The minimum minor allele frequency for variants to be included in cross-validation. Default is 0.05.

Value

A list containing the multivariate analysis results.

Examples

library(pecotmr)

data(multitrait_data)
attach(multitrait_data)

data_driven_prior_matrices <- list(
  U = prior_matrices,
  w = rep(1 / length(prior_matrices), length(prior_matrices))
)

data_driven_prior_matrices_cv <- lapply(prior_matrices_cv, function(x) {
  list(U = x, w = rep(1 / length(x), length(x)))
})

result <- multivariate_analysis_pipeline(
  X = multitrait_data$X,
  Y = multitrait_data$Y,
  maf = colMeans(multitrait_data$X),
  X_variance = multitrait_data$X_variance,
  max_L = 10,
  ld_reference_meta_file = NULL,
  max_cv_variants = -1,
  pip_cutoff_to_skip = 0,
  signal_cutoff = 0.025,
  data_driven_prior_matrices = data_driven_prior_matrices,
  data_driven_prior_matrices_cv = data_driven_prior_matrices_cv,
  canonical_prior_matrices = TRUE,
  sample_partition = NULL,
  cv_folds = 5,
  cv_threads = 2,
  data_driven_prior_weights_cutoff = 1e-4
)
#> Error in multivariate_analysis_pipeline(X = multitrait_data$X, Y = multitrait_data$Y,     maf = colMeans(multitrait_data$X), X_variance = multitrait_data$X_variance,     max_L = 10, ld_reference_meta_file = NULL, max_cv_variants = -1,     pip_cutoff_to_skip = 0, signal_cutoff = 0.025, data_driven_prior_matrices = data_driven_prior_matrices,     data_driven_prior_matrices_cv = data_driven_prior_matrices_cv,     canonical_prior_matrices = TRUE, sample_partition = NULL,     cv_folds = 5, cv_threads = 2, data_driven_prior_weights_cutoff = 1e-04): maf values must be between 0 and 1