Intro to mr.mash
Chunming Liu and Anqi Wang
2026-03-01
Source:vignettes/mrmash-intro.Rmd
mrmash-intro.RmdThis vignette demonstrates how we use pecotmr package to
implement mr.mash to compute weights matrices for
performing TWAS analysis. We use simulated data of a gene (Y) with expression level in multi-conditions
in N \approx 400 individuals. We want
to first determine the imputability of a genotype matrix X_{N\times P} (P\approx3000) for gene expression level using
cross-validation, then compute the weight matrix for TWAS analysis
input.
Load data-set
X and Y matrices are simulated based on real-world eQTL data sets.
dim(X)
# [1] 400 3277
dim(Y)
# [1] 400 7The multitrait_data includes mixture prior matrices for
the scenario of cross-validation and without cross-validation.
prior_matrices_cv contains a list of data for each fold. In
this example, we have prepared the data for five-fold cross-validation
on the partition of the subjects.
names(multitrait_data)
# [1] "X" "Y" "prior_matrices"
# [4] "prior_matrices_cv" "gwas_sumstats"
str(prior_matrices)
# List of 14
# $ XtX : num [1:7, 1:7] 32.35 24.79 11.51 22.87 9.78 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ tFLASH_default : num [1:7, 1:7] 7.14 9.29 1.05 6.95 1.88 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_default_1: num [1:7, 1:7] 2.123 5.953 -0.694 4.455 1.656 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_default_2: num [1:7, 1:7] 0.788 2.653 0.315 0.793 0.333 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_default_3: num [1:7, 1:7] 34.26 -2.66 -2.41 2.31 1.66 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_default_4: num [1:7, 1:7] 0.1431 0.2182 0.0753 0.6721 0.0267 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ tFLASH_nonneg : num [1:7, 1:7] 56.909 3.505 1.767 -1.225 0.829 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_nonneg_1 : num [1:7, 1:7] 5.119 12.254 0.859 5.919 2.263 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_nonneg_2 : num [1:7, 1:7] 8.367 -0.204 1.176 1.662 0.35 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ FLASH_nonneg_3 : num [1:7, 1:7] 16.68 2.329 -0.428 1.758 1.341 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ PCA_1 : num [1:7, 1:7] 3.13 10.69 1.92 6.1 2.17 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ PCA_2 : num [1:7, 1:7] 0.0806 0.1373 0.6542 0.0108 -0.026 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ PCA_3 : num [1:7, 1:7] 1.2912 0.0777 0.1645 -0.0686 0.2428 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# $ tPCA : num [1:7, 1:7] 29.23 39.53 5.14 33.27 17.35 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
# .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...Cross-Validation to Select Imputable Genes
Note: The code below requires the
mr.mash.alpha package. These chunks are not executed during
the vignette build but are shown for reference.
A sample partition need to be provided if intend to perform
cross-validation. We will first partition samples for 5 fold cross
validation, which will be later used to compute univariate summary
statistics and prior grids, then to fit mr.mash for weight matrix
computation. This step computes sumstats,
prior_grid, and prior weights matrices on the fly to fit
mr.mash. Additionally, we fit mr.mash with
pre-computed mixture prior matrices to compute weight matrices.
A simple sample partition can be obtained as below:
set.seed(999)
sampleid <- rownames(X)
k_fold <- 5
Fold <-sample(rep(1:k_fold, length.out=length(sampleid)))
sample_partition <- data.frame(Sample=sampleid, Fold=Fold)To perform cross-validation, either fold or
sample_partitions must be provided. If both
fold and sample_partitions are provided, we
will partition samples by the provided sample_partitions.
Via cross-validation, we evaluate prediction accuracy to select genes
that are imputable of the gene expression. We will then use these
selected genes to perform TWAS analysis.
methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))
weight <- twas_weights_cv(X=X,Y=Y,fold=5, weight_methods=methods)In the case of providing sample_partitions.
methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))
weight_cv <- twas_weights_cv(X=X,Y=Y,
sample_partition=sample_partition, weight_methods=methods)
str(weight_cv)The wrapper function returns a list of results, including
sample_partition and prediction which is
predicted Y matrices, as well as cross-validation evaluation metrics for
each method across conditions, including corr, adj_rsq, adj_rsq_pval,
RMSE, MAE, and the time_elapsed which is the time spent on
computing weights and performance evaluation on the prediction accuracy
with 5 fold cross-validation.
Imputable genes were selected based on the prediction accuracy threshold of p-values determined by Bonferroni correction and selected RMSE cut-off values.
weight_cv$performanceWe will then compute weights for these selected genes for TWAS analysis.
Compute Weights for Imputable Genes
Here we compute weight for imputable genes without cross-validation.
weight_methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))
weight <- twas_weights(X=X,Y=Y,weight_methods=weight_methods)
head(weight[["mrmash_weights"]])FIXME: Anqi, please add some visualization and comparison with univariate results