Skip to contents

This vignette demonstrates how we use pecotmr package to implement mr.mash to compute weights matrices for performing TWAS analysis. We use simulated data of a gene (Y) with expression level in multi-conditions in N \approx 400 individuals. We want to first determine the imputability of a genotype matrix X_{N\times P} (P\approx3000) for gene expression level using cross-validation, then compute the weight matrix for TWAS analysis input.

Set up your environment

Load the pecotmr package.

Load data-set

data(multitrait_data)
attach(multitrait_data)

X and Y matrices are simulated based on real-world eQTL data sets.

dim(X)
# [1]  400 3277
dim(Y)
# [1] 400   7

The multitrait_data includes mixture prior matrices for the scenario of cross-validation and without cross-validation. prior_matrices_cv contains a list of data for each fold. In this example, we have prepared the data for five-fold cross-validation on the partition of the subjects.

names(multitrait_data)
# [1] "X"                 "Y"                 "prior_matrices"   
# [4] "prior_matrices_cv" "gwas_sumstats"
str(prior_matrices)
# List of 14
#  $ XtX            : num [1:7, 1:7] 32.35 24.79 11.51 22.87 9.78 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ tFLASH_default : num [1:7, 1:7] 7.14 9.29 1.05 6.95 1.88 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_default_1: num [1:7, 1:7] 2.123 5.953 -0.694 4.455 1.656 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_default_2: num [1:7, 1:7] 0.788 2.653 0.315 0.793 0.333 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_default_3: num [1:7, 1:7] 34.26 -2.66 -2.41 2.31 1.66 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_default_4: num [1:7, 1:7] 0.1431 0.2182 0.0753 0.6721 0.0267 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ tFLASH_nonneg  : num [1:7, 1:7] 56.909 3.505 1.767 -1.225 0.829 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_nonneg_1 : num [1:7, 1:7] 5.119 12.254 0.859 5.919 2.263 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_nonneg_2 : num [1:7, 1:7] 8.367 -0.204 1.176 1.662 0.35 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ FLASH_nonneg_3 : num [1:7, 1:7] 16.68 2.329 -0.428 1.758 1.341 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ PCA_1          : num [1:7, 1:7] 3.13 10.69 1.92 6.1 2.17 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ PCA_2          : num [1:7, 1:7] 0.0806 0.1373 0.6542 0.0108 -0.026 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ PCA_3          : num [1:7, 1:7] 1.2912 0.0777 0.1645 -0.0686 0.2428 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#  $ tPCA           : num [1:7, 1:7] 29.23 39.53 5.14 33.27 17.35 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...
#   .. ..$ : chr [1:7] "cond_1" "cond_2" "cond_3" "cond_4" ...

Cross-Validation to Select Imputable Genes

Note: The code below requires the mr.mash.alpha package. These chunks are not executed during the vignette build but are shown for reference.

A sample partition need to be provided if intend to perform cross-validation. We will first partition samples for 5 fold cross validation, which will be later used to compute univariate summary statistics and prior grids, then to fit mr.mash for weight matrix computation. This step computes sumstats, prior_grid, and prior weights matrices on the fly to fit mr.mash. Additionally, we fit mr.mash with pre-computed mixture prior matrices to compute weight matrices.

A simple sample partition can be obtained as below:

sampleid <- rownames(X)
k_fold <- 5
Fold <-sample(rep(1:k_fold, length.out=length(sampleid)))
sample_partition <- data.frame(Sample=sampleid, Fold=Fold)

To perform cross-validation, either fold or sample_partitions must be provided. If both fold and sample_partitions are provided, we will partition samples by the provided sample_partitions. Via cross-validation, we evaluate prediction accuracy to select genes that are imputable of the gene expression. We will then use these selected genes to perform TWAS analysis.

methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))
weight <- twas_weights_cv(X=X,Y=Y,fold=5, weight_methods=methods)

In the case of providing sample_partitions.

methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))
weight_cv <- twas_weights_cv(X=X,Y=Y,
                          sample_partition=sample_partition, weight_methods=methods)

str(weight_cv)

The wrapper function returns a list of results, including sample_partition and prediction which is predicted Y matrices, as well as cross-validation evaluation metrics for each method across conditions, including corr, adj_rsq, adj_rsq_pval, RMSE, MAE, and the time_elapsed which is the time spent on computing weights and performance evaluation on the prediction accuracy with 5 fold cross-validation.

Imputable genes were selected based on the prediction accuracy threshold of p-values determined by Bonferroni correction and selected RMSE cut-off values.

weight_cv$performance

We will then compute weights for these selected genes for TWAS analysis.

Compute Weights for Imputable Genes

Here we compute weight for imputable genes without cross-validation.

weight_methods <- list(mrmash_weights = list(data_driven_prior_matrices=prior_matrices))

weight <- twas_weights(X=X,Y=Y,weight_methods=weight_methods)

head(weight[["mrmash_weights"]])

FIXME: Anqi, please add some visualization and comparison with univariate results

detach(multitrait_data)
rm(multitrait_data)