This function performs weights computation for Transcriptome-Wide Association Study (TWAS) incorporating various steps such as filtering variants by linkage disequilibrium reference panel variants, fitting models using SuSiE and other methods, and calculating TWAS weights and predictions. Optionally, it can perform cross-validation for TWAS weights.
Usage
twas_weights_pipeline(
X,
y,
susie_fit = NULL,
fitted_models = NULL,
cv_folds = 5,
sample_partition = NULL,
weight_methods = "default",
max_cv_variants = -1,
cv_threads = 1,
cv_weight_methods = NULL,
ensemble = TRUE,
ensemble_r2_threshold = 0.01,
ensemble_solver = "quadprog",
ensemble_alpha = 1,
estimate_pi = TRUE,
verbose = 1
)Arguments
- X
A matrix of genotype data where rows represent samples and columns represent genetic variants.
- y
A vector of phenotype measurements for each sample.
- susie_fit
An object returned by the SuSiE function, containing the SuSiE model fit.
- fitted_models
Optional named list of fitted fine-mapping models, such as
list(susie = susie_fit, susie_inf = susie_inf_fit).- cv_folds
The number of folds to use for cross-validation. Set to 0 to skip cross-validation. Defaults to 5.
- sample_partition
Optional data frame with Sample and Fold columns for cross-validation. If NULL, a random partition is generated.
- weight_methods
List of methods to use to compute weights for TWAS; along with their parameters.
- max_cv_variants
The maximum number of variants to be included in cross-validation. Defaults to -1 which means no limit.
- cv_threads
The number of threads to use for parallel computation in cross-validation. Defaults to 1.
- cv_weight_methods
List of methods to use for cross-validation. If NULL, uses the same methods as weight_methods.
- ensemble
Logical. If TRUE and cv_folds > 1, learn ensemble combination weights via stacked regression (SR-TWAS). Requires at least two individual methods to have been run and to pass the R-squared cutoff. Defaults to TRUE.
- ensemble_r2_threshold
Minimum cross-validated R-squared for an individual method to be included in the ensemble. Methods below this threshold are excluded. Defaults to 0.01.
- ensemble_solver
Character string specifying the optimization backend for ensemble learning. One of
"quadprog","nnls","lbfgsb", or"glmnet". Passed toensemble_weights. Defaults to"quadprog".- ensemble_alpha
Elastic net mixing parameter, used only when
ensemble_solver = "glmnet". Defaults to 1 (lasso).- estimate_pi
If TRUE, estimate spike-and-slab sparsity from mr.ash before running Bayesian alphabet methods that need inclusion probabilities.
- verbose
Integer controlling verbosity level: 0 = suppress all messages, 1 = show pecotmr messages but suppress external package messages (default), 2 = show all messages including those from external packages.