Skip to contents

Given cross-validated predictions from multiple TWAS weight methods, learns non-negative combination coefficients (summing to 1) via constrained least squares. Returns ensemble weights and per-method performance metrics.

Usage

ensemble_weights(
  cv_results,
  Y,
  twas_weight_list = NULL,
  context_index = 1,
  solver = c("quadprog", "nnls", "lbfgsb", "glmnet"),
  alpha = 1
)

Arguments

cv_results

Output of twas_weights_cv, with $prediction (named list of method -> out-of-fold prediction matrix, keys like "susie_predicted"). For multi-dataset: a list of such objects.

Y

Observed outcome vector or matrix (samples x contexts). For multi-dataset: a list of vectors/matrices, one per dataset.

twas_weight_list

Optional named list of weight matrices from twas_weights, with keys like "susie_weights". Used to construct the final combined TWAS weight vector. For multi-dataset: a list of such lists (the first is used as the weight template).

context_index

Integer indicating which column of Y to use when Y is a matrix. Default is 1 (univariate).

solver

Character string specifying the optimization backend. One of "quadprog" (default), "nnls", "lbfgsb", or "glmnet". "quadprog" solves a constrained QP with sum-to-1 and non-negativity constraints. "nnls" uses non-negative least squares (Lawson-Hanson algorithm, as in SuperLearner) and normalizes post-hoc. "lbfgsb" uses optim(method = "L-BFGS-B") with non-negativity bounds and normalizes post-hoc. "glmnet" uses cv.glmnet with lower.limits = 0 for penalized non-negative regression, providing automatic method selection via regularization. All solvers fall back to equal weights on failure.

alpha

Elastic net mixing parameter, used only when solver = "glmnet". alpha = 1 (default) is lasso (sparse method selection), alpha = 0 is ridge, and intermediate values give elastic net.

Value

A list with components:

method_coef

Named numeric vector of combination coefficients (\(\zeta_k\)), non-negative and summing to 1. Names are method base names (e.g., "susie", "enet").

ensemble_twas_weights

Final combined weight vector \(w = \sum_k \zeta_k w_k\), or NULL if twas_weight_list is not provided. Returned as a vector for univariate Y, matrix otherwise.

method_performance

Named numeric vector of per-method R-squared computed from out-of-fold CV predictions. Preserved so users can still report individual method performance.

Details

This implements the stacked regression approach of SR-TWAS (Dai et al., Nature Communications, 2024, doi:10.1038/s41467-024-50983-w ). The ensemble provides a principled way to combine predictions from many TWAS weight methods without requiring the user to pick one method a priori or pay a multiple-testing penalty for running several.

For single-dataset usage, pass one twas_weights_cv() result directly. For multi-dataset ensemble (e.g., combining cell types or reference panels such as CUMC1 + MIT), pass a list of twas_weights_cv() results along with a list of observed Y vectors - this learns a single joint set of coefficients.

The stacked regression solves: $$\min_{\zeta} \|y - P\zeta\|^2 \quad \text{s.t.} \quad \zeta_k \geq 0,\ \sum_k \zeta_k = 1$$ where P is the \(n \times K\) matrix of out-of-fold predictions from K methods. Four solver backends are available: "quadprog" enforces both constraints during optimization; "nnls", "lbfgsb", and "glmnet" enforce non-negativity only, then normalize coefficients to sum to 1. The "glmnet" solver additionally applies regularization, which can produce sparse solutions (method selection). If any solver fails, the function falls back to equal weights with a warning.

Methods whose CV predictions have zero variance (e.g., when all weights are zero) are excluded from the optimization and assigned \(\zeta_k = 0\).

Predictions and Y are aligned by sample names (rownames) when available, rather than assuming positional order.

Examples

if (FALSE) { # \dontrun{
# After running twas_weights_pipeline with CV:
res <- twas_weights_pipeline(X, y, cv_folds = 5, weight_methods = methods)

ens <- ensemble_weights(
  cv_results = res$twas_cv_result,
  Y = y,
  twas_weight_list = res$twas_weights
)
ens$method_coef           # combination weights, sum to 1

# Multi-dataset ensemble (e.g., CUMC1 + MIT cell types):
ens_multi <- ensemble_weights(
  cv_results = list(res_cumc$twas_cv_result, res_mit$twas_cv_result),
  Y = list(y_cumc, y_mit),
  twas_weight_list = list(res_cumc$twas_weights, res_mit$twas_weights)
)
} # }