Given cross-validated predictions from multiple TWAS weight methods, learns non-negative combination coefficients (summing to 1) via constrained least squares. Returns ensemble weights and per-method performance metrics.
Usage
ensemble_weights(
cv_results,
Y,
twas_weight_list = NULL,
context_index = 1,
solver = c("quadprog", "nnls", "lbfgsb", "glmnet"),
alpha = 1
)Arguments
- cv_results
Output of
twas_weights_cv, with$prediction(named list of method -> out-of-fold prediction matrix, keys like"susie_predicted"). For multi-dataset: a list of such objects.- Y
Observed outcome vector or matrix (samples x contexts). For multi-dataset: a list of vectors/matrices, one per dataset.
- twas_weight_list
Optional named list of weight matrices from
twas_weights, with keys like"susie_weights". Used to construct the final combined TWAS weight vector. For multi-dataset: a list of such lists (the first is used as the weight template).- context_index
Integer indicating which column of Y to use when Y is a matrix. Default is 1 (univariate).
- solver
Character string specifying the optimization backend. One of
"quadprog"(default),"nnls","lbfgsb", or"glmnet"."quadprog"solves a constrained QP with sum-to-1 and non-negativity constraints."nnls"uses non-negative least squares (Lawson-Hanson algorithm, as in SuperLearner) and normalizes post-hoc."lbfgsb"usesoptim(method = "L-BFGS-B")with non-negativity bounds and normalizes post-hoc."glmnet"usescv.glmnetwithlower.limits = 0for penalized non-negative regression, providing automatic method selection via regularization. All solvers fall back to equal weights on failure.- alpha
Elastic net mixing parameter, used only when
solver = "glmnet".alpha = 1(default) is lasso (sparse method selection),alpha = 0is ridge, and intermediate values give elastic net.
Value
A list with components:
- method_coef
Named numeric vector of combination coefficients (\(\zeta_k\)), non-negative and summing to 1. Names are method base names (e.g.,
"susie","enet").- ensemble_twas_weights
Final combined weight vector \(w = \sum_k \zeta_k w_k\), or NULL if
twas_weight_listis not provided. Returned as a vector for univariate Y, matrix otherwise.- method_performance
Named numeric vector of per-method R-squared computed from out-of-fold CV predictions. Preserved so users can still report individual method performance.
Details
This implements the stacked regression approach of SR-TWAS (Dai et al., Nature Communications, 2024, doi:10.1038/s41467-024-50983-w ). The ensemble provides a principled way to combine predictions from many TWAS weight methods without requiring the user to pick one method a priori or pay a multiple-testing penalty for running several.
For single-dataset usage, pass one twas_weights_cv() result directly.
For multi-dataset ensemble (e.g., combining cell types or reference panels
such as CUMC1 + MIT), pass a list of twas_weights_cv() results along
with a list of observed Y vectors - this learns a single joint set of
coefficients.
The stacked regression solves:
$$\min_{\zeta} \|y - P\zeta\|^2 \quad \text{s.t.} \quad \zeta_k \geq 0,\ \sum_k \zeta_k = 1$$
where P is the \(n \times K\) matrix of out-of-fold predictions from K
methods. Four solver backends are available: "quadprog" enforces
both constraints during optimization; "nnls", "lbfgsb", and
"glmnet" enforce non-negativity only, then normalize coefficients
to sum to 1. The "glmnet" solver additionally applies
regularization, which can produce sparse solutions (method selection).
If any solver fails, the function falls back to equal weights with a
warning.
Methods whose CV predictions have zero variance (e.g., when all weights are zero) are excluded from the optimization and assigned \(\zeta_k = 0\).
Predictions and Y are aligned by sample names (rownames) when available, rather than assuming positional order.
Examples
if (FALSE) { # \dontrun{
# After running twas_weights_pipeline with CV:
res <- twas_weights_pipeline(X, y, cv_folds = 5, weight_methods = methods)
ens <- ensemble_weights(
cv_results = res$twas_cv_result,
Y = y,
twas_weight_list = res$twas_weights
)
ens$method_coef # combination weights, sum to 1
# Multi-dataset ensemble (e.g., CUMC1 + MIT cell types):
ens_multi <- ensemble_weights(
cv_results = list(res_cumc$twas_cv_result, res_mit$twas_cv_result),
Y = list(y_cumc, y_mit),
twas_weight_list = list(res_cumc$twas_weights, res_mit$twas_weights)
)
} # }