Multi-functional, multi-outcome SuSiE

Fit the multi-functional, multi-outcome Sum of Single Effects (mfSuSiE) regression model. Each outcome m carries a per-sample functional response Y_m of length T_m; T_m may differ across outcomes, and either T_m = 1 (univariate response) or T_m > 1 (functional response on a regular grid) is supported. mfsusieR runs a per-outcome wavelet decomposition, then dispatches into susieR's IBSS workhorse with mfsusieR-specific S3 methods that handle the per-(scale, outcome) mixture-of-normals prior and residual variance.

Usage

mfsusie(
  X,
  Y,
  pos = NULL,
  L = 20,
  prior_variance_grid = NULL,
  prior_variance_scope = c("per_outcome", "per_scale"),
  null_prior_weight = 2,
  cross_outcome_prior = NULL,
  prior_weights = NULL,
  residual_variance = NULL,
  residual_variance_scope = c("per_outcome", "per_scale"),
  standardize = TRUE,
  intercept = TRUE,
  max_iter = 50,
  tol = 1e-04,
  coverage = 0.95,
  min_abs_corr = 0.5,
  L_greedy = 5,
  greedy_lbf_cutoff = 0.1,
  estimate_prior_variance = TRUE,
  convergence_method = c("pip", "elbo"),
  pip_stall_window = 5L,
  estimate_residual_variance = TRUE,
  verbose = FALSE,
  track_fit = FALSE,
  max_padded_log2 = 10,
  wavelet_basis_order = 10,
  wavelet_family = "DaubLeAsymm",
  wavelet_magnitude_cutoff = 0,
  wavelet_qnorm = TRUE,
  control_mixsqp = NULL,
  mixsqp_null_penalty = 0.1,
  mixsqp_alpha_eps = 1e-06,
  model_init = NULL,
  small_sample_correction = FALSE,
  attach_smoothing_inputs = TRUE
)

Arguments

X: numeric matrix n x p of covariates (e.g., genotype dosages).
Y: list of length M; each element a numeric matrix n x T_m (or a length-n vector when T_m = 1).
pos: optional list of length M; each element a numeric vector of length T_m recording the sampling positions for that outcome. Defaults to seq_len(T_m) per outcome.
L: integer, maximum number of single effects to fit. Default 20. With the default L_greedy = 5, the IBSS workhorse grows the effect count from 5 toward this cap in steps of 5 until the fit saturates (min(lbf) < greedy_lbf_cutoff); set L_greedy = NULL to fit the full L effects in one round.
prior_variance_grid: optional length-K vector of mixture variances for the scale-mixture-of-normals prior. When NULL, the data-driven path (compute_marginal_bhat_shat + ash) selects the grid per outcome.
prior_variance_scope: "per_outcome" (default) or "per_scale". Controls whether the mixture-weight matrix pi_V[[m]] collapses across scales (one weight vector per outcome) or keeps a separate row per wavelet scale. The per-outcome scope is the default because it is roughly S_m-fold cheaper per IBSS iter (M mixsqp calls per iter instead of M*S_m); switch to "per_scale" only when you need scale-specific mixture weights for power on shape- varying signals.
null_prior_weight: numeric, weight on the null prior component. Default 2.
cross_outcome_prior: optional object that controls how the per-outcome log-Bayes factors are combined into the joint log-Bayes factor used by the SER step. Each IBSS effect first computes a length-p log-BF per outcome m (one entry per variable, summed over wavelet positions); these are then reduced to a single length-p joint log-BF before the posterior alpha is taken as a softmax over variables. Defaults to outcome independence (cross_outcome_prior_independent()), under which the joint log-BF is the elementwise sum of the per-outcome log-BFs; equivalently, the joint BF is the product of per-outcome BFs. The argument is an extension point for non-independence combiners (e.g. modality-covariance priors) registered as S3 methods on combine_outcome_lbfs; only the independence combiner is shipped, so most users leave this NULL.
prior_weights: optional length-p numeric vector, the variable-selection prior. Defaults to uniform 1/p.
residual_variance: optional list of length M, initial residual variance per outcome. Defaults to the per-outcome sample variance.
residual_variance_scope: "per_outcome" (default) or "per_scale". Controls the sigma2 update shape: a single scalar per outcome (default) or one scalar per wavelet scale per outcome.
standardize: logical, scale X columns to unit variance.
intercept: logical, center X columns to mean zero.
max_iter: integer, maximum IBSS iterations. Default 50.
tol: numeric, ELBO change tolerance for convergence.
coverage: numeric in (0, 1), credible-set coverage.
min_abs_corr: numeric, minimum variable-to-variable correlation inside a credible set (CS purity threshold).
L_greedy: integer or NULL. When non-NULL (default 5), run susieR's greedy outer loop, growing the effect count from L_greedy up to L in linear steps until the fit saturates (min(lbf) < greedy_lbf_cutoff). Set NULL to fit the full L effects in one round.
greedy_lbf_cutoff: numeric saturation threshold for the greedy outer loop. Default 0.1.
estimate_prior_variance: logical. When TRUE (default), run the per-effect empirical-Bayes update of the mixture weights pi_V[[m]] per (outcome, scale) using the mixsqp solver. When FALSE, hold the prior fixed at the initial prior_variance_grid / null_prior_weight; useful when collapsing mfsusie() to susieR::susie() for sanity checks.
convergence_method: one of "pip" (default) or "elbo". Selects the IBSS convergence criterion. "pip" stops when max(abs(prev_alpha - alpha)) falls below tol (PIP-difference convergence) with stall detection – the default because alpha is robust to the small per-iteration ELBO oscillations that arise from mixsqp's approximate M-step (Generalized EM residual: ELBO(t+1) >= ELBO(t) - O(eps_mixsqp * L * M * S_m), Neal & Hinton 1998). "elbo" stops when the change in ELBO falls below tol; the per-iteration ELBO is a coherent variational free energy thanks to the get_objective.mfsusie post-iteration KL refresh (refresh_lbf_kl.mf_individual) which evaluates per- effect KL[l] against the iter-final pi_V rather than the pi_V state at the moment effect l was updated.
pip_stall_window: integer. Number of consecutive iterations without PIP-difference improvement after which the PIP-based convergence path declares convergence even when tol has not been reached. Default 5. Only consulted when convergence_method = "pip" (or when ELBO produces non-finite values and the PIP fallback fires).
estimate_residual_variance: logical. When TRUE (default), update sigma2 per IBSS iteration via update_variance_components. When FALSE, hold sigma2 at the initial value across iterations; useful for sensitivity analysis.
verbose: logical.
track_fit: logical, retain a per-iteration tracking list on the fit. Default FALSE.
max_padded_log2: integer, log2 cap on the post-remap grid length per outcome. Default 10.
wavelet_basis_order: integer; selects the wavelet basis member within wavelet_family. For Daubechies families this equals the number of vanishing moments (higher = smoother, longer-support filter). Forwarded to wavethresh::wd's filter.number; see also filter.select. Default 10.
wavelet_family: character; selects the wavelet family. Forwarded to wavethresh::wd's family. Default "DaubLeAsymm" (Daubechies least-asymmetric, a.k.a. Symmlet).
wavelet_magnitude_cutoff: non-negative numeric. After the wavelet decomposition, each wavelet column t of the response has its median absolute value median(|Y_wd[, t]|) compared against this cutoff; columns at or below are zeroed in the data and treated as Bhat = 0, Shat = 1 at every IBSS iteration. Useful for sparse-coverage assays where high-frequency wavelet coefficients are dominated by zeros. Default 0; only columns with a strictly-zero absolute-value median are masked.
wavelet_qnorm: logical. When TRUE (default), applies a column-wise rank-based normal quantile transform to the wavelet-domain response before the IBSS loop. Useful for non-Gaussian wavelet coefficients arising from heavy-tailed assays.
control_mixsqp: optional named list of mixsqp control arguments forwarded to the per-(outcome, scale) M-step.
mixsqp_null_penalty: numeric, per-coefficient pseudo-count on the null component of the mixsqp M-step. The total null pseudo-count fed to mixsqp is mixsqp_null_penalty * idx_size, versus a total data weight of idx_size from the zeta- weighted variable rows; the ratio mixsqp_null_penalty : 1 is therefore the null:data weight ratio at the M-step. Default 0.1 corresponds to a 10% null:data ratio, anchored to ashr::ash.workhorse(nullweight = 10) over a typical idx_size = 128 (ratio ~0.08). For multi-outcome fits the penalty is scaled by the number of outcomes M so that the null:data balance stays invariant as the per-effect variable posterior concentrates linearly in M; single-outcome fits are unchanged. Default 0.1.
mixsqp_alpha_eps: numeric, threshold below which a variable's per-effect posterior alpha[l, j] is dropped from the mixsqp M-step input. The truncation error on the M-step gradient is bounded by sum_{j outside} alpha_j * max_k(L_jk), well under floating-point precision for typical concentrated SuSiE posteriors. Set to 0 to use every variable regardless of alpha[l, j]. Default 1e-6.
model_init: optional mfsusie fit object from a prior call. When supplied, the IBSS loop is seeded from the supplied alpha, mu, mu2, KL, lbf, V, pi_V, sigma2, fitted, and intercept rather than the cold- start zero state. The supplied fit must have the same L as the requested L; otherwise the call errors. Useful for resuming a long-running fit after a per-call iteration budget. Default NULL (cold start).
small_sample_correction: logical. When TRUE, replaces the per-variable Wakefield Normal marginal Bayes factor in the SER step with a Johnson 2005 scaled Student-t marginal with df = n - 1 degrees of freedom. Useful for small sample sizes where the Wakefield approximation under-propagates residual-variance uncertainty into the per-variable BF, inflating PIPs at null variants. The correction acts on variable selection probabilities only; posterior moments given inclusion are unchanged. Default FALSE.
attach_smoothing_inputs: logical. When TRUE (default), the fit carries Y_grid (post-remap position-space Y) and X_eff (per-effect alpha-weighted aggregate of X) so that mf_post_smooth(fit) runs without re-passing the data. Set FALSE to drop these and call mf_post_smooth(fit, X = X, Y = Y, ...) instead; useful when sharing fits where the per-individual data should not travel with the fit.

Value

A list of class c("mfsusie", "susie") carrying:

alpha: L x p posterior inclusion probabilities.
mu, mu2: list[L] of list[M] of p x T_basis[m] matrices: per-effect, per-outcome posterior mean and second moment in the wavelet domain.
pi_V: list[M] of S_m x K mixture-weight matrices.
G_prior: list[M] of list[S_m] ash-shaped prior records (mutated by the IBSS loop).
sigma2: list[M]. Element m is either a scalar (when residual_variance_scope = "per_outcome", the default) or a length-S_m vector of per-scale residual variances (when residual_variance_scope = "per_scale"). The richer per-scale shape is mfsusieR-specific; downstream code reading sigma2 should handle both.
elbo: numeric vector of ELBO values per iteration.
niter, converged: IBSS termination metadata.
pip: length-p posterior inclusion probabilities.
sets: credible sets via susie_get_cs.
fitted: list[M] of running per-outcome fits in the wavelet domain. Used by model_init to warm-start a follow-up call.
Y_grid: list[M], the post-remap position-space response on the padded grid. Attached when attach_smoothing_inputs = TRUE (default).
X_eff: list[L] of per-effect alpha-weighted X aggregates. Attached when attach_smoothing_inputs = TRUE.
lbf_variable_outcome: L x p x M array of per-(effect, variant, outcome) log Bayes factors. Parallels lbf_variable (which is L x p, the joint composite summed across scales and outcomes); lbf_variable_outcome keeps the M axis intact. Always attached; consumed by susie_post_outcome_configuration(fit, by = "outcome").

References

Manuscript: methods/online_method.tex (mfsusieR algorithm).