Skip to contents

Given a candidate predictor matrix X and an optional unnamed covariate matrix C, builds the design [1, X, C] and removes rank-deficient columns from X until the design has full column rank. Rank-deficient columns are identified via the pivot of qr([1, X, C]). On each iteration, one problematic column is dropped using drop_collinear_columns. If iterative pruning does not achieve full rank, falls back to ld_prune_by_correlation at a descending sequence of correlation thresholds.

Usage

enforce_design_full_rank(
  X,
  C,
  strategy = c("correlation", "variance", "response_correlation"),
  response = NULL,
  max_iterations = 300L,
  corr_thresholds = seq(0.75, 0.5, by = -0.05),
  verbose = FALSE
)

Arguments

X

Numeric matrix with column names (the predictors subject to pruning).

C

Numeric matrix of covariates (can be unnamed) that will be kept. Pass NULL or a zero-column matrix when there are no covariates.

strategy

Passed through to drop_collinear_columns.

response

Passed through to drop_collinear_columns when strategy = "response_correlation".

max_iterations

Integer. Hard cap on the iterative-prune loop. Default 300.

corr_thresholds

Numeric vector of |cor| thresholds used for the ld_prune_by_correlation fallback, tried in order. Default seq(0.75, 0.5, by = -0.05).

verbose

Logical. If TRUE, print per-iteration progress. Default FALSE.

Value

The pruned predictor matrix X (covariates C are not modified).

Examples

set.seed(1)
X <- matrix(rnorm(100 * 4), 100, 4)
X[, 4] <- X[, 1] + X[, 2]          # rank-deficient
colnames(X) <- c("a", "b", "c", "d")
C <- matrix(rnorm(100), 100, 1)
X2 <- enforce_design_full_rank(X, C, strategy = "variance")
qr(cbind(1, X2, C))$rank == ncol(cbind(1, X2, C))
#> [1] TRUE