Skip to contents

Imputes missing values in a numeric matrix by iteratively training per-column XGBoost models on observed entries and predicting missing ones. Columns that are entirely missing are removed. Initial imputation uses column means.

Usage

xgboost_imputation(
  data,
  maxiter = 10L,
  max_depth = 2L,
  nrounds = 50L,
  decreasing = FALSE,
  num_workers = 1L,
  verbose = TRUE
)

Arguments

data

Numeric matrix with missing values (NA).

maxiter

Maximum number of imputation iterations (default 10).

max_depth

Maximum tree depth for XGBoost (default 2).

nrounds

Number of boosting rounds per variable (default 50).

decreasing

Logical. If TRUE, impute variables with most missing values first. Default FALSE (fewest missing first).

num_workers

Number of parallel workers for BiocParallel. Default 1 (sequential).

verbose

Logical, print progress (default TRUE).

Value

The imputed matrix with the same dimensions as the input (minus any all-NA columns).