Performs LD pruning using one of two backends. The default "hclust"
backend computes the full correlation matrix, builds a single-linkage
hierarchical clustering on the distance (1 - |cor|), and keeps one
representative column per cluster. The "snprelate" backend delegates
to SNPRelate::snpgdsLDpruning, which performs a sliding-window
greedy prune directly on a temporary GDS file.
Usage
ld_prune_by_correlation(
X,
cor_thres = 0.8,
backend = c("hclust", "snprelate"),
verbose = FALSE
)Arguments
- X
Numeric matrix. Columns are the variables to prune (typically SNP genotype dosages); rows are observations.
- cor_thres
Numeric in (0, 1). Absolute correlation threshold. Columns whose pairwise |cor| exceeds this are grouped; one survivor is kept per group. Default 0.8.
- backend
Character, one of
"hclust"(default) or"snprelate". Controls the pruning algorithm:"hclust"Uses the internal hierarchical-clustering approach with
Rfast::cora(if available) or basecor()."snprelate"Requires SNPRelate and gdsfmt. Creates a temporary GDS file and runs
SNPRelate::snpgdsLDpruning(method = "corr").
- verbose
Logical. If TRUE, print progress messages. Default FALSE.