Skip to contents

Wraps bigsnpr::snp_clumping with the boilerplate of wrapping a numeric dosage matrix into a bigstatsr::FBM.code256 object and of handling the common pitfall of a single-variant input.

Usage

ld_clump_by_score(
  X,
  score,
  chr,
  pos,
  r2 = 0.2,
  window_kb = 100/r2,
  verbose = FALSE
)

Arguments

X

Numeric matrix of 0/1/2 allele dosages, n rows by p variants. Column names are expected to be variant IDs but are not required.

score

Numeric vector of length ncol(X). Higher values favour retention during clumping (e.g. -log10 p, |Z|, MAF). May be NULL, in which case bigsnpr falls back to minor allele frequency computed from X.

chr

Integer or character vector of length ncol(X) giving the chromosome for each variant.

pos

Integer vector of length ncol(X) giving the base-pair position for each variant.

r2

Numeric in (0, 1]. r-squared threshold for clumping (variants within window_kb whose r2 exceeds r2 and have lower score are removed). Default 0.2.

window_kb

Numeric. Window size in kilobases. Default is 100 / r2, matching the common "ld-clump size = 100/r2" heuristic used in many GWAS pipelines.

verbose

Logical. If TRUE, print the number of retained variants. Default FALSE.

Value

An integer vector of indices (into X columns) kept after clumping. For a single-column X, returns 1L.

Examples

if (FALSE) { # \dontrun{
  set.seed(1)
  n <- 500; p <- 20
  X <- matrix(rbinom(n * p, 2, 0.3), n, p)
  colnames(X) <- paste0("chr1:", seq_len(p) * 1000, ":A:G")
  s <- runif(p)
  chr <- rep(1L, p); pos <- seq_len(p) * 1000L
  keep <- ld_clump_by_score(X, score = s, chr = chr, pos = pos, r2 = 0.2)
} # }