ColocBoost analysis with optional pipeline QC
Source:R/colocboost_pipeline.R
colocboost_analysis.RdThis wrapper keeps the direct colocboost() argument surface. All
ColocBoost inputs and model parameters are supplied through .... When
no QC options are requested, the call is passed directly to
colocboost(). When QC options are requested, the wrapper
inspects named X/Y and/or sumstat/LD/X_ref
arguments in ..., runs the relevant reusable QC step, and then calls
ColocBoost on the cleaned inputs. If the required named inputs are not
available, QC is skipped with a warning and the original ColocBoost call is
used.
Usage
colocboost_analysis(
...,
missing_rate_thresh = NULL,
maf_cutoff = NULL,
xvar_cutoff = NULL,
ld_reference_meta_file = NULL,
pip_cutoff_to_skip_ind = NULL,
keep_indel = TRUE,
pip_cutoff_to_skip_sumstat = NULL,
qc_method = NULL,
impute = FALSE,
impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
LD_reference_info = NULL,
variant_convention = c("A2_A1", "A1_A2")
)Arguments
- ...
Arguments passed to
colocboost(), including data inputs such asX,Y,sumstat,LD,X_ref,dict_YX,dict_sumstatLD,outcome_names, and all ColocBoost model/post-processing options. QC can only inspect inputs that are supplied by name.- missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind
Individual-level QC controls. If all are
NULL, individual-level QC is not run.- keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts
Summary-statistic QC controls.
qc_method = "none"runs basic allele harmonization without LD-mismatch outlier detection. Imputation is only run whenimpute = TRUE.- LD_reference_info
Optional LD reference information for summary-statistic QC. This is only needed when the native
LDmatrix row/column names orX_refcolumn names are missing or are not parseable genomic variant IDs. It can be a .bim/.pvar/.pvar.zst file path, a data.frame with variant metadata, or aload_LD_matrix()result. This is a QC-only argument and is not passed tocolocboost().- variant_convention
Allele order used by native ColocBoost-style
sumstat$variantand LD/X_ref names when deriving QC inputs:"A2_A1"for pecotmr canonicalchr:pos:A2:A1, or"A1_A2"forchr:pos:A1:A2.
Details
Use colocboost_analysis() the same way you would use
colocboost(): pass the native ColocBoost arguments by
name, for example X, Y, sumstat, LD,
X_ref, dict_YX, dict_sumstatLD,
outcome_names, focal_outcome_idx, effect_est,
effect_se, effect_n, M, and other ColocBoost model or
post-processing options. These arguments are forwarded unchanged unless one
or more QC controls are requested.
Individual-level QC is only attempted when at least one individual QC control
is non-NULL and named X and Y inputs are available in
.... Summary-statistic QC is only attempted when qc_method,
pip_cutoff_to_skip_sumstat, impute = TRUE, or
LD_reference_info is supplied and named sumstat plus either
LD, X_ref, or LD_reference_info are available.
qc_method = "none" means run basic allele/variant harmonization
only; it does not run SLALOM/DENTIST
LD-mismatch QC. RAISS imputation is controlled separately by
impute = TRUE.
If no QC controls are supplied, this function is a thin direct call to
colocboost(...).
When QC removes outcomes, outcome_names and focal_outcome_idx
are updated to match the post-QC outcome order. If the requested focal outcome
is removed by QC, focal_outcome_idx is set to NULL with a
warning.
Examples
if (FALSE) { # \dontrun{
# Direct ColocBoost call without QC.
fit <- colocboost_analysis(X = X, Y = Y, M = 500)
# Summary-statistic input with basic allele/variant harmonization only.
fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
qc_method = "none", M = 500)
# Summary-statistic input with LD-mismatch QC and RAISS imputation.
fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
qc_method = "slalom", impute = TRUE)
# Use richer LD metadata from load_LD_matrix() for QC, while still passing
# ColocBoost's native LD input.
ld_data <- load_LD_matrix(ld_meta_file, region)
fit <- colocboost_analysis(sumstat = sumstat, LD = ld_data$LD_matrix,
LD_reference_info = ld_data, qc_method = "none")
# Individual-level input with explicit genotype QC thresholds.
fit <- colocboost_analysis(X = X, Y = Y,
missing_rate_thresh = 0.1,
maf_cutoff = 0.0005)
} # }