This function performs quality control on the processed summary statistics
using the specified method. It wraps ld_mismatch_qc and handles
subsetting of the summary statistics and LD matrix.
Usage
summary_stats_qc(
sumstats,
LD_data,
n = NULL,
method = c("slalom", "dentist"),
rss_input = NULL,
keep_indel = TRUE,
skip_region = NULL,
pip_cutoff_to_skip = 0,
qc_method = NULL,
impute = FALSE,
impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
study = NULL,
var_y = NULL,
return_on_skip = c("null", "preprocess"),
R_finite = NULL,
R_mismatch = NULL
)Arguments
- sumstats
A data frame containing the processed summary statistics.
- LD_data
An
LDDataS4 object or a legacy list containing combined LD variants data, as generated byload_LD_matrix.- n
Sample size for the LD reference panel (used by dentist method).
- method
The quality control method to use. Options are
"slalom"or"dentist"(default:"slalom").- rss_input
Optional loaded RSS input, either one
load_rss_data()result or a named list of them. Supplying this selects the additional combined RSS/ColocBoost QC workflow. A single RSS record is detected by structure, not by the namesumstats, so a multi-study list may safely include a study named"sumstats".- keep_indel, skip_region, pip_cutoff_to_skip, qc_method, impute, impute_opts, study, var_y, return_on_skip, R_finite, R_mismatch
Additional controls for the combined RSS/ColocBoost QC workflow. They are ignored by the historical LD-mismatch-only call unless
rss_inputor combined-QC options are supplied.
Value
A list containing the quality-controlled summary statistics and updated LD matrix for the historical call:
sumstats: The quality-controlled summary statistics data frame.
LD_mat: The updated LD matrix after quality control.
outlier_number: The number of outlier variants removed.
When rss_input or combined-QC controls are supplied, returns a
cleaned RSS/LD record for one RSS record, or a named list of records for a
list of RSS records.
Details
This function applies the specified quality control method to the
processed summary statistics via ld_mismatch_qc, then subsets
the summary statistics and LD matrix to keep only non-outlier variants.
As an additional workflow for ColocBoost/RSS pipelines, callers may supply
rss_input or combined-QC controls. That path first runs
rss_basic_qc(), optional PIP screening, optional LD-mismatch QC
through this same function, and optional RAISS imputation. The combined
path normalizes one or many RSS records to the same loop internally; only
true single-record input is unwrapped on return.
qc_method = NULL and "none" mean basic-only summary-stat
preprocessing without SLALOM/DENTIST outlier QC.
When the combined path receives genotype-backed reference data
(X_ref), basic harmonization avoids constructing an LD matrix. PIP
screening uses the LD-independent single-effect summary-statistic model
susie_ser(coverage = NULL), and LD-mismatch QC
computes only the filtered local correlation matrix required by
SLALOM/DENTIST. RAISS imputation temporarily centers/scales
genotype-backed X_ref before using the whole-region genotype/SVD
path. This avoids LD-block partition artifacts while matching the LD scale
used by compute_LD(X_ref). Final ColocBoost calls still keep the
original X_ref as the reference input.
Examples
# Perform SLALOM quality control (default)
qc_results <- summary_stats_qc(sumstats, LD_data, method = "slalom")
#> Error: object 'sumstats' not found
# Additional combined basic-only RSS QC.
qc_results <- summary_stats_qc(rss_input = rss_input, LD_data = LD_data,
qc_method = "none")
#> Error: object 'rss_input' not found