Skip to contents

This function performs quality control on the processed summary statistics using the specified method. It wraps ld_mismatch_qc and handles subsetting of the summary statistics and LD matrix.

Usage

summary_stats_qc(
  sumstats,
  LD_data,
  n = NULL,
  method = c("slalom", "dentist"),
  rss_input = NULL,
  keep_indel = TRUE,
  skip_region = NULL,
  pip_cutoff_to_skip = 0,
  qc_method = NULL,
  impute = FALSE,
  impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
  study = NULL,
  var_y = NULL,
  return_on_skip = c("null", "preprocess"),
  R_finite = NULL,
  R_mismatch = NULL
)

Arguments

sumstats

A data frame containing the processed summary statistics.

LD_data

An LDData S4 object or a legacy list containing combined LD variants data, as generated by load_LD_matrix.

n

Sample size for the LD reference panel (used by dentist method).

method

The quality control method to use. Options are "slalom" or "dentist" (default: "slalom").

rss_input

Optional loaded RSS input, either one load_rss_data() result or a named list of them. Supplying this selects the additional combined RSS/ColocBoost QC workflow. A single RSS record is detected by structure, not by the name sumstats, so a multi-study list may safely include a study named "sumstats".

keep_indel, skip_region, pip_cutoff_to_skip, qc_method, impute, impute_opts, study, var_y, return_on_skip, R_finite, R_mismatch

Additional controls for the combined RSS/ColocBoost QC workflow. They are ignored by the historical LD-mismatch-only call unless rss_input or combined-QC options are supplied.

Value

A list containing the quality-controlled summary statistics and updated LD matrix for the historical call:

  • sumstats: The quality-controlled summary statistics data frame.

  • LD_mat: The updated LD matrix after quality control.

  • outlier_number: The number of outlier variants removed.

When rss_input or combined-QC controls are supplied, returns a cleaned RSS/LD record for one RSS record, or a named list of records for a list of RSS records.

Details

This function applies the specified quality control method to the processed summary statistics via ld_mismatch_qc, then subsets the summary statistics and LD matrix to keep only non-outlier variants.

As an additional workflow for ColocBoost/RSS pipelines, callers may supply rss_input or combined-QC controls. That path first runs rss_basic_qc(), optional PIP screening, optional LD-mismatch QC through this same function, and optional RAISS imputation. The combined path normalizes one or many RSS records to the same loop internally; only true single-record input is unwrapped on return. qc_method = NULL and "none" mean basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.

When the combined path receives genotype-backed reference data (X_ref), basic harmonization avoids constructing an LD matrix. PIP screening uses the LD-independent single-effect summary-statistic model susie_ser(coverage = NULL), and LD-mismatch QC computes only the filtered local correlation matrix required by SLALOM/DENTIST. RAISS imputation temporarily centers/scales genotype-backed X_ref before using the whole-region genotype/SVD path. This avoids LD-block partition artifacts while matching the LD scale used by compute_LD(X_ref). Final ColocBoost calls still keep the original X_ref as the reference input.

Examples

# Perform SLALOM quality control (default)
qc_results <- summary_stats_qc(sumstats, LD_data, method = "slalom")
#> Error: object 'sumstats' not found

# Additional combined basic-only RSS QC.
qc_results <- summary_stats_qc(rss_input = rss_input, LD_data = LD_data,
                               qc_method = "none")
#> Error: object 'rss_input' not found