This function performs an end-to-end RSS analysis pipeline, including data loading, preprocessing, quality control, imputation, and SuSiE RSS analysis. It provides flexibility in specifying various analysis options and parameters.
Usage
rss_analysis_pipeline(
sumstat_path,
column_file_path,
LD_data,
n_sample = 0,
n_case = 0,
n_control = 0,
region = NULL,
skip_region = NULL,
extract_region_name = NULL,
region_name_col = NULL,
qc_method = c("dentist", "slalom"),
finemapping_method = c("susie_rss", "single_effect", "bayesian_conditional_regression"),
finemapping_opts = list(init_L = 5, max_L = 20, l_step = 5, coverage = c(0.95, 0.7,
0.5), signal_cutoff = 0.025, min_abs_corr = 0.8),
impute = TRUE,
impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
pip_cutoff_to_skip = 0,
remove_indels = FALSE,
comment_string = "#",
diagnostics = FALSE
)Arguments
- sumstat_path
File path to the summary statistics.
- column_file_path
File path to the column file for mapping.
- LD_data
A list containing combined LD variants data that is generated by load_LD_matrix.
- n_sample
User-specified sample size. If unknown, set as 0 to retrieve from the sumstat file.
- n_case
User-specified number of cases.
- n_control
User-specified number of controls.
- region
The region where tabix use to subset the input dataset.
- skip_region
A character vector specifying regions to be skipped in the analysis (optional). Each region should be in the format "chrom:start-end" (e.g., "1:1000000-2000000").
- extract_region_name
User-specified gene/phenotype name used to further subset the phenotype data.
- region_name_col
Filter this specific column for the extract_region_name.
- qc_method
Quality control method to use. Options are "dentist" or "slalom" (default: "dentist").
- finemapping_opts
A list of fine-mapping options: init_L, max_L, l_step, coverage, signal_cutoff, and min_abs_corr (minimum absolute correlation for credible set purity, default 0.8; susieR default is 0.5).
- impute
Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).
- impute_opts
A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).
- pip_cutoff_to_skip
PIP cutoff to skip imputation (default: 0).
- L
Initial number of causal configurations to consider in the analysis (default: 8).
- max_L
Maximum number of causal configurations to consider when dynamically adjusting L (default: 20).
- l_step
Step size for increasing L when the limit is reached during dynamic adjustment (default: 5).
- analysis_method
Analysis method to use. Options are "susie_rss", "single_effect", or "bayesian_conditional_regression" (default: "susie_rss").
- coverage
Coverage levels for SuSiE RSS analysis (default: c(0.95, 0.7, 0.5)).
- signal_cutoff
Signal cutoff for susie_post_processor (default: 0.025).