RSS Fine-mapping with GWAS Summary Statistics

RSS Fine-mapping with GWAS Summary Statistics#

End-to-end SuSiE-RSS fine-mapping over GWAS summary statistics, per study × per LD block, using the current pecotmr S4 API (GwasSumStats / summaryStatsQc() / fineMappingPipeline()). Four SoS steps:

[generate_manifest] — gwas_rss_manifest.R resolves --gwas-meta + --gwas-tsv-list against --region-list + --regions into one manifest TSV (one row per study × region).
[generate_gwas_sumstats] — gwas_sumstats_construct.R builds one GwasSumStats RDS per study × LD block, running summaryStatsQc() (optional SLALOM/DENTIST QC + RAISS imputation; harmonization re-keys variants to the LD-panel id convention).
[gwas_fine_mapping] — fine_mapping.R dispatches to pecotmr::fineMappingPipeline(gwasSumStats, methods = ...) (SuSiE-RSS).
[gwas_rss_plot] (opt-out via --no-plot) — gwas_rss_plot.R renders a PIP plot.

This replaces the legacy two-step workflow (get_analysis_regions + univariate_rss, which called the removed rss_analysis_pipeline()). The LD reference must be genotype-backed (PLINK2/PLINK1/GDS/VCF); a precomputed .cor.xz LD matrix is not supported for the GwasSumStats path.

Inputs#

--gwas-meta — TSV with columns study_id, path (relative to the meta file or absolute), and optional column_mapping (per-study YAML). One row per study.
--gwas-tsv-list S1=path1 ... — explicit STUDY=PATH pairs (alternative to --gwas-meta; no per-study column mapping).
--region-list — BED-like TSV of LD blocks; and/or --regions chr:start-end ....
--ld-meta — LD-reference meta TSV (#chr, start, end, path -> genotype prefix). One LD block per row (or one row with start=end=0 for a whole chromosome).

QC knobs (forwarded to `gwas_sumstats_construct.R` -> `summaryStatsQc()`)#

--qc-method — none (default), slalom, or dentist.
--impute — enable RAISS sumstat imputation.
--qc-args '{"key":value,...}' — JSON passthrough for any other summaryStatsQc() kwarg (e.g. {"mafCutoff":0.01,"nCutoff":0,"alleleFlipKriging":true}).

Fine-mapping knobs (forwarded to `fine_mapping.R`)#

--methods (default susie), --coverage (0.95), --secondary-coverage (0.7,0.5).
--min-abs-corr (0.8), --median-abs-corr (off; OR-logic purity), --pip-cutoff (0.025).
--L (20) / --L-greedy (5) — SuSiE fit defaults; --method-args '{"susie":{...}}' overrides per key.

Outputs#

{cwd}/{manifest_name}.manifest.tsv
{cwd}/sumstats/{study}.{chrN_S_E}.gwas_sumstats.rds
{cwd}/fine_mapping/{study}.{chrN_S_E}.gwas_finemap.rds — one GwasFineMappingResult per (study, block).
{cwd}/plots/{study}.{chrN_S_E}.pip_plot.png (unless --no-plot).

Minimal working example#

Genotype-LD MWE (AD_Bellenguez chr22 toy data under `input/`). There is no single combined workflow#

cd /restricted/projectnb/xqtl/jaempawi/xqtl-protocol
sos run pipeline/rss_analysis.ipynb \
    generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
    --cwd output/rss_analysis --modular-script-dir code/script \
    --gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
    --regions chr22:49355984-50799822 \
    --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv

This fine-maps one study × one LD block and writes the manifest, the QC’d GwasSumStats, the GwasFineMappingResult, and a PIP plot under output/rss_analysis/.

With SLALOM QC + RAISS imputation + per-method tuning:#

sos run pipeline/rss_analysis.ipynb \
    generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
    --cwd output/rss_analysis --modular-script-dir code/script \
    --gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
    --regions chr22:49355984-50799822 \
    --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv \
    --qc-method slalom --impute --qc-args '{"mafCutoff":0.01}' \
    --min-abs-corr 0.5 --method-args '{"susie":{"L":10}}'

Command interface#

[global]
parameter: cwd = path('output')
parameter: modular_script_dir = path('code/script')
# --- per-study GWAS inputs (use either or both) ---------------------
parameter: gwas_meta = path('.')
parameter: gwas_tsv_list = []   # list of STUDY=PATH items
# --- region inputs (use either or both) -----------------------------
parameter: region_list = path('.')
parameter: regions = []         # list of chr:start-end strings
# --- LD reference + genome ------------------------------------------
parameter: ld_meta = path
parameter: genome = 'GRCh38'
# --- QC knobs (forwarded to gwas_sumstats_construct.R) --------------
parameter: qc_method = 'none'   # none | slalom | dentist
parameter: impute = False
parameter: qc_args = ''         # JSON object spliced into summaryStatsQc()
# --- fine-mapping knobs (forwarded to fine_mapping.R) ---------------
parameter: methods = 'susie'
parameter: coverage = 0.95
parameter: secondary_coverage = '0.7,0.5'
parameter: min_abs_corr = 0.8
parameter: median_abs_corr = ''   # empty -> off (OR-logic purity; needs pecotmr step-1)
parameter: pip_cutoff = 0.025
parameter: L = 20
parameter: L_greedy = 5
parameter: method_args = ''     # nested per-method JSON object
# --- plot opt-out ---------------------------------------------------
parameter: no_plot = False
# --- output prefix for the manifest file ----------------------------
parameter: manifest_name = 'gwas_rss'
# --- infrastructure -------------------------------------------------
parameter: container = ''
parameter: job_size = 1
parameter: walltime = '2h'
parameter: mem = '16G'
parameter: numThreads = 1

[generate_manifest]
# Resolve --gwas-meta / --gwas-tsv-list x --region-list / --regions into a
# single manifest TSV via gwas_rss_manifest.R. Downstream steps fan out
# over its rows; no Python parsing in this notebook.
input: None
output: f"{cwd}/{manifest_name}.manifest.tsv"
task: trunk_workers = 1, trunk_size = 1, walltime = '15m', mem = '2G', cores = 1, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
    Rscript ${modular_script_dir}/pecotmr_integration/gwas_rss_manifest.R \
        ${('--gwas-meta ' + str(gwas_meta)) if gwas_meta.is_file() else ''} \
        ${('--gwas-tsv-list ' + ' '.join(str(x) for x in gwas_tsv_list)) if gwas_tsv_list else ''} \
        ${('--region-list ' + str(region_list)) if region_list.is_file() else ''} \
        ${('--regions ' + ' '.join(str(x) for x in regions)) if regions else ''} \
        --output ${_output}

[generate_gwas_sumstats]
# Fan out over the manifest's rows: one (study, region) GwasSumStats per
# row. Manifest columns: study_id, gwas_tsv, column_mapping, chr, start,
# end, region_id, gwas_tsv_basename.
import csv
jobs = list(csv.DictReader(open(f"{cwd}/{manifest_name}.manifest.tsv"), delimiter='\t'))
input: for_each = 'jobs'
output: f"{cwd}/sumstats/{_jobs['study_id']}.{_jobs['region_id']}.gwas_sumstats.rds"
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
    Rscript ${modular_script_dir}/pecotmr_integration/gwas_sumstats_construct.R \
        --study ${_jobs['study_id']} \
        --gwas-tsv ${_jobs['gwas_tsv']} \
        --ld-block ${_jobs['chr']}:${_jobs['start']}-${_jobs['end']} \
        --ld-meta ${ld_meta} \
        --genome ${genome} \
        ${('--column-mapping ' + _jobs['column_mapping']) if _jobs['column_mapping'] else ''} \
        --qc-method ${qc_method} \
        ${'--impute' if impute else ''} \
        ${('--qc-args ' + repr(qc_args)) if qc_args else ''} \
        --output ${_output}

[gwas_fine_mapping]
# Per-RDS fan-out: one fine-mapping task per (study, region) GwasSumStats.
output: f"{cwd}/fine_mapping/{_input:bnn}.gwas_finemap.rds", group_by = 1
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
    Rscript ${modular_script_dir}/pecotmr_integration/fine_mapping.R \
        --gwas-sumstats ${_input} \
        --methods ${methods} \
        --coverage ${coverage} \
        --secondary-coverage ${secondary_coverage} \
        --min-abs-corr ${min_abs_corr} \
        --pip-cutoff ${pip_cutoff} \
        --L ${L} \
        --L-greedy ${L_greedy} \
        ${('--median-abs-corr ' + str(median_abs_corr)) if str(median_abs_corr) != '' else ''} \
        ${('--method-args ' + repr(method_args)) if method_args else ''} \
        --output ${_output}

[gwas_rss_plot]
stop_if(no_plot, '--no-plot set; skipping PIP plot step.')
output: f"{cwd}/plots/{_input:bnn}.pip_plot.png", group_by = 1
task: trunk_workers = 1, trunk_size = job_size, walltime = '30m', mem = '4G', cores = 1, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
    Rscript ${modular_script_dir}/pecotmr_integration/gwas_rss_plot.R \
        --input ${_input} \
        --output ${_output}