RSS Fine-mapping with GWAS Summary Statistics#
End-to-end SuSiE-RSS fine-mapping over GWAS summary statistics, per study × per LD block, using
the current pecotmr S4 API (GwasSumStats / summaryStatsQc() / fineMappingPipeline()). Four SoS
steps:
[generate_manifest]—gwas_rss_manifest.Rresolves--gwas-meta+--gwas-tsv-listagainst--region-list+--regionsinto one manifest TSV (one row per study × region).[generate_gwas_sumstats]—gwas_sumstats_construct.Rbuilds oneGwasSumStatsRDS per study × LD block, runningsummaryStatsQc()(optional SLALOM/DENTIST QC + RAISS imputation; harmonization re-keys variants to the LD-panel id convention).[gwas_fine_mapping]—fine_mapping.Rdispatches topecotmr::fineMappingPipeline(gwasSumStats, methods = ...)(SuSiE-RSS).[gwas_rss_plot](opt-out via--no-plot) —gwas_rss_plot.Rrenders a PIP plot.
This replaces the legacy two-step workflow (get_analysis_regions + univariate_rss, which called the
removed rss_analysis_pipeline()). The LD reference must be genotype-backed (PLINK2/PLINK1/GDS/VCF);
a precomputed .cor.xz LD matrix is not supported for the GwasSumStats path.
Inputs#
--gwas-meta— TSV with columnsstudy_id,path(relative to the meta file or absolute), and optionalcolumn_mapping(per-study YAML). One row per study.--gwas-tsv-list S1=path1 ...— explicit STUDY=PATH pairs (alternative to--gwas-meta; no per-study column mapping).--region-list— BED-like TSV of LD blocks; and/or--regions chr:start-end ....--ld-meta— LD-reference meta TSV (#chr,start,end,path-> genotype prefix). One LD block per row (or one row withstart=end=0for a whole chromosome).
QC knobs (forwarded to gwas_sumstats_construct.R -> summaryStatsQc())#
--qc-method—none(default),slalom, ordentist.--impute— enable RAISS sumstat imputation.--qc-args '{"key":value,...}'— JSON passthrough for any othersummaryStatsQc()kwarg (e.g.{"mafCutoff":0.01,"nCutoff":0,"alleleFlipKriging":true}).
Fine-mapping knobs (forwarded to fine_mapping.R)#
--methods(defaultsusie),--coverage(0.95),--secondary-coverage(0.7,0.5).--min-abs-corr(0.8),--median-abs-corr(off; OR-logic purity),--pip-cutoff(0.025).--L(20) /--L-greedy(5) — SuSiE fit defaults;--method-args '{"susie":{...}}'overrides per key.
Outputs#
{cwd}/{manifest_name}.manifest.tsv{cwd}/sumstats/{study}.{chrN_S_E}.gwas_sumstats.rds{cwd}/fine_mapping/{study}.{chrN_S_E}.gwas_finemap.rds— oneGwasFineMappingResultper (study, block).{cwd}/plots/{study}.{chrN_S_E}.pip_plot.png(unless--no-plot).
Minimal working example#
Genotype-LD MWE (AD_Bellenguez chr22 toy data under input/). There is no single combined workflow#
cd /restricted/projectnb/xqtl/jaempawi/xqtl-protocol
sos run pipeline/rss_analysis.ipynb \
generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
--cwd output/rss_analysis --modular-script-dir code/script \
--gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
--regions chr22:49355984-50799822 \
--ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv
This fine-maps one study × one LD block and writes the manifest, the QC’d GwasSumStats, the
GwasFineMappingResult, and a PIP plot under output/rss_analysis/.
With SLALOM QC + RAISS imputation + per-method tuning:#
sos run pipeline/rss_analysis.ipynb \
generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
--cwd output/rss_analysis --modular-script-dir code/script \
--gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
--regions chr22:49355984-50799822 \
--ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv \
--qc-method slalom --impute --qc-args '{"mafCutoff":0.01}' \
--min-abs-corr 0.5 --method-args '{"susie":{"L":10}}'
Command interface#
[global]
parameter: cwd = path('output')
parameter: modular_script_dir = path('code/script')
# --- per-study GWAS inputs (use either or both) ---------------------
parameter: gwas_meta = path('.')
parameter: gwas_tsv_list = [] # list of STUDY=PATH items
# --- region inputs (use either or both) -----------------------------
parameter: region_list = path('.')
parameter: regions = [] # list of chr:start-end strings
# --- LD reference + genome ------------------------------------------
parameter: ld_meta = path
parameter: genome = 'GRCh38'
# --- QC knobs (forwarded to gwas_sumstats_construct.R) --------------
parameter: qc_method = 'none' # none | slalom | dentist
parameter: impute = False
parameter: qc_args = '' # JSON object spliced into summaryStatsQc()
# --- fine-mapping knobs (forwarded to fine_mapping.R) ---------------
parameter: methods = 'susie'
parameter: coverage = 0.95
parameter: secondary_coverage = '0.7,0.5'
parameter: min_abs_corr = 0.8
parameter: median_abs_corr = '' # empty -> off (OR-logic purity; needs pecotmr step-1)
parameter: pip_cutoff = 0.025
parameter: L = 20
parameter: L_greedy = 5
parameter: method_args = '' # nested per-method JSON object
# --- plot opt-out ---------------------------------------------------
parameter: no_plot = False
# --- output prefix for the manifest file ----------------------------
parameter: manifest_name = 'gwas_rss'
# --- infrastructure -------------------------------------------------
parameter: container = ''
parameter: job_size = 1
parameter: walltime = '2h'
parameter: mem = '16G'
parameter: numThreads = 1
[generate_manifest]
# Resolve --gwas-meta / --gwas-tsv-list x --region-list / --regions into a
# single manifest TSV via gwas_rss_manifest.R. Downstream steps fan out
# over its rows; no Python parsing in this notebook.
input: None
output: f"{cwd}/{manifest_name}.manifest.tsv"
task: trunk_workers = 1, trunk_size = 1, walltime = '15m', mem = '2G', cores = 1, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
Rscript ${modular_script_dir}/pecotmr_integration/gwas_rss_manifest.R \
${('--gwas-meta ' + str(gwas_meta)) if gwas_meta.is_file() else ''} \
${('--gwas-tsv-list ' + ' '.join(str(x) for x in gwas_tsv_list)) if gwas_tsv_list else ''} \
${('--region-list ' + str(region_list)) if region_list.is_file() else ''} \
${('--regions ' + ' '.join(str(x) for x in regions)) if regions else ''} \
--output ${_output}
[generate_gwas_sumstats]
# Fan out over the manifest's rows: one (study, region) GwasSumStats per
# row. Manifest columns: study_id, gwas_tsv, column_mapping, chr, start,
# end, region_id, gwas_tsv_basename.
import csv
jobs = list(csv.DictReader(open(f"{cwd}/{manifest_name}.manifest.tsv"), delimiter='\t'))
input: for_each = 'jobs'
output: f"{cwd}/sumstats/{_jobs['study_id']}.{_jobs['region_id']}.gwas_sumstats.rds"
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
Rscript ${modular_script_dir}/pecotmr_integration/gwas_sumstats_construct.R \
--study ${_jobs['study_id']} \
--gwas-tsv ${_jobs['gwas_tsv']} \
--ld-block ${_jobs['chr']}:${_jobs['start']}-${_jobs['end']} \
--ld-meta ${ld_meta} \
--genome ${genome} \
${('--column-mapping ' + _jobs['column_mapping']) if _jobs['column_mapping'] else ''} \
--qc-method ${qc_method} \
${'--impute' if impute else ''} \
${('--qc-args ' + repr(qc_args)) if qc_args else ''} \
--output ${_output}
[gwas_fine_mapping]
# Per-RDS fan-out: one fine-mapping task per (study, region) GwasSumStats.
output: f"{cwd}/fine_mapping/{_input:bnn}.gwas_finemap.rds", group_by = 1
task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
Rscript ${modular_script_dir}/pecotmr_integration/fine_mapping.R \
--gwas-sumstats ${_input} \
--methods ${methods} \
--coverage ${coverage} \
--secondary-coverage ${secondary_coverage} \
--min-abs-corr ${min_abs_corr} \
--pip-cutoff ${pip_cutoff} \
--L ${L} \
--L-greedy ${L_greedy} \
${('--median-abs-corr ' + str(median_abs_corr)) if str(median_abs_corr) != '' else ''} \
${('--method-args ' + repr(method_args)) if method_args else ''} \
--output ${_output}
[gwas_rss_plot]
stop_if(no_plot, '--no-plot set; skipping PIP plot step.')
output: f"{cwd}/plots/{_input:bnn}.pip_plot.png", group_by = 1
task: trunk_workers = 1, trunk_size = job_size, walltime = '30m', mem = '4G', cores = 1, tags = f"{step_name}_{_output:bn}"
bash: expand = '${ }', stderr = f"{_output}.stderr", stdout = f"{_output}.stdout", container = container
Rscript ${modular_script_dir}/pecotmr_integration/gwas_rss_plot.R \
--input ${_input} \
--output ${_output}