Regression with Summary Statistics (RSS) Fine-Mapping and TWAS with SuSiE#
Description#
Last, we include an option to conduct fine-mapping with SuSiE Regression using Summary Statistics (RSS) model and TWAS.
Input#
--gwas-meta— TSV with columnsstudy_id,path(GWAS sumstats, relative or absolute), and optionalcolumn_mapping(per-study YAML). One row per study:
study_id path column_mapping
AD_Bellenguez_2022 input/rss_analysis/protocol_example.gwas_sumstats.chr22.tsv.gz input/rss_analysis/protocol_example.gwas_column_mapping.fixed.yml
--ld-meta— LD-reference meta TSV with columns#chr,start,end,path(genotype prefix pointing to PLINK2/PLINK1 files; produced byrss_ld_sketch.ipynb):
#chr start end path
chr22 49355984 50799822 input/ld_reference/chr22_49355984_50799822
--regions— one or more regions inchr:start-endformat (e.g.chr22:49355984-50799822)--cwd— output root directory--modular-script-dir— path tocode/script/directory
Optional QC parameters (forwarded to summaryStatsQc()):
Parameter |
Default |
Description |
|---|---|---|
|
|
QC method: |
|
off |
Enable RAISS sumstat imputation for missing variants |
|
|
JSON object of extra |
|
|
Minimum absolute pairwise LD r² for credible set purity filter |
|
|
Maximum number of causal signals SuSiE-RSS searches for |
|
|
Nested per-method JSON, e.g. |
Output#
Two output subdirectories are created under --cwd:
sumstats/ — one GwasSumStats S4 RDS per (study × LD block):
{study_id}.{chrom}_{start}_{end}.gwas_sumstats.rds
fine_mapping/ — one GwasFineMappingResult S4 RDS per (study × LD block):
{study_id}.{chrom}_{start}_{end}.gwas_finemap.rds
plots/ — PIP Manhattan plots:
{study_id}.{chrom}_{start}_{end}.pip_plot.png
Minimal Working Example Steps#
v. Run the Summary Statistics Fine-Mapping#
sos run pipeline/rss_analysis.ipynb \
generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \
--cwd output/rss_analysis --modular-script-dir code/script \
--gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \
--regions chr22:49355984-50799822 \
--ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv
Anticipated Results#
Summary statistics fine-mapping produces a results file for each region and gwas of interest.
GwasFineMappingResult S4 object#
suppressPackageStartupMessages(library(pecotmr))
rds <- readRDS("output/rss_analysis/fine_mapping/AD_Bellenguez_2022.chr22_49355984_50799822.gwas_finemap.rds")
class(rds) # "GwasFineMappingResult"
ld <- attr(rds, "listData")
names(ld) # "study" "method" "region_id" "entry"
ld$study # "AD_Bellenguez_2022"
ld$method # "susie"
e1 <- ld$entry[[1]]
names(attributes(e1)) # "variantIds" "susieFit" "topLoci" "cvResult" "class"
topLoci table (one row per variant in the region):
tl <- attr(e1, "topLoci")
dim(tl) # 24 × 23
names(tl)
# [1] "variant_id" "chrom" "pos" "A1" "A2"
# [6] "N" "af" "marginal_beta" "marginal_se" "marginal_z"
# [11] "marginal_p" "pip" "posterior_mean" "posterior_sd" "cs_95"
# [16] "cs_70" "cs_50" "cs_95_purity" "method" "gene"
# [21] "event" "grange_start" "grange_end"
# Top 3 by PIP
tl_s <- tl[order(tl$pip, decreasing = TRUE), ]
tl_s[1:3, c("variant_id", "af", "marginal_z", "pip", "cs_95", "cs_95_purity")]
# variant_id af marginal_z pip cs_95 cs_95_purity
# chr22:49363017:G:A 0.317 2.753 0.382 susie_rss_0 0
# chr22:49784794:G:A 0.073 -2.280 0.292 susie_rss_0 0
# chr22:49514095:A:G 0.937 2.053 0.268 susie_rss_0 0
Toy data note: All variants fall in one large credible set (
cs_95_purity = 0), indicating insufficient LD contrast to localize the signal. In real large-sample GWAS the top CS typically contains 1–10 variants with purity ≥ 0.5.
GwasSumStats S4 object#
ss <- readRDS("output/rss_analysis/sumstats/AD_Bellenguez_2022.chr22_49355984_50799822.gwas_sumstats.rds")
class(ss) # "GwasSumStats"
# main data — GRanges with GWAS columns
ld_ss <- attr(ss, "listData")
ld_ss$study # "AD_Bellenguez_2022"
e_ss <- ld_ss$entry[[1]] # GRanges
names(mcols(e_ss)) # "SNP" "A1" "A2" "Z" "N" "MAF" "BETA" "SE" "P"
length(e_ss) # 24 (variants passing QC and matched to LD panel)
# QC audit
attr(ss, "qcInfo")$entryAudit[[1]]
# $variantsIn: 7609 (raw input variants)
# $contentFilters$nDropped: 2563
# $betaSeFromZ$nDerived: 5046 (beta/se imputed from z + N)
# $matchedAgainstSketch: 24 (variants matched to LD reference)
# $variantsOut: 24
qcInfo$entryAudit key fields:
Field |
Description |
|---|---|
|
Raw variants in the GWAS sumstats for this region |
|
Dropped by QC filters (strand-ambiguous, missing data, etc.) |
|
Variants where β/SE were imputed from Z-score and N |
|
Variants successfully matched to LD reference panel |
|
Final variant count used for fine-mapping |